xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Linux 4.1 reports wrong number of pages to toolstack
@ 2015-09-04  0:40 Wei Liu
  2015-09-04  3:38 ` Juergen Gross
  2015-09-04  8:53 ` Ian Campbell
  0 siblings, 2 replies; 15+ messages in thread
From: Wei Liu @ 2015-09-04  0:40 UTC (permalink / raw)
  To: xen-devel, David Vrabel, Juergen Gross
  Cc: Ian Jackson, wei.liu2, Ian Campbell, Andrew Cooper

Hi David

This issue is exposed by the introduction of migration v2. The symptom is that
a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
many pages.

Note that all guests have 512MB memory, which means they have 131072 pages.

Both 3.14 tests [2] [3] get the correct number of pages.  Like:

   xc: detail: max_pfn 0x1ffff, p2m_frames 256
   ...
   xc: detail: Memory: 2048/131072    1%
   ...

However in both 4.1 [0] [1] the number of pages are quite wrong.

4.1 32 bit:

   xc: detail: max_pfn 0xfffff, p2m_frames 1024
   ...
   xc: detail: Memory: 11264/1048576    1%
   ...

It thinks it has 4096MB memory.

4.1 64 bit:

   xc: detail: max_pfn 0x3ffff, p2m_frames 512
   ...
   xc: detail: Memory: 3072/262144    1%
   ...

It thinks it has 1024MB memory.

The total number of pages is determined in libxc by calling
xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
hypervisor. And that value is clearly touched by Linux in some way.

I now think this is a bug in Linux kernel. The biggest suspect is the
introduction of linear P2M.  If you think this is a bug in toolstack,
please let me know.

I don't know why 4.1 64 bit [0] can still be successfully restored. I
don't have handy setup to experiment. The restore path doesn't show
enough information to tell anything. The thing I worry about is that
migration v2 somehow make the guest bigger than it should be. But that's
another topic.


Wei.

[0] 4.1 kernel 64 bit save restore:
http://logs.test-lab.xenproject.org/osstest/logs/60785/test-amd64-amd64-xl/16.ts-guest-saverestore.log

[1] 4.1 kernel 32 bit save restore:
http://logs.test-lab.xenproject.org/osstest/logs/60785/test-amd64-i386-xl/14.ts-guest-saverestore.log

[2] 3.14 kernel 64 bit save restore:
http://logs.test-lab.xenproject.org/osstest/logs/61263/test-amd64-amd64-xl/16.ts-guest-saverestore.log

[3] 3.14 kernel 32 bit save restore:
http://logs.test-lab.xenproject.org/osstest/logs/61263/test-amd64-i386-xl/16.ts-guest-saverestore.log

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  0:40 Linux 4.1 reports wrong number of pages to toolstack Wei Liu
@ 2015-09-04  3:38 ` Juergen Gross
  2015-09-04  8:28   ` Jan Beulich
  2015-09-04  8:53 ` Ian Campbell
  1 sibling, 1 reply; 15+ messages in thread
From: Juergen Gross @ 2015-09-04  3:38 UTC (permalink / raw)
  To: Wei Liu, xen-devel, David Vrabel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell

On 09/04/2015 02:40 AM, Wei Liu wrote:
> Hi David
>
> This issue is exposed by the introduction of migration v2. The symptom is that
> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
> many pages.
>
> Note that all guests have 512MB memory, which means they have 131072 pages.
>
> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
>
>     xc: detail: max_pfn 0x1ffff, p2m_frames 256
>     ...
>     xc: detail: Memory: 2048/131072    1%
>     ...
>
> However in both 4.1 [0] [1] the number of pages are quite wrong.
>
> 4.1 32 bit:
>
>     xc: detail: max_pfn 0xfffff, p2m_frames 1024
>     ...
>     xc: detail: Memory: 11264/1048576    1%
>     ...
>
> It thinks it has 4096MB memory.
>
> 4.1 64 bit:
>
>     xc: detail: max_pfn 0x3ffff, p2m_frames 512
>     ...
>     xc: detail: Memory: 3072/262144    1%
>     ...
>
> It thinks it has 1024MB memory.
>
> The total number of pages is determined in libxc by calling
> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
> hypervisor. And that value is clearly touched by Linux in some way.

Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
can handle. This is not the memory size of the domain.

> I now think this is a bug in Linux kernel. The biggest suspect is the
> introduction of linear P2M.  If you think this is a bug in toolstack,
> please let me know.

I absolutely think it is a toolstack bug. Even without the linear p2m
things would go wrong in case a ballooned down guest would be migrated,
as shared_info->arch.max_pfn would hold the upper limit of the guest
in this case and not the current size.


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  3:38 ` Juergen Gross
@ 2015-09-04  8:28   ` Jan Beulich
  2015-09-04  9:35     ` Andrew Cooper
  2015-09-04 11:40     ` Wei Liu
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2015-09-04  8:28 UTC (permalink / raw)
  To: Wei Liu, Juergen Gross
  Cc: Andrew Cooper, xen-devel, Ian Jackson, David Vrabel, Ian Campbell

>>> On 04.09.15 at 05:38, <JGross@suse.com> wrote:
> On 09/04/2015 02:40 AM, Wei Liu wrote:
>> This issue is exposed by the introduction of migration v2. The symptom is that
>> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
>> many pages.
>>
>> Note that all guests have 512MB memory, which means they have 131072 pages.
>>
>> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
>>
>>     xc: detail: max_pfn 0x1ffff, p2m_frames 256
>>     ...
>>     xc: detail: Memory: 2048/131072    1%
>>     ...
>>
>> However in both 4.1 [0] [1] the number of pages are quite wrong.
>>
>> 4.1 32 bit:
>>
>>     xc: detail: max_pfn 0xfffff, p2m_frames 1024
>>     ...
>>     xc: detail: Memory: 11264/1048576    1%
>>     ...
>>
>> It thinks it has 4096MB memory.
>>
>> 4.1 64 bit:
>>
>>     xc: detail: max_pfn 0x3ffff, p2m_frames 512
>>     ...
>>     xc: detail: Memory: 3072/262144    1%
>>     ...
>>
>> It thinks it has 1024MB memory.
>>
>> The total number of pages is determined in libxc by calling
>> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
>> hypervisor. And that value is clearly touched by Linux in some way.
> 
> Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
> can handle. This is not the memory size of the domain.
> 
>> I now think this is a bug in Linux kernel. The biggest suspect is the
>> introduction of linear P2M.  If you think this is a bug in toolstack,
>> please let me know.
> 
> I absolutely think it is a toolstack bug. Even without the linear p2m
> things would go wrong in case a ballooned down guest would be migrated,
> as shared_info->arch.max_pfn would hold the upper limit of the guest
> in this case and not the current size.

I don't think this necessarily is a tool stack bug, at least not in
the sense implied above - since (afaik) migrating ballooned guests
(at least PV ones) has been working before, there ought to be
logic to skip ballooned pages (and I certainly recall having seen
migration slowly move up to e.g. 50% and the skip the other
half due to being ballooned, albeit that recollection certainly is
from before v2). And pages above the highest populated one
ought to be considered ballooned just as much. With the
information provided by Wei I don't think we can judge about
this, since it only shows the values the migration process starts
from, not when, why, or how it fails.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  0:40 Linux 4.1 reports wrong number of pages to toolstack Wei Liu
  2015-09-04  3:38 ` Juergen Gross
@ 2015-09-04  8:53 ` Ian Campbell
  2015-09-04  9:28   ` Ian Campbell
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Ian Campbell @ 2015-09-04  8:53 UTC (permalink / raw)
  To: Wei Liu, xen-devel, David Vrabel, Juergen Gross
  Cc: Andrew Cooper, Ian Jackson

On Fri, 2015-09-04 at 01:40 +0100, Wei Liu wrote:
> Hi David
> 
> This issue is exposed by the introduction of migration v2. The symptom is that
> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
> many pages.

FWIW my adhoc tests overnight gave me:

37858: b953c0d234bc72e8489d3bf51a276c5c4ec85345 v4.1		Fail
37862: 39a8804455fb23f09157341d3ba7db6d7ae6ee76 v4.0		Fail
37860: bfa76d49576599a4b9f9b7a71f23d73d6dcff735 v3.19		Fail

37872: e36f014edff70fc02b3d3d79cead1d58f289332e v3.19-rc7	Fail
37866: 26bc420b59a38e4e6685a73345a0def461136dce v3.19-rc6	Fail
37868: ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc v3.19-rc5	Fail
37864: eaa27f34e91a14cdceed26ed6c6793ec1d186115 v3.19-rc4	Fail *
37867: b1940cd21c0f4abdce101253e860feff547291b0 v3.19-rc3	Pass *
37865: b7392d2247cfe6771f95d256374f1a8e6a6f48d6 v3.19-rc2	Pass

37863: 97bf6af1f928216fd6c5a66e8a57bfa95a659672 v3.19-rc1	Pass

37861: b2776bf7149bddd1f4161f14f79520f17fc1d71d v3.18		Pass

I have set the adhoc bisector working on the ~200 commits between rc3 and
rc4. It's running in the Citrix instance (which is quieter) so the interim
results are only visible within our network at http://osstest.xs.citrite.ne
t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
-xl..html.

So far it has confirmed the basis fail and it is now rechecking the basis
pass.

Slightly strange though is:
$ git log --oneline v3.19-rc3..v3.19-rc4 -- drivers/xen/ arch/x86/xen/ include/xen/
$

i.e. there are no relevant seeming xen commits in that range. Maybe the
last one of this is more relevant?

$ git log --grep=[xX][eE][nN] --oneline v3.19-rc3..v3.19-rc4 -- 
bdec419 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
07ff890 xen-netback: fixing the propagation of the transmit shaper timeout
132978b x86: Fix step size adjustment during initial memory mapping
$

I don't think this particular issue is prone to false positives (i.e.
passing when it should fail) and the bisector has reconfirmed the fail case
already, so I think it is unlikely that the bisector is going to come back
and say it can't find a reliable basis for running.

Which might mean we have two issues, some as yet unknown issue between
v3.19-rc3 and -rc4 and the issue you have observed with the number of pages
the toolstack thinks it should be working on, which is masked by the
unknown issue (and could very well be a toolstack bug exposed by a change
in Linux, not a Linux bug at all).

I'm going to leave the bisector going, hopefully it'll tell us something
interesting in whatever it fingers...

Ian.


> 
> Note that all guests have 512MB memory, which means they have 131072 
> pages.
> 
> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
> 
>    xc: detail: max_pfn 0x1ffff, p2m_frames 256
>    ...
>    xc: detail: Memory: 2048/131072    1%
>    ...
> 
> However in both 4.1 [0] [1] the number of pages are quite wrong.
> 
> 4.1 32 bit:
> 
>    xc: detail: max_pfn 0xfffff, p2m_frames 1024
>    ...
>    xc: detail: Memory: 11264/1048576    1%
>    ...
> 
> It thinks it has 4096MB memory.
> 
> 4.1 64 bit:
> 
>    xc: detail: max_pfn 0x3ffff, p2m_frames 512
>    ...
>    xc: detail: Memory: 3072/262144    1%
>    ...
> 
> It thinks it has 1024MB memory.
> 
> The total number of pages is determined in libxc by calling
> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
> hypervisor. And that value is clearly touched by Linux in some way.
> 
> I now think this is a bug in Linux kernel. The biggest suspect is the
> introduction of linear P2M.  If you think this is a bug in toolstack,
> please let me know.
> 
> I don't know why 4.1 64 bit [0] can still be successfully restored. I
> don't have handy setup to experiment. The restore path doesn't show
> enough information to tell anything. The thing I worry about is that
> migration v2 somehow make the guest bigger than it should be. But that's
> another topic.
> 
> 
> Wei.
> 
> [0] 4.1 kernel 64 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/60785/test-amd64-amd64
> -xl/16.ts-guest-saverestore.log
> 
> [1] 4.1 kernel 32 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/60785/test-amd64-i386
> -xl/14.ts-guest-saverestore.log
> 
> [2] 3.14 kernel 64 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/61263/test-amd64-amd64
> -xl/16.ts-guest-saverestore.log
> 
> [3] 3.14 kernel 32 bit save restore:
> http://logs.test-lab.xenproject.org/osstest/logs/61263/test-amd64-i386
> -xl/16.ts-guest-saverestore.log

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  8:53 ` Ian Campbell
@ 2015-09-04  9:28   ` Ian Campbell
  2015-09-04 14:42   ` David Vrabel
  2015-09-07  7:09   ` Jan Beulich
  2 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-09-04  9:28 UTC (permalink / raw)
  To: Wei Liu, xen-devel, David Vrabel, Juergen Gross
  Cc: Andrew Cooper, Ian Jackson

On Fri, 2015-09-04 at 09:53 +0100, Ian Campbell wrote:
> I have set the adhoc bisector working on the ~200 commits between rc3 and
> rc4. It's running in the Citrix instance (which is quieter) so the interim
> results are only visible within our network at http://osstest.xs.citrite.ne
> t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
> -xl..html.
> 
> So far it has confirmed the basis fail and it is now rechecking the basis
> pass.

It's checked the basis and is now actually bisecting.

I setup a periodic rsync to 
http://xenbits.xen.org/people/ianc/tmp/adhoc/test-amd64-i386-xl..html for
anyone outside the Citrix network who wants to follow along...

The first hash in the tuple is the Linux one, all the others are the same
for all nodes (because I arranged the basis flights that way)

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  8:28   ` Jan Beulich
@ 2015-09-04  9:35     ` Andrew Cooper
  2015-09-04 11:35       ` Wei Liu
  2015-09-04 11:40     ` Wei Liu
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2015-09-04  9:35 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu, Juergen Gross
  Cc: xen-devel, Ian Jackson, David Vrabel, Ian Campbell

On 04/09/15 09:28, Jan Beulich wrote:
>>>> On 04.09.15 at 05:38, <JGross@suse.com> wrote:
>> On 09/04/2015 02:40 AM, Wei Liu wrote:
>>> This issue is exposed by the introduction of migration v2. The symptom is that
>>> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
>>> many pages.
>>>
>>> Note that all guests have 512MB memory, which means they have 131072 pages.
>>>
>>> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
>>>
>>>      xc: detail: max_pfn 0x1ffff, p2m_frames 256
>>>      ...
>>>      xc: detail: Memory: 2048/131072    1%
>>>      ...
>>>
>>> However in both 4.1 [0] [1] the number of pages are quite wrong.
>>>
>>> 4.1 32 bit:
>>>
>>>      xc: detail: max_pfn 0xfffff, p2m_frames 1024
>>>      ...
>>>      xc: detail: Memory: 11264/1048576    1%
>>>      ...
>>>
>>> It thinks it has 4096MB memory.
>>>
>>> 4.1 64 bit:
>>>
>>>      xc: detail: max_pfn 0x3ffff, p2m_frames 512
>>>      ...
>>>      xc: detail: Memory: 3072/262144    1%
>>>      ...
>>>
>>> It thinks it has 1024MB memory.
>>>
>>> The total number of pages is determined in libxc by calling
>>> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
>>> hypervisor. And that value is clearly touched by Linux in some way.
>> Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
>> can handle. This is not the memory size of the domain.
>>
>>> I now think this is a bug in Linux kernel. The biggest suspect is the
>>> introduction of linear P2M.  If you think this is a bug in toolstack,
>>> please let me know.
>> I absolutely think it is a toolstack bug. Even without the linear p2m
>> things would go wrong in case a ballooned down guest would be migrated,
>> as shared_info->arch.max_pfn would hold the upper limit of the guest
>> in this case and not the current size.
> I don't think this necessarily is a tool stack bug, at least not in
> the sense implied above - since (afaik) migrating ballooned guests
> (at least PV ones) has been working before, there ought to be
> logic to skip ballooned pages (and I certainly recall having seen
> migration slowly move up to e.g. 50% and the skip the other
> half due to being ballooned, albeit that recollection certainly is
> from before v2). And pages above the highest populated one
> ought to be considered ballooned just as much. With the
> information provided by Wei I don't think we can judge about
> this, since it only shows the values the migration process starts
> from, not when, why, or how it fails.

Max pfn reported by migration v2 is max pfn, not the number of pages of 
RAM in the guest.

It is used for the size of the bitmaps used by migration v2, including 
the logdirty op calls.

All frames between 0 and max pfn will have their type queried, and acted 
upon appropriately, including doing nothing if the frame was ballooned out.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  9:35     ` Andrew Cooper
@ 2015-09-04 11:35       ` Wei Liu
  2015-09-04 18:39         ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-09-04 11:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Wei Liu, Ian Campbell, Ian Jackson, David Vrabel,
	Jan Beulich, xen-devel

On Fri, Sep 04, 2015 at 10:35:52AM +0100, Andrew Cooper wrote:
> On 04/09/15 09:28, Jan Beulich wrote:
> >>>>On 04.09.15 at 05:38, <JGross@suse.com> wrote:
> >>On 09/04/2015 02:40 AM, Wei Liu wrote:
> >>>This issue is exposed by the introduction of migration v2. The symptom is that
> >>>a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
> >>>many pages.
> >>>
> >>>Note that all guests have 512MB memory, which means they have 131072 pages.
> >>>
> >>>Both 3.14 tests [2] [3] get the correct number of pages.  Like:
> >>>
> >>>     xc: detail: max_pfn 0x1ffff, p2m_frames 256
> >>>     ...
> >>>     xc: detail: Memory: 2048/131072    1%
> >>>     ...
> >>>
> >>>However in both 4.1 [0] [1] the number of pages are quite wrong.
> >>>
> >>>4.1 32 bit:
> >>>
> >>>     xc: detail: max_pfn 0xfffff, p2m_frames 1024
> >>>     ...
> >>>     xc: detail: Memory: 11264/1048576    1%
> >>>     ...
> >>>
> >>>It thinks it has 4096MB memory.
> >>>
> >>>4.1 64 bit:
> >>>
> >>>     xc: detail: max_pfn 0x3ffff, p2m_frames 512
> >>>     ...
> >>>     xc: detail: Memory: 3072/262144    1%
> >>>     ...
> >>>
> >>>It thinks it has 1024MB memory.
> >>>
> >>>The total number of pages is determined in libxc by calling
> >>>xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
> >>>hypervisor. And that value is clearly touched by Linux in some way.
> >>Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
> >>can handle. This is not the memory size of the domain.
> >>
> >>>I now think this is a bug in Linux kernel. The biggest suspect is the
> >>>introduction of linear P2M.  If you think this is a bug in toolstack,
> >>>please let me know.
> >>I absolutely think it is a toolstack bug. Even without the linear p2m
> >>things would go wrong in case a ballooned down guest would be migrated,
> >>as shared_info->arch.max_pfn would hold the upper limit of the guest
> >>in this case and not the current size.
> >I don't think this necessarily is a tool stack bug, at least not in
> >the sense implied above - since (afaik) migrating ballooned guests
> >(at least PV ones) has been working before, there ought to be
> >logic to skip ballooned pages (and I certainly recall having seen
> >migration slowly move up to e.g. 50% and the skip the other
> >half due to being ballooned, albeit that recollection certainly is
> >from before v2). And pages above the highest populated one
> >ought to be considered ballooned just as much. With the
> >information provided by Wei I don't think we can judge about
> >this, since it only shows the values the migration process starts
> >from, not when, why, or how it fails.
> 
> Max pfn reported by migration v2 is max pfn, not the number of pages of RAM
> in the guest.
> 

I understand that by looking at the code. Just the log itself
is very confusing.

I propose we rename the log a bit. Maybe change "Memory" to "P2M" or
something else?

> It is used for the size of the bitmaps used by migration v2, including the
> logdirty op calls.
> 
> All frames between 0 and max pfn will have their type queried, and acted
> upon appropriately, including doing nothing if the frame was ballooned out.

In short, do you think this is a bug in migration v2?

When I looked at write_batch() I found some snippets that I thought to
be wrong. But I didn't what to make the judgement when I didn't have a
clear head.

Wei.

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  8:28   ` Jan Beulich
  2015-09-04  9:35     ` Andrew Cooper
@ 2015-09-04 11:40     ` Wei Liu
  1 sibling, 0 replies; 15+ messages in thread
From: Wei Liu @ 2015-09-04 11:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson,
	David Vrabel, xen-devel

On Fri, Sep 04, 2015 at 02:28:41AM -0600, Jan Beulich wrote:
> >>> On 04.09.15 at 05:38, <JGross@suse.com> wrote:
> > On 09/04/2015 02:40 AM, Wei Liu wrote:
> >> This issue is exposed by the introduction of migration v2. The symptom is that
> >> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
> >> many pages.
> >>
> >> Note that all guests have 512MB memory, which means they have 131072 pages.
> >>
> >> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
> >>
> >>     xc: detail: max_pfn 0x1ffff, p2m_frames 256
> >>     ...
> >>     xc: detail: Memory: 2048/131072    1%
> >>     ...
> >>
> >> However in both 4.1 [0] [1] the number of pages are quite wrong.
> >>
> >> 4.1 32 bit:
> >>
> >>     xc: detail: max_pfn 0xfffff, p2m_frames 1024
> >>     ...
> >>     xc: detail: Memory: 11264/1048576    1%
> >>     ...
> >>
> >> It thinks it has 4096MB memory.
> >>
> >> 4.1 64 bit:
> >>
> >>     xc: detail: max_pfn 0x3ffff, p2m_frames 512
> >>     ...
> >>     xc: detail: Memory: 3072/262144    1%
> >>     ...
> >>
> >> It thinks it has 1024MB memory.
> >>
> >> The total number of pages is determined in libxc by calling
> >> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
> >> hypervisor. And that value is clearly touched by Linux in some way.
> > 
> > Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
> > can handle. This is not the memory size of the domain.
> > 
> >> I now think this is a bug in Linux kernel. The biggest suspect is the
> >> introduction of linear P2M.  If you think this is a bug in toolstack,
> >> please let me know.
> > 
> > I absolutely think it is a toolstack bug. Even without the linear p2m
> > things would go wrong in case a ballooned down guest would be migrated,
> > as shared_info->arch.max_pfn would hold the upper limit of the guest
> > in this case and not the current size.
> 
> I don't think this necessarily is a tool stack bug, at least not in
> the sense implied above - since (afaik) migrating ballooned guests
> (at least PV ones) has been working before, there ought to be
> logic to skip ballooned pages (and I certainly recall having seen

Yes, there is.

Migration v2 has logic to skip gpfn when the underlying mfn is
INVALID_MFN. I'm not too convinced the code that implement that logic is
working correctly. I need to have a closer look today.

> migration slowly move up to e.g. 50% and the skip the other
> half due to being ballooned, albeit that recollection certainly is
> from before v2). And pages above the highest populated one
> ought to be considered ballooned just as much. With the
> information provided by Wei I don't think we can judge about
> this, since it only shows the values the migration process starts
> from, not when, why, or how it fails.
> 

It fails on the receiving end when helper tries to populate more pages
than the guest can have. In the specific case above, helper populates
nr 131073 page and fails.

Wei.

> Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  8:53 ` Ian Campbell
  2015-09-04  9:28   ` Ian Campbell
@ 2015-09-04 14:42   ` David Vrabel
  2015-09-04 14:53     ` Wei Liu
  2015-09-07  7:09   ` Jan Beulich
  2 siblings, 1 reply; 15+ messages in thread
From: David Vrabel @ 2015-09-04 14:42 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu, xen-devel, Juergen Gross
  Cc: Juergen Gross, Andrew Cooper, Ian Jackson

On 04/09/15 09:53, Ian Campbell wrote:
> On Fri, 2015-09-04 at 01:40 +0100, Wei Liu wrote:
>> Hi David
>>
>> This issue is exposed by the introduction of migration v2. The symptom is that
>> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
>> many pages.
> 
> FWIW my adhoc tests overnight gave me:
> 
> 37858: b953c0d234bc72e8489d3bf51a276c5c4ec85345 v4.1		Fail
> 37862: 39a8804455fb23f09157341d3ba7db6d7ae6ee76 v4.0		Fail
> 37860: bfa76d49576599a4b9f9b7a71f23d73d6dcff735 v3.19		Fail
> 
> 37872: e36f014edff70fc02b3d3d79cead1d58f289332e v3.19-rc7	Fail
> 37866: 26bc420b59a38e4e6685a73345a0def461136dce v3.19-rc6	Fail
> 37868: ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc v3.19-rc5	Fail
> 37864: eaa27f34e91a14cdceed26ed6c6793ec1d186115 v3.19-rc4	Fail *
> 37867: b1940cd21c0f4abdce101253e860feff547291b0 v3.19-rc3	Pass *
> 37865: b7392d2247cfe6771f95d256374f1a8e6a6f48d6 v3.19-rc2	Pass
> 
> 37863: 97bf6af1f928216fd6c5a66e8a57bfa95a659672 v3.19-rc1	Pass
> 
> 37861: b2776bf7149bddd1f4161f14f79520f17fc1d71d v3.18		Pass
> 
> I have set the adhoc bisector working on the ~200 commits between rc3 and
> rc4. It's running in the Citrix instance (which is quieter) so the interim
> results are only visible within our network at http://osstest.xs.citrite.ne
> t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
> -xl..html.
> 
> So far it has confirmed the basis fail and it is now rechecking the basis
> pass.
> 
> Slightly strange though is:
> $ git log --oneline v3.19-rc3..v3.19-rc4 -- drivers/xen/ arch/x86/xen/ include/xen/
> $
> 
> i.e. there are no relevant seeming xen commits in that range. Maybe the
> last one of this is more relevant?

Since this bisect attempt appears to have disappeared into the weeds I
did my own and it fingered:

633d6f17cd91ad5bf2370265946f716e42d388c6 (x86/xen: prepare p2m list for
memory hotplug) which was introduced in 4.0-rc7.

This looks a lot more plausible as the Linux change triggering the
migration failures.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04 14:42   ` David Vrabel
@ 2015-09-04 14:53     ` Wei Liu
  2015-09-04 14:58       ` David Vrabel
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-09-04 14:53 UTC (permalink / raw)
  To: David Vrabel
  Cc: Juergen Gross, Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson,
	xen-devel

On Fri, Sep 04, 2015 at 03:42:06PM +0100, David Vrabel wrote:
> On 04/09/15 09:53, Ian Campbell wrote:
> > On Fri, 2015-09-04 at 01:40 +0100, Wei Liu wrote:
> >> Hi David
> >>
> >> This issue is exposed by the introduction of migration v2. The symptom is that
> >> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
> >> many pages.
> > 
> > FWIW my adhoc tests overnight gave me:
> > 
> > 37858: b953c0d234bc72e8489d3bf51a276c5c4ec85345 v4.1		Fail
> > 37862: 39a8804455fb23f09157341d3ba7db6d7ae6ee76 v4.0		Fail
> > 37860: bfa76d49576599a4b9f9b7a71f23d73d6dcff735 v3.19		Fail
> > 
> > 37872: e36f014edff70fc02b3d3d79cead1d58f289332e v3.19-rc7	Fail
> > 37866: 26bc420b59a38e4e6685a73345a0def461136dce v3.19-rc6	Fail
> > 37868: ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc v3.19-rc5	Fail
> > 37864: eaa27f34e91a14cdceed26ed6c6793ec1d186115 v3.19-rc4	Fail *
> > 37867: b1940cd21c0f4abdce101253e860feff547291b0 v3.19-rc3	Pass *
> > 37865: b7392d2247cfe6771f95d256374f1a8e6a6f48d6 v3.19-rc2	Pass
> > 
> > 37863: 97bf6af1f928216fd6c5a66e8a57bfa95a659672 v3.19-rc1	Pass
> > 
> > 37861: b2776bf7149bddd1f4161f14f79520f17fc1d71d v3.18		Pass
> > 
> > I have set the adhoc bisector working on the ~200 commits between rc3 and
> > rc4. It's running in the Citrix instance (which is quieter) so the interim
> > results are only visible within our network at http://osstest.xs.citrite.ne
> > t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
> > -xl..html.
> > 
> > So far it has confirmed the basis fail and it is now rechecking the basis
> > pass.
> > 
> > Slightly strange though is:
> > $ git log --oneline v3.19-rc3..v3.19-rc4 -- drivers/xen/ arch/x86/xen/ include/xen/
> > $
> > 
> > i.e. there are no relevant seeming xen commits in that range. Maybe the
> > last one of this is more relevant?
> 
> Since this bisect attempt appears to have disappeared into the weeds I
> did my own and it fingered:
> 
> 633d6f17cd91ad5bf2370265946f716e42d388c6 (x86/xen: prepare p2m list for
> memory hotplug) which was introduced in 4.0-rc7.
> 
> This looks a lot more plausible as the Linux change triggering the
> migration failures.
> 

FWIW. Same 32bit kernel, 128MB memory, migration is OK.

> David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04 14:53     ` Wei Liu
@ 2015-09-04 14:58       ` David Vrabel
  0 siblings, 0 replies; 15+ messages in thread
From: David Vrabel @ 2015-09-04 14:58 UTC (permalink / raw)
  To: Wei Liu; +Cc: Juergen Gross, xen-devel, Ian Jackson, Ian Campbell,
	Andrew Cooper

On 04/09/15 15:53, Wei Liu wrote:
> On Fri, Sep 04, 2015 at 03:42:06PM +0100, David Vrabel wrote:
>> On 04/09/15 09:53, Ian Campbell wrote:
>>> On Fri, 2015-09-04 at 01:40 +0100, Wei Liu wrote:
>>>> Hi David
>>>>
>>>> This issue is exposed by the introduction of migration v2. The symptom is that
>>>> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
>>>> many pages.
>>>
>>> FWIW my adhoc tests overnight gave me:
>>>
>>> 37858: b953c0d234bc72e8489d3bf51a276c5c4ec85345 v4.1		Fail
>>> 37862: 39a8804455fb23f09157341d3ba7db6d7ae6ee76 v4.0		Fail
>>> 37860: bfa76d49576599a4b9f9b7a71f23d73d6dcff735 v3.19		Fail
>>>
>>> 37872: e36f014edff70fc02b3d3d79cead1d58f289332e v3.19-rc7	Fail
>>> 37866: 26bc420b59a38e4e6685a73345a0def461136dce v3.19-rc6	Fail
>>> 37868: ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc v3.19-rc5	Fail
>>> 37864: eaa27f34e91a14cdceed26ed6c6793ec1d186115 v3.19-rc4	Fail *
>>> 37867: b1940cd21c0f4abdce101253e860feff547291b0 v3.19-rc3	Pass *
>>> 37865: b7392d2247cfe6771f95d256374f1a8e6a6f48d6 v3.19-rc2	Pass
>>>
>>> 37863: 97bf6af1f928216fd6c5a66e8a57bfa95a659672 v3.19-rc1	Pass
>>>
>>> 37861: b2776bf7149bddd1f4161f14f79520f17fc1d71d v3.18		Pass
>>>
>>> I have set the adhoc bisector working on the ~200 commits between rc3 and
>>> rc4. It's running in the Citrix instance (which is quieter) so the interim
>>> results are only visible within our network at http://osstest.xs.citrite.ne
>>> t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
>>> -xl..html.
>>>
>>> So far it has confirmed the basis fail and it is now rechecking the basis
>>> pass.
>>>
>>> Slightly strange though is:
>>> $ git log --oneline v3.19-rc3..v3.19-rc4 -- drivers/xen/ arch/x86/xen/ include/xen/
>>> $
>>>
>>> i.e. there are no relevant seeming xen commits in that range. Maybe the
>>> last one of this is more relevant?
>>
>> Since this bisect attempt appears to have disappeared into the weeds I
>> did my own and it fingered:
>>
>> 633d6f17cd91ad5bf2370265946f716e42d388c6 (x86/xen: prepare p2m list for
>> memory hotplug) which was introduced in 4.0-rc7.
>>
>> This looks a lot more plausible as the Linux change triggering the
>> migration failures.
>>
> 
> FWIW. Same 32bit kernel, 128MB memory, migration is OK.

This commit is only bad with 64-bit guests -- with a 32-bit guest the
maximum p2m size covers only 64 GiB.  It will also requires
XEN_BALLOON_MEMORY_HOTPLUG to be enabled.

This commit is exposing a toolstack bug.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04 11:35       ` Wei Liu
@ 2015-09-04 18:39         ` Andrew Cooper
  2015-09-04 19:46           ` Wei Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2015-09-04 18:39 UTC (permalink / raw)
  To: Wei Liu
  Cc: Juergen Gross, Ian Campbell, Ian Jackson, David Vrabel,
	Jan Beulich, xen-devel



On 04/09/15 12:35, Wei Liu wrote:
> On Fri, Sep 04, 2015 at 10:35:52AM +0100, Andrew Cooper wrote:
>> On 04/09/15 09:28, Jan Beulich wrote:
>>>>>> On 04.09.15 at 05:38, <JGross@suse.com> wrote:
>>>> On 09/04/2015 02:40 AM, Wei Liu wrote:
>>>>> This issue is exposed by the introduction of migration v2. The symptom is that
>>>>> a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
>>>>> many pages.
>>>>>
>>>>> Note that all guests have 512MB memory, which means they have 131072 pages.
>>>>>
>>>>> Both 3.14 tests [2] [3] get the correct number of pages.  Like:
>>>>>
>>>>>      xc: detail: max_pfn 0x1ffff, p2m_frames 256
>>>>>      ...
>>>>>      xc: detail: Memory: 2048/131072    1%
>>>>>      ...
>>>>>
>>>>> However in both 4.1 [0] [1] the number of pages are quite wrong.
>>>>>
>>>>> 4.1 32 bit:
>>>>>
>>>>>      xc: detail: max_pfn 0xfffff, p2m_frames 1024
>>>>>      ...
>>>>>      xc: detail: Memory: 11264/1048576    1%
>>>>>      ...
>>>>>
>>>>> It thinks it has 4096MB memory.
>>>>>
>>>>> 4.1 64 bit:
>>>>>
>>>>>      xc: detail: max_pfn 0x3ffff, p2m_frames 512
>>>>>      ...
>>>>>      xc: detail: Memory: 3072/262144    1%
>>>>>      ...
>>>>>
>>>>> It thinks it has 1024MB memory.
>>>>>
>>>>> The total number of pages is determined in libxc by calling
>>>>> xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
>>>>> hypervisor. And that value is clearly touched by Linux in some way.
>>>> Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
>>>> can handle. This is not the memory size of the domain.
>>>>
>>>>> I now think this is a bug in Linux kernel. The biggest suspect is the
>>>>> introduction of linear P2M.  If you think this is a bug in toolstack,
>>>>> please let me know.
>>>> I absolutely think it is a toolstack bug. Even without the linear p2m
>>>> things would go wrong in case a ballooned down guest would be migrated,
>>>> as shared_info->arch.max_pfn would hold the upper limit of the guest
>>>> in this case and not the current size.
>>> I don't think this necessarily is a tool stack bug, at least not in
>>> the sense implied above - since (afaik) migrating ballooned guests
>>> (at least PV ones) has been working before, there ought to be
>>> logic to skip ballooned pages (and I certainly recall having seen
>>> migration slowly move up to e.g. 50% and the skip the other
>>> half due to being ballooned, albeit that recollection certainly is
>> >from before v2). And pages above the highest populated one
>>> ought to be considered ballooned just as much. With the
>>> information provided by Wei I don't think we can judge about
>>> this, since it only shows the values the migration process starts
>>> from, not when, why, or how it fails.
>> Max pfn reported by migration v2 is max pfn, not the number of pages of RAM
>> in the guest.
>>
> I understand that by looking at the code. Just the log itself
> is very confusing.
>
> I propose we rename the log a bit. Maybe change "Memory" to "P2M" or
> something else?

P2M would be wrong for HVM guests.  Memory was the same term used by the 
legacy code iirc.

"Frames" is probably the best term.

>
>> It is used for the size of the bitmaps used by migration v2, including the
>> logdirty op calls.
>>
>> All frames between 0 and max pfn will have their type queried, and acted
>> upon appropriately, including doing nothing if the frame was ballooned out.
> In short, do you think this is a bug in migration v2?

There is insufficient information in this thread to say either way. 
Maybe.  Maybe a Linux kernel bug.

>
> When I looked at write_batch() I found some snippets that I thought to
> be wrong. But I didn't what to make the judgement when I didn't have a
> clear head.

write_batch() is a complicated function but it can't usefully be split 
any further.  I would be happy to explain bits or expand the existing 
comments, but it is also possible that it is buggy.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04 18:39         ` Andrew Cooper
@ 2015-09-04 19:46           ` Wei Liu
  2015-09-04 20:32             ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-09-04 19:46 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Wei Liu, Ian Campbell, Ian Jackson, David Vrabel,
	Jan Beulich, xen-devel

On Fri, Sep 04, 2015 at 07:39:27PM +0100, Andrew Cooper wrote:
> 
> 
> On 04/09/15 12:35, Wei Liu wrote:
> >On Fri, Sep 04, 2015 at 10:35:52AM +0100, Andrew Cooper wrote:
> >>On 04/09/15 09:28, Jan Beulich wrote:
> >>>>>>On 04.09.15 at 05:38, <JGross@suse.com> wrote:
> >>>>On 09/04/2015 02:40 AM, Wei Liu wrote:
> >>>>>This issue is exposed by the introduction of migration v2. The symptom is that
> >>>>>a guest with 32 bit 4.1 kernel can't be restored because it's asking for too
> >>>>>many pages.
> >>>>>
> >>>>>Note that all guests have 512MB memory, which means they have 131072 pages.
> >>>>>
> >>>>>Both 3.14 tests [2] [3] get the correct number of pages.  Like:
> >>>>>
> >>>>>     xc: detail: max_pfn 0x1ffff, p2m_frames 256
> >>>>>     ...
> >>>>>     xc: detail: Memory: 2048/131072    1%
> >>>>>     ...
> >>>>>
> >>>>>However in both 4.1 [0] [1] the number of pages are quite wrong.
> >>>>>
> >>>>>4.1 32 bit:
> >>>>>
> >>>>>     xc: detail: max_pfn 0xfffff, p2m_frames 1024
> >>>>>     ...
> >>>>>     xc: detail: Memory: 11264/1048576    1%
> >>>>>     ...
> >>>>>
> >>>>>It thinks it has 4096MB memory.
> >>>>>
> >>>>>4.1 64 bit:
> >>>>>
> >>>>>     xc: detail: max_pfn 0x3ffff, p2m_frames 512
> >>>>>     ...
> >>>>>     xc: detail: Memory: 3072/262144    1%
> >>>>>     ...
> >>>>>
> >>>>>It thinks it has 1024MB memory.
> >>>>>
> >>>>>The total number of pages is determined in libxc by calling
> >>>>>xc_domain_nr_gpfns, which yanks shared_info->arch.max_pfn from
> >>>>>hypervisor. And that value is clearly touched by Linux in some way.
> >>>>Sure. shared_info->arch.max_pfn holds the number of pfns the p2m list
> >>>>can handle. This is not the memory size of the domain.
> >>>>
> >>>>>I now think this is a bug in Linux kernel. The biggest suspect is the
> >>>>>introduction of linear P2M.  If you think this is a bug in toolstack,
> >>>>>please let me know.
> >>>>I absolutely think it is a toolstack bug. Even without the linear p2m
> >>>>things would go wrong in case a ballooned down guest would be migrated,
> >>>>as shared_info->arch.max_pfn would hold the upper limit of the guest
> >>>>in this case and not the current size.
> >>>I don't think this necessarily is a tool stack bug, at least not in
> >>>the sense implied above - since (afaik) migrating ballooned guests
> >>>(at least PV ones) has been working before, there ought to be
> >>>logic to skip ballooned pages (and I certainly recall having seen
> >>>migration slowly move up to e.g. 50% and the skip the other
> >>>half due to being ballooned, albeit that recollection certainly is
> >>>from before v2). And pages above the highest populated one
> >>>ought to be considered ballooned just as much. With the
> >>>information provided by Wei I don't think we can judge about
> >>>this, since it only shows the values the migration process starts
> >>>from, not when, why, or how it fails.
> >>Max pfn reported by migration v2 is max pfn, not the number of pages of RAM
> >>in the guest.
> >>
> >I understand that by looking at the code. Just the log itself
> >is very confusing.
> >
> >I propose we rename the log a bit. Maybe change "Memory" to "P2M" or
> >something else?
> 
> P2M would be wrong for HVM guests.  Memory was the same term used by the
> legacy code iirc.
> 
> "Frames" is probably the best term.
> 
> >
> >>It is used for the size of the bitmaps used by migration v2, including the
> >>logdirty op calls.
> >>
> >>All frames between 0 and max pfn will have their type queried, and acted
> >>upon appropriately, including doing nothing if the frame was ballooned out.
> >In short, do you think this is a bug in migration v2?
> 
> There is insufficient information in this thread to say either way. Maybe.
> Maybe a Linux kernel bug.
> 
> >
> >When I looked at write_batch() I found some snippets that I thought to
> >be wrong. But I didn't what to make the judgement when I didn't have a
> >clear head.
> 
> write_batch() is a complicated function but it can't usefully be split any
> further.  I would be happy to explain bits or expand the existing comments,
> but it is also possible that it is buggy.
> 

I think write_batch is correct. I overlooked one function call. I'm not
overly happy with the handling of balloon pages and the use of deferred
array in non-live transfer, but those things are not buggy in itself.

See my patch series for the real bug I discover. Gosh, took me a whole
day to identity the culprit.

Wei.

> ~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04 19:46           ` Wei Liu
@ 2015-09-04 20:32             ` Andrew Cooper
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2015-09-04 20:32 UTC (permalink / raw)
  To: Wei Liu
  Cc: Juergen Gross, Ian Campbell, Ian Jackson, David Vrabel,
	Jan Beulich, xen-devel

On 04/09/15 20:46, Wei Liu wrote:
>
>>> When I looked at write_batch() I found some snippets that I thought to
>>> be wrong. But I didn't what to make the judgement when I didn't have a
>>> clear head.
>> write_batch() is a complicated function but it can't usefully be split any
>> further.  I would be happy to explain bits or expand the existing comments,
>> but it is also possible that it is buggy.
>>
> I think write_batch is correct. I overlooked one function call. I'm not
> overly happy with the handling of balloon pages and the use of deferred
> array in non-live transfer, but those things are not buggy in itself.

Handling of ballooned pages is broken at several layers.  This was 
covered in my talk at Seattle.  Fixing it is non-trivial.

The use of the deferred array is necessary for live migrates, and used 
in non-live migrates to avoid diverging the algorithm.  Nothing in the 
non-live side queries the deferred array (which itself is a contributory 
factor to the ballooning issue, as there is no interlock to prevent 
something else issuing population/depopoulation hypercalls on behalf of 
the paused domain).

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Linux 4.1 reports wrong number of pages to toolstack
  2015-09-04  8:53 ` Ian Campbell
  2015-09-04  9:28   ` Ian Campbell
  2015-09-04 14:42   ` David Vrabel
@ 2015-09-07  7:09   ` Jan Beulich
  2 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2015-09-07  7:09 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Andrew Cooper, Wei Liu, DavidVrabel, Juergen Gross

>>> On 04.09.15 at 10:53, <ian.campbell@citrix.com> wrote:
> I have set the adhoc bisector working on the ~200 commits between rc3 and
> rc4. It's running in the Citrix instance (which is quieter) so the interim
> results are only visible within our network at http://osstest.xs.citrite.ne 
> t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
> -xl..html.
> 
> So far it has confirmed the basis fail and it is now rechecking the basis
> pass.
> 
> Slightly strange though is:
> $ git log --oneline v3.19-rc3..v3.19-rc4 -- drivers/xen/ arch/x86/xen/ 
> include/xen/
> $
> 
> i.e. there are no relevant seeming xen commits in that range. Maybe the
> last one of this is more relevant?
> 
> $ git log --grep=[xX][eE][nN] --oneline v3.19-rc3..v3.19-rc4 -- 
> bdec419 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> 07ff890 xen-netback: fixing the propagation of the transmit shaper timeout
> 132978b x86: Fix step size adjustment during initial memory mapping
> $

So if I'm interpreting the graph right it was indeed the last of these
which got fingered, which is mine. Yet having looked at it in close
detail just now again I can't see it to be wrong, or even have an
effect on post-boot state: All it does is adjust the block sizes in
which the 1:1 mapping gets established. The final result ought to
still be the same (with - obviously - the exception of which pages
may get used for page tables). No change to any global variables
afaics.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-09-07  7:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-04  0:40 Linux 4.1 reports wrong number of pages to toolstack Wei Liu
2015-09-04  3:38 ` Juergen Gross
2015-09-04  8:28   ` Jan Beulich
2015-09-04  9:35     ` Andrew Cooper
2015-09-04 11:35       ` Wei Liu
2015-09-04 18:39         ` Andrew Cooper
2015-09-04 19:46           ` Wei Liu
2015-09-04 20:32             ` Andrew Cooper
2015-09-04 11:40     ` Wei Liu
2015-09-04  8:53 ` Ian Campbell
2015-09-04  9:28   ` Ian Campbell
2015-09-04 14:42   ` David Vrabel
2015-09-04 14:53     ` Wei Liu
2015-09-04 14:58       ` David Vrabel
2015-09-07  7:09   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).