xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: [Xen-users] rebased openSUSE Xen dom0 Patches
@ 2010-04-15 18:36 Simon Graham
  2010-04-15 18:41 ` Keir Fraser
  2010-04-16  7:58 ` Jan Beulich
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Graham @ 2010-04-15 18:36 UTC (permalink / raw)
  To: xen-devel

On Tue, Apr 06, 2010 at 02:37:35PM +0100, Andrew Lyon wrote:
> I've uploaded updated 2.6.31 and 2.6.32 rebased openSUSE Xen dom0
> patches and ebuilds to
> http://code.google.com/p/gentoo-xen-kernel/downloads/list
> 
> Notable change is that both include the online resize feature recently
> posted to xen-devel.
>

We're currently testing these patches with the Ubuntu 10.4 kernel and Xen 3.4.2 and have encountered some problems in the Xen mm code shortly after Dom0 starts to run.

The typical failure is that Dom0 is trying to write to a page table (a L4 PT in the example below) but finds the page it is writing to is a PGT_writable_page

...
(XEN) mm.c:2410:d0 Bad type (saw e800000000000001 != exp 8000000000000000) for mfn 0x114702 (pfn 0x702)
(XEN) mm.c:2802:d0 Error while pinning mfn 114702
[   18.549043] HYPERVISOR_mmuext_op failed:  pgd ffff880000703000 cmd0=3 mfn0=114702 cmd1=3 mfn1=114703 err=-22
[   18.549689] ------------[ cut here ]------------
[   18.549975] kernel BUG at /sandbox/orc-tree-10.4/orc-kernel/linux-2.6.32/arch/x86/mm/hypervisor.c:680!
[   18.550542] invalid opcode: 0000 [#1] SMP 

Another common failure is shown below.  Dom0 expects a page to be PGT_l1_page_table but instead finds the page's type is PGT_writable_page:

(XEN) mm.c:2410:d0 Bad type (saw e800000000000001 != exp 2000000000000000) for mfn 0x114711 (pfn 0x711)
(XEN) mm.c:2413:d0 Writable page alloc'd from ptwr_do_page_fault:4481
(XEN) mm.c:821:d0 Attempt to create linear p.t. with write perms
[   21.159495] HYPERVISOR_multicall_check failed on call 1 rc 4294967274
[   21.159498] HYPERVISOR_multicall_check failed page=ffff8800020bbc00 pfn=1808
[   21.159503] ------------[ cut here ]------------
[   21.159505] kernel BUG at /sandbox/orc-tree-10.4/orc-kernel/linux-2.6.32/arch/x86/mm/hypervisor.c:480!

It is not clear whether Dom0 is erroneously using pages that are and truly should be writable, or whether somehow pages have erroneously become writable; if anyone can suggest likely places to look for problems that would be awesome!

Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-15 18:36 [Xen-users] rebased openSUSE Xen dom0 Patches Simon Graham
@ 2010-04-15 18:41 ` Keir Fraser
  2010-04-16  7:58 ` Jan Beulich
  1 sibling, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2010-04-15 18:41 UTC (permalink / raw)
  To: Simon Graham, xen-devel@lists.xensource.com

On 15/04/2010 19:36, "Simon Graham" <simon.graham@virtualcomputer.com>
wrote:

> It is not clear whether Dom0 is erroneously using pages that are and truly
> should be writable, or whether somehow pages have erroneously become writable;
> if anyone can suggest likely places to look for problems that would be
> awesome!

Typically it means that the guest kernel hasn't managed to zap all of its
writable mappings of a page before assigning it for use as a page table.
That doesn't immediately point to the bug, unfortunately. ;-)

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-15 18:36 [Xen-users] rebased openSUSE Xen dom0 Patches Simon Graham
  2010-04-15 18:41 ` Keir Fraser
@ 2010-04-16  7:58 ` Jan Beulich
  2010-04-16 13:42   ` Simon Graham
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2010-04-16  7:58 UTC (permalink / raw)
  To: Andrew Lyon, Simon Graham; +Cc: xen-devel

>>> "Simon Graham" <simon.graham@virtualcomputer.com> 15.04.10 20:36 >>>
>On Tue, Apr 06, 2010 at 02:37:35PM +0100, Andrew Lyon wrote:
>> I've uploaded updated 2.6.31 and 2.6.32 rebased openSUSE Xen dom0
>> patches and ebuilds to
>> http://code.google.com/p/gentoo-xen-kernel/downloads/list 
>> 
>> Notable change is that both include the online resize feature recently
>> posted to xen-devel.
>>
>
>We're currently testing these patches with the Ubuntu 10.4 kernel and Xen 3.4.2 and have encountered some problems in the Xen mm code shortly after Dom0 starts to run.
>
>The typical failure is that Dom0 is trying to write to a page table (a L4 PT in the example below) but finds the page it is writing to is a PGT_writable_page
>
>...
>Another common failure is shown below.  Dom0 expects a page to be PGT_l1_page_table but instead finds the page's type is PGT_writable_page:

Well, we certainly aren't experiencing anything like that with our
"original" version of those patches, and I would suppose Andy didn't
see such either. Hence perhaps a problem that got introduced
by how you made use of the patches and/or some specifics of the
source tree you applied them to?

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-16  7:58 ` Jan Beulich
@ 2010-04-16 13:42   ` Simon Graham
  2010-04-19  8:41     ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Graham @ 2010-04-16 13:42 UTC (permalink / raw)
  To: Jan Beulich, Andrew Lyon; +Cc: xen-devel

Thanks Jan,

> >...
> >Another common failure is shown below.  Dom0 expects a page to be
> PGT_l1_page_table but instead finds the page's type is
> PGT_writable_page:
> 
> Well, we certainly aren't experiencing anything like that with our
> "original" version of those patches, and I would suppose Andy didn't
> see such either. Hence perhaps a problem that got introduced
> by how you made use of the patches and/or some specifics of the
> source tree you applied them to?
> 

The patches seem to apply cleanly to the Ubuntu 10.4 kernel source tree
(but I agree that this might be the problem)...

We've actually narrowed the problem down a bit -- the pages we fail on
are always in the range of those freed by free_init_pages("unused kernel
memory") from free_initmem(). Now, the specific problem is that a
writable page cant be turned into a page table page because it's page
type ref count is non-zero -- I see in the free_init_pages() routine
that two hypercalls are made for each page, one of which sets the pte to
zero (which would decrement the page type ref count I think) and one of
which does not -- doesn't this leave the page type ref count at 1 which
in turn means the page cant be turned into a page table page? Or is
there some other magic that occurs later on that should decrement the
page type ref count before attempting to use the page as a page table
page?

Here's the extract of the code I am talking about (yes, we are using a
64-bit Dom0):

        printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin)
>> 10);

        for (; addr < end; addr += PAGE_SIZE) {
                ClearPageReserved(virt_to_page(addr));
                init_page_count(virt_to_page(addr));
                memset((void *)(addr & ~(PAGE_SIZE-1)),
                        POISON_FREE_INITMEM, PAGE_SIZE);
#ifdef CONFIG_X86_64
                if (addr >= __START_KERNEL_map) {
                        /* make_readonly() reports all kernel addresses.
*/
                        if (HYPERVISOR_update_va_mapping((unsigned
long)__va(__pa(addr)),
 
pfn_pte(__pa(addr) >> PAGE_SHIFT,
 
PAGE_KERNEL),
                                                         0))
                                BUG();
                        if (HYPERVISOR_update_va_mapping(addr, __pte(0),
0))
                                BUG();
                }
#endif
...

Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-16 13:42   ` Simon Graham
@ 2010-04-19  8:41     ` Jan Beulich
  2010-04-19 14:52       ` Simon Graham
  2010-04-20 16:07       ` Simon Graham
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2010-04-19  8:41 UTC (permalink / raw)
  To: Andrew Lyon, Simon Graham; +Cc: xen-devel

>>> "Simon Graham" <simon.graham@virtualcomputer.com> 16.04.10 15:42 >>>
>We've actually narrowed the problem down a bit -- the pages we fail on
>are always in the range of those freed by free_init_pages("unused kernel
>memory") from free_initmem(). Now, the specific problem is that a
>writable page cant be turned into a page table page because it's page
>type ref count is non-zero -- I see in the free_init_pages() routine
>that two hypercalls are made for each page, one of which sets the pte to
>zero (which would decrement the page type ref count I think) and one of
>which does not -- doesn't this leave the page type ref count at 1 which
>in turn means the page cant be turned into a page table page? Or is
>there some other magic that occurs later on that should decrement the
>page type ref count before attempting to use the page as a page table
>page?

Are you observing this with both the .31 and .32 patches?

>Here's the extract of the code I am talking about (yes, we are using a
>64-bit Dom0):
>...

But that code is precisely what guarantees that the pages *can* be
converted to page table pages (by completely unmapping them from
the kernel image part of the address space). So your explanation is
rather confusing than clarifying to me...

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-19  8:41     ` Jan Beulich
@ 2010-04-19 14:52       ` Simon Graham
  2010-04-19 15:09         ` Jan Beulich
  2010-04-20 16:07       ` Simon Graham
  1 sibling, 1 reply; 10+ messages in thread
From: Simon Graham @ 2010-04-19 14:52 UTC (permalink / raw)
  To: Jan Beulich, Andrew Lyon; +Cc: xen-devel

> >in turn means the page cant be turned into a page table page? Or is
> >there some other magic that occurs later on that should decrement the
> >page type ref count before attempting to use the page as a page table
> >page?
> 
> Are you observing this with both the .31 and .32 patches?
> 

We're only testing the .32 patches.

> >Here's the extract of the code I am talking about (yes, we are using
a
> >64-bit Dom0):
> >...
> 
> But that code is precisely what guarantees that the pages *can* be
> converted to page table pages (by completely unmapping them from
> the kernel image part of the address space). So your explanation is
> rather confusing than clarifying to me...

I agree that that is the intent of this code -- what we _seem_ to
observe (and this
is hard to prove) is that the page type ref count is not being
decremented by this
code which would imply that the unmapping is not happening for some
reason. The only
real evidence I have for this is that the failure always occurs on one
of these pages.

Now, the first of these hypercalls creates a pte with PAGE_KERNEL as the
opts and
I think this includes read-write access whereas the second one
completely deletes
the pte for the alternate mapping -- the combined affect should leave
the page type 
ref count as one shouldn't it? (for the read-write kernel mapping)

That being the case, I'm not sure how the page type ref count is
supposed to get to
zero when reusing one of these pages as a page table page later on.

Thanks for your help
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-19 14:52       ` Simon Graham
@ 2010-04-19 15:09         ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2010-04-19 15:09 UTC (permalink / raw)
  To: Simon Graham; +Cc: Andrew Lyon, xen-devel

>>> "Simon Graham" <simon.graham@virtualcomputer.com> 19.04.10 16:52 >>>
>Now, the first of these hypercalls creates a pte with PAGE_KERNEL as the
>opts and
>I think this includes read-write access whereas the second one
>completely deletes
>the pte for the alternate mapping -- the combined affect should leave
>the page type 
>ref count as one shouldn't it? (for the read-write kernel mapping)
>
>That being the case, I'm not sure how the page type ref count is
>supposed to get to
>zero when reusing one of these pages as a page table page later on.

Such pages get converted to read-only when they get allocated for
the purpose of being a page table. Hence there being a type refcount
of 1 at the point of the insertion means that there's a second mapping
to the page somewhere else. That's what you want to find.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-19  8:41     ` Jan Beulich
  2010-04-19 14:52       ` Simon Graham
@ 2010-04-20 16:07       ` Simon Graham
  2010-04-20 19:01         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 10+ messages in thread
From: Simon Graham @ 2010-04-20 16:07 UTC (permalink / raw)
  To: Simon Graham, Jan Beulich, Andrew Lyon; +Cc: xen-devel

> >
> > But that code is precisely what guarantees that the pages *can* be
> > converted to page table pages (by completely unmapping them from
> > the kernel image part of the address space). So your explanation is
> > rather confusing than clarifying to me...
> 
> I agree that that is the intent of this code -- what we _seem_ to
> observe (and this
> is hard to prove) is that the page type ref count is not being
> decremented by this
> code which would imply that the unmapping is not happening for some
> reason. The only
> real evidence I have for this is that the failure always occurs on one
> of these pages.
> 

We now think we've found the problem which seems to be due to the
following two calls in Linux within mark_rodata_ro():

    free_init_pages("unused kernel memory",
                    (unsigned long)
                     page_address(virt_to_page(text_end)),
                    (unsigned long)
                     page_address(virt_to_page(rodata_start)));
    free_init_pages("unused kernel memory",
                    (unsigned long)
                     page_address(virt_to_page(rodata_end)),
                    (unsigned long)
                     page_address(virt_to_page(data_start)));

The first of these calls is trying to free the range
page_address(virt_to_page(text_end)) through
page_address(virt_to_page(rodata_start)).

With text_end == 0xffffffff80610000 and  rodata_start ==
0xffffffff80800000 the actual values received by free_init_pages() are
0xffff880000610000 and 0xffff880000800000 (i.e. within the 64-bit direct
mapping region).

In free_init_pages() there is a test of addr >= __start_kernel_map
(which is 0xffffffff80000000). Because of this test, the two calls to
HYPERVISOR_update_va_mapping() are not made.

The net effect (we believe) is that this range of pages is freed from
Linux's viewpoint but the pages are still marked as PGT_writable_page
with a non-zero page type ref count in the hypervisor. When Linux tries
to use these pages later on for page table pages, the hypervisor traps.

Note, we have traced all uses of the pages in question.  Apparently they
are never used by Linux prior to the trap. Our traces show them being
initialized in the hypervisor by construct_dom0(), marked as readonly in
Linux by mark_rodata_ro() and then causing the hypervisor trap when
Linux tries to use one them for a page tables.

Presumably the correct fix will be to change the address range test in
free_init_pages...
Simon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
  2010-04-20 16:07       ` Simon Graham
@ 2010-04-20 19:01         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-04-20 19:01 UTC (permalink / raw)
  To: Simon Graham; +Cc: Andrew Lyon, xen-devel, Jan Beulich

On Tue, Apr 20, 2010 at 11:07:54AM -0500, Simon Graham wrote:
> > >
> > > But that code is precisely what guarantees that the pages *can* be
> > > converted to page table pages (by completely unmapping them from
> > > the kernel image part of the address space). So your explanation is
> > > rather confusing than clarifying to me...
> > 
> > I agree that that is the intent of this code -- what we _seem_ to
> > observe (and this
> > is hard to prove) is that the page type ref count is not being
> > decremented by this
> > code which would imply that the unmapping is not happening for some
> > reason. The only
> > real evidence I have for this is that the failure always occurs on one
> > of these pages.
> > 
> 
> We now think we've found the problem which seems to be due to the
> following two calls in Linux within mark_rodata_ro():
> 
>     free_init_pages("unused kernel memory",
>                     (unsigned long)
>                      page_address(virt_to_page(text_end)),
>                     (unsigned long)
>                      page_address(virt_to_page(rodata_start)));
>     free_init_pages("unused kernel memory",
>                     (unsigned long)
>                      page_address(virt_to_page(rodata_end)),
>                     (unsigned long)
>                      page_address(virt_to_page(data_start)));
> 
> The first of these calls is trying to free the range
> page_address(virt_to_page(text_end)) through
> page_address(virt_to_page(rodata_start)).
> 
> With text_end == 0xffffffff80610000 and  rodata_start ==
> 0xffffffff80800000 the actual values received by free_init_pages() are
> 0xffff880000610000 and 0xffff880000800000 (i.e. within the 64-bit direct
> mapping region).
> 
> In free_init_pages() there is a test of addr >= __start_kernel_map
> (which is 0xffffffff80000000). Because of this test, the two calls to
> HYPERVISOR_update_va_mapping() are not made.
> 
> The net effect (we believe) is that this range of pages is freed from
> Linux's viewpoint but the pages are still marked as PGT_writable_page
> with a non-zero page type ref count in the hypervisor. When Linux tries
> to use these pages later on for page table pages, the hypervisor traps.
> 
> Note, we have traced all uses of the pages in question.  Apparently they
> are never used by Linux prior to the trap. Our traces show them being
> initialized in the hypervisor by construct_dom0(), marked as readonly in
> Linux by mark_rodata_ro() and then causing the hypervisor trap when
> Linux tries to use one them for a page tables.

Oh man, I remember this one. I submitted an initial patch for this.
https://patchwork.kernel.org/patch/79086/
> 
> Presumably the correct fix will be to change the address range test in
> free_init_pages...

And this was the final fix:
http://marc.info/?l=linux-kernel&m=126652277705569&w=2

The end result was that the a different mechanism to get the kernel address
and use that to set the _PAGE_RW on them. And ignore the other mapping.
I think, this has been some time ago.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Re: [Xen-users] rebased openSUSE Xen dom0 Patches
@ 2010-04-21  7:04 Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2010-04-21  7:04 UTC (permalink / raw)
  To: andrew.lyon, simon.graham; +Cc: xen-devel

>>> "Simon Graham"  04/20/10 6:08 PM >>>
>We now think we've found the problem which seems to be due to the
>following two calls in Linux within mark_rodata_ro():
>
>free_init_pages("unused kernel memory",
>(unsigned long)
>page_address(virt_to_page(text_end)),
>(unsigned long)
>page_address(virt_to_page(rodata_start)));
>free_init_pages("unused kernel memory",
>(unsigned long)
>page_address(virt_to_page(rodata_end)),
>(unsigned long)
>page_address(virt_to_page(data_start)));

This code is not present in 2.6.32.11, so your kernel source tree must have extra patches requiring proper Xen equivalents. In particular, our tree's version of that change has its Xen counterpart avoid the (pointless and perhaps wasteful) aligning to 2Mb boundaries in arch/x86/kernel/vmlinux.lds, thus yielding those two calls to free_init_pages() to be no-ops.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-04-21  7:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-15 18:36 [Xen-users] rebased openSUSE Xen dom0 Patches Simon Graham
2010-04-15 18:41 ` Keir Fraser
2010-04-16  7:58 ` Jan Beulich
2010-04-16 13:42   ` Simon Graham
2010-04-19  8:41     ` Jan Beulich
2010-04-19 14:52       ` Simon Graham
2010-04-19 15:09         ` Jan Beulich
2010-04-20 16:07       ` Simon Graham
2010-04-20 19:01         ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2010-04-21  7:04 Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).