Tracking "Cannot allocate memory" error in shadow_alloc_p2m

All of lore.kernel.org
 help / color / mirror / Atom feed

* Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
@ 2007-01-09 21:16 Chris Lalancette
  2007-01-10 11:32 ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Lalancette @ 2007-01-09 21:16 UTC (permalink / raw)
  To: xen-devel

Hello,
     I've been working on tracking down a "Cannot allocate memory" error when trying to start FV domains.
I finally found a 16GB box where I could reliably reproduce the problem.  By turning on trace debugging and adding a few of my own prints, I was able to see that the error came from this code path:

shadow_alloc_p2m_table() -> shadow_set_p2m_entry -> p2m_next_level -> p2m_find_entry

(all in arch/x86/mm/shadow/common.c).  What's happening is that the gfn_remainder passed into
p2m_find_entry is something like 0x3a3815 which, when shifted by shift (which would happen to be 18 in
the case of the 3rd-level page table in i686 PAE), it would end up being larger than the max (which is 8),
and hence causing the failure.  Looking deeper into it, there is something I really don't understand,
though.  shadow_alloc_p2m_table fetches each page in the page_list, then gets the page struct, then
converts to mfn using page_to_mfn, and finally gets the gfn using get_gpfn_from_mfn.  get_gpfn_from_mfn
is defined in include/asm-x86/mm.h, and looks to be just pulling the mfn -> gpfn mapping out of the
machine_to_phys_mapping table.  The problem, as I see it, is that no one ever put a valid entry into
machine_to_phys_mapping, so the data returned there is bogus.  Once I commented out this section of code
in shadow_alloc_p2m_table():

/*
    for ( entry = d->page_list.next;
          entry != &d->page_list;
          entry = entry->next )
    {
        page = list_entry(entry, struct page_info, list);
        mfn = page_to_mfn(page);
        gfn = get_gpfn_from_mfn(mfn_x(mfn));
        page_count++;
        if (
#ifdef __x86_64__
            (gfn != 0x5555555555555555L)
#else
            (gfn != 0x55555555L)
#endif
             && gfn != INVALID_M2P_ENTRY
             && !shadow_set_p2m_entry(d, gfn, mfn) )
            goto error;
    }
*/

I was able to successfully start some FV domains.

1)  Am I missing something here?  Is there some sort of initialization of the machine_to_phys_mapping
table that I missed?

2)  Why do we need the above code during the creation of a new domain?  I would think there aren't any
valid pages in the page_list we would have to worry about at that point.

Of course, I am by no means a shadow table expert, so I'm sure I'm missing something.  Any ideas?

Thanks,
Chris Lalancette

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
  2007-01-09 21:16 Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table Chris Lalancette
@ 2007-01-10 11:32 ` Tim Deegan
  2007-01-10 15:41   ` Chris Lalancette
  2007-01-12  5:13   ` Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Tim Deegan @ 2007-01-10 11:32 UTC (permalink / raw)
  To: Chris Lalancette; +Cc: xen-devel

Hi,

At 16:16 -0500 on 09 Jan (1168359405), Chris Lalancette wrote:
> What's happening is that the gfn_remainder passed into p2m_find_entry
> is something like 0x3a3815 which, when shifted by shift (which would
> happen to be 18 in the case of the 3rd-level page table in i686 PAE),
> it would end up being larger than the max (which is 8), and hence
> causing the failure.

Are you giving this domain more than 7GB of RAM on a PAE hypervisor?

> 1)  Am I missing something here?  Is there some sort of initialization of the machine_to_phys_mapping
> table that I missed?

The code you cut out is start-of-day code that builds the p2m map of a
domain from the m2p entries of its allocated pages.  It was needed
originally because the domain builder tools would set up the guest's
memory and m2p mapping and _then_ enable shadow-translate mode.  It may
not be necessary now that HVM domains are put in shadow mode at creation
time.

AFAICS, either the domain should have no memory assigned yet (in which
case it does nothing), or the domain's pages should have m2p entries
that are valid, explicitly invalid, or set to the "debug pattern" of all
5s.  I'll look at a more sensible failure mode then -ENOMEM, though.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
  2007-01-10 11:32 ` Tim Deegan
@ 2007-01-10 15:41   ` Chris Lalancette
  2007-01-12  5:13   ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Chris Lalancette @ 2007-01-10 15:41 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel

Tim Deegan wrote:
> Hi,
> 
> At 16:16 -0500 on 09 Jan (1168359405), Chris Lalancette wrote:
> 
>>What's happening is that the gfn_remainder passed into p2m_find_entry
>>is something like 0x3a3815 which, when shifted by shift (which would
>>happen to be 18 in the case of the 3rd-level page table in i686 PAE),
>>it would end up being larger than the max (which is 8), and hence
>>causing the failure.
> 
> 
> Are you giving this domain more than 7GB of RAM on a PAE hypervisor?

Yes, the Hypervisor is i686 PAE, but no, the FV domain that I am trying to start up only has 512MB of
memory.

> 
> 
>>1)  Am I missing something here?  Is there some sort of initialization of the machine_to_phys_mapping
>>table that I missed?
> 
> 
> The code you cut out is start-of-day code that builds the p2m map of a
> domain from the m2p entries of its allocated pages.  It was needed
> originally because the domain builder tools would set up the guest's
> memory and m2p mapping and _then_ enable shadow-translate mode.  It may
> not be necessary now that HVM domains are put in shadow mode at creation
> time.
> 
> AFAICS, either the domain should have no memory assigned yet (in which
> case it does nothing), or the domain's pages should have m2p entries
> that are valid, explicitly invalid, or set to the "debug pattern" of all
> 5s.  I'll look at a more sensible failure mode then -ENOMEM, though.

Ah, that makes sense.  Yeah, I was thinking that the other way to solve this was to initialize the whole
machine_to_phys_mapping table to something sensible, but then I just didn't even see why it needed to be
done at this early stage, so I took the code out.  Either way is fine with me...I would just like to see
the FV domains start working on this large memory machine :).

Thanks,
Chris Lalancette

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
  2007-01-10 11:32 ` Tim Deegan
  2007-01-10 15:41   ` Chris Lalancette
@ 2007-01-12  5:13   ` Steven Rostedt
  2007-01-12 11:07     ` Tim Deegan
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2007-01-12  5:13 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Chris Lalancette, xen-devel

Tim Deegan wrote:

> The code you cut out is start-of-day code that builds the p2m map of a
> domain from the m2p entries of its allocated pages.  It was needed
> originally because the domain builder tools would set up the guest's
> memory and m2p mapping and _then_ enable shadow-translate mode.  It may
> not be necessary now that HVM domains are put in shadow mode at creation
> time.
> 
> AFAICS, either the domain should have no memory assigned yet (in which
> case it does nothing), or the domain's pages should have m2p entries
> that are valid, explicitly invalid, or set to the "debug pattern" of all
> 5s.  I'll look at a more sensible failure mode then -ENOMEM, though.
> 

OK, I've been debugging the hell out of this one.  And I don't see where 
the m2p mapping is updated for the guest yet!  I put in prints to show 
me where the m2p is updated for the guest and this seems to have shown 
me that the m2p mapping doesn't know about the new guest at this point.

I used my internal log tracer (ring buffer) to map output, and this is 
what it showed me.

Note: I'm using RHEL5 HV for this, but it goes the same for Xen.

(XEN) [   98.395925] cpu:0 share_xen_page_with_guest:264 guest 1 mapping 
pages page=f6811280 mfn=2928 from:ff106fa2
(XEN) [   98.395927] cpu:0 share_xen_page_with_guest:264 guest 1 mapping 
pages page=f6811298 mfn=2929 from:ff106fa2
(XEN) [   98.395928] cpu:0 share_xen_page_with_guest:264 guest 1 mapping 
pages page=f68112b0 mfn=2930 from:ff106fa2
(XEN) [   98.395929] cpu:0 share_xen_page_with_guest:264 guest 1 mapping 
pages page=f68112c8 mfn=2931 from:ff106fa2
(XEN) [   98.395936] cpu:0 share_xen_page_with_guest:264 guest 1 mapping 
pages page=f68112f8 mfn=2933 from:ff11db10

The above is not filtered (meaning that I didn't put in any limit to the 
amount of prints it will make, or snip it from this email) so the above 
is the only calls to share_xen_page_with_guest (that updates m2p 
mapping) before .

These show me that this was called by grant_table_create (0xff106fa2), 
and arch_domain_create (0xff11db10).

(XEN) [   99.008441] cpu:1 set_sh_allocation:1201 alloc_domheap_pages 
(1) pages=768
(XEN) [   99.008444] cpu:1 set_sh_allocation:1205  arch total_pages=0

The above shows that we are going to call alloc_domheap_pages next (768 
pages to allocate) from set_sh_allocation, and the total_pages currently 
set by the domain is zero.

This only allocates the pages for the domain (puts it on its 
d->page_list), but it does not modify the m2p mapping here. Perhaps we 
should have this invalidate the m2p mapping for the page?

(XEN) [   99.054858] cpu:0 shadow_alloc_p2m_table:1040 page=f7049e70 
gfn=2669a mfn=362138 cnt=0
(XEN) [   99.054861] cpu:0 shadow_alloc_p2m_table:1040 page=f73b3ba8 
gfn=752e1 mfn=511271 cnt=1
(XEN) [   99.054867] cpu:0 shadow_alloc_p2m_table:1040 page=f7049e58 
gfn=26699 mfn=362137 cnt=2
(XEN) [   99.054868] cpu:0 shadow_alloc_p2m_table:1040 page=f7049e40 
gfn=26698 mfn=362136 cnt=3
(XEN) [   99.054869] cpu:0 shadow_alloc_p2m_table:1040 page=f73b3c98 
gfn=752d7 mfn=511281 cnt=4

The above is next, but I do filter it, since it prints out for every page.

mfn is in decimal (don't ask why, it was my first default)
gfn's will be in hex, and page is the pointer.

This is the code that we are having a problem with. Basically, we are 
updating the shadow page tables with bogus memory!  Unless we wanted the 
5  pages that were mapped for the grant table, and the one for the arch 
setup, but I don't even think those are in this.

Why do we need that code, it's using stale data, and updating the shadow 
page tables with a m2p mapping that is from a older domain. See more below:

[...]

(XEN) [   99.054911] cpu:0 shadow_alloc_p2m_table:1040 page=f73b2000 
gfn=2000 mfn=510976 cnt=49
(XEN) [   99.076309] cpu:0 shadow_guest_physmap_add_page:2901 guest 1 
calling sh_p2m_remove_page ogfn=2669a
(XEN) [   99.076317] cpu:0 sh_p2m_remove_page:2832 guest 1 mapping pages 
invalid(XEN) [   99.076319] cpu:0 shadow_guest_physmap_add_page:2912 
guest 1 mapping pages gfn=0 mfn=362138 from:ff12d10f
(XEN) [   99.076321] cpu:0 shadow_guest_physmap_add_page:2901 guest 1 
calling sh_p2m_remove_page ogfn=752e1
(XEN) [   99.076322] cpu:0 sh_p2m_remove_page:2832 guest 1 mapping pages 
invalid(XEN) [   99.076323] cpu:0 shadow_guest_physmap_add_page:2912 
guest 1 mapping pages gfn=1 mfn=511271 from:ff12d10f
(XEN) [   99.076324] cpu:0 shadow_guest_physmap_add_page:2901 guest 1 
calling sh_p2m_remove_page ogfn=26699
(XEN) [   99.076326] cpu:0 sh_p2m_remove_page:2832 guest 1 mapping pages 
invalid(XEN) [   99.076327] cpu:0 shadow_guest_physmap_add_page:2912 
guest 1 mapping pages gfn=2 mfn=362137 from:ff12d10f

Finally we are calling shadow_guest_physmap_add_page, and this is called 
from ff12d10f or do_mmu_update which I believe (correct me if I'm wrong) 
is being called by xend and friends.

First thing that is done, is that it *removes* the m2p mapping 
(sh_p2m_remove_page). It actually does both, remove the shadow mapping 
that we set up before, and it invalidates the m2p mapping.

We then set up this page to be the correct shadow mapping, with the call 
to shadow_guest_physmap_add_page and this also updates to the correct 
m2p mapping.

So in conclusion, I don't see why we need the code that does this bogus 
mapping, and that the patch that Chris proposed, in removing that code 
looks like the better approach.

If you think I'm wrong, please feel free to convince me. I've been wrong 
before, but I don't thing that we should have code that just undoes itself.

Oh, I also tried this logging after removing the code, and I got this at 
the end:

(XEN) [   52.764736] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 
mapping pages gfn=0 mfn=346470 from:ff12d10f
(XEN) [   52.764739] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 
mapping pages gfn=1 mfn=346467 from:ff12d10f
(XEN) [   52.764741] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 
mapping pages gfn=2 mfn=346465 from:ff12d10f
(XEN) [   52.764742] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 
mapping pages gfn=3 mfn=346462 from:ff12d10f
(XEN) [   52.764744] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 
mapping pages gfn=4 mfn=346402 from:ff12d10f
(XEN) [   52.764749] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 
mapping pages gfn=5 mfn=346426 from:ff12d10f

Notice that there's no effort in removing the pages before setting them up.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
  2007-01-12  5:13   ` Steven Rostedt
@ 2007-01-12 11:07     ` Tim Deegan
  2007-01-12 13:05       ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2007-01-12 11:07 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: xen-devel, Tim Deegan, Chris Lalancette

At 00:13 -0500 on 12 Jan (1168560825), Steven Rostedt wrote:
> Why do we need that code, it's using stale data, and updating the shadow 
> page tables with a m2p mapping that is from a older domain.

The domain builder for translated PV guests still populates the physmap
before enabling shadow-translate mode, so needs this init code.  Now
that paravirt-ops kernels use proper mmu operations that can probably be
fixed up.

In the meantime, cset 13326:dc0638bb4628 of xen-unstable should have
stopped it from killing the guest if it finds an entry that's too big.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
  2007-01-12 11:07     ` Tim Deegan
@ 2007-01-12 13:05       ` Steven Rostedt
  2007-01-12 14:36         ` Steven Rostedt
  2007-01-12 14:37         ` Tim Deegan
  0 siblings, 2 replies; 8+ messages in thread
From: Steven Rostedt @ 2007-01-12 13:05 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Chris Lalancette, xen-devel

Tim Deegan wrote:
> At 00:13 -0500 on 12 Jan (1168560825), Steven Rostedt wrote:
>> Why do we need that code, it's using stale data, and updating the shadow 
>> page tables with a m2p mapping that is from a older domain.
> 
> The domain builder for translated PV guests still populates the physmap
> before enabling shadow-translate mode, so needs this init code.  Now
> that paravirt-ops kernels use proper mmu operations that can probably be
> fixed up.
> 
> In the meantime, cset 13326:dc0638bb4628 of xen-unstable should have
> stopped it from killing the guest if it finds an entry that's too big.
> 

Is there a way to tell if this is a translated PV guest?  I'd rather not 
do this code at all if it is not needed.  We are mapping pages into a 
guess from junk data. I hate to find out later on that something is 
missed, and we allow for the guest to have access to memory it should 
not have.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
  2007-01-12 13:05       ` Steven Rostedt
@ 2007-01-12 14:36         ` Steven Rostedt
  2007-01-12 14:37         ` Tim Deegan
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2007-01-12 14:36 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: xen-devel, Herbert Xu, Tim Deegan, Chris Lalancette

OK,

Herbert has been looking into this too, and he's discovered that it's a 
problem with the RHEL5 snapshot.  The allocation of the shadow_enable is 
done later on in RHEL5 and thus the d->page_list is not empty, where as 
in the xen-unstable/testing version, shadow_enable is called in the 
hypervisor, so it is not affected by this bug.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
  2007-01-12 13:05       ` Steven Rostedt
  2007-01-12 14:36         ` Steven Rostedt
@ 2007-01-12 14:37         ` Tim Deegan
  1 sibling, 0 replies; 8+ messages in thread
From: Tim Deegan @ 2007-01-12 14:37 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: xen-devel, Tim Deegan, Chris Lalancette

At 08:05 -0500 on 12 Jan (1168589116), Steven Rostedt wrote:
> Is there a way to tell if this is a translated PV guest?  I'd rather not 
> do this code at all if it is not needed. 

It just needs the PV loader to be tweaked, and we can drop it entirely.

> We are mapping pages into a 
> guess from junk data. I hate to find out later on that something is 
> missed, and we allow for the guest to have access to memory it should 
> not have.

That's not a risk.  The MFNs we're putting into the p2m map are
definitely correct: they're taken from the domain's page list.  It's
just that they're going in at the wrong guest-physical addresses.

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-01-12 14:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-09 21:16 Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table Chris Lalancette
2007-01-10 11:32 ` Tim Deegan
2007-01-10 15:41   ` Chris Lalancette
2007-01-12  5:13   ` Steven Rostedt
2007-01-12 11:07     ` Tim Deegan
2007-01-12 13:05       ` Steven Rostedt
2007-01-12 14:36         ` Steven Rostedt
2007-01-12 14:37         ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.