All of lore.kernel.org
 help / color / mirror / Atom feed
* dom0 pvops and rearranging memory layout
@ 2015-01-23 10:32 Juergen Gross
  2015-01-23 11:35 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Juergen Gross @ 2015-01-23 10:32 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com, David Vrabel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky
  Cc: Jan Beulich

Hi,

while testing new patches to support dom0 with more than 512 GB I
stumbled over an issue which - I think - is present in pvops for
some time now.

On boot the kernel rearranges the memory layout to match the host
E820 map. This is done to be able to access all I/O areas with
identity mapped pfns (pfn == mfn). So basically some memory pages
change their pfns while the mfns stay the same.

There is no check done whether the moved memory areas are actually
in use (e.g. via memblock_is_reserved()). This can lead to cases
where memory in use is put to an area which is made available for
new memory allocations soon afterwards. Memory in question could
be the initrd, the p2m map presented to dom0 by the hypervisor, or
(hopefully in theory only) even the kernel itself or it's initial
page tables built by the hypervisor.

In my test I had a p2m map of nearly 2GB size and the area between
2GB and 4GB had no RAM. So parts of the p2m map and the complete
initrd where subject to be remapped which led to an early PANIC.

I'll try to add some special handling for the initrd and the p2m
map. In case someone has a better idea: please tell me.


Juergen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
  2015-01-23 10:32 dom0 pvops and rearranging memory layout Juergen Gross
@ 2015-01-23 11:35 ` Andrew Cooper
  2015-01-23 11:41   ` Juergen Gross
                     ` (2 more replies)
  2015-01-23 11:58 ` David Vrabel
  2015-01-23 15:09 ` Konrad Rzeszutek Wilk
  2 siblings, 3 replies; 9+ messages in thread
From: Andrew Cooper @ 2015-01-23 11:35 UTC (permalink / raw)
  To: Juergen Gross, xen-devel@lists.xensource.com, David Vrabel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky
  Cc: Jan Beulich

On 23/01/15 10:32, Juergen Gross wrote:
> Hi,
>
> while testing new patches to support dom0 with more than 512 GB I
> stumbled over an issue which - I think - is present in pvops for
> some time now.
>
> On boot the kernel rearranges the memory layout to match the host
> E820 map. This is done to be able to access all I/O areas with
> identity mapped pfns (pfn == mfn). So basically some memory pages
> change their pfns while the mfns stay the same.
>
> There is no check done whether the moved memory areas are actually
> in use (e.g. via memblock_is_reserved()). This can lead to cases
> where memory in use is put to an area which is made available for
> new memory allocations soon afterwards. Memory in question could
> be the initrd, the p2m map presented to dom0 by the hypervisor, or
> (hopefully in theory only) even the kernel itself or it's initial
> page tables built by the hypervisor.
>
> In my test I had a p2m map of nearly 2GB size and the area between
> 2GB and 4GB had no RAM. So parts of the p2m map and the complete
> initrd where subject to be remapped which led to an early PANIC.
>
> I'll try to add some special handling for the initrd and the p2m
> map. In case someone has a better idea: please tell me.
>

The relocation is done based only on the e820 is it not?

I wonder whether it might be reasonable to extend contruct_dom0/libelf
to avoid constructing a p2m where pfns of built data (kernel, initrd,
p2m and initial pagetables) aliased with host io regions.

~Andrew

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
  2015-01-23 11:35 ` Andrew Cooper
@ 2015-01-23 11:41   ` Juergen Gross
  2015-01-23 12:03   ` Jan Beulich
       [not found]   ` <54C246980200007800058C51@suse.com>
  2 siblings, 0 replies; 9+ messages in thread
From: Juergen Gross @ 2015-01-23 11:41 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel@lists.xensource.com, David Vrabel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky
  Cc: Jan Beulich

On 01/23/2015 12:35 PM, Andrew Cooper wrote:
> On 23/01/15 10:32, Juergen Gross wrote:
>> Hi,
>>
>> while testing new patches to support dom0 with more than 512 GB I
>> stumbled over an issue which - I think - is present in pvops for
>> some time now.
>>
>> On boot the kernel rearranges the memory layout to match the host
>> E820 map. This is done to be able to access all I/O areas with
>> identity mapped pfns (pfn == mfn). So basically some memory pages
>> change their pfns while the mfns stay the same.
>>
>> There is no check done whether the moved memory areas are actually
>> in use (e.g. via memblock_is_reserved()). This can lead to cases
>> where memory in use is put to an area which is made available for
>> new memory allocations soon afterwards. Memory in question could
>> be the initrd, the p2m map presented to dom0 by the hypervisor, or
>> (hopefully in theory only) even the kernel itself or it's initial
>> page tables built by the hypervisor.
>>
>> In my test I had a p2m map of nearly 2GB size and the area between
>> 2GB and 4GB had no RAM. So parts of the p2m map and the complete
>> initrd where subject to be remapped which led to an early PANIC.
>>
>> I'll try to add some special handling for the initrd and the p2m
>> map. In case someone has a better idea: please tell me.
>>
>
> The relocation is done based only on the e820 is it not?

Yes.

> I wonder whether it might be reasonable to extend contruct_dom0/libelf
> to avoid constructing a p2m where pfns of built data (kernel, initrd,
> p2m and initial pagetables) aliased with host io regions.

That was my first idea, too. OTOH this would require a rather new
hypervisor with this functionality to be able to run a pvops dom0 on
such a machine.

Ans can we be sure that an existing non-pvops dom0 (or even an old pvops
one) can work with such a change?


Juergen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
  2015-01-23 10:32 dom0 pvops and rearranging memory layout Juergen Gross
  2015-01-23 11:35 ` Andrew Cooper
@ 2015-01-23 11:58 ` David Vrabel
  2015-01-23 15:09 ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 9+ messages in thread
From: David Vrabel @ 2015-01-23 11:58 UTC (permalink / raw)
  To: Juergen Gross, xen-devel@lists.xensource.com,
	Konrad Rzeszutek Wilk, Boris Ostrovsky
  Cc: Jan Beulich

On 23/01/15 10:32, Juergen Gross wrote:
> Hi,
> 
> while testing new patches to support dom0 with more than 512 GB I
> stumbled over an issue which - I think - is present in pvops for
> some time now.
> 
> On boot the kernel rearranges the memory layout to match the host
> E820 map. This is done to be able to access all I/O areas with
> identity mapped pfns (pfn == mfn). So basically some memory pages
> change their pfns while the mfns stay the same.
> 
> There is no check done whether the moved memory areas are actually
> in use (e.g. via memblock_is_reserved()). This can lead to cases
> where memory in use is put to an area which is made available for
> new memory allocations soon afterwards. Memory in question could
> be the initrd, the p2m map presented to dom0 by the hypervisor, or
> (hopefully in theory only) even the kernel itself or it's initial
> page tables built by the hypervisor.
> 
> In my test I had a p2m map of nearly 2GB size and the area between
> 2GB and 4GB had no RAM. So parts of the p2m map and the complete
> initrd where subject to be remapped which led to an early PANIC.
> 
> I'll try to add some special handling for the initrd and the p2m
> map. In case someone has a better idea: please tell me.

I would suggest:

Pass 1: relocate p2m and page tables to "safe" areas.
Pass 2: relocate frames from holes/reserved regions.

I don't think we want to change the hypervisor to workaround this.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
  2015-01-23 11:35 ` Andrew Cooper
  2015-01-23 11:41   ` Juergen Gross
@ 2015-01-23 12:03   ` Jan Beulich
       [not found]   ` <54C246980200007800058C51@suse.com>
  2 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2015-01-23 12:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Juergen Gross, xen-devel, Boris Ostrovsky, David Vrabel

>>> On 23.01.15 at 12:35, <andrew.cooper3@citrix.com> wrote:
> I wonder whether it might be reasonable to extend contruct_dom0/libelf
> to avoid constructing a p2m where pfns of built data (kernel, initrd,
> p2m and initial pagetables) aliased with host io regions.

For one, the problem goes away to 99.999% if using the advanced
capabilities of relocating the P2M and not mapping the initrd at all.

And then, such a change could easily end up being incompatible with
older kernels, which may (and do) build on the initial memory map
being a single chunk.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
       [not found]   ` <54C246980200007800058C51@suse.com>
@ 2015-01-23 12:08     ` Juergen Gross
  2015-01-23 12:42       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Juergen Gross @ 2015-01-23 12:08 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel, Boris Ostrovsky, David Vrabel

On 01/23/2015 01:03 PM, Jan Beulich wrote:
>>>> On 23.01.15 at 12:35, <andrew.cooper3@citrix.com> wrote:
>> I wonder whether it might be reasonable to extend contruct_dom0/libelf
>> to avoid constructing a p2m where pfns of built data (kernel, initrd,
>> p2m and initial pagetables) aliased with host io regions.
>
> For one, the problem goes away to 99.999% if using the advanced
> capabilities of relocating the P2M and not mapping the initrd at all.

No, it does not. I'm doing both and the systems dies at once.

Even relocating the P2M to another virtual address and not mapping
the initrd won't help if the related PFNs are relocated and, even worse,
are made available for new memory allocations.

> And then, such a change could easily end up being incompatible with
> older kernels, which may (and do) build on the initial memory map
> being a single chunk.

Yes, I fear so, too.


Juergen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
  2015-01-23 12:08     ` Juergen Gross
@ 2015-01-23 12:42       ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2015-01-23 12:42 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Andrew Cooper, Boris Ostrovsky, David Vrabel, xen-devel

>>> On 23.01.15 at 13:08, <JGross@suse.com> wrote:
> On 01/23/2015 01:03 PM, Jan Beulich wrote:
>>>>> On 23.01.15 at 12:35, <andrew.cooper3@citrix.com> wrote:
>>> I wonder whether it might be reasonable to extend contruct_dom0/libelf
>>> to avoid constructing a p2m where pfns of built data (kernel, initrd,
>>> p2m and initial pagetables) aliased with host io regions.
>>
>> For one, the problem goes away to 99.999% if using the advanced
>> capabilities of relocating the P2M and not mapping the initrd at all.
> 
> No, it does not. I'm doing both and the systems dies at once.
> 
> Even relocating the P2M to another virtual address and not mapping
> the initrd won't help if the related PFNs are relocated and, even worse,
> are made available for new memory allocations.

Oh, right, the placement in VA space doesn't matter here.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
  2015-01-23 10:32 dom0 pvops and rearranging memory layout Juergen Gross
  2015-01-23 11:35 ` Andrew Cooper
  2015-01-23 11:58 ` David Vrabel
@ 2015-01-23 15:09 ` Konrad Rzeszutek Wilk
  2015-01-23 15:16   ` Juergen Gross
  2 siblings, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-01-23 15:09 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Boris Ostrovsky, xen-devel@lists.xensource.com, David Vrabel,
	Jan Beulich

On Fri, Jan 23, 2015 at 11:32:20AM +0100, Juergen Gross wrote:
> Hi,
> 
> while testing new patches to support dom0 with more than 512 GB I
> stumbled over an issue which - I think - is present in pvops for
> some time now.
> 
> On boot the kernel rearranges the memory layout to match the host
> E820 map. This is done to be able to access all I/O areas with
> identity mapped pfns (pfn == mfn). So basically some memory pages
> change their pfns while the mfns stay the same.
> 
> There is no check done whether the moved memory areas are actually
> in use (e.g. via memblock_is_reserved()). This can lead to cases
> where memory in use is put to an area which is made available for
> new memory allocations soon afterwards. Memory in question could
> be the initrd, the p2m map presented to dom0 by the hypervisor, or
> (hopefully in theory only) even the kernel itself or it's initial
> page tables built by the hypervisor.
> 
> In my test I had a p2m map of nearly 2GB size and the area between

Oh my. That is huge. Could you compress it? This would require of course
a new type of P2M - where would mark which MFNs are contingous.

And then during booting you could read over and find these special
ones and when creating the new P2M do the right uncompression?

> 2GB and 4GB had no RAM. So parts of the p2m map and the complete
> initrd where subject to be remapped which led to an early PANIC.
> 
> I'll try to add some special handling for the initrd and the p2m
> map. In case someone has a better idea: please tell me.
> 
> 
> Juergen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dom0 pvops and rearranging memory layout
  2015-01-23 15:09 ` Konrad Rzeszutek Wilk
@ 2015-01-23 15:16   ` Juergen Gross
  0 siblings, 0 replies; 9+ messages in thread
From: Juergen Gross @ 2015-01-23 15:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Boris Ostrovsky, xen-devel@lists.xensource.com, David Vrabel,
	Jan Beulich

On 01/23/2015 04:09 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 23, 2015 at 11:32:20AM +0100, Juergen Gross wrote:
>> Hi,
>>
>> while testing new patches to support dom0 with more than 512 GB I
>> stumbled over an issue which - I think - is present in pvops for
>> some time now.
>>
>> On boot the kernel rearranges the memory layout to match the host
>> E820 map. This is done to be able to access all I/O areas with
>> identity mapped pfns (pfn == mfn). So basically some memory pages
>> change their pfns while the mfns stay the same.
>>
>> There is no check done whether the moved memory areas are actually
>> in use (e.g. via memblock_is_reserved()). This can lead to cases
>> where memory in use is put to an area which is made available for
>> new memory allocations soon afterwards. Memory in question could
>> be the initrd, the p2m map presented to dom0 by the hypervisor, or
>> (hopefully in theory only) even the kernel itself or it's initial
>> page tables built by the hypervisor.
>>
>> In my test I had a p2m map of nearly 2GB size and the area between
>
> Oh my. That is huge. Could you compress it? This would require of course
> a new type of P2M - where would mark which MFNs are contingous.
>
> And then during booting you could read over and find these special
> ones and when creating the new P2M do the right uncompression?

I don't think that's the correct solution. It would require a new
hypervisor as well and we still wouldn't have a guarantee it will
work.

That's "only" for 1TB of memory. I think we want to support much
more.

And even if the p2m is okay, a huge initrd will still blow us up.


Juergen

>
>> 2GB and 4GB had no RAM. So parts of the p2m map and the complete
>> initrd where subject to be remapped which led to an early PANIC.
>>
>> I'll try to add some special handling for the initrd and the p2m
>> map. In case someone has a better idea: please tell me.
>>
>>
>> Juergen
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-01-23 15:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-23 10:32 dom0 pvops and rearranging memory layout Juergen Gross
2015-01-23 11:35 ` Andrew Cooper
2015-01-23 11:41   ` Juergen Gross
2015-01-23 12:03   ` Jan Beulich
     [not found]   ` <54C246980200007800058C51@suse.com>
2015-01-23 12:08     ` Juergen Gross
2015-01-23 12:42       ` Jan Beulich
2015-01-23 11:58 ` David Vrabel
2015-01-23 15:09 ` Konrad Rzeszutek Wilk
2015-01-23 15:16   ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.