All of lore.kernel.org
 help / color / mirror / Atom feed
* address space reorganization
@ 2005-04-13 17:59 Gerd Knorr
  2005-04-13 19:29 ` Keir Fraser
  0 siblings, 1 reply; 14+ messages in thread
From: Gerd Knorr @ 2005-04-13 17:59 UTC (permalink / raw)
  To: xen-devel

  Hi,

On my devel machine xen comes up fine with PAE paging
enabled.  Well, it boots not that far yet, it stops at the
end of paging_init() right now.  The address space issues
need fixing now before I can attempt to boot domain 0 with
PAE ...

At the moment the xen virtual address space (top 64 MB)
looks like this:

  0xffc0  |  4 MB  |  ioremap area
  0xff80  |  4 MB  |  mapping cache
  0xff40  |  4 MB  |  per domain mapping (gdt, ...)
  0xff00  |  4 MB  |  shadow linear page tables
  0xfec0  |  4 MB  |  linear page tables
  0xfd40  | 24 MB  |  frame table
  0xfd00  |  4 MB  |  MPT (rw)
  0xfc40  | 12 MB  |  low mem, xen code, xen heap
  0xfc00  |  4 MB  |  MPT (ro)

For PAE we'll have to change:

  * linear page tables and linear shadow tables need 8 MB
    each (because pte size is doubled with PAE).
  * frame table needs to grow, the size depends on the
    amount of memory we are willing to support (total),
    for 16 GB it would be 96 MB.
  * MPT might need more space, depending on how much memory
    we are willing to support (per domain).  With a 4GB
    per-domain limit the current 4 MB size would be fine.
    [ side note: the shadow code seems to reuse the MPT
      address space for something else in some cases, not
      sure which implications this has ]
  * not sure about xen's heap.  What this is used for?
    Might we need more space here as well to support large
    amounts of memory?

If we touch the address space anyway we might fix some other
issues along the way.  Ian mentioned he wants to move the
ioremap area to the bottom.  I guess next to the ro MPT
table, so it's easy to grant domains read-only access to
ACPI tables?

Is it possible (and/or useful) to make the address layout
dynamic?  So the size of the frametable can be adjusted at
boot time depending on the amount of memory installed in the
machine?  That would imply the ro MPT doesn't have a fixed
address any more, not sure this is possible ...

In any case I'd try to make the memory layout as fixed as
possible, i.e. move the fixed size stuff to the top, below
the data structures which are not fixed-size, at the bottom
the ro MPT + ioremap area for r/o domain access, i.e.
something like this:

[ fixed size ]
    0xff00  | 16 MB  |  low mem, xen code, xen heap
    0xfec0  |  4 MB  |  mapping cache
    0xfe80  |  4 MB  |  per domain mapping (gdt, ...)

[ Hmm, debatable whenever make that fixed-size or not.
  It would waste some address space in the non-pae case,
  on the other hand the memory layout would be identical
  for both pae and non-pae. ]
    0xfe00  |  8 MB  |  shadow linear page tables
    0xfd80  |  8 MB  |  linear page tables
    0xfbc0  |  4 MB  |  MPT (rw)

[ not fixed size ]
    0xfc00  | 24 MB  |  frame table (larger for PAE ...)

[ r/o access for domains ]
    0xfb80  |  4 MB  |  MPT (ro)
    0xfb40  |  4 MB  |  ioremap area

Comments?  Anything else to consider when touching the
address layout anyway?

  Gerd

PS: my current patches are @ http://dl.bytesex.org/patches/xen/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-13 17:59 Gerd Knorr
@ 2005-04-13 19:29 ` Keir Fraser
  2005-04-14  1:45   ` Keir Fraser
  0 siblings, 1 reply; 14+ messages in thread
From: Keir Fraser @ 2005-04-13 19:29 UTC (permalink / raw)
  To: Gerd Knorr; +Cc: xen-devel


On 13 Apr 2005, at 18:59, Gerd Knorr wrote:

> If we touch the address space anyway we might fix some other
> issues along the way.  Ian mentioned he wants to move the
> ioremap area to the bottom.  I guess next to the ro MPT
> table, so it's easy to grant domains read-only access to
> ACPI tables?

Ian was talking about the ioremap area in XenLinux, not in Xen. Doing 
this for XenLinux would be useful if we were to make HYPERVISOR_START a 
run-time, rather than compile-time, constant. However, I am not 
convinced that this is a very good idea (see my reasoning below).

That given, it is also not clear that reorg'ing the Xen address space 
is a worthwhile effort. Yes, things like the location of the Xen heap 
will be different on PAE vs. non-PAE builds, but it will still be a 
constant decided at compile time.  Reorg'ing is hopefully a pain we can 
do without for the time being. We should revisit these considerations 
if/when we want a single Xen binary that can do both PAE and non-PAE. 
But let's just get the basic support working first. :-)

> Is it possible (and/or useful) to make the address layout
> dynamic?  So the size of the frametable can be adjusted at
> boot time depending on the amount of memory installed in the
> machine?  That would imply the ro MPT doesn't have a fixed
> address any more, not sure this is possible ...

I do not think that we should have run-time selected 
HYPERVISOR_VIRT_START. This is because it will make it impossible to 
migrate a guest to a machine with more memory than the one on which it 
is currently executing.

The reasoning is: the new target machine will have a lower 
HYPERVISOR_VIRT_START, but the guest will have sized its lowmem area 
according to the available space on the original machine. It therefore 
cannot run  on the new target because its lowmem area simply will not 
fit.

I think we should just fix on two VIRT_START values: -64MB for non-PAE 
and something like -192MB for PAE (or whatever allows us to map up to 
16GB -- I think we will treat bigger memory configs than that as rare 
enough to ignore).

  -- Keir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-13 19:29 ` Keir Fraser
@ 2005-04-14  1:45   ` Keir Fraser
  2005-04-14  2:18     ` Jacob Gorm Hansen
  0 siblings, 1 reply; 14+ messages in thread
From: Keir Fraser @ 2005-04-14  1:45 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Gerd Knorr


On 13 Apr 2005, at 20:29, Keir Fraser wrote:

> I do not think that we should have run-time selected 
> HYPERVISOR_VIRT_START. This is because it will make it impossible to 
> migrate a guest to a machine with more memory than the one on which it 
> is currently executing.
>
> The reasoning is: the new target machine will have a lower 
> HYPERVISOR_VIRT_START, but the guest will have sized its lowmem area 
> according to the available space on the original machine. It therefore 
> cannot run  on the new target because its lowmem area simply will not 
> fit.
>
> I think we should just fix on two VIRT_START values: -64MB for non-PAE 
> and something like -192MB for PAE (or whatever allows us to map up to 
> 16GB -- I think we will treat bigger memory configs than that as rare 
> enough to ignore).

I chatted to Christian a bit about this and he changed my mind. There 
probably are some situations where a variable virt_start would be 
useful for us, although we still may not want to do it for an initial 
pae patch.

We need generally to think about how flexible we want to be in allowing 
migration between different machine configurations. Shoudl we require 
identical h/w specs, or allow differences in I/O devices, CPU and/or 
memory? We will already have to be careful about downgrading cpu specs 
when we migrate (e.g., Linux locks onto using multimedia instructions 
for software raid that are unavailable post-migrate). A pragmatic 
middleground may be that, if people want to migrate in a heterogeneous 
cluster, we require them to configure 'worst-case specs' up front when 
building a domain (e.g., lowest-spec cpu the domain should run on; 
biggest hypervisor address-space hole the domain should work with).

  -- Keir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14  1:45   ` Keir Fraser
@ 2005-04-14  2:18     ` Jacob Gorm Hansen
  0 siblings, 0 replies; 14+ messages in thread
From: Jacob Gorm Hansen @ 2005-04-14  2:18 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Gerd Knorr

Keir Fraser wrote:

> We need generally to think about how flexible we want to be in allowing 
> migration between different machine configurations. Shoudl we require 
> identical h/w specs, or allow differences in I/O devices, CPU and/or 
> memory? We will already have to be careful about downgrading cpu specs 
> when we migrate (e.g., Linux locks onto using multimedia instructions 
> for software raid that are unavailable post-migrate).

Why not treat the functions that use special mm-instructions (like the 
software RAID code) as critical sections that cannot overlap with 
migration, and then have the guestOS re-calibrate its use of these 
features upon arrival?

[ insert standard plug of self-migration here :-) ]

Jacob

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: address space reorganization
@ 2005-04-14  2:26 Ian Pratt
  2005-04-14  3:08 ` Jacob Gorm Hansen
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Pratt @ 2005-04-14  2:26 UTC (permalink / raw)
  To: Jacob Gorm Hansen, Keir Fraser; +Cc: xen-devel, Gerd Knorr

> > We need generally to think about how flexible we want to be in 
> > allowing migration between different machine 
> configurations. Shoudl we 
> > require identical h/w specs, or allow differences in I/O 
> devices, CPU 
> > and/or memory? We will already have to be careful about downgrading 
> > cpu specs when we migrate (e.g., Linux locks onto using multimedia 
> > instructions for software raid that are unavailable post-migrate).
> 
> Why not treat the functions that use special mm-instructions 
> (like the software RAID code) as critical sections that 
> cannot overlap with migration, and then have the guestOS 
> re-calibrate its use of these features upon arrival?

That works OK for the kernel, but you might have user space apps that
have adapted their behviour based on what the've found in /proc/cpuinfo.

A particularly nasty case is apps or libraries that go at 'cpuid'
directly, as we can't trap that instruction. I guess VMware have the
same problem, as I don't believe they translate ring 3 code.

As regards your proposed critical region, we already effectively do
this.  We don't recalibrate stuff after a migration yet though (some of
the tests are quite slow, so I'm not sure you'd want to do them all
anyhow).

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14  2:26 address space reorganization Ian Pratt
@ 2005-04-14  3:08 ` Jacob Gorm Hansen
  2005-04-14  3:14   ` Kip Macy
  0 siblings, 1 reply; 14+ messages in thread
From: Jacob Gorm Hansen @ 2005-04-14  3:08 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Gerd Knorr

Ian Pratt wrote:

> That works OK for the kernel, but you might have user space apps that
> have adapted their behviour based on what the've found in /proc/cpuinfo.

A compromise then would be to lie to userspace and still recalibrate the 
kernel.

> A particularly nasty case is apps or libraries that go at 'cpuid'
> directly, as we can't trap that instruction. I guess VMware have the
> same problem, as I don't believe they translate ring 3 code.

Yeah, nothing we can do there really, except tell people not to :-(

> As regards your proposed critical region, we already effectively do
> this.  We don't recalibrate stuff after a migration yet though (some of
> the tests are quite slow, so I'm not sure you'd want to do them all
> anyhow).

Could perhaps do them in advance, or during the pre-copy phase as they 
are likely to stay constant for the lifetime of the machine, but that 
demands that the receiving side knows what you are looking for.
My system uploads a bootstrapper to the target VM in advance, so I could 
have this info ready when the rest of the OS arrives.

Jacob

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14  3:08 ` Jacob Gorm Hansen
@ 2005-04-14  3:14   ` Kip Macy
  2005-04-14  3:38     ` Jacob Gorm Hansen
  2005-04-14 17:54     ` Rik van Riel
  0 siblings, 2 replies; 14+ messages in thread
From: Kip Macy @ 2005-04-14  3:14 UTC (permalink / raw)
  To: Jacob Gorm Hansen; +Cc: xen-devel

> > That works OK for the kernel, but you might have user space apps that
> > have adapted their behviour based on what the've found in /proc/cpuinfo.
> 
> A compromise then would be to lie to userspace and still recalibrate the
> kernel.

If you're saying what I think you are - I don't think people who have
customized their apps for SSE2 would like that.

You'll either have to have xend propagate a bitmask of the cpuid
capabilities so that xfrd won't migrate it or you tell users that "the
behaviour is undefined" if they use migration in a heterogeneous
cluster.

      -Kip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14  3:14   ` Kip Macy
@ 2005-04-14  3:38     ` Jacob Gorm Hansen
  2005-04-14 16:00       ` Adam Heath
  2005-04-14 17:54     ` Rik van Riel
  1 sibling, 1 reply; 14+ messages in thread
From: Jacob Gorm Hansen @ 2005-04-14  3:38 UTC (permalink / raw)
  To: xen-devel

Kip Macy wrote:
>>>That works OK for the kernel, but you might have user space apps that
>>>have adapted their behviour based on what the've found in /proc/cpuinfo.
>>
>>A compromise then would be to lie to userspace and still recalibrate the
>>kernel.
> 
> 
> If you're saying what I think you are - I don't think people who have
> customized their apps for SSE2 would like that.

This is what Keir was suggesting, but still giving the kernel a chance
to use SSE[12] for its own purposes. It is not perfect, the only perfect
solution would alert user-space apps about the impending change, so that
they can react to or veto the migration.

Jacob

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14  3:38     ` Jacob Gorm Hansen
@ 2005-04-14 16:00       ` Adam Heath
  0 siblings, 0 replies; 14+ messages in thread
From: Adam Heath @ 2005-04-14 16:00 UTC (permalink / raw)
  Cc: xen-devel

On Wed, 13 Apr 2005, Jacob Gorm Hansen wrote:

> Kip Macy wrote:
> >>>That works OK for the kernel, but you might have user space apps that
> >>>have adapted their behviour based on what the've found in /proc/cpuinfo.
> >>
> >>A compromise then would be to lie to userspace and still recalibrate the
> >>kernel.
> >
> >
> > If you're saying what I think you are - I don't think people who have
> > customized their apps for SSE2 would like that.
>
> This is what Keir was suggesting, but still giving the kernel a chance
> to use SSE[12] for its own purposes. It is not perfect, the only perfect
> solution would alert user-space apps about the impending change, so that
> they can react to or veto the migration.

How does beowulf handle this situation?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14  3:14   ` Kip Macy
  2005-04-14  3:38     ` Jacob Gorm Hansen
@ 2005-04-14 17:54     ` Rik van Riel
  2005-04-14 18:15       ` Kip Macy
  1 sibling, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2005-04-14 17:54 UTC (permalink / raw)
  To: Kip Macy; +Cc: xen-devel, Jacob Gorm Hansen

On Wed, 13 Apr 2005, Kip Macy wrote:

> You'll either have to have xend propagate a bitmask of the cpuid
> capabilities so that xfrd won't migrate it or you tell users that "the
> behaviour is undefined" if they use migration in a heterogeneous
> cluster.

I'd prefer the latter.   It really is a configuration problem,
and not something I'd want the software to solve for me.

People whose applications need SSE2 will install CPUs that
have those instructions.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14 17:54     ` Rik van Riel
@ 2005-04-14 18:15       ` Kip Macy
  2005-04-14 18:20         ` Rik van Riel
  0 siblings, 1 reply; 14+ messages in thread
From: Kip Macy @ 2005-04-14 18:15 UTC (permalink / raw)
  To: Rik van Riel; +Cc: xen-devel, Jacob Gorm Hansen

> People whose applications need SSE2 will install CPUs that
> have those instructions.
> 
One would hope, but just because a customer has lots of money to spend
on hardware doesn't mean that he is rowing with both oars. This is a
supportability issue. The xen{source} folks would do themselves a
favor by trapping a guest's use of unsupported instructions and
logging it. That would make it easy enough to track down if a
customer's apps stop working when using migration.


           -Kip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14 18:15       ` Kip Macy
@ 2005-04-14 18:20         ` Rik van Riel
  0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2005-04-14 18:20 UTC (permalink / raw)
  To: Kip Macy; +Cc: xen-devel, Jacob Gorm Hansen

On Thu, 14 Apr 2005, Kip Macy wrote:

> > People whose applications need SSE2 will install CPUs that
> > have those instructions.
> > 
> One would hope, but just because a customer has lots of money to spend
> on hardware doesn't mean that he is rowing with both oars. This is a
> supportability issue. The xen{source} folks would do themselves a
> favor by trapping a guest's use of unsupported instructions and
> logging it. That would make it easy enough to track down if a
> customer's apps stop working when using migration.

Good idea, trapping unsupported instructions and printing out
the category the instruction belongs to (eg. SSE2) will make
things a lot easier to track.  I like this idea a lot...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: address space reorganization
@ 2005-04-14 19:03 Ian Pratt
  2005-04-19 19:52 ` Chris Wedgwood
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Pratt @ 2005-04-14 19:03 UTC (permalink / raw)
  To: Rik van Riel, Kip Macy; +Cc: xen-devel, Jacob Gorm Hansen

 
> > One would hope, but just because a customer has lots of 
> money to spend 
> > on hardware doesn't mean that he is rowing with both oars. 
> This is a 
> > supportability issue. The xen{source} folks would do themselves a 
> > favor by trapping a guest's use of unsupported instructions and 
> > logging it. That would make it easy enough to track down if a 
> > customer's apps stop working when using migration.
> 
> Good idea, trapping unsupported instructions and printing out 
> the category the instruction belongs to (eg. SSE2) will make 
> things a lot easier to track.  I like this idea a lot...

Yep, although we can't trap the cpuid, we can trap the use of e.g. SSE2.

We have to be a bit careful though, to prevent DoS of the Xen console.
We'd need to rate limit such messages. Patches welcome :-)

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: address space reorganization
  2005-04-14 19:03 Ian Pratt
@ 2005-04-19 19:52 ` Chris Wedgwood
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wedgwood @ 2005-04-19 19:52 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Kip Macy, xen-devel, Jacob Gorm Hansen

On Thu, Apr 14, 2005 at 08:03:18PM +0100, Ian Pratt wrote:

> Yep, although we can't trap the cpuid, we can trap the use of
> e.g. SSE2.

and emulate them or just warn?  for the former it's going to be slow
as hell, and i would almost argue you could stop the domain doing this
with a warning...

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-04-19 19:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-14  2:26 address space reorganization Ian Pratt
2005-04-14  3:08 ` Jacob Gorm Hansen
2005-04-14  3:14   ` Kip Macy
2005-04-14  3:38     ` Jacob Gorm Hansen
2005-04-14 16:00       ` Adam Heath
2005-04-14 17:54     ` Rik van Riel
2005-04-14 18:15       ` Kip Macy
2005-04-14 18:20         ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2005-04-14 19:03 Ian Pratt
2005-04-19 19:52 ` Chris Wedgwood
2005-04-13 17:59 Gerd Knorr
2005-04-13 19:29 ` Keir Fraser
2005-04-14  1:45   ` Keir Fraser
2005-04-14  2:18     ` Jacob Gorm Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.