All of lore.kernel.org
 help / color / mirror / Atom feed
* More on kexec/purgatory handover
@ 2015-05-09 13:21 Petr Tesarik
  2015-05-13  5:26 ` Eric W. Biederman
  2015-05-13  5:26 ` Eric W. Biederman
  0 siblings, 2 replies; 19+ messages in thread
From: Petr Tesarik @ 2015-05-09 13:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, Jan Beulich, xen-devel

Hi all,

note that I'm not subscribed to the xen-devel mailing list, but Jan
quoted from this mail of Andrew's in SUSE Bugzilla:

> This is all from a while ago.  It is quite possible that we didn't
> actually tested the compatibility case with a 64bit dom0 kernel,
> although I certainly did test earlier versions of the series with a
> 32bit dom0 kernel.  The work was done long before XenServer moved to a
> 64bit dom0, and was done by deleting everything and starting from scratch.
> 
> IIRC, the low 640k mappings is a purgatory bug rather than Linux, and
> has been fixed upstream in kexec-tools since.  (I recall that it used to
> take a backup copy of the IVT for some reason)

This is not entirely correct.

Originally, kexec (in Linux kernel) was supposed to provide an
environment which is equivalent to the boot loader, i.e. kexec is just
another bootloader like LILO or GRUB. The first implementation indeed
switched back to 16-bit real mode before passing control to the
secondary kernel's boot code...

It was at that time that the need arose to save the low 640K of RAM
somewhere else, because the 16-bit bootloader had to use parts of this
memory range, not the least because it also made BIOS calls, and BIOS
used this range for its data.

This solution was suboptimal for numerous reasons, e.g. very limited
location of the purgatory code in physical RAM, or incompatibility with
UEFI booting. As an improvement, a 32-bit boot protocol was introduced.
At entry, the CPU must be in 32-bit protected mode with paging
disabled. This explains why you never noticed any issues related to
pagetables with 32-bit kernels. Since paging is disabled, there are
none. ;-)

The 32-bit protocol limits the location of the secondary kernel to low
4G in physical RAM (for obvious reason). This is now solved by a 64-bit
boot protocol. Since paging must be always enabled in Long Mode, it
must be set up somehow. The Linux documentation says: "The range with
setup_header.init_size from start address of loaded kernel and zero
page and command line buffer get ident mapping".

The problematic part here is that Linux kexec code is split between
kernel and purgatory. Unfortunately, the handover between the old
kernel and the purgatory is not so well defined, so the actual kexec
code is probably the best documentation available.

There are currently two versions of the Linux purgatory: in kexec-tools
and in the kernel. None of them sets CR3. On the other hand, the Linux
kernel does set CR3 (see arch/x86/kernel/relocate_kernel_64.S). This
makes me believe that the 64-bit kexec entry point expects that paging
is set up by the old kernel. If Xen plays the role of the old kernel,
it must also set up paging. The question is how.

Let's start a discussion on the kexec mailing list (in Cc) to clarify
what should be done by the old kernel and what should be done by the
purgatory code.

Petr Tesarik

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-09 13:21 More on kexec/purgatory handover Petr Tesarik
  2015-05-13  5:26 ` Eric W. Biederman
@ 2015-05-13  5:26 ` Eric W. Biederman
  2015-05-13  6:55   ` Jan Beulich
  2015-05-13  6:55   ` Jan Beulich
  1 sibling, 2 replies; 19+ messages in thread
From: Eric W. Biederman @ 2015-05-13  5:26 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Andrew Cooper, kexec, David Vrabel, Jan Beulich, xen-devel

Petr Tesarik <ptesarik@suse.cz> writes:

> Hi all,
>
> note that I'm not subscribed to the xen-devel mailing list, but Jan
> quoted from this mail of Andrew's in SUSE Bugzilla:
>
>> This is all from a while ago.  It is quite possible that we didn't
>> actually tested the compatibility case with a 64bit dom0 kernel,
>> although I certainly did test earlier versions of the series with a
>> 32bit dom0 kernel.  The work was done long before XenServer moved to a
>> 64bit dom0, and was done by deleting everything and starting from scratch.
>> 
>> IIRC, the low 640k mappings is a purgatory bug rather than Linux, and
>> has been fixed upstream in kexec-tools since.  (I recall that it used to
>> take a backup copy of the IVT for some reason)
>
> This is not entirely correct.
>
> Originally, kexec (in Linux kernel) was supposed to provide an
> environment which is equivalent to the boot loader, i.e. kexec is just
> another bootloader like LILO or GRUB. The first implementation indeed
> switched back to 16-bit real mode before passing control to the
> secondary kernel's boot code...
>
> It was at that time that the need arose to save the low 640K of RAM
> somewhere else, because the 16-bit bootloader had to use parts of this
> memory range, not the least because it also made BIOS calls, and BIOS
> used this range for its data.
>
> This solution was suboptimal for numerous reasons, e.g. very limited
> location of the purgatory code in physical RAM, or incompatibility with
> UEFI booting. As an improvement, a 32-bit boot protocol was introduced.
> At entry, the CPU must be in 32-bit protected mode with paging
> disabled. This explains why you never noticed any issues related to
> pagetables with 32-bit kernels. Since paging is disabled, there are
> none. ;-)
>
> The 32-bit protocol limits the location of the secondary kernel to low
> 4G in physical RAM (for obvious reason). This is now solved by a 64-bit
> boot protocol. Since paging must be always enabled in Long Mode, it
> must be set up somehow. The Linux documentation says: "The range with
> setup_header.init_size from start address of loaded kernel and zero
> page and command line buffer get ident mapping".
>
> The problematic part here is that Linux kexec code is split between
> kernel and purgatory. Unfortunately, the handover between the old
> kernel and the purgatory is not so well defined, so the actual kexec
> code is probably the best documentation available.
>
> There are currently two versions of the Linux purgatory: in kexec-tools
> and in the kernel. None of them sets CR3. On the other hand, the Linux
> kernel does set CR3 (see arch/x86/kernel/relocate_kernel_64.S). This
> makes me believe that the 64-bit kexec entry point expects that paging
> is set up by the old kernel. If Xen plays the role of the old kernel,
> it must also set up paging. The question is how.
>
> Let's start a discussion on the kexec mailing list (in Cc) to clarify
> what should be done by the old kernel and what should be done by the
> purgatory code.


Yes.  The assumption is that the for the addresses claimed by the image
that is loaded (think the physical addresses in ELF PHDRS).  That
physical addresses are one to one mapped with virtual addresses.
In practice I think I would up using huge pages and mapping everything
one-to-one on x86_64 because it was easier than a specific subset.

The low 640k was weird.   We copied it off in purgatory so that it could
be capture in a dump.  The linux kernel itself winds up using that
memory fundamentally because to fire up subsequent processors you have
to have memory in the low 640k as processors start in real-mode and
the startup IPI only takes an address in the low 1M.  I also remember
things with interrupt descriptor tables.

Eric




_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-09 13:21 More on kexec/purgatory handover Petr Tesarik
@ 2015-05-13  5:26 ` Eric W. Biederman
  2015-05-13  5:26 ` Eric W. Biederman
  1 sibling, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2015-05-13  5:26 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Andrew Cooper, kexec, David Vrabel, Jan Beulich, xen-devel

Petr Tesarik <ptesarik@suse.cz> writes:

> Hi all,
>
> note that I'm not subscribed to the xen-devel mailing list, but Jan
> quoted from this mail of Andrew's in SUSE Bugzilla:
>
>> This is all from a while ago.  It is quite possible that we didn't
>> actually tested the compatibility case with a 64bit dom0 kernel,
>> although I certainly did test earlier versions of the series with a
>> 32bit dom0 kernel.  The work was done long before XenServer moved to a
>> 64bit dom0, and was done by deleting everything and starting from scratch.
>> 
>> IIRC, the low 640k mappings is a purgatory bug rather than Linux, and
>> has been fixed upstream in kexec-tools since.  (I recall that it used to
>> take a backup copy of the IVT for some reason)
>
> This is not entirely correct.
>
> Originally, kexec (in Linux kernel) was supposed to provide an
> environment which is equivalent to the boot loader, i.e. kexec is just
> another bootloader like LILO or GRUB. The first implementation indeed
> switched back to 16-bit real mode before passing control to the
> secondary kernel's boot code...
>
> It was at that time that the need arose to save the low 640K of RAM
> somewhere else, because the 16-bit bootloader had to use parts of this
> memory range, not the least because it also made BIOS calls, and BIOS
> used this range for its data.
>
> This solution was suboptimal for numerous reasons, e.g. very limited
> location of the purgatory code in physical RAM, or incompatibility with
> UEFI booting. As an improvement, a 32-bit boot protocol was introduced.
> At entry, the CPU must be in 32-bit protected mode with paging
> disabled. This explains why you never noticed any issues related to
> pagetables with 32-bit kernels. Since paging is disabled, there are
> none. ;-)
>
> The 32-bit protocol limits the location of the secondary kernel to low
> 4G in physical RAM (for obvious reason). This is now solved by a 64-bit
> boot protocol. Since paging must be always enabled in Long Mode, it
> must be set up somehow. The Linux documentation says: "The range with
> setup_header.init_size from start address of loaded kernel and zero
> page and command line buffer get ident mapping".
>
> The problematic part here is that Linux kexec code is split between
> kernel and purgatory. Unfortunately, the handover between the old
> kernel and the purgatory is not so well defined, so the actual kexec
> code is probably the best documentation available.
>
> There are currently two versions of the Linux purgatory: in kexec-tools
> and in the kernel. None of them sets CR3. On the other hand, the Linux
> kernel does set CR3 (see arch/x86/kernel/relocate_kernel_64.S). This
> makes me believe that the 64-bit kexec entry point expects that paging
> is set up by the old kernel. If Xen plays the role of the old kernel,
> it must also set up paging. The question is how.
>
> Let's start a discussion on the kexec mailing list (in Cc) to clarify
> what should be done by the old kernel and what should be done by the
> purgatory code.


Yes.  The assumption is that the for the addresses claimed by the image
that is loaded (think the physical addresses in ELF PHDRS).  That
physical addresses are one to one mapped with virtual addresses.
In practice I think I would up using huge pages and mapping everything
one-to-one on x86_64 because it was easier than a specific subset.

The low 640k was weird.   We copied it off in purgatory so that it could
be capture in a dump.  The linux kernel itself winds up using that
memory fundamentally because to fire up subsequent processors you have
to have memory in the low 640k as processors start in real-mode and
the startup IPI only takes an address in the low 1M.  I also remember
things with interrupt descriptor tables.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  5:26 ` Eric W. Biederman
@ 2015-05-13  6:55   ` Jan Beulich
  2015-05-13  7:35     ` Eric W. Biederman
  2015-05-13  7:35     ` Eric W. Biederman
  2015-05-13  6:55   ` Jan Beulich
  1 sibling, 2 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13  6:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

>>> On 13.05.15 at 07:26, <ebiederm@xmission.com> wrote:
> The low 640k was weird.   We copied it off in purgatory so that it could
> be capture in a dump.  The linux kernel itself winds up using that
> memory fundamentally because to fire up subsequent processors you have
> to have memory in the low 640k as processors start in real-mode and
> the startup IPI only takes an address in the low 1M.  I also remember
> things with interrupt descriptor tables.

Right, but the point to clarify is whether it is reasonable for the
purgatory and/or new kernel to expect the old kernel to set up
mappings for special regions like this one. As said before - I don't
think this should be done; the old (possibly half broken) kernel
shouldn't be forced to do any more than the absolute minimum
amount of work to be able to transfer control.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  5:26 ` Eric W. Biederman
  2015-05-13  6:55   ` Jan Beulich
@ 2015-05-13  6:55   ` Jan Beulich
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13  6:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

>>> On 13.05.15 at 07:26, <ebiederm@xmission.com> wrote:
> The low 640k was weird.   We copied it off in purgatory so that it could
> be capture in a dump.  The linux kernel itself winds up using that
> memory fundamentally because to fire up subsequent processors you have
> to have memory in the low 640k as processors start in real-mode and
> the startup IPI only takes an address in the low 1M.  I also remember
> things with interrupt descriptor tables.

Right, but the point to clarify is whether it is reasonable for the
purgatory and/or new kernel to expect the old kernel to set up
mappings for special regions like this one. As said before - I don't
think this should be done; the old (possibly half broken) kernel
shouldn't be forced to do any more than the absolute minimum
amount of work to be able to transfer control.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  6:55   ` Jan Beulich
  2015-05-13  7:35     ` Eric W. Biederman
@ 2015-05-13  7:35     ` Eric W. Biederman
  2015-05-13  8:12       ` Jan Beulich
  2015-05-13  8:12       ` Jan Beulich
  1 sibling, 2 replies; 19+ messages in thread
From: Eric W. Biederman @ 2015-05-13  7:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 13.05.15 at 07:26, <ebiederm@xmission.com> wrote:
>> The low 640k was weird.   We copied it off in purgatory so that it could
>> be capture in a dump.  The linux kernel itself winds up using that
>> memory fundamentally because to fire up subsequent processors you have
>> to have memory in the low 640k as processors start in real-mode and
>> the startup IPI only takes an address in the low 1M.  I also remember
>> things with interrupt descriptor tables.
>
> Right, but the point to clarify is whether it is reasonable for the
> purgatory and/or new kernel to expect the old kernel to set up
> mappings for special regions like this one. As said before - I don't
> think this should be done; the old (possibly half broken) kernel
> shouldn't be forced to do any more than the absolute minimum
> amount of work to be able to transfer control.

When transfering control in 64bit mode on x86_64.  A one-to-one virtual
to physical identity mapping must be set up.  That identity map must be
set up before transfering control to the kexec destination.  That
mapping must cover all pages in the destination image.  

Those page tables should be created before the old kernel gets into a
broken state.

Fundamentally if you are transfering control in long mode you have to
set up some page table.  I giant identity mapped page table that can use
1G or 2M pages takes up very little memory, and can be very simply
and easily before the transfer of control takes place.

All you have to do when you are in a half broken state is load cr3.
Possible after verifying a checksum.

640k in this case I don't think is particularly special, and certainly
not worth a special case.  The in-kernel implementation on x86_64 sets
up a page table for all of memory which because of the availability of
huge pages winds up being simple and trivial.

Weird things like copying off the 640k region for the kexec-on-panic
case can be done in the adapter/purgatory piece that lives between the
two kernels.

So at a very practical level I think we shouldn't have mappings for
special regions we should just have mappings for all of memory.
KISS.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  6:55   ` Jan Beulich
@ 2015-05-13  7:35     ` Eric W. Biederman
  2015-05-13  7:35     ` Eric W. Biederman
  1 sibling, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2015-05-13  7:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 13.05.15 at 07:26, <ebiederm@xmission.com> wrote:
>> The low 640k was weird.   We copied it off in purgatory so that it could
>> be capture in a dump.  The linux kernel itself winds up using that
>> memory fundamentally because to fire up subsequent processors you have
>> to have memory in the low 640k as processors start in real-mode and
>> the startup IPI only takes an address in the low 1M.  I also remember
>> things with interrupt descriptor tables.
>
> Right, but the point to clarify is whether it is reasonable for the
> purgatory and/or new kernel to expect the old kernel to set up
> mappings for special regions like this one. As said before - I don't
> think this should be done; the old (possibly half broken) kernel
> shouldn't be forced to do any more than the absolute minimum
> amount of work to be able to transfer control.

When transfering control in 64bit mode on x86_64.  A one-to-one virtual
to physical identity mapping must be set up.  That identity map must be
set up before transfering control to the kexec destination.  That
mapping must cover all pages in the destination image.  

Those page tables should be created before the old kernel gets into a
broken state.

Fundamentally if you are transfering control in long mode you have to
set up some page table.  I giant identity mapped page table that can use
1G or 2M pages takes up very little memory, and can be very simply
and easily before the transfer of control takes place.

All you have to do when you are in a half broken state is load cr3.
Possible after verifying a checksum.

640k in this case I don't think is particularly special, and certainly
not worth a special case.  The in-kernel implementation on x86_64 sets
up a page table for all of memory which because of the availability of
huge pages winds up being simple and trivial.

Weird things like copying off the 640k region for the kexec-on-panic
case can be done in the adapter/purgatory piece that lives between the
two kernels.

So at a very practical level I think we shouldn't have mappings for
special regions we should just have mappings for all of memory.
KISS.

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  7:35     ` Eric W. Biederman
  2015-05-13  8:12       ` Jan Beulich
@ 2015-05-13  8:12       ` Jan Beulich
  2015-05-13  9:07           ` Petr Tesarik
                           ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13  8:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
> Fundamentally if you are transfering control in long mode you have to
> set up some page table.  I giant identity mapped page table that can use
> 1G or 2M pages takes up very little memory, and can be very simply
> and easily before the transfer of control takes place.
> 
> All you have to do when you are in a half broken state is load cr3.
> Possible after verifying a checksum.
> 
> 640k in this case I don't think is particularly special, and certainly
> not worth a special case.  The in-kernel implementation on x86_64 sets
> up a page table for all of memory which because of the availability of
> huge pages winds up being simple and trivial.
> 
> Weird things like copying off the 640k region for the kexec-on-panic
> case can be done in the adapter/purgatory piece that lives between the
> two kernels.
> 
> So at a very practical level I think we shouldn't have mappings for
> special regions we should just have mappings for all of memory.

But in all of the above you (a) forget that setting up 1:1
mappings for all memory isn't as simple as putting in place a
couple of 1G pages - holes need to be accounted for and must
at best be mapped UC (that's especially an issue with the low
640k) and (b) imply that whatever Linux behavior there is, Xen
should mimic it (ignoring for example the fact that with the non-
kernel based kexec which newer Xen and tools support such 1:1
mapping setup doesn't appear to be required, i.e. [supposed]
requirements change).

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  7:35     ` Eric W. Biederman
@ 2015-05-13  8:12       ` Jan Beulich
  2015-05-13  8:12       ` Jan Beulich
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13  8:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
> Fundamentally if you are transfering control in long mode you have to
> set up some page table.  I giant identity mapped page table that can use
> 1G or 2M pages takes up very little memory, and can be very simply
> and easily before the transfer of control takes place.
> 
> All you have to do when you are in a half broken state is load cr3.
> Possible after verifying a checksum.
> 
> 640k in this case I don't think is particularly special, and certainly
> not worth a special case.  The in-kernel implementation on x86_64 sets
> up a page table for all of memory which because of the availability of
> huge pages winds up being simple and trivial.
> 
> Weird things like copying off the 640k region for the kexec-on-panic
> case can be done in the adapter/purgatory piece that lives between the
> two kernels.
> 
> So at a very practical level I think we shouldn't have mappings for
> special regions we should just have mappings for all of memory.

But in all of the above you (a) forget that setting up 1:1
mappings for all memory isn't as simple as putting in place a
couple of 1G pages - holes need to be accounted for and must
at best be mapped UC (that's especially an issue with the low
640k) and (b) imply that whatever Linux behavior there is, Xen
should mimic it (ignoring for example the fact that with the non-
kernel based kexec which newer Xen and tools support such 1:1
mapping setup doesn't appear to be required, i.e. [supposed]
requirements change).

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  8:12       ` Jan Beulich
@ 2015-05-13  9:07           ` Petr Tesarik
  2015-05-13  9:53         ` [Xen-devel] " David Vrabel
  2015-05-13  9:53         ` David Vrabel
  2 siblings, 0 replies; 19+ messages in thread
From: Petr Tesarik @ 2015-05-13  9:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel, kexec, Eric W. Biederman, David Vrabel

On Wed, 13 May 2015 09:12:47 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
> > Fundamentally if you are transfering control in long mode you have to
> > set up some page table.  I giant identity mapped page table that can use
> > 1G or 2M pages takes up very little memory, and can be very simply
> > and easily before the transfer of control takes place.
> > 
> > All you have to do when you are in a half broken state is load cr3.
> > Possible after verifying a checksum.
> > 
> > 640k in this case I don't think is particularly special, and certainly
> > not worth a special case.  The in-kernel implementation on x86_64 sets
> > up a page table for all of memory which because of the availability of
> > huge pages winds up being simple and trivial.
> > 
> > Weird things like copying off the 640k region for the kexec-on-panic
> > case can be done in the adapter/purgatory piece that lives between the
> > two kernels.
> > 
> > So at a very practical level I think we shouldn't have mappings for
> > special regions we should just have mappings for all of memory.
> 
> But in all of the above you (a) forget that setting up 1:1
> mappings for all memory isn't as simple as putting in place a
> couple of 1G pages - holes need to be accounted for and must
> at best be mapped UC (that's especially an issue with the low
> 640k) and (b) imply that whatever Linux behavior there is, Xen
> should mimic it (ignoring for example the fact that with the non-
> kernel based kexec which newer Xen and tools support such 1:1
> mapping setup doesn't appear to be required, i.e. [supposed]
> requirements change).

IIUC the problem here is that the Xen hypervisor has reused part of
kexec infrastructure without providing full compatibility.

In other words, what prevents you from using your own bootloader glue
code (aka purgatory in Linux kexec)?

OTOH I agree that the pagetables could be constructed by the 64-bit
purgatory code itself to make it less dependent on the old kernel.
Especially if Eric considers the task trivial...

Petr T

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
@ 2015-05-13  9:07           ` Petr Tesarik
  0 siblings, 0 replies; 19+ messages in thread
From: Petr Tesarik @ 2015-05-13  9:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel, kexec, Eric W. Biederman, David Vrabel

On Wed, 13 May 2015 09:12:47 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
> > Fundamentally if you are transfering control in long mode you have to
> > set up some page table.  I giant identity mapped page table that can use
> > 1G or 2M pages takes up very little memory, and can be very simply
> > and easily before the transfer of control takes place.
> > 
> > All you have to do when you are in a half broken state is load cr3.
> > Possible after verifying a checksum.
> > 
> > 640k in this case I don't think is particularly special, and certainly
> > not worth a special case.  The in-kernel implementation on x86_64 sets
> > up a page table for all of memory which because of the availability of
> > huge pages winds up being simple and trivial.
> > 
> > Weird things like copying off the 640k region for the kexec-on-panic
> > case can be done in the adapter/purgatory piece that lives between the
> > two kernels.
> > 
> > So at a very practical level I think we shouldn't have mappings for
> > special regions we should just have mappings for all of memory.
> 
> But in all of the above you (a) forget that setting up 1:1
> mappings for all memory isn't as simple as putting in place a
> couple of 1G pages - holes need to be accounted for and must
> at best be mapped UC (that's especially an issue with the low
> 640k) and (b) imply that whatever Linux behavior there is, Xen
> should mimic it (ignoring for example the fact that with the non-
> kernel based kexec which newer Xen and tools support such 1:1
> mapping setup doesn't appear to be required, i.e. [supposed]
> requirements change).

IIUC the problem here is that the Xen hypervisor has reused part of
kexec infrastructure without providing full compatibility.

In other words, what prevents you from using your own bootloader glue
code (aka purgatory in Linux kexec)?

OTOH I agree that the pagetables could be constructed by the 64-bit
purgatory code itself to make it less dependent on the old kernel.
Especially if Eric considers the task trivial...

Petr T

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] More on kexec/purgatory handover
  2015-05-13  8:12       ` Jan Beulich
  2015-05-13  9:07           ` Petr Tesarik
@ 2015-05-13  9:53         ` David Vrabel
  2015-05-13 10:01           ` Jan Beulich
  2015-05-13 10:01           ` [Xen-devel] " Jan Beulich
  2015-05-13  9:53         ` David Vrabel
  2 siblings, 2 replies; 19+ messages in thread
From: David Vrabel @ 2015-05-13  9:53 UTC (permalink / raw)
  To: Jan Beulich, Eric W. Biederman
  Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

On 13/05/15 09:12, Jan Beulich wrote:
>>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
>> Fundamentally if you are transfering control in long mode you have to
>> set up some page table.  I giant identity mapped page table that can use
>> 1G or 2M pages takes up very little memory, and can be very simply
>> and easily before the transfer of control takes place.
>>
>> All you have to do when you are in a half broken state is load cr3.
>> Possible after verifying a checksum.
>>
>> 640k in this case I don't think is particularly special, and certainly
>> not worth a special case.  The in-kernel implementation on x86_64 sets
>> up a page table for all of memory which because of the availability of
>> huge pages winds up being simple and trivial.
>>
>> Weird things like copying off the 640k region for the kexec-on-panic
>> case can be done in the adapter/purgatory piece that lives between the
>> two kernels.
>>
>> So at a very practical level I think we shouldn't have mappings for
>> special regions we should just have mappings for all of memory.
> 
> But in all of the above you (a) forget that setting up 1:1
> mappings for all memory isn't as simple as putting in place a
> couple of 1G pages - holes need to be accounted for and must
> at best be mapped UC (that's especially an issue with the low
> 640k) and (b) imply that whatever Linux behavior there is, Xen
> should mimic it (ignoring for example the fact that with the non-
> kernel based kexec which newer Xen and tools support such 1:1
> mapping setup doesn't appear to be required, i.e. [supposed]
> requirements change).

Xen's V2 kexec ABI builds 1:1 pages tables for the source and
destination pages and any additional regions requested by the guest (see
calls to machine_kexec_add_page()).  kexec-tools adds a "map-only"
segment for 0-1MiB when using the V2 ABI.

These page tables are built at load time (not at exec time) and in the
crash case are placed in the crash memory area.

When using the V1 ABI, there is no way for the tools to provide an
additional "map-only" segment so you'd have to get purgatory to add
mappings for 0-1MiB, or get Xen (in the V1 path only) to do so.

David


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  8:12       ` Jan Beulich
  2015-05-13  9:07           ` Petr Tesarik
  2015-05-13  9:53         ` [Xen-devel] " David Vrabel
@ 2015-05-13  9:53         ` David Vrabel
  2 siblings, 0 replies; 19+ messages in thread
From: David Vrabel @ 2015-05-13  9:53 UTC (permalink / raw)
  To: Jan Beulich, Eric W. Biederman
  Cc: Andrew Cooper, kexec, David Vrabel, Petr Tesarik, xen-devel

On 13/05/15 09:12, Jan Beulich wrote:
>>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
>> Fundamentally if you are transfering control in long mode you have to
>> set up some page table.  I giant identity mapped page table that can use
>> 1G or 2M pages takes up very little memory, and can be very simply
>> and easily before the transfer of control takes place.
>>
>> All you have to do when you are in a half broken state is load cr3.
>> Possible after verifying a checksum.
>>
>> 640k in this case I don't think is particularly special, and certainly
>> not worth a special case.  The in-kernel implementation on x86_64 sets
>> up a page table for all of memory which because of the availability of
>> huge pages winds up being simple and trivial.
>>
>> Weird things like copying off the 640k region for the kexec-on-panic
>> case can be done in the adapter/purgatory piece that lives between the
>> two kernels.
>>
>> So at a very practical level I think we shouldn't have mappings for
>> special regions we should just have mappings for all of memory.
> 
> But in all of the above you (a) forget that setting up 1:1
> mappings for all memory isn't as simple as putting in place a
> couple of 1G pages - holes need to be accounted for and must
> at best be mapped UC (that's especially an issue with the low
> 640k) and (b) imply that whatever Linux behavior there is, Xen
> should mimic it (ignoring for example the fact that with the non-
> kernel based kexec which newer Xen and tools support such 1:1
> mapping setup doesn't appear to be required, i.e. [supposed]
> requirements change).

Xen's V2 kexec ABI builds 1:1 pages tables for the source and
destination pages and any additional regions requested by the guest (see
calls to machine_kexec_add_page()).  kexec-tools adds a "map-only"
segment for 0-1MiB when using the V2 ABI.

These page tables are built at load time (not at exec time) and in the
crash case are placed in the crash memory area.

When using the V1 ABI, there is no way for the tools to provide an
additional "map-only" segment so you'd have to get purgatory to add
mappings for 0-1MiB, or get Xen (in the V1 path only) to do so.

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] More on kexec/purgatory handover
  2015-05-13  9:53         ` [Xen-devel] " David Vrabel
  2015-05-13 10:01           ` Jan Beulich
@ 2015-05-13 10:01           ` Jan Beulich
  2015-05-13 12:12             ` Petr Tesarik
  2015-05-13 12:12             ` Petr Tesarik
  1 sibling, 2 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13 10:01 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Cooper, kexec, Eric W. Biederman, Petr Tesarik, xen-devel

>>> On 13.05.15 at 11:53, <david.vrabel@citrix.com> wrote:
> On 13/05/15 09:12, Jan Beulich wrote:
>>>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
>>> Fundamentally if you are transfering control in long mode you have to
>>> set up some page table.  I giant identity mapped page table that can use
>>> 1G or 2M pages takes up very little memory, and can be very simply
>>> and easily before the transfer of control takes place.
>>>
>>> All you have to do when you are in a half broken state is load cr3.
>>> Possible after verifying a checksum.
>>>
>>> 640k in this case I don't think is particularly special, and certainly
>>> not worth a special case.  The in-kernel implementation on x86_64 sets
>>> up a page table for all of memory which because of the availability of
>>> huge pages winds up being simple and trivial.
>>>
>>> Weird things like copying off the 640k region for the kexec-on-panic
>>> case can be done in the adapter/purgatory piece that lives between the
>>> two kernels.
>>>
>>> So at a very practical level I think we shouldn't have mappings for
>>> special regions we should just have mappings for all of memory.
>> 
>> But in all of the above you (a) forget that setting up 1:1
>> mappings for all memory isn't as simple as putting in place a
>> couple of 1G pages - holes need to be accounted for and must
>> at best be mapped UC (that's especially an issue with the low
>> 640k) and (b) imply that whatever Linux behavior there is, Xen
>> should mimic it (ignoring for example the fact that with the non-
>> kernel based kexec which newer Xen and tools support such 1:1
>> mapping setup doesn't appear to be required, i.e. [supposed]
>> requirements change).
> 
> Xen's V2 kexec ABI builds 1:1 pages tables for the source and
> destination pages and any additional regions requested by the guest (see
> calls to machine_kexec_add_page()).  kexec-tools adds a "map-only"
> segment for 0-1MiB when using the V2 ABI.
> 
> These page tables are built at load time (not at exec time) and in the
> crash case are placed in the crash memory area.
> 
> When using the V1 ABI, there is no way for the tools to provide an
> additional "map-only" segment so you'd have to get purgatory to add
> mappings for 0-1MiB, or get Xen (in the V1 path only) to do so.

Okay, if the tools do this in v2, then I think the compatibility v1
path should indeed do so too (in the hypervisor).

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13  9:53         ` [Xen-devel] " David Vrabel
@ 2015-05-13 10:01           ` Jan Beulich
  2015-05-13 10:01           ` [Xen-devel] " Jan Beulich
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13 10:01 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Cooper, kexec, Eric W. Biederman, Petr Tesarik, xen-devel

>>> On 13.05.15 at 11:53, <david.vrabel@citrix.com> wrote:
> On 13/05/15 09:12, Jan Beulich wrote:
>>>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
>>> Fundamentally if you are transfering control in long mode you have to
>>> set up some page table.  I giant identity mapped page table that can use
>>> 1G or 2M pages takes up very little memory, and can be very simply
>>> and easily before the transfer of control takes place.
>>>
>>> All you have to do when you are in a half broken state is load cr3.
>>> Possible after verifying a checksum.
>>>
>>> 640k in this case I don't think is particularly special, and certainly
>>> not worth a special case.  The in-kernel implementation on x86_64 sets
>>> up a page table for all of memory which because of the availability of
>>> huge pages winds up being simple and trivial.
>>>
>>> Weird things like copying off the 640k region for the kexec-on-panic
>>> case can be done in the adapter/purgatory piece that lives between the
>>> two kernels.
>>>
>>> So at a very practical level I think we shouldn't have mappings for
>>> special regions we should just have mappings for all of memory.
>> 
>> But in all of the above you (a) forget that setting up 1:1
>> mappings for all memory isn't as simple as putting in place a
>> couple of 1G pages - holes need to be accounted for and must
>> at best be mapped UC (that's especially an issue with the low
>> 640k) and (b) imply that whatever Linux behavior there is, Xen
>> should mimic it (ignoring for example the fact that with the non-
>> kernel based kexec which newer Xen and tools support such 1:1
>> mapping setup doesn't appear to be required, i.e. [supposed]
>> requirements change).
> 
> Xen's V2 kexec ABI builds 1:1 pages tables for the source and
> destination pages and any additional regions requested by the guest (see
> calls to machine_kexec_add_page()).  kexec-tools adds a "map-only"
> segment for 0-1MiB when using the V2 ABI.
> 
> These page tables are built at load time (not at exec time) and in the
> crash case are placed in the crash memory area.
> 
> When using the V1 ABI, there is no way for the tools to provide an
> additional "map-only" segment so you'd have to get purgatory to add
> mappings for 0-1MiB, or get Xen (in the V1 path only) to do so.

Okay, if the tools do this in v2, then I think the compatibility v1
path should indeed do so too (in the hypervisor).

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] More on kexec/purgatory handover
  2015-05-13 10:01           ` [Xen-devel] " Jan Beulich
@ 2015-05-13 12:12             ` Petr Tesarik
  2015-05-13 12:28               ` Jan Beulich
  2015-05-13 12:28               ` Jan Beulich
  2015-05-13 12:12             ` Petr Tesarik
  1 sibling, 2 replies; 19+ messages in thread
From: Petr Tesarik @ 2015-05-13 12:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel, kexec, David Vrabel, Eric W. Biederman

On Wed, 13 May 2015 11:01:24 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 13.05.15 at 11:53, <david.vrabel@citrix.com> wrote:
> > On 13/05/15 09:12, Jan Beulich wrote:
> >>>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
> >>> Fundamentally if you are transfering control in long mode you have to
> >>> set up some page table.  I giant identity mapped page table that can use
> >>> 1G or 2M pages takes up very little memory, and can be very simply
> >>> and easily before the transfer of control takes place.
> >>>
> >>> All you have to do when you are in a half broken state is load cr3.
> >>> Possible after verifying a checksum.
> >>>
> >>> 640k in this case I don't think is particularly special, and certainly
> >>> not worth a special case.  The in-kernel implementation on x86_64 sets
> >>> up a page table for all of memory which because of the availability of
> >>> huge pages winds up being simple and trivial.
> >>>
> >>> Weird things like copying off the 640k region for the kexec-on-panic
> >>> case can be done in the adapter/purgatory piece that lives between the
> >>> two kernels.
> >>>
> >>> So at a very practical level I think we shouldn't have mappings for
> >>> special regions we should just have mappings for all of memory.
> >> 
> >> But in all of the above you (a) forget that setting up 1:1
> >> mappings for all memory isn't as simple as putting in place a
> >> couple of 1G pages - holes need to be accounted for and must
> >> at best be mapped UC (that's especially an issue with the low
> >> 640k) and (b) imply that whatever Linux behavior there is, Xen
> >> should mimic it (ignoring for example the fact that with the non-
> >> kernel based kexec which newer Xen and tools support such 1:1
> >> mapping setup doesn't appear to be required, i.e. [supposed]
> >> requirements change).
> > 
> > Xen's V2 kexec ABI builds 1:1 pages tables for the source and
> > destination pages and any additional regions requested by the guest (see
> > calls to machine_kexec_add_page()).  kexec-tools adds a "map-only"
> > segment for 0-1MiB when using the V2 ABI.
> > 
> > These page tables are built at load time (not at exec time) and in the
> > crash case are placed in the crash memory area.
> > 
> > When using the V1 ABI, there is no way for the tools to provide an
> > additional "map-only" segment so you'd have to get purgatory to add
> > mappings for 0-1MiB, or get Xen (in the V1 path only) to do so.
> 
> Okay, if the tools do this in v2, then I think the compatibility v1
> path should indeed do so too (in the hypervisor).

Are you working on a patch?

Petr T

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13 10:01           ` [Xen-devel] " Jan Beulich
  2015-05-13 12:12             ` Petr Tesarik
@ 2015-05-13 12:12             ` Petr Tesarik
  1 sibling, 0 replies; 19+ messages in thread
From: Petr Tesarik @ 2015-05-13 12:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel, kexec, David Vrabel, Eric W. Biederman

On Wed, 13 May 2015 11:01:24 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 13.05.15 at 11:53, <david.vrabel@citrix.com> wrote:
> > On 13/05/15 09:12, Jan Beulich wrote:
> >>>>> On 13.05.15 at 09:35, <ebiederm@xmission.com> wrote:
> >>> Fundamentally if you are transfering control in long mode you have to
> >>> set up some page table.  I giant identity mapped page table that can use
> >>> 1G or 2M pages takes up very little memory, and can be very simply
> >>> and easily before the transfer of control takes place.
> >>>
> >>> All you have to do when you are in a half broken state is load cr3.
> >>> Possible after verifying a checksum.
> >>>
> >>> 640k in this case I don't think is particularly special, and certainly
> >>> not worth a special case.  The in-kernel implementation on x86_64 sets
> >>> up a page table for all of memory which because of the availability of
> >>> huge pages winds up being simple and trivial.
> >>>
> >>> Weird things like copying off the 640k region for the kexec-on-panic
> >>> case can be done in the adapter/purgatory piece that lives between the
> >>> two kernels.
> >>>
> >>> So at a very practical level I think we shouldn't have mappings for
> >>> special regions we should just have mappings for all of memory.
> >> 
> >> But in all of the above you (a) forget that setting up 1:1
> >> mappings for all memory isn't as simple as putting in place a
> >> couple of 1G pages - holes need to be accounted for and must
> >> at best be mapped UC (that's especially an issue with the low
> >> 640k) and (b) imply that whatever Linux behavior there is, Xen
> >> should mimic it (ignoring for example the fact that with the non-
> >> kernel based kexec which newer Xen and tools support such 1:1
> >> mapping setup doesn't appear to be required, i.e. [supposed]
> >> requirements change).
> > 
> > Xen's V2 kexec ABI builds 1:1 pages tables for the source and
> > destination pages and any additional regions requested by the guest (see
> > calls to machine_kexec_add_page()).  kexec-tools adds a "map-only"
> > segment for 0-1MiB when using the V2 ABI.
> > 
> > These page tables are built at load time (not at exec time) and in the
> > crash case are placed in the crash memory area.
> > 
> > When using the V1 ABI, there is no way for the tools to provide an
> > additional "map-only" segment so you'd have to get purgatory to add
> > mappings for 0-1MiB, or get Xen (in the V1 path only) to do so.
> 
> Okay, if the tools do this in v2, then I think the compatibility v1
> path should indeed do so too (in the hypervisor).

Are you working on a patch?

Petr T

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] More on kexec/purgatory handover
  2015-05-13 12:12             ` Petr Tesarik
@ 2015-05-13 12:28               ` Jan Beulich
  2015-05-13 12:28               ` Jan Beulich
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13 12:28 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: Andrew Cooper, Eric W. Biederman, kexec, David Vrabel, xen-devel

>>> On 13.05.15 at 14:12, <ptesarik@suse.cz> wrote:
> On Wed, 13 May 2015 11:01:24 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
>> Okay, if the tools do this in v2, then I think the compatibility v1
>> path should indeed do so too (in the hypervisor).
> 
> Are you working on a patch?

Not yet, but I'm going to as soon as I can find the time.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: More on kexec/purgatory handover
  2015-05-13 12:12             ` Petr Tesarik
  2015-05-13 12:28               ` Jan Beulich
@ 2015-05-13 12:28               ` Jan Beulich
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2015-05-13 12:28 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: Andrew Cooper, Eric W. Biederman, kexec, David Vrabel, xen-devel

>>> On 13.05.15 at 14:12, <ptesarik@suse.cz> wrote:
> On Wed, 13 May 2015 11:01:24 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
>> Okay, if the tools do this in v2, then I think the compatibility v1
>> path should indeed do so too (in the hypervisor).
> 
> Are you working on a patch?

Not yet, but I'm going to as soon as I can find the time.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-05-13 12:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-09 13:21 More on kexec/purgatory handover Petr Tesarik
2015-05-13  5:26 ` Eric W. Biederman
2015-05-13  5:26 ` Eric W. Biederman
2015-05-13  6:55   ` Jan Beulich
2015-05-13  7:35     ` Eric W. Biederman
2015-05-13  7:35     ` Eric W. Biederman
2015-05-13  8:12       ` Jan Beulich
2015-05-13  8:12       ` Jan Beulich
2015-05-13  9:07         ` Petr Tesarik
2015-05-13  9:07           ` Petr Tesarik
2015-05-13  9:53         ` [Xen-devel] " David Vrabel
2015-05-13 10:01           ` Jan Beulich
2015-05-13 10:01           ` [Xen-devel] " Jan Beulich
2015-05-13 12:12             ` Petr Tesarik
2015-05-13 12:28               ` Jan Beulich
2015-05-13 12:28               ` Jan Beulich
2015-05-13 12:12             ` Petr Tesarik
2015-05-13  9:53         ` David Vrabel
2015-05-13  6:55   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.