* Re: Saveoops: Making Kexec purgatory position-independent?
2011-02-27 0:57 ` Eric W. Biederman
@ 2011-02-27 1:15 ` H. Peter Anvin
2011-02-27 13:24 ` Ahmed S. Darwish
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2011-02-27 1:15 UTC (permalink / raw)
To: Eric W. Biederman
Cc: X86-ML, KEXEC-ML, Ahmed S. Darwish, Haren Myneni, Simon Horman,
Ingo Molnar, Vivek Goyal
On 02/26/2011 04:57 PM, Eric W. Biederman wrote:
>>
>> I can't see any sane reason to *not* make kexec purgatory
>> position-independent. It is the obvious thing to do.
>
> This isn't a case of the code not being position independent. This is
> case of where the relocations are applied.
>
> I can see a couple of handling this with different tradeoffs.
>
> 1) We teach bootloaders how to load two kernels at once. This
> completely avoids the purgatory, as it is replaced by code in the
> bootloader that already exists to load the primary kernel and setup
> it's arguments.
>
> 2) We add minimal relocation processing to purgatory, allowing us to do
> the setup for the second kernel extremely early and allow it to be
> compiled into the first kernel.
>
> 3) We come up with a scheme where we don't share code and the first
> kernel copies the firmware information to place where the second
> kernel can get at it, and uses it's own home grown stub and not
> purgatory.
>
> I think this whole thing can be prototyped easily with a getting /sbin/kexec
> to load to a fixed address and then baking that section into the primary
> kernel. I'm not convinced that directly using /sbin/kexec is the right
> way forward to handle the general case. This is something where the
> devil is in the details.
>
OK... I'm clearly missing something... what code is not being
position-independent and which code needs relocations applied?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Saveoops: Making Kexec purgatory position-independent?
2011-02-27 0:57 ` Eric W. Biederman
2011-02-27 1:15 ` H. Peter Anvin
@ 2011-02-27 13:24 ` Ahmed S. Darwish
2011-02-27 14:16 ` Vivek Goyal
2011-02-27 18:32 ` Eric W. Biederman
2011-02-28 1:38 ` Simon Horman
2011-02-28 1:39 ` H. Peter Anvin
3 siblings, 2 replies; 10+ messages in thread
From: Ahmed S. Darwish @ 2011-02-27 13:24 UTC (permalink / raw)
To: Eric W. Biederman
Cc: X86-ML, KEXEC-ML, Haren Myneni, Simon Horman, H. Peter Anvin,
Ingo Molnar, Vivek Goyal
On Sat, Feb 26, 2011 at 04:57:30PM -0800, Eric W. Biederman wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> >
> > I can't see any sane reason to *not* make kexec purgatory
> > position-independent. It is the obvious thing to do.
>
> This isn't a case of the code not being position independent. This is
> case of where the relocations are applied.
>
> I can see a couple of handling this with different tradeoffs.
>
> 1) We teach bootloaders how to load two kernels at once. This
> completely avoids the purgatory, as it is replaced by code in the
> bootloader that already exists to load the primary kernel and setup
> it's arguments.
>
This is in fact my plan. Using Syslinux, I loaded 'purgatory.ro' to RAM
thinking that it will still be needed. Re-checking the purgatory code
now after reading above note, it seems it does 5 important points:
a) reset the VGA (if instructed)
b) reset the PIC to legacy mode (if instructed)
c) check the overall integrity of the second kernel image (SHA-2)
d) setup the environment for second kernel entry (switch back to
32-bit protected mode in x86-64, reset registers, etc)
e) saves the first 640K in a backup region
So (a) and (b) can be done elsewhere if needed; (c) isn't needed cause
if the bootloader corrupts images, we have bigger problems; (d) can be
done as a stub; (e), on the contrary of kdump, isn't critical for my
goals.
Am I missing an important detail?
> 2) We add minimal relocation processing to purgatory, allowing us to do
> the setup for the second kernel extremely early and allow it to be
> compiled into the first kernel.
>
> 3) We come up with a scheme where we don't share code and the first
> kernel copies the firmware information to place where the second
> kernel can get at it, and uses it's own home grown stub and not
> purgatory.
>
Sorry, but how the third point differs from the first? I thought they
were complementary.
> I think this whole thing can be prototyped easily with a getting /sbin/kexec
> to load to a fixed address and then baking that section into the primary
> kernel. ...
I'll prototype this now by loading the second kernel (bzImage), using
syslinux, without the purgatory. Let's hope I won't face many
surprises.
> ... I'm not convinced that directly using /sbin/kexec is the right
> way forward to handle the general case. This is something where the
> devil is in the details.
>
Lots of details per se; spent last week exploring Kexec user and
kernel code to understand how it does its magic.
> Eric
thanks,
--
Darwish
http://darwish.07.googlepages.com
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Saveoops: Making Kexec purgatory position-independent?
2011-02-27 13:24 ` Ahmed S. Darwish
@ 2011-02-27 14:16 ` Vivek Goyal
2011-02-27 15:43 ` Ahmed S. Darwish
2011-02-27 18:32 ` Eric W. Biederman
1 sibling, 1 reply; 10+ messages in thread
From: Vivek Goyal @ 2011-02-27 14:16 UTC (permalink / raw)
To: Ahmed S. Darwish
Cc: X86-ML, KEXEC-ML, Haren Myneni, Simon Horman, Eric W. Biederman,
H. Peter Anvin, Ingo Molnar
On Sun, Feb 27, 2011 at 03:24:09PM +0200, Ahmed S. Darwish wrote:
> On Sat, Feb 26, 2011 at 04:57:30PM -0800, Eric W. Biederman wrote:
> > "H. Peter Anvin" <hpa@zytor.com> writes:
> > >
> > > I can't see any sane reason to *not* make kexec purgatory
> > > position-independent. It is the obvious thing to do.
> >
> > This isn't a case of the code not being position independent. This is
> > case of where the relocations are applied.
> >
> > I can see a couple of handling this with different tradeoffs.
> >
> > 1) We teach bootloaders how to load two kernels at once. This
> > completely avoids the purgatory, as it is replaced by code in the
> > bootloader that already exists to load the primary kernel and setup
> > it's arguments.
> >
>
> This is in fact my plan. Using Syslinux, I loaded 'purgatory.ro' to RAM
> thinking that it will still be needed. Re-checking the purgatory code
> now after reading above note, it seems it does 5 important points:
>
> a) reset the VGA (if instructed)
> b) reset the PIC to legacy mode (if instructed)
> c) check the overall integrity of the second kernel image (SHA-2)
> d) setup the environment for second kernel entry (switch back to
> 32-bit protected mode in x86-64, reset registers, etc)
> e) saves the first 640K in a backup region
>
> So (a) and (b) can be done elsewhere if needed; (c) isn't needed cause
> if the bootloader corrupts images, we have bigger problems;
First kernel boot itself could corrupt the second kernel?
Secondly, Once you are booted sucessfully, I guess same can be used for
regular kernel crash without reloading the kdump kernel (For poeple who are
looking to capture just logs). If yes, it will be good to check integrity
of second kernel image.
Is bootloader going to set up the kernels in such a way that second kernel
boot is not going to overwrite the first kernel's memory? Otherwise how
do we make sure that second kernel does not overwrite the oops/logs
information you are trying to save.
On a side note, does UEFI provide some functionality where first kernel
can save some limited amount of buffer and retrieve it back when a new
kernel is booting. I might be completely into weedes here. This is just
based on discussion with somebody long back who mentioned that UEFI
might allow us to save kernel oops and the retrieve it back when fresh
kernel is booting.
Do two kernels boot from mutually exclusive locations or you will continue
to copy new kernel at some low meory address and boot from there? Copying
will again lead to issue of how not to overwrite or concept of backup
region. For not copying we need to make sure what the highest address
kernel can boot from and also letting first kernel know not to overwrite
second kernel.
Above are just some random thoughts without going into details of the proposal.
Thanks
Vivek
>(d) can be
> done as a stub;
>(e), on the contrary of kdump, isn't critical for my
> goals.
>
> Am I missing an important detail?
>
> > 2) We add minimal relocation processing to purgatory, allowing us to do
> > the setup for the second kernel extremely early and allow it to be
> > compiled into the first kernel.
> >
> > 3) We come up with a scheme where we don't share code and the first
> > kernel copies the firmware information to place where the second
> > kernel can get at it, and uses it's own home grown stub and not
> > purgatory.
> >
>
> Sorry, but how the third point differs from the first? I thought they
> were complementary.
>
> > I think this whole thing can be prototyped easily with a getting /sbin/kexec
> > to load to a fixed address and then baking that section into the primary
> > kernel. ...
>
> I'll prototype this now by loading the second kernel (bzImage), using
> syslinux, without the purgatory. Let's hope I won't face many
> surprises.
>
> > ... I'm not convinced that directly using /sbin/kexec is the right
> > way forward to handle the general case. This is something where the
> > devil is in the details.
> >
>
> Lots of details per se; spent last week exploring Kexec user and
> kernel code to understand how it does its magic.
>
> > Eric
>
> thanks,
>
> --
> Darwish
> http://darwish.07.googlepages.com
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saveoops: Making Kexec purgatory position-independent?
2011-02-27 14:16 ` Vivek Goyal
@ 2011-02-27 15:43 ` Ahmed S. Darwish
0 siblings, 0 replies; 10+ messages in thread
From: Ahmed S. Darwish @ 2011-02-27 15:43 UTC (permalink / raw)
To: Vivek Goyal
Cc: X86-ML, KEXEC-ML, Haren Myneni, Simon Horman, Eric W. Biederman,
H. Peter Anvin, Ingo Molnar
On Sun, Feb 27, 2011 at 09:16:00AM -0500, Vivek Goyal wrote:
> On Sun, Feb 27, 2011 at 03:24:09PM +0200, Ahmed S. Darwish wrote:
> >
> > This is in fact my plan. Using Syslinux, I loaded 'purgatory.ro' to RAM
> > thinking that it will still be needed. Re-checking the purgatory code
> > now after reading above note, it seems it does 5 important points:
> >
> > a) reset the VGA (if instructed)
> > b) reset the PIC to legacy mode (if instructed)
> > c) check the overall integrity of the second kernel image (SHA-2)
> > d) setup the environment for second kernel entry (switch back to
> > 32-bit protected mode in x86-64, reset registers, etc)
> > e) saves the first 640K in a backup region
> >
> > So (a) and (b) can be done elsewhere if needed; (c) isn't needed cause
> > if the bootloader corrupts images, we have bigger problems;
>
> First kernel boot itself could corrupt the second kernel?
>
Indeed; that didn't pass my mind though. Linus was also quite nervous
of writing to disk upon panic, so some validity guarantee of the second
kernel image will be asked for.
That means, I guess, that the purgatory will still be needed. Eric, any
thoughts?
> Secondly, Once you are booted sucessfully, I guess same can be used for
> regular kernel crash without reloading the kdump kernel (For poeple who are
> looking to capture just logs). If yes, it will be good to check integrity
> of second kernel image.
>
Didn't understand the above paragraph, but in all cases, it seems that
checking the second kernel integrity will be quite important.
Maybe we'll also need to check the integrity of the integrator :)
> Is bootloader going to set up the kernels in such a way that second kernel
> boot is not going to overwrite the first kernel's memory? Otherwise how
> do we make sure that second kernel does not overwrite the oops/logs
> information you are trying to save.
>
There will be some sort of protection for the logs area, yes. How such
communication between the two kernels will occur, I don't know. For now,
I'm focused on loading and booting the second kernel.
> On a side note, does UEFI provide some functionality where first kernel
> can save some limited amount of buffer and retrieve it back when a new
> kernel is booting. I might be completely into weedes here. This is just
> based on discussion with somebody long back who mentioned that UEFI
> might allow us to save kernel oops and the retrieve it back when fresh
> kernel is booting.
>
I have zero contact with EFI, though there's some sort of ACPI-interfaced
nonvolatile RAM in modern high-end x86 servers.
> Do two kernels boot from mutually exclusive locations or you will continue
> to copy new kernel at some low meory address and boot from there? Copying
> will again lead to issue of how not to overwrite or concept of backup
> region. For not copying we need to make sure what the highest address
> kernel can boot from and also letting first kernel know not to overwrite
> second kernel.
>
Yes, I'll need to utilize the boot protocol for this, making sure the
first kernel mark the second one's loaded image (and its purgatory) as
reserved.
> Above are just some random thoughts without going into details of the
> proposal.
Appreciated! I need all input possible at this stage.
> Thanks
> Vivek
>
thanks,
--
Darwish
http://darwish.07.googlepages.com
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saveoops: Making Kexec purgatory position-independent?
2011-02-27 13:24 ` Ahmed S. Darwish
2011-02-27 14:16 ` Vivek Goyal
@ 2011-02-27 18:32 ` Eric W. Biederman
1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2011-02-27 18:32 UTC (permalink / raw)
To: Ahmed S. Darwish
Cc: X86-ML, KEXEC-ML, Haren Myneni, Simon Horman, H. Peter Anvin,
Ingo Molnar, Vivek Goyal
"Ahmed S. Darwish" <darwish.07@gmail.com> writes:
> On Sat, Feb 26, 2011 at 04:57:30PM -0800, Eric W. Biederman wrote:
>> "H. Peter Anvin" <hpa@zytor.com> writes:
>> >
>> > I can't see any sane reason to *not* make kexec purgatory
>> > position-independent. It is the obvious thing to do.
>>
>> This isn't a case of the code not being position independent. This is
>> case of where the relocations are applied.
>>
>> I can see a couple of handling this with different tradeoffs.
>>
>> 1) We teach bootloaders how to load two kernels at once. This
>> completely avoids the purgatory, as it is replaced by code in the
>> bootloader that already exists to load the primary kernel and setup
>> it's arguments.
>>
>
> This is in fact my plan. Using Syslinux, I loaded 'purgatory.ro' to RAM
> thinking that it will still be needed. Re-checking the purgatory code
> now after reading above note, it seems it does 5 important points:
>
> a) reset the VGA (if instructed)
> b) reset the PIC to legacy mode (if instructed)
> c) check the overall integrity of the second kernel image (SHA-2)
> d) setup the environment for second kernel entry (switch back to
> 32-bit protected mode in x86-64, reset registers, etc)
> e) saves the first 640K in a backup region
>
> So (a) and (b) can be done elsewhere if needed; (c) isn't needed cause
> if the bootloader corrupts images, we have bigger problems; (d) can be
> done as a stub; (e), on the contrary of kdump, isn't critical for my
> goals.
(c) Is needed somewhere on the initialization path, because we don't start
running until after a kernel has crashed. For a first prototype it
can probably be skipped.
(e) Is there because the first 640K is the only memory of the original
kernel that we use.
I suspect the copying of the first 640K to somewhere reserved for it,
and the verifying the sha256 checksum are things we can move into the
kernels boot.
But seriously prototype it and get something that works. I don't know
of a case where in practice I have gotten a checksum failure.
Saving the first 640K is sort of important but again we don't do much
down there except boot secondary cpus so you can probably deal with that
later.
There is also some magic we do with ELF headers to describe memory
regions and to find elf notes written by the crashed kernel when it goes
down. Those notes the existing tools use to find all kinds of things.
See vmcore-to-dmesg in the /sbin/kexec source tree. If you don't want
the full core I expect you want to be able to run that program.
I'm not ready to change how the crash recovery kernel on finds what is
going on. The elf header and elf notes. It is already kernel agnostic
etc, but I am totally willing to change how we implement it.
Eric
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saveoops: Making Kexec purgatory position-independent?
2011-02-27 0:57 ` Eric W. Biederman
2011-02-27 1:15 ` H. Peter Anvin
2011-02-27 13:24 ` Ahmed S. Darwish
@ 2011-02-28 1:38 ` Simon Horman
2011-02-28 1:39 ` H. Peter Anvin
3 siblings, 0 replies; 10+ messages in thread
From: Simon Horman @ 2011-02-28 1:38 UTC (permalink / raw)
To: Eric W. Biederman
Cc: X86-ML, KEXEC-ML, Ahmed S. Darwish, Haren Myneni, H. Peter Anvin,
Ingo Molnar, Vivek Goyal
On Sat, Feb 26, 2011 at 04:57:30PM -0800, Eric W. Biederman wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
>
> > On 02/26/2011 08:20 AM, Ahmed S. Darwish wrote:
> >> Hi,
> >>
> >> I'm continuing work on 'Saveoops', saving both early and normal Linux
> >> oops log to disk upon panic [*], using Kexec and bootloaders this time.
> >>
> >> Purgatory, the transitional mini-kernel used by kexec, is a relocatable
> >> ELF file. Userspace kexec-tools finds the final load address of such
> >> code (by parsing /proc/iomem, etc) and then applies the relocations
> >> itself before passing the now-ready executable image to kernel.
> >>
> >> Since capturing early oopses is the major goal, doing such relocation
> >> in userspace won't fit my purposes. Two options remain:
> >> - relocate purgatory entries in the kernel early boot path
> >> - or compile purgatory as position-independent, thus simplifying
> >> the kernel load logic
> >>
> >> The former will add extra logic in a sensitive path (early boot), while
> >> the latter will require changes inside the purgatory code itself,
> >> especially i386 assembly files.
> >>
> >> Any preferable option from our kexec and x86 maintainers?
> >>
> >> thanks!
> >>
> >> [*] http://news.gmane.org/find-root.php?message_id=<20110125134748.GA10051@laptop>
> >> http://news.gmane.org/find-root.php?message_id=<20110126124954.GC24527@laptop>
> >>
> >
> > I can't see any sane reason to *not* make kexec purgatory
> > position-independent. It is the obvious thing to do.
>
> This isn't a case of the code not being position independent. This is
> case of where the relocations are applied.
>
> I can see a couple of handling this with different tradeoffs.
>
> 1) We teach bootloaders how to load two kernels at once. This
> completely avoids the purgatory, as it is replaced by code in the
> bootloader that already exists to load the primary kernel and setup
> it's arguments.
I can think of plenty of scenarios where that isn't entirely useful.
For example using kexec as a boot loader and as such not necessarily having
any idea what the second kernel will look like at the time that the first
kernel is built. Am I missing something?
>
> 2) We add minimal relocation processing to purgatory, allowing us to do
> the setup for the second kernel extremely early and allow it to be
> compiled into the first kernel.
>
> 3) We come up with a scheme where we don't share code and the first
> kernel copies the firmware information to place where the second
> kernel can get at it, and uses it's own home grown stub and not
> purgatory.
>
> I think this whole thing can be prototyped easily with a getting /sbin/kexec
> to load to a fixed address and then baking that section into the primary
> kernel. I'm not convinced that directly using /sbin/kexec is the right
> way forward to handle the general case. This is something where the
> devil is in the details.
>
> Eric
>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Saveoops: Making Kexec purgatory position-independent?
2011-02-27 0:57 ` Eric W. Biederman
` (2 preceding siblings ...)
2011-02-28 1:38 ` Simon Horman
@ 2011-02-28 1:39 ` H. Peter Anvin
3 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2011-02-28 1:39 UTC (permalink / raw)
To: Eric W. Biederman
Cc: X86-ML, KEXEC-ML, Ahmed S. Darwish, Haren Myneni, Simon Horman,
Ingo Molnar, Vivek Goyal
On 02/26/2011 04:57 PM, Eric W. Biederman wrote:
>
> This isn't a case of the code not being position independent. This is
> case of where the relocations are applied.
>
This is a generic comment and may not apply to this particular case, but
it's pretty easy on x86 to build a chunk of code which is
self-relocating, meaning that it can be jumped to at any address without
any pre-relocation and do the necessary relocations itself.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 10+ messages in thread