public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [BROKEN PATCH] kexec for ia64
       [not found]   ` <20040730155504.2a51b1fa.rddunlap@osdl.org>
@ 2004-08-04 13:07     ` Eric W. Biederman
  2004-08-04 16:24       ` Jesse Barnes
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eric W. Biederman @ 2004-08-04 13:07 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Jesse Barnes, linux-ia64, fastboot, linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> writes:

> On Mon, 26 Jul 2004 15:36:05 -0700 Jesse Barnes wrote:
> 
> | On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
> | >   o userspace tools need ia64 support

Correct.  But all they need are the ia64 bits of the ELF loader,
plus ia64 specific goo.  The generic part of the ELF loader is already
written.

> | >   o need to deal with in-flight DMA (see FIXME in machine_kexec)
> | 
> | After looking at it a little more, I suppose device_shutdown() should 
> | theoretically deal with this.
> | 
> | Also, it would be nice if there were a Documentation/kexec.txt or something in
> 
> | the full patch that describes all the pieces and what the arch dependent 
> | functions are responsible for.  Randy, do you have anything like that written
> 
> | up somewhere that you could include in the next spin of the patch?
> 
> Nope, sorry, I don't have anything like that.
> 
> Eric, do you have anything like Jesse asked about (arch-dependent
> requirements)?

Sort of fundamentally they are arch dependent.  

I believe that DMA FIXME is a red hearing.  Initially that patch
was targeted for a kernel without device_shutdown(), so I was
likely considering the old trick of running through all of the PCI
devices and disabling their bus master bit.

In general there are two arch specific pieces of information here.

1) What is the kernel's argument passing format, what arguments
   does the kernel need, and how do you derive those arguments
   from a running kernel.

   Usually this is at least the kernels memory map.  But the binary
   arguments a kernel accepts/requires vary widely from architecture
   to architecture. 

(This is user space only)

2) The code itself in machine_kexec.c and relocate_kernel.S needs
   to place the machine in a state where virtual and physical addresses
   are identity mapped.  And the arch specific registers are in some
   well defined state.  Usually the least setup you can guarantee to make
   it work the better.  

(This is the kernel side)

We should probably start capturing these pieces of information in
a kexec.3 man page.  Volunteers?

For ia64 in particular I believe the binary arguments are the
FPSWA and EFI memory map, and the firmware entry points (PAL and SAL
and EFI).

As for the physical mode transition state.  I believe that
is largely defined by the current set of kernel bootloaders.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BROKEN PATCH] kexec for ia64
  2004-08-04 13:07     ` [BROKEN PATCH] kexec for ia64 Eric W. Biederman
@ 2004-08-04 16:24       ` Jesse Barnes
  2004-08-04 23:33       ` Grant Grundler
  2004-08-05 16:45       ` Luck, Tony
  2 siblings, 0 replies; 9+ messages in thread
From: Jesse Barnes @ 2004-08-04 16:24 UTC (permalink / raw)
  To: Eric W. Biederman, khalid.aziz
  Cc: Randy.Dunlap, linux-ia64, fastboot, linux-kernel

On Wednesday, August 4, 2004 6:07 am, Eric W. Biederman wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> writes:
> > On Mon, 26 Jul 2004 15:36:05 -0700 Jesse Barnes wrote:
> > | On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
> > | >   o userspace tools need ia64 support
>
> Correct.  But all they need are the ia64 bits of the ELF loader,
> plus ia64 specific goo.  The generic part of the ELF loader is already
> written.

I think Khalid might already have these bits done.

> Sort of fundamentally they are arch dependent.
>
> I believe that DMA FIXME is a red hearing.  Initially that patch
> was targeted for a kernel without device_shutdown(), so I was
> likely considering the old trick of running through all of the PCI
> devices and disabling their bus master bit.

Yeah, I added that bit to remind me to think about it.

> 1) What is the kernel's argument passing format, what arguments

Right, and that should be pretty straightforward.

> 2) The code itself in machine_kexec.c and relocate_kernel.S needs
>    to place the machine in a state where virtual and physical addresses
>    are identity mapped.  And the arch specific registers are in some
>    well defined state.  Usually the least setup you can guarantee to make
>    it work the better.
>
> (This is the kernel side)
>
> We should probably start capturing these pieces of information in
> a kexec.3 man page.  Volunteers?
>
> For ia64 in particular I believe the binary arguments are the
> FPSWA and EFI memory map, and the firmware entry points (PAL and SAL
> and EFI).

With the addition of some ACPI tables and such.  I don't think those are freed 
by the kernel right now though, so it should be pretty easy to point at the 
originals from the newly kexec'd kernel, or make copies.

Jesse

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BROKEN PATCH] kexec for ia64
  2004-08-04 13:07     ` [BROKEN PATCH] kexec for ia64 Eric W. Biederman
  2004-08-04 16:24       ` Jesse Barnes
@ 2004-08-04 23:33       ` Grant Grundler
  2004-08-05  2:14         ` [Fastboot] " Eric W. Biederman
  2004-08-05 16:45       ` Luck, Tony
  2 siblings, 1 reply; 9+ messages in thread
From: Grant Grundler @ 2004-08-04 23:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Randy.Dunlap, Jesse Barnes, linux-ia64, fastboot, linux-kernel

On Wed, Aug 04, 2004 at 07:07:04AM -0600, Eric W. Biederman wrote:
> Initially that patch
> was targeted for a kernel without device_shutdown(), so I was
> likely considering the old trick of running through all of the PCI
> devices and disabling their bus master bit.

Blindly disabling all PCI bus master bits will also kill VGA/serial
console and any USB keyboard attached to the system.

I'll comment more on the "DMA is a Red Herring" when I can read
more what it is about.

grant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
  2004-08-04 23:33       ` Grant Grundler
@ 2004-08-05  2:14         ` Eric W. Biederman
  2004-08-05 15:39           ` Grant Grundler
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2004-08-05  2:14 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Randy.Dunlap, linux-ia64, Jesse Barnes, linux-kernel, fastboot

Grant Grundler <iod00d@hp.com> writes:

> On Wed, Aug 04, 2004 at 07:07:04AM -0600, Eric W. Biederman wrote:
> > Initially that patch
> > was targeted for a kernel without device_shutdown(), so I was
> > likely considering the old trick of running through all of the PCI
> > devices and disabling their bus master bit.
> 
> Blindly disabling all PCI bus master bits will also kill VGA/serial
> console and any USB keyboard attached to the system.

VGA/serial console devices rarely need to do be bus masters so they
should be fine.

> I'll comment more on the "DMA is a Red Herring" when I can read
> more what it is about.

Most of those cases don't matter as the driver should always be calling
pci_set_master() on startup.  Disabling all the bus master bits on ioxapics
in pci space would likely cripple the system.  As they are architectural
hardware and rarely have pci drivers that can enable them.

In the general case it appears to be overkill, incorrect and
insufficient to disable bus mastering on all PCI devices.  Which is
why device_shutdown() calls device specific code.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
  2004-08-05  2:14         ` [Fastboot] " Eric W. Biederman
@ 2004-08-05 15:39           ` Grant Grundler
  2004-08-05 16:44             ` Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: Grant Grundler @ 2004-08-05 15:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Grant Grundler, Randy.Dunlap, linux-ia64, Jesse Barnes,
	linux-kernel, fastboot

On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
> VGA/serial console devices rarely need to do be bus masters so they
> should be fine.

yeah - you are right. I wasn't thinking.
Can anyone comment on UGA or other console devices?

> In the general case it appears to be overkill, incorrect and
> insufficient to disable bus mastering on all PCI devices.  Which is
> why device_shutdown() calls device specific code.

Is anyone else considering using kexec() to recover from a oops/panic?
What is the risk calling multiple device_shutdown() will expose another panic?

While calling a device specific cleanup is best, I worry about how
much code/data gets touched in this path. I was hoping something
simple like twiddling bus master bit would be sufficient.
If it's not, oh well.

thanks,
grant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
  2004-08-05 15:39           ` Grant Grundler
@ 2004-08-05 16:44             ` Eric W. Biederman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric W. Biederman @ 2004-08-05 16:44 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Randy.Dunlap, linux-ia64, fastboot, Jesse Barnes, linux-kernel

Grant Grundler <iod00d@hp.com> writes:

> On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:

> > In the general case it appears to be overkill, incorrect and
> > insufficient to disable bus mastering on all PCI devices.  Which is
> > why device_shutdown() calls device specific code.
> 
> Is anyone else considering using kexec() to recover from a oops/panic?

Yes.  That is what most of the recent discussion was about.  Considering
this was one of the subjects brought up at the kernel summit I'm surprised
a lot of people have been thinking that way.

> What is the risk calling multiple device_shutdown() will expose another panic?

It has been agreed that device_shutdown() will not be called in the panic
path.  What gets called on panic or other fatal case is going to be
a streamlined code path, that is little more than a jump to the
previously loaded kernel.  

> While calling a device specific cleanup is best, I worry about how
> much code/data gets touched in this path. I was hoping something
> simple like twiddling bus master bit would be sufficient.
> If it's not, oh well.

The kernel on the other side of the kexec gets to do this.  It will
run out of memory reserved for it in the kernel that panic'd since
boot time.

That is not perfect protection but it simple and quite good.
Especially with the addition of verifying a hash of the new kernel
before it messes with the hardware.  (But that code gets to live
in /sbin/kexec and added as a prefix to the recovery kernel)

I don't expect that is enough to give a full recovery but it
should be sufficient to take a core dump of the system or
do any number of other interesting things.  But before
running a full kernel it is expected that the entire system will
be reset, to get everything back into a sane state.

And of course all of this is largely architecture independent
so that the basic code should work on any architecture.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 16:45       ` Luck, Tony
  2004-08-05 17:05         ` [Fastboot] " Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2004-08-05 16:45 UTC (permalink / raw)
  To: Jesse Barnes, Eric W. Biederman, khalid.aziz
  Cc: Randy.Dunlap, linux-ia64, fastboot, linux-kernel

Jesse Barnes wrote:
>With the addition of some ACPI tables and such.  I don't think 
>those are freed by the kernel right now though, so it should
>be pretty easy to point at the originals from the newly kexec'd
>kernel, or make copies.

The "trim_bottom" and "trim_top" functions currently modify
the memory map in place.  But this would only make a difference
if you tried to kexec a kernel with a smaller granule size than
the originally running kernel, and even then would only
result in missing seeing some memory that you might have been
able to use.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] RE: [BROKEN PATCH] kexec for ia64
  2004-08-05 16:45       ` Luck, Tony
@ 2004-08-05 17:05         ` Eric W. Biederman
  2004-08-05 19:18           ` Khalid Aziz
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2004-08-05 17:05 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jesse Barnes, khalid.aziz, Randy.Dunlap, linux-ia64, linux-kernel,
	fastboot


Hmm. Your mailer did not add any references lines.


"Luck, Tony" <tony.luck@intel.com> writes:

> Jesse Barnes wrote:
> >With the addition of some ACPI tables and such.  I don't think 
> >those are freed by the kernel right now though, so it should
> >be pretty easy to point at the originals from the newly kexec'd
> >kernel, or make copies.
> 
> The "trim_bottom" and "trim_top" functions currently modify
> the memory map in place.  But this would only make a difference
> if you tried to kexec a kernel with a smaller granule size than
> the originally running kernel, and even then would only
> result in missing seeing some memory that you might have been
> able to use.

On x86 and x86-64 we can recover the memory map from /proc/iomem.

Does that work on ia64?  Can that be fixed to work on ia64?

All of that information needs to get exported to user space so
/sbin/kexec can pass it to the new kernel.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Fastboot] RE: [BROKEN PATCH] kexec for ia64
  2004-08-05 17:05         ` [Fastboot] " Eric W. Biederman
@ 2004-08-05 19:18           ` Khalid Aziz
  0 siblings, 0 replies; 9+ messages in thread
From: Khalid Aziz @ 2004-08-05 19:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Luck, Tony, Jesse Barnes, Randy Dunlap, Linux ia64, LKML,
	fastboot

On Thu, 2004-08-05 at 11:05, Eric W. Biederman wrote:
> Hmm. Your mailer did not add any references lines.
> 
> 
> "Luck, Tony" <tony.luck@intel.com> writes:
> 
> > Jesse Barnes wrote:
> > >With the addition of some ACPI tables and such.  I don't think 
> > >those are freed by the kernel right now though, so it should
> > >be pretty easy to point at the originals from the newly kexec'd
> > >kernel, or make copies.
> > 
> > The "trim_bottom" and "trim_top" functions currently modify
> > the memory map in place.  But this would only make a difference
> > if you tried to kexec a kernel with a smaller granule size than
> > the originally running kernel, and even then would only
> > result in missing seeing some memory that you might have been
> > able to use.
> 
> On x86 and x86-64 we can recover the memory map from /proc/iomem.
> 
> Does that work on ia64?  Can that be fixed to work on ia64?

No, it does not work on ia64. Once I have basic code in place to get
somewhat working kexec on ia64, I am considering looking into fixing
/proc/iomem.

> 
> All of that information needs to get exported to user space so
> /sbin/kexec can pass it to the new kernel.
> 
> Eric

-- 
Khalid

====================================================================
Khalid Aziz                                Linux and Open Source Lab
(970)898-9214                                        Hewlett-Packard
khalid_aziz@hp.com                                  Fort Collins, CO

"The Linux kernel is subject to relentless development" 
				- Alessandro Rubini



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-08-05 19:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200407261524.40804.jbarnes@engr.sgi.com>
     [not found] ` <200407261536.05133.jbarnes@engr.sgi.com>
     [not found]   ` <20040730155504.2a51b1fa.rddunlap@osdl.org>
2004-08-04 13:07     ` [BROKEN PATCH] kexec for ia64 Eric W. Biederman
2004-08-04 16:24       ` Jesse Barnes
2004-08-04 23:33       ` Grant Grundler
2004-08-05  2:14         ` [Fastboot] " Eric W. Biederman
2004-08-05 15:39           ` Grant Grundler
2004-08-05 16:44             ` Eric W. Biederman
2004-08-05 16:45       ` Luck, Tony
2004-08-05 17:05         ` [Fastboot] " Eric W. Biederman
2004-08-05 19:18           ` Khalid Aziz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox