* Re: [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86)
[not found] ` <m1y8le3cto.fsf@ebiederm.dsl.xmission.com>
@ 2004-07-21 1:07 ` Mike Snitzer
2004-07-21 19:09 ` [BUG] e1000 on reboot/boot path Eric W. Biederman
2004-07-21 19:40 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Eric W. Biederman
0 siblings, 2 replies; 3+ messages in thread
From: Mike Snitzer @ 2004-07-21 1:07 UTC (permalink / raw)
To: fastboot, linux-kernel, netdev; +Cc: Eric W. Biederman
On 20 Jul 2004 09:42:11 -0600, Eric W. Biederman <ebiederm@xmission.com> wrote:
...
> I will take a look at this as soon as I finish the x86-64 port.
> Basically I am down to restructuring the generic code so it
> does not use init_mm to identity map the reboot_code_buffer,
> but instead does something architecture specific.
>
> Removing the dependence on init_mm is going to be painful
> since it touches the generic code. But I really don't see
> a choice at this point.
>
> Once everything works after that change, I think kexec should
> be much closer to kernel inclusion.
As kexec approaches kernel inclusion one thing to be mindful of is
that, prior to kexec'ing, loaded drivers must let go of their PCI
resources in order for the newly kexec'd kernel's device drivers to
work. Device drivers that don't properly let go of their resources
will likely be left in a funky state. I've been bitten by this issue
with the e1000 (<= 5.2.52-k4) driver. With an "Ethernet controller:
Intel Corp. 82546EB Gigabit Ethernet Controller", device id# 1010, I
get:
Intel(R) PRO/1000 Network Driver - version 5.2.30.1-k1
Copyright (c) 1999-2004 Intel Corporation.
PCI: Enabling device 04:05.0 (0000 -> 0003)
Uhhuh. NMI received for unknown reason 31.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
The EEPROM Checksum Is Not Valid
PCI: Enabling device 04:05.1 (0000 -> 0003)
The EEPROM Checksum Is Not Valid
The 5.2.52-k4 has toned down, yet basically the same, errors. This
message results when kexec'ing from a 2.6.7 kernel with the e1000
builtin; once I made the e1000 a module and unloaded it prior to
kexec'ing all was fine.
Looking at the e1000 source it is clear that removing the e1000 module
triggers e1000_remove() via module_exit()'s pci_unregister_driver().
Once the e1000 let go of the PCI resources the kexec'd kernel's e1000
driver was happy. Kexec looks to call all loaded modules' shutdown()
routine. The e1000 doesn't have shutdown(); but it does have a
remove().
In preparation for kexec to be viable for 2.6 inclusion it would
appear that some effort will need to go in to making sure all drivers
in the kernel actually let go of their resources. Should there be an
audit to verify all device drivers have a shutdown()?
Another option is to call each module's remove() iff the module
doesn't have shutdown(). This would require changing
drivers/base/power/shutdown.c device_shutdown() to include ... else if
(dev->driver && dev->driver->remove) ... As a side-effect it would
make drivers like the e1000 safe for use with kexec.
thoughts?
Mike
^ permalink raw reply [flat|nested] 3+ messages in thread
* [BUG] e1000 on reboot/boot path.
2004-07-21 1:07 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Mike Snitzer
@ 2004-07-21 19:09 ` Eric W. Biederman
2004-07-21 19:40 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Eric W. Biederman
1 sibling, 0 replies; 3+ messages in thread
From: Eric W. Biederman @ 2004-07-21 19:09 UTC (permalink / raw)
To: Mike Snitzer; +Cc: fastboot, linux-kernel, netdev
Mike Snitzer <snitzer@gmail.com> writes:
So you have a problem that the e1000 driver does not properly
shutdown the e1000 in the reboot path (no code). But it does properly
cleanup when you remove the module.
> I've been bitten by this issue
> with the e1000 (<= 5.2.52-k4) driver. With an "Ethernet controller:
> Intel Corp. 82546EB Gigabit Ethernet Controller", device id# 1010, I
> get:
>
> Intel(R) PRO/1000 Network Driver - version 5.2.30.1-k1
> Copyright (c) 1999-2004 Intel Corporation.
> PCI: Enabling device 04:05.0 (0000 -> 0003)
> Uhhuh. NMI received for unknown reason 31.
> Dazed and confused, but trying to continue
> Do you have a strange power saving mode enabled?
> The EEPROM Checksum Is Not Valid
> PCI: Enabling device 04:05.1 (0000 -> 0003)
> The EEPROM Checksum Is Not Valid
>
> The 5.2.52-k4 has toned down, yet basically the same, errors. This
> message results when kexec'ing from a 2.6.7 kernel with the e1000
> builtin; once I made the e1000 a module and unloaded it prior to
> kexec'ing all was fine.
>
> Looking at the e1000 source it is clear that removing the e1000 module
> triggers e1000_remove() via module_exit()'s pci_unregister_driver().
> Once the e1000 let go of the PCI resources the kexec'd kernel's e1000
> driver was happy. Kexec looks to call all loaded modules' shutdown()
> routine. The e1000 doesn't have shutdown(); but it does have a
> remove().
>
Eric
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86)
2004-07-21 1:07 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Mike Snitzer
2004-07-21 19:09 ` [BUG] e1000 on reboot/boot path Eric W. Biederman
@ 2004-07-21 19:40 ` Eric W. Biederman
1 sibling, 0 replies; 3+ messages in thread
From: Eric W. Biederman @ 2004-07-21 19:40 UTC (permalink / raw)
To: Mike Snitzer; +Cc: fastboot, linux-kernel, netdev
Mike Snitzer <snitzer@gmail.com> writes:
> In preparation for kexec to be viable for 2.6 inclusion it would
> appear that some effort will need to go in to making sure all drivers
> in the kernel actually let go of their resources. Should there be an
> audit to verify all device drivers have a shutdown()?
Right this is a janitorial type task that needs to done.
- First there are a lot of devices that don't need a shutdown
method. Mostly either because they are not stateful or
their initialization method can bring them out of whatever
state they are in.
Although if the kexec stuff gets in this might just happen
with a pile of bug reports like yours.
> Another option is to call each module's remove() iff the module
> doesn't have shutdown(). This would require changing
> drivers/base/power/shutdown.c device_shutdown() to include ... else if
> (dev->driver && dev->driver->remove) ... As a side-effect it would
> make drivers like the e1000 safe for use with kexec.
Last time I had this conversation it was not wanted to merge
shutdown() and remove() because shutdown is just required to touch
the hardware and not to clean up the kernel data structures. Which
if you machine is in an unstable state already could be an advantage.
But calling shutdown on device remove is perfectly legitimate.
And it should help ensure the code path gets tested.
Ah... Looking more closely there is a method for testing
the shutdown method and the related power management states.
Details are in documentation/power/devices.txt
But in short there is a "detach_state" file for each file
in sysfs. If you do "echo 4 > detach_state" it the shutdown
method should be called on device removal. Other low power
states can be handles the same way.
I still think drivers will want to call their shutdown method
from their remove method if there is any work to do. But at
least there is now a way to test the code path.
Eric
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-07-21 19:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040719181115.86378.qmail@web52302.mail.yahoo.com>
[not found] ` <m1y8le3cto.fsf@ebiederm.dsl.xmission.com>
2004-07-21 1:07 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Mike Snitzer
2004-07-21 19:09 ` [BUG] e1000 on reboot/boot path Eric W. Biederman
2004-07-21 19:40 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).