* Re: [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) [not found] ` <m1y8le3cto.fsf@ebiederm.dsl.xmission.com> @ 2004-07-21 1:07 ` Mike Snitzer 2004-07-21 19:09 ` [BUG] e1000 on reboot/boot path Eric W. Biederman 2004-07-21 19:40 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Eric W. Biederman 0 siblings, 2 replies; 3+ messages in thread From: Mike Snitzer @ 2004-07-21 1:07 UTC (permalink / raw) To: fastboot, linux-kernel, netdev; +Cc: Eric W. Biederman On 20 Jul 2004 09:42:11 -0600, Eric W. Biederman <ebiederm@xmission.com> wrote: ... > I will take a look at this as soon as I finish the x86-64 port. > Basically I am down to restructuring the generic code so it > does not use init_mm to identity map the reboot_code_buffer, > but instead does something architecture specific. > > Removing the dependence on init_mm is going to be painful > since it touches the generic code. But I really don't see > a choice at this point. > > Once everything works after that change, I think kexec should > be much closer to kernel inclusion. As kexec approaches kernel inclusion one thing to be mindful of is that, prior to kexec'ing, loaded drivers must let go of their PCI resources in order for the newly kexec'd kernel's device drivers to work. Device drivers that don't properly let go of their resources will likely be left in a funky state. I've been bitten by this issue with the e1000 (<= 5.2.52-k4) driver. With an "Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller", device id# 1010, I get: Intel(R) PRO/1000 Network Driver - version 5.2.30.1-k1 Copyright (c) 1999-2004 Intel Corporation. PCI: Enabling device 04:05.0 (0000 -> 0003) Uhhuh. NMI received for unknown reason 31. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? The EEPROM Checksum Is Not Valid PCI: Enabling device 04:05.1 (0000 -> 0003) The EEPROM Checksum Is Not Valid The 5.2.52-k4 has toned down, yet basically the same, errors. This message results when kexec'ing from a 2.6.7 kernel with the e1000 builtin; once I made the e1000 a module and unloaded it prior to kexec'ing all was fine. Looking at the e1000 source it is clear that removing the e1000 module triggers e1000_remove() via module_exit()'s pci_unregister_driver(). Once the e1000 let go of the PCI resources the kexec'd kernel's e1000 driver was happy. Kexec looks to call all loaded modules' shutdown() routine. The e1000 doesn't have shutdown(); but it does have a remove(). In preparation for kexec to be viable for 2.6 inclusion it would appear that some effort will need to go in to making sure all drivers in the kernel actually let go of their resources. Should there be an audit to verify all device drivers have a shutdown()? Another option is to call each module's remove() iff the module doesn't have shutdown(). This would require changing drivers/base/power/shutdown.c device_shutdown() to include ... else if (dev->driver && dev->driver->remove) ... As a side-effect it would make drivers like the e1000 safe for use with kexec. thoughts? Mike ^ permalink raw reply [flat|nested] 3+ messages in thread
* [BUG] e1000 on reboot/boot path. 2004-07-21 1:07 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Mike Snitzer @ 2004-07-21 19:09 ` Eric W. Biederman 2004-07-21 19:40 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Eric W. Biederman 1 sibling, 0 replies; 3+ messages in thread From: Eric W. Biederman @ 2004-07-21 19:09 UTC (permalink / raw) To: Mike Snitzer; +Cc: fastboot, linux-kernel, netdev Mike Snitzer <snitzer@gmail.com> writes: So you have a problem that the e1000 driver does not properly shutdown the e1000 in the reboot path (no code). But it does properly cleanup when you remove the module. > I've been bitten by this issue > with the e1000 (<= 5.2.52-k4) driver. With an "Ethernet controller: > Intel Corp. 82546EB Gigabit Ethernet Controller", device id# 1010, I > get: > > Intel(R) PRO/1000 Network Driver - version 5.2.30.1-k1 > Copyright (c) 1999-2004 Intel Corporation. > PCI: Enabling device 04:05.0 (0000 -> 0003) > Uhhuh. NMI received for unknown reason 31. > Dazed and confused, but trying to continue > Do you have a strange power saving mode enabled? > The EEPROM Checksum Is Not Valid > PCI: Enabling device 04:05.1 (0000 -> 0003) > The EEPROM Checksum Is Not Valid > > The 5.2.52-k4 has toned down, yet basically the same, errors. This > message results when kexec'ing from a 2.6.7 kernel with the e1000 > builtin; once I made the e1000 a module and unloaded it prior to > kexec'ing all was fine. > > Looking at the e1000 source it is clear that removing the e1000 module > triggers e1000_remove() via module_exit()'s pci_unregister_driver(). > Once the e1000 let go of the PCI resources the kexec'd kernel's e1000 > driver was happy. Kexec looks to call all loaded modules' shutdown() > routine. The e1000 doesn't have shutdown(); but it does have a > remove(). > Eric ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) 2004-07-21 1:07 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Mike Snitzer 2004-07-21 19:09 ` [BUG] e1000 on reboot/boot path Eric W. Biederman @ 2004-07-21 19:40 ` Eric W. Biederman 1 sibling, 0 replies; 3+ messages in thread From: Eric W. Biederman @ 2004-07-21 19:40 UTC (permalink / raw) To: Mike Snitzer; +Cc: fastboot, linux-kernel, netdev Mike Snitzer <snitzer@gmail.com> writes: > In preparation for kexec to be viable for 2.6 inclusion it would > appear that some effort will need to go in to making sure all drivers > in the kernel actually let go of their resources. Should there be an > audit to verify all device drivers have a shutdown()? Right this is a janitorial type task that needs to done. - First there are a lot of devices that don't need a shutdown method. Mostly either because they are not stateful or their initialization method can bring them out of whatever state they are in. Although if the kexec stuff gets in this might just happen with a pile of bug reports like yours. > Another option is to call each module's remove() iff the module > doesn't have shutdown(). This would require changing > drivers/base/power/shutdown.c device_shutdown() to include ... else if > (dev->driver && dev->driver->remove) ... As a side-effect it would > make drivers like the e1000 safe for use with kexec. Last time I had this conversation it was not wanted to merge shutdown() and remove() because shutdown is just required to touch the hardware and not to clean up the kernel data structures. Which if you machine is in an unstable state already could be an advantage. But calling shutdown on device remove is perfectly legitimate. And it should help ensure the code path gets tested. Ah... Looking more closely there is a method for testing the shutdown method and the related power management states. Details are in documentation/power/devices.txt But in short there is a "detach_state" file for each file in sysfs. If you do "echo 4 > detach_state" it the shutdown method should be called on device removal. Other low power states can be handles the same way. I still think drivers will want to call their shutdown method from their remove method if there is any work to do. But at least there is now a way to test the code path. Eric ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-07-21 19:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040719181115.86378.qmail@web52302.mail.yahoo.com>
[not found] ` <m1y8le3cto.fsf@ebiederm.dsl.xmission.com>
2004-07-21 1:07 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Mike Snitzer
2004-07-21 19:09 ` [BUG] e1000 on reboot/boot path Eric W. Biederman
2004-07-21 19:40 ` [Fastboot] [ANNOUNCE] [PATCH] 2.6.8-rc1-kexec1 (ppc & x86) Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).