* [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
@ 2005-06-28 23:58 Linas Vepstas
2005-06-29 1:46 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 8+ messages in thread
From: Linas Vepstas @ 2005-06-28 23:58 UTC (permalink / raw)
To: linux-kernel, Benjamin Herrenschmidt, long
Cc: Hidetoshi Seto, Greg KH, ak, Paul Mackerras, linuxppc64-dev,
linux-pci, johnrose
[-- Attachment #1: Type: text/plain, Size: 189 bytes --]
pci-err-4-e100.patch
Adds PCI error recovery callbacks to the Intel E100 ethernet device driver.
Lightly tested on an E100 two-port card.
Signed-off-by: Linas Vepstas <linas@linas.org>
[-- Attachment #2: pci-err-4-e100.patch --]
[-- Type: text/plain, Size: 3505 bytes --]
--- linux-2.6.12-git10/drivers/net/e100.c.linas-orig 2005-06-17 14:48:29.000000000 -0500
+++ linux-2.6.12-git10/drivers/net/e100.c 2005-06-22 17:18:26.000000000 -0500
@@ -2460,6 +2460,67 @@ static void e100_shutdown(struct device
#endif
}
+#ifdef CONFIG_E100_EEH_RECOVERY
+
+/** e100_io_error_detected() is called when PCI error is detected */
+static int e100_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct nic *nic = netdev_priv(netdev);
+
+ mod_timer(&nic->watchdog, jiffies + 30*HZ);
+ e100_down(nic);
+
+ /* Request a slot reset. */
+ return PCIERR_RESULT_NEED_RESET;
+}
+
+/** e100_io_slot_reset is called after the pci bus has been reset.
+ * Restart the card from scratch. */
+static int e100_io_slot_reset (struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct nic *nic = netdev_priv(netdev);
+
+ if(pci_enable_device(pdev)) {
+ printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
+ return PCIERR_RESULT_DISCONNECT;
+ }
+ pci_set_master(pdev);
+
+ /* Only one device per card can do a reset */
+ if (0 != PCI_FUNC (pdev->devfn))
+ return PCIERR_RESULT_RECOVERED;
+
+ e100_hw_reset(nic);
+ e100_phy_init(nic);
+
+ if(e100_hw_init(nic)) {
+ DPRINTK(HW, ERR, "e100_hw_init failed\n");
+ return PCIERR_RESULT_DISCONNECT;
+ }
+
+ return PCIERR_RESULT_RECOVERED;
+}
+
+/** e100_io_resume is called when the error recovery driver
+ * tells us that its OK to resume normal operation.
+ */
+static void e100_io_resume (struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct nic *nic = netdev_priv(netdev);
+
+ /* ack any pending wake events, disable PME */
+ pci_enable_wake(pdev, 0, 0);
+
+ netif_device_attach(netdev);
+ if(netif_running(netdev))
+ e100_open (netdev);
+
+ mod_timer(&nic->watchdog, jiffies);
+}
+#endif /* CONFIG_E100_EEH_RECOVERY */
static struct pci_driver e100_driver = {
.name = DRV_NAME,
@@ -2470,6 +2531,13 @@ static struct pci_driver e100_driver = {
.suspend = e100_suspend,
.resume = e100_resume,
#endif
+#ifdef CONFIG_E100_EEH_RECOVERY
+ .err_handler = {
+ .error_detected = e100_io_error_detected,
+ .slot_reset = e100_io_slot_reset,
+ .resume = e100_io_resume,
+ },
+#endif /* CONFIG_E100_EEH_RECOVERY */
.driver = {
.shutdown = e100_shutdown,
--- linux-2.6.12-git10/drivers/net/Kconfig.linas-orig 2005-06-22 15:26:13.000000000 -0500
+++ linux-2.6.12-git10/drivers/net/Kconfig 2005-06-22 15:28:29.000000000 -0500
@@ -1392,6 +1392,14 @@ config E100
<file:Documentation/networking/net-modules.txt>. The module
will be called e100.
+config E100_EEH_RECOVERY
+ bool "Enable PCI bus error recovery"
+ depends on E100 && PPC_PSERIES
+ help
+ If you say Y here, the driver will be able to recover from
+ PCI bus errors on many PowerPC platforms. IBM pSeries users
+ should answer Y.
+
config LNE390
tristate "Mylex EISA LNE390A/B support (EXPERIMENTAL)"
depends on NET_PCI && EISA && EXPERIMENTAL
--- linux-2.6.12-git10/arch/ppc64/configs/pSeries_defconfig.linas-orig 2005-06-17 14:48:29.000000000 -0500
+++ linux-2.6.12-git10/arch/ppc64/configs/pSeries_defconfig 2005-06-22 15:30:33.000000000 -0500
@@ -545,6 +545,7 @@ CONFIG_PCNET32=y
# CONFIG_DGRS is not set
# CONFIG_EEPRO100 is not set
CONFIG_E100=y
+CONFIG_E100_EEH_RECOVERY=y
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
2005-06-28 23:58 [PATCH 4/13]: PCI Err: e100 ethernet driver recovery Linas Vepstas
@ 2005-06-29 1:46 ` Benjamin Herrenschmidt
2005-06-29 15:59 ` Linas Vepstas
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2005-06-29 1:46 UTC (permalink / raw)
To: Linas Vepstas
Cc: linux-kernel, long, Hidetoshi Seto, Greg KH, ak, Paul Mackerras,
linuxppc64-dev, linux-pci, johnrose
On Tue, 2005-06-28 at 18:58 -0500, Linas Vepstas wrote:
> /** e100_io_error_detected() is called when PCI error is detected */
> +static int e100_io_error_detected (struct pci_dev *pdev, enum
> pci_channel_state state)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> + struct nic *nic = netdev_priv(netdev);
> +
> + mod_timer(&nic->watchdog, jiffies + 30*HZ);
> + e100_down(nic);
> +
> + /* Request a slot reset. */
> + return PCIERR_RESULT_NEED_RESET;
> +}
I'm not sure just "pushing" the watchdog timer to 30sec in the future is
the way to go here. What about netif_stop_queue() or so ?
Ben.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
2005-06-29 1:46 ` Benjamin Herrenschmidt
@ 2005-06-29 15:59 ` Linas Vepstas
2005-06-29 16:58 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Linas Vepstas @ 2005-06-29 15:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-kernel, long, Hidetoshi Seto, Greg KH, ak, Paul Mackerras,
linuxppc64-dev, linux-pci, johnrose
On Wed, Jun 29, 2005 at 11:46:58AM +1000, Benjamin Herrenschmidt was heard to remark:
> On Tue, 2005-06-28 at 18:58 -0500, Linas Vepstas wrote:
> > /** e100_io_error_detected() is called when PCI error is detected */
> > +static int e100_io_error_detected (struct pci_dev *pdev, enum
> > pci_channel_state state)
> > +{
> > + struct net_device *netdev = pci_get_drvdata(pdev);
> > + struct nic *nic = netdev_priv(netdev);
> > +
> > + mod_timer(&nic->watchdog, jiffies + 30*HZ);
> > + e100_down(nic);
> > +
> > + /* Request a slot reset. */
> > + return PCIERR_RESULT_NEED_RESET;
> > +}
>
> I'm not sure just "pushing" the watchdog timer to 30sec in the future is
> the way to go here. What about netif_stop_queue() or so ?
Yep, OK. Pushig the timer would in fact break if the device was marked
perm disabled.
--linas
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
2005-06-29 15:59 ` Linas Vepstas
@ 2005-06-29 16:58 ` Andi Kleen
2005-06-29 23:40 ` Benjamin Herrenschmidt
2005-06-30 20:39 ` PCI Power management (was: " Linas Vepstas
0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2005-06-29 16:58 UTC (permalink / raw)
To: Linas Vepstas
Cc: Benjamin Herrenschmidt, linux-kernel, long, Hidetoshi Seto,
Greg KH, Paul Mackerras, linuxppc64-dev, linux-pci, johnrose
> Yep, OK. Pushig the timer would in fact break if the device was marked
> perm disabled.
I think for network drivers you should just write a generic error handler
(perhaps in net/core/dev.c) that calls the watchdog handler.
Then all drivers could be easily converted without much code duplication.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
2005-06-29 16:58 ` Andi Kleen
@ 2005-06-29 23:40 ` Benjamin Herrenschmidt
2005-06-30 20:39 ` PCI Power management (was: " Linas Vepstas
1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2005-06-29 23:40 UTC (permalink / raw)
To: Andi Kleen
Cc: Linas Vepstas, linux-kernel, long, Hidetoshi Seto, Greg KH,
Paul Mackerras, linuxppc64-dev, linux-pci, johnrose
On Wed, 2005-06-29 at 18:58 +0200, Andi Kleen wrote:
> > Yep, OK. Pushig the timer would in fact break if the device was marked
> > perm disabled.
>
> I think for network drivers you should just write a generic error handler
> (perhaps in net/core/dev.c) that calls the watchdog handler.
> Then all drivers could be easily converted without much code duplication.
Provided the watchdog timer completely reconfigures the device from
reset since the slot will be reset...
Ben.
^ permalink raw reply [flat|nested] 8+ messages in thread
* PCI Power management (was: Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
2005-06-29 16:58 ` Andi Kleen
2005-06-29 23:40 ` Benjamin Herrenschmidt
@ 2005-06-30 20:39 ` Linas Vepstas
2005-06-30 21:07 ` Linas Vepstas
2005-06-30 23:32 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 8+ messages in thread
From: Linas Vepstas @ 2005-06-30 20:39 UTC (permalink / raw)
To: Andi Kleen, sfr
Cc: Benjamin Herrenschmidt, linux-kernel, long, Hidetoshi Seto,
Greg KH, Paul Mackerras, linuxppc64-dev, linux-pci, johnrose,
linux-laptop, mochel, pavel
On Wed, Jun 29, 2005 at 06:58:29PM +0200, Andi Kleen was heard to remark:
> > Yep, OK. Pushig the timer would in fact break if the device was marked
> > perm disabled.
>
> I think for network drivers you should just write a generic error handler
> (perhaps in net/core/dev.c) that calls the watchdog handler.
> Then all drivers could be easily converted without much code duplication.
Well, there's no watchdog per-se in "struct net_device" -- are you
suggesting I add one?
It looks like I can almost create generic handlers for net devices;
looks like calling netdev->stop() is enough to handle the error
detection.
However, a generic bringup would need to call pci_enable_device(),
and net/core/dev.c does not include pci.h so I can't really do it
there. Other than that, a generic recovry routine looks like it might
be possible; I'll have to experiment; its hard to tell by reading code.
This might be the wrong paradigm, though. The pci error recovery
routines are *almost identical* to the power-management suspend/resume
routines. From what I can tell, the only real difference is that
I want to not actually turn off/on the power.
Thus, the right thing to do might be to split up the
struct pci_dev->suspend() and pci_dev->resume() calls into
suspend()
poweroff()
poweron()
resume()
and then have the generic pci error recovery routines call
suspend/resume only, skipping the poweroff-on calls. Does that
sound good?
I'm not sure I can pull this off without having someone from
the power-management world throw a brick at me.
--linas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI Power management (was: Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
2005-06-30 20:39 ` PCI Power management (was: " Linas Vepstas
@ 2005-06-30 21:07 ` Linas Vepstas
2005-06-30 23:32 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 8+ messages in thread
From: Linas Vepstas @ 2005-06-30 21:07 UTC (permalink / raw)
To: Andi Kleen, sfr
Cc: Benjamin Herrenschmidt, linux-kernel, long, Hidetoshi Seto,
Greg KH, Paul Mackerras, linuxppc64-dev, linux-pci, johnrose,
linux-laptop, mochel, pavel
Hm,
Scratch the idea I outline below, seems like its not a good idea.
I'm reading the e100, e1000 and the ixgb power management code, and they
go through all sorts of steps I don't need to do for PCI device reset.
There's no clear abstraction that would serve both needs.
On Thu, Jun 30, 2005 at 03:39:31PM -0500, Linas Vepstas was heard to remark:
> On Wed, Jun 29, 2005 at 06:58:29PM +0200, Andi Kleen was heard to remark:
> > > Yep, OK. Pushig the timer would in fact break if the device was marked
> > > perm disabled.
> >
> > I think for network drivers you should just write a generic error handler
> > (perhaps in net/core/dev.c) that calls the watchdog handler.
> > Then all drivers could be easily converted without much code duplication.
>
> Well, there's no watchdog per-se in "struct net_device" -- are you
> suggesting I add one?
>
> It looks like I can almost create generic handlers for net devices;
> looks like calling netdev->stop() is enough to handle the error
> detection.
>
> However, a generic bringup would need to call pci_enable_device(),
> and net/core/dev.c does not include pci.h so I can't really do it
> there. Other than that, a generic recovry routine looks like it might
> be possible; I'll have to experiment; its hard to tell by reading code.
>
> This might be the wrong paradigm, though. The pci error recovery
> routines are *almost identical* to the power-management suspend/resume
> routines. From what I can tell, the only real difference is that
> I want to not actually turn off/on the power.
>
> Thus, the right thing to do might be to split up the
> struct pci_dev->suspend() and pci_dev->resume() calls into
>
> suspend()
> poweroff()
> poweron()
> resume()
>
> and then have the generic pci error recovery routines call
> suspend/resume only, skipping the poweroff-on calls. Does that
> sound good?
>
> I'm not sure I can pull this off without having someone from
> the power-management world throw a brick at me.
>
> --linas
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI Power management (was: Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery
2005-06-30 20:39 ` PCI Power management (was: " Linas Vepstas
2005-06-30 21:07 ` Linas Vepstas
@ 2005-06-30 23:32 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2005-06-30 23:32 UTC (permalink / raw)
To: Linas Vepstas
Cc: Andi Kleen, sfr, linux-kernel, long, Hidetoshi Seto, Greg KH,
Paul Mackerras, linuxppc64-dev, linux-pci, johnrose, linux-laptop,
mochel, pavel
On Thu, 2005-06-30 at 15:39 -0500, Linas Vepstas wrote:
> Thus, the right thing to do might be to split up the
> struct pci_dev->suspend() and pci_dev->resume() calls into
>
> suspend()
> poweroff()
> poweron()
> resume()
No. There are very good reasons not to do that split at the pci_dev
level.
> and then have the generic pci error recovery routines call
> suspend/resume only, skipping the poweroff-on calls. Does that
> sound good?
>
> I'm not sure I can pull this off without having someone from
> the power-management world throw a brick at me.
Just keep the error recovery callbacks for now, and we might be able to
provide a generic "helper" doing the watchdog thing (yes, there is a
watchdog in the net core)
Ben.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-06-30 23:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-28 23:58 [PATCH 4/13]: PCI Err: e100 ethernet driver recovery Linas Vepstas
2005-06-29 1:46 ` Benjamin Herrenschmidt
2005-06-29 15:59 ` Linas Vepstas
2005-06-29 16:58 ` Andi Kleen
2005-06-29 23:40 ` Benjamin Herrenschmidt
2005-06-30 20:39 ` PCI Power management (was: " Linas Vepstas
2005-06-30 21:07 ` Linas Vepstas
2005-06-30 23:32 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox