* [PATCH] ixgb: add PCI Error recovery callbacks
@ 2006-06-29 16:26 Linas Vepstas
2006-07-03 5:49 ` Zhang, Yanmin
0 siblings, 1 reply; 9+ messages in thread
From: Linas Vepstas @ 2006-06-29 16:26 UTC (permalink / raw)
To: Jesse Brandeburg, Ronciak, John, bibo,mao, Rajesh Shah
Cc: Grant Grundler, akpm, linux-kernel, linux-pci, netdev
Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
ixgb device driver. Lightly tested, works.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
drivers/net/ixgb/ixgb_main.c | 112 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 111 insertions(+), 1 deletion(-)
Index: linux-2.6.17-mm3/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.17-mm3.orig/drivers/net/ixgb/ixgb_main.c 2006-06-27 11:39:08.000000000 -0500
+++ linux-2.6.17-mm3/drivers/net/ixgb/ixgb_main.c 2006-06-28 18:04:32.000000000 -0500
@@ -118,15 +118,26 @@ static void ixgb_restore_vlan(struct ixg
static void ixgb_netpoll(struct net_device *dev);
#endif
-/* Exported from other modules */
+static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev,
+ enum pci_channel_state state);
+static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev);
+static void ixgb_io_resume (struct pci_dev *pdev);
+/* Exported from other modules */
extern void ixgb_check_options(struct ixgb_adapter *adapter);
+static struct pci_error_handlers ixgb_err_handler = {
+ .error_detected = ixgb_io_error_detected,
+ .slot_reset = ixgb_io_slot_reset,
+ .resume = ixgb_io_resume,
+};
+
static struct pci_driver ixgb_driver = {
.name = ixgb_driver_name,
.id_table = ixgb_pci_tbl,
.probe = ixgb_probe,
.remove = __devexit_p(ixgb_remove),
+ .err_handler = &ixgb_err_handler
};
MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -1543,6 +1554,11 @@ void
ixgb_update_stats(struct ixgb_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
+ struct pci_dev *pdev = adapter->pdev;
+
+ /* Prevent stats update while adapter is being reset */
+ if (pdev->error_state && pdev->error_state != pci_channel_io_normal)
+ return;
if((netdev->flags & IFF_PROMISC) || (netdev->flags & IFF_ALLMULTI) ||
(netdev->mc_count > IXGB_MAX_NUM_MULTICAST_ADDRESSES)) {
@@ -2198,4 +2214,98 @@ static void ixgb_netpoll(struct net_devi
}
#endif
+/**
+ * ixgb_io_error_detected() - called when PCI error is detected
+ * @pdev pointer to pci device with error
+ * @state pci channel state after error
+ *
+ * This callback is called by the PCI subsystem whenever
+ * a PCI bus error is detected.
+ */
+static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev,
+ enum pci_channel_state state)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct ixgb_adapter *adapter = netdev->priv;
+
+ if(netif_running(netdev))
+ ixgb_down(adapter, TRUE);
+
+ pci_disable_device(pdev);
+
+ /* Request a slot reset. */
+ return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * ixgb_io_slot_reset - called after the pci bus has been reset.
+ * @pdev pointer to pci device with error
+ *
+ * This callback is called after the PCI buss has been reset.
+ * Basically, this tries to restart the card from scratch.
+ * This is a shortened version of the device probe/discovery code,
+ * it resembles the first-half of the ixgb_probe() routine.
+ */
+static pci_ers_result_t ixgb_io_slot_reset (struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct ixgb_adapter *adapter = netdev->priv;
+
+ if(pci_enable_device(pdev)) {
+ DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after reset.\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ /* Perform card reset only on one instance of the card */
+ if (0 != PCI_FUNC (pdev->devfn))
+ return PCI_ERS_RESULT_RECOVERED;
+
+ pci_set_master(pdev);
+
+ netif_carrier_off(netdev);
+ netif_stop_queue(netdev);
+ ixgb_reset(adapter);
+
+ /* Make sure the EEPROM is good */
+ if(!ixgb_validate_eeprom_checksum(&adapter->hw)) {
+ DPRINTK(PROBE, ERR, "After reset, the EEPROM checksum is not valid.\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+ ixgb_get_ee_mac_addr(&adapter->hw, netdev->dev_addr);
+ memcpy(netdev->perm_addr, netdev->dev_addr, netdev->addr_len);
+
+ if(!is_valid_ether_addr(netdev->perm_addr)) {
+ DPRINTK(PROBE, ERR, "After reset, invalid MAC address.\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * ixgb_io_resume - called when its OK to resume normal operations
+ * @pdev pointer to pci device with error
+ *
+ * The error recovery driver tells us that its OK to resume
+ * normal operation. Implementation resembles the second-half
+ * of the ixgb_probe() routine.
+ */
+static void ixgb_io_resume (struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct ixgb_adapter *adapter = netdev->priv;
+
+ pci_set_master(pdev);
+
+ if(netif_running(netdev)) {
+ if(ixgb_up(adapter)) {
+ printk ("ixgb: can't bring device back up after reset\n");
+ return;
+ }
+ }
+
+ netif_device_attach(netdev);
+ mod_timer(&adapter->watchdog_timer, jiffies);
+}
+
/* ixgb_main.c */
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-06-29 16:26 [PATCH] ixgb: add PCI Error recovery callbacks Linas Vepstas
@ 2006-07-03 5:49 ` Zhang, Yanmin
2006-07-05 15:49 ` Auke Kok
0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Yanmin @ 2006-07-03 5:49 UTC (permalink / raw)
To: Linas Vepstas
Cc: Jesse Brandeburg, Ronciak, John, bibo,mao, Rajesh Shah,
Grant Grundler, akpm, LKML, linux-pci maillist, netdev
On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote:
> Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
> ixgb device driver. Lightly tested, works.
>
> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
> +/**
> + * ixgb_io_error_detected() - called when PCI error is detected
> + * @pdev pointer to pci device with error
> + * @state pci channel state after error
> + *
> + * This callback is called by the PCI subsystem whenever
> + * a PCI bus error is detected.
> + */
> +static pci_ers_result_t ixgb_io_error_detected (struct pci_dev *pdev,
> + enum pci_channel_state state)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> + struct ixgb_adapter *adapter = netdev->priv;
> +
> + if(netif_running(netdev))
> + ixgb_down(adapter, TRUE);
> +
> + pci_disable_device(pdev);
> +
> + /* Request a slot reset. */
> + return PCI_ERS_RESULT_NEED_RESET;
> +}
Both pci_disable_device and ixgb_down would access the device. It doesn't
follow Documentation/pci-error-recovery.txt that error_detected shouldn't do
any access to the device.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-07-03 5:49 ` Zhang, Yanmin
@ 2006-07-05 15:49 ` Auke Kok
2006-07-05 19:44 ` Linas Vepstas
0 siblings, 1 reply; 9+ messages in thread
From: Auke Kok @ 2006-07-05 15:49 UTC (permalink / raw)
To: Linas Vepstas
Cc: Zhang, Yanmin, Jesse Brandeburg, Ronciak, John, bibo,mao,
Rajesh Shah, Grant Grundler, akpm, LKML, linux-pci maillist,
netdev
Zhang, Yanmin wrote:
> On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote:
>> Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
>> ixgb device driver. Lightly tested, works.
>
> Both pci_disable_device and ixgb_down would access the device. It doesn't
> follow Documentation/pci-error-recovery.txt that error_detected shouldn't do
> any access to the device.
Moreover, it was Linas who wrote this documentation in the first place :)
Linas, have you tried moving the e1000_down() call into the _reset part? I
suspect that the e1000_reset() in there however may already be sufficient.
Cheers,
Auke
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-07-05 15:49 ` Auke Kok
@ 2006-07-05 19:44 ` Linas Vepstas
2006-07-06 1:21 ` Zhang, Yanmin
0 siblings, 1 reply; 9+ messages in thread
From: Linas Vepstas @ 2006-07-05 19:44 UTC (permalink / raw)
To: Auke Kok
Cc: Zhang, Yanmin, Jesse Brandeburg, Ronciak, John, bibo,mao,
Rajesh Shah, Grant Grundler, akpm, LKML, linux-pci maillist,
netdev, wenxiong
On Wed, Jul 05, 2006 at 08:49:27AM -0700, Auke Kok wrote:
> Zhang, Yanmin wrote:
> >On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote:
> >>Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
> >>ixgb device driver. Lightly tested, works.
> >
> >Both pci_disable_device and ixgb_down would access the device. It doesn't
> >follow Documentation/pci-error-recovery.txt that error_detected shouldn't
> >do
> >any access to the device.
>
> Moreover, it was Linas who wrote this documentation in the first place :)
On the pSeries, its harmless to try to do i/o; the i/o will e blocked.
> Linas, have you tried moving the e1000_down() call into the _reset part? I
> suspect that the e1000_reset() in there however may already be sufficient.
I wanted to perform all of the "down" type functions BEFORE the reset.
The idea is to get the device driver and the various parts of the
Linux kernel into a state that would be consisten with a reset.
I don't want to do these functions after the reset, since, at this
point, the card is a "clean slate"; it has the PCI bars set, but
nothing else. Doing random i/o to it at this point could confuse
the card; instead, one wants to bring the card up using the usual
bringup sequence.
For example, I tipped over one rather confusing bug: new code
in the -mm tree blocks a pci_enable_device() if it thinks the
card is already enabled (even if its not). Doing I/O to a card
that is not enabled will cause either a target abort or a master
abort. Thus, I found I had to call pci_disable_device(); and it
seemed that the best time to do this would be before the reset, not
afterwords. However, I did not play at length with other possibilities.
I recently lost access to my ixgb cards, and so can't do more testing
just right now.
--linas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-07-05 19:44 ` Linas Vepstas
@ 2006-07-06 1:21 ` Zhang, Yanmin
2006-07-06 16:16 ` Linas Vepstas
0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Yanmin @ 2006-07-06 1:21 UTC (permalink / raw)
To: Linas Vepstas
Cc: Auke Kok, Jesse Brandeburg, Ronciak, John, bibo,mao, Rajesh Shah,
Grant Grundler, akpm, LKML, linux-pci maillist, netdev, wenxiong
On Thu, 2006-07-06 at 03:44, Linas Vepstas wrote:
> On Wed, Jul 05, 2006 at 08:49:27AM -0700, Auke Kok wrote:
> > Zhang, Yanmin wrote:
> > >On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote:
> > >>Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
> > >>ixgb device driver. Lightly tested, works.
> > >
> > >Both pci_disable_device and ixgb_down would access the device. It doesn't
> > >follow Documentation/pci-error-recovery.txt that error_detected shouldn't
> > >do
> > >any access to the device.
> >
> > Moreover, it was Linas who wrote this documentation in the first place :)
>
> On the pSeries, its harmless to try to do i/o; the i/o will e blocked.
In the future, we might move the pci error recovery codes to generic to
support other platforms which might not block I/O. So it's better to follow
Documentation/pci-error-recovery.txt when adding error recovery codes into driver.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-07-06 1:21 ` Zhang, Yanmin
@ 2006-07-06 16:16 ` Linas Vepstas
2006-07-06 18:01 ` Auke Kok
0 siblings, 1 reply; 9+ messages in thread
From: Linas Vepstas @ 2006-07-06 16:16 UTC (permalink / raw)
To: Zhang, Yanmin
Cc: Auke Kok, Jesse Brandeburg, Ronciak, John, bibo,mao, Rajesh Shah,
Grant Grundler, akpm, LKML, linux-pci maillist, netdev, wenxiong
On Thu, Jul 06, 2006 at 09:21:39AM +0800, Zhang, Yanmin wrote:
> On Thu, 2006-07-06 at 03:44, Linas Vepstas wrote:
> > On Wed, Jul 05, 2006 at 08:49:27AM -0700, Auke Kok wrote:
> > > Zhang, Yanmin wrote:
> > > >On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote:
> > > >>Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
> > > >>ixgb device driver. Lightly tested, works.
> > > >
> > > >Both pci_disable_device and ixgb_down would access the device. It doesn't
> > > >follow Documentation/pci-error-recovery.txt that error_detected shouldn't
> > > >do
> > > >any access to the device.
> > >
> > > Moreover, it was Linas who wrote this documentation in the first place :)
> >
> > On the pSeries, its harmless to try to do i/o; the i/o will e blocked.
> In the future, we might move the pci error recovery codes to generic to
> support other platforms which might not block I/O. So it's better to follow
> Documentation/pci-error-recovery.txt when adding error recovery codes into driver.
Or we could change the documentation. The point was that doing
unexpected i/o after the aapter reset is likely to wedge the adapter
again, leading to an inf loop of resets. As a practical matter,
I found that, while developing this patch, and the other related
patches, that this was indeed the usual failure mode: incorrect bringup
just lead to more errors.
What I really want to do is to perform as clean a shut-down as possible,
reset the adapter, and then bring it back up. I'm concerned that changing
the order to "reset"-"shutdown-"bringup" would be inappropriate.
Perhaps the right fix is to figure out what parts of the driver do i/o
during shutdown, and then add a line "if(wedged) skip i/o;" to those
places?
--linas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-07-06 16:16 ` Linas Vepstas
@ 2006-07-06 18:01 ` Auke Kok
2006-07-06 18:50 ` Linas Vepstas
0 siblings, 1 reply; 9+ messages in thread
From: Auke Kok @ 2006-07-06 18:01 UTC (permalink / raw)
To: Linas Vepstas
Cc: Zhang, Yanmin, Auke Kok, Jesse Brandeburg, Ronciak, John,
bibo,mao, Rajesh Shah, Grant Grundler, akpm, LKML,
linux-pci maillist, netdev, wenxiong
Linas Vepstas wrote:
> On Thu, Jul 06, 2006 at 09:21:39AM +0800, Zhang, Yanmin wrote:
>> On Thu, 2006-07-06 at 03:44, Linas Vepstas wrote:
>>> On Wed, Jul 05, 2006 at 08:49:27AM -0700, Auke Kok wrote:
>>>> Zhang, Yanmin wrote:
>>>>> On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote:
>>>>>> Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
>>>>>> ixgb device driver. Lightly tested, works.
>>>>> Both pci_disable_device and ixgb_down would access the device. It doesn't
>>>>> follow Documentation/pci-error-recovery.txt that error_detected shouldn't
>>>>> do
>>>>> any access to the device.
>>>> Moreover, it was Linas who wrote this documentation in the first place :)
>>> On the pSeries, its harmless to try to do i/o; the i/o will e blocked.
>> In the future, we might move the pci error recovery codes to generic to
>> support other platforms which might not block I/O. So it's better to follow
>> Documentation/pci-error-recovery.txt when adding error recovery codes into driver.
>
> Or we could change the documentation. The point was that doing
> unexpected i/o after the aapter reset is likely to wedge the adapter
> again, leading to an inf loop of resets. As a practical matter,
> I found that, while developing this patch, and the other related
> patches, that this was indeed the usual failure mode: incorrect bringup
> just lead to more errors.
>
> What I really want to do is to perform as clean a shut-down as possible,
> reset the adapter, and then bring it back up. I'm concerned that changing
> the order to "reset"-"shutdown-"bringup" would be inappropriate.
>
> Perhaps the right fix is to figure out what parts of the driver do i/o
> during shutdown, and then add a line "if(wedged) skip i/o;" to those
> places?
that would be relatively simple if we can check a flag (?) somewhere that
signifies that we've encountered a pci error. We basically only need to skip
out after e1000_reset and bypass e1000_irq_disable in e1000_down() then.
Does the pci error recovery code give us such a flag?
Auke
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-07-06 18:01 ` Auke Kok
@ 2006-07-06 18:50 ` Linas Vepstas
2006-07-06 21:52 ` Linas Vepstas
0 siblings, 1 reply; 9+ messages in thread
From: Linas Vepstas @ 2006-07-06 18:50 UTC (permalink / raw)
To: Auke Kok
Cc: Zhang, Yanmin, Auke Kok, Jesse Brandeburg, Ronciak, John,
bibo,mao, Rajesh Shah, Grant Grundler, akpm, LKML,
linux-pci maillist, netdev, wenxiong
On Thu, Jul 06, 2006 at 11:01:35AM -0700, Auke Kok wrote:
> Linas Vepstas wrote:
> >
> >Perhaps the right fix is to figure out what parts of the driver do i/o
> >during shutdown, and then add a line "if(wedged) skip i/o;" to those
> >places?
>
> that would be relatively simple if we can check a flag (?) somewhere that
> signifies that we've encountered a pci error. We basically only need to
> skip out after e1000_reset and bypass e1000_irq_disable in e1000_down()
> then.
>
> Does the pci error recovery code give us such a flag?
Yes, it was introduced so that drivers could view the state in
an interrupt context. (how this flag is set is platform dependent.)
struct pci_dev {
pci_channel_state error_state;
};
enum pci_channel_state {
/* I/O channel is in normal state */
pci_channel_io_normal,
/* I/O to channel is blocked */
pci_channel_io_frozen,
/* PCI card is dead */
pci_channel_io_perm_failure,
};
Unless I get distracted, I'll provide an e1000 patch shortly ?
--linas
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] ixgb: add PCI Error recovery callbacks
2006-07-06 18:50 ` Linas Vepstas
@ 2006-07-06 21:52 ` Linas Vepstas
0 siblings, 0 replies; 9+ messages in thread
From: Linas Vepstas @ 2006-07-06 21:52 UTC (permalink / raw)
To: Auke Kok
Cc: Zhang, Yanmin, Auke Kok, Jesse Brandeburg, Ronciak, John,
bibo,mao, Rajesh Shah, Grant Grundler, akpm, LKML,
linux-pci maillist, netdev, wenxiong
On Thu, Jul 06, 2006 at 01:50:59PM -0500, Linas Vepstas wrote:
> On Thu, Jul 06, 2006 at 11:01:35AM -0700, Auke Kok wrote:
> > Linas Vepstas wrote:
> > >
> > >Perhaps the right fix is to figure out what parts of the driver do i/o
> > >during shutdown, and then add a line "if(wedged) skip i/o;" to those
> > >places?
> >
> > that would be relatively simple if we can check a flag (?) somewhere that
> > signifies that we've encountered a pci error. We basically only need to
> > skip out after e1000_reset and bypass e1000_irq_disable in e1000_down()
> > then.
> >
> > Does the pci error recovery code give us such a flag?
>
> Yes,
[...]
> Unless I get distracted, I'll provide an e1000 patch shortly ?
I sat down to do this and realized it was a lame idea. If a given
platform cannot tolerate PCI I/O while a PCI channel is hung, then
the plaform should stub out readb()/read()/pci_read_config_word()/etc.
as needed to prevent I/O during the critical stage.
Otherwise, one is trying to chase down all the locations in the driver
that may or may not require I/O to be disabled, which is a hit-or-miss,
mistake-prone operation.
--linas
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-07-06 21:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-29 16:26 [PATCH] ixgb: add PCI Error recovery callbacks Linas Vepstas
2006-07-03 5:49 ` Zhang, Yanmin
2006-07-05 15:49 ` Auke Kok
2006-07-05 19:44 ` Linas Vepstas
2006-07-06 1:21 ` Zhang, Yanmin
2006-07-06 16:16 ` Linas Vepstas
2006-07-06 18:01 ` Auke Kok
2006-07-06 18:50 ` Linas Vepstas
2006-07-06 21:52 ` Linas Vepstas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).