* Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery
[not found] ` <20071107222446.GM4239@austin.ibm.com>
@ 2007-11-07 22:45 ` Kok, Auke
2007-11-07 23:21 ` Linas Vepstas
2007-11-08 1:15 ` Stephen Hemminger
0 siblings, 2 replies; 6+ messages in thread
From: Kok, Auke @ 2007-11-07 22:45 UTC (permalink / raw)
To: Linas Vepstas
Cc: wenxiong, e1000-devel, Brandeburg, Jesse, john.ronciak,
jeffrey.t.kirsher, Jeff Garzik, NetDev,
'Stephen Hemminger'
[adding netdev, jeff G to the Cc]
Linas Vepstas wrote:
> On Wed, Nov 07, 2007 at 01:50:17PM -0800, Kok, Auke wrote:
>> Linas Vepstas wrote:
>>> If a PCI bus error is encountered during device open, the
>>> error recovery routines will attempt to close the device.
>>> If napi has not yet been enabled, the napi disable in the
>>> close will hang.
>>>
>>> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
>>>
>>> ----
>>> The "elegence" of this solution is arguable: one could
>>> say its "better" to perform this check in e1000_down().
>>> However, doing so will disrupt a commonly used path,
>>> whereas here, the hack is in the infrequently used
>>> error path, and thus less intrusive.
>>>
>>> drivers/net/e1000/e1000_main.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> Index: linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000_main.c
>>> ===================================================================
>>> --- linux-2.6.23-rc8-mm1.orig/drivers/net/e1000/e1000_main.c 2007-11-07 15:04:45.000000000 -0600
>>> +++ linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000_main.c 2007-11-07 15:04:52.000000000 -0600
>>> @@ -5268,8 +5268,15 @@ static pci_ers_result_t e1000_io_error_d
>>>
>>> netif_device_detach(netdev);
>>>
>>> - if (netif_running(netdev))
>>> + if (netif_running(netdev)) {
>>> +#ifdef CONFIG_E1000_NAPI
>>> + /* e1000_down will spinlock in napi_disable() if we
>>> + * catch an error before napi_enable() was called. */
>>> + if (test_bit(NAPI_STATE_SCHED, &adapter->napi.state))
>>> + napi_enable(&adapter->napi);
>>> +#endif
>>> e1000_down(adapter);
>>> + }
>>> pci_disable_device(pdev);
>>>
>>> /* Request a slot slot reset. */
>> I think this is OK, but it's quite awful looking if you ask me.
>
> Yeah, ...
>
> There are several alternatives: below are two. If you
> find one to be more appealing.. could you use it? Consider them
> to be "signed-off-by"; I have not actually compiled or tested
> either of them.
I'm not a particular fan of putting extra state tracking in the driver for
something we could extract from the napi subsystem already.
Jeff, Stephen, can't we have a generic napi_enabled() inline in netdevice.h that
tests for NAPI_STATE_SCHED ?
I wonder if there isn't something in the PCI error recovery missing the point and
we can solve this problem better for all drivers somehow.
Auke
>
>
> drivers/net/e1000/e1000_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000_main.c
> ===================================================================
> --- linux-2.6.23-rc8-mm1.orig/drivers/net/e1000/e1000_main.c 2007-11-07 16:11:36.000000000 -0600
> +++ linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000_main.c 2007-11-07 16:15:17.000000000 -0600
> @@ -633,7 +633,11 @@ e1000_down(struct e1000_adapter *adapter
> set_bit(__E1000_DOWN, &adapter->flags);
>
> #ifdef CONFIG_E1000_NAPI
> - napi_disable(&adapter->napi);
> + /* napi_disable() will spinlock if we are in the
> + * pci error recovery path, and caught a pci
> + * error before napi_enable() was called. */
> + if (!test_bit(NAPI_STATE_SCHED, &adapter->napi.state))
> + napi_disable(&adapter->napi);
> #endif
> e1000_irq_disable(adapter);
>
> and here's another:
>
> drivers/net/e1000/e1000.h | 1 +
> drivers/net/e1000/e1000_main.c | 7 ++++++-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000.h
> ===================================================================
> --- linux-2.6.23-rc8-mm1.orig/drivers/net/e1000/e1000.h 2007-09-26 15:06:56.000000000 -0500
> +++ linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000.h 2007-11-07 16:19:16.000000000 -0600
> @@ -301,6 +301,7 @@ struct e1000_adapter {
> struct e1000_rx_ring *rx_ring; /* One per active queue */
> #ifdef CONFIG_E1000_NAPI
> struct napi_struct napi;
> + int napi_enabled;
> struct net_device *polling_netdev; /* One per active queue */
> #endif
> int num_tx_queues;
> Index: linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000_main.c
> ===================================================================
> --- linux-2.6.23-rc8-mm1.orig/drivers/net/e1000/e1000_main.c 2007-11-07 16:16:11.000000000 -0600
> +++ linux-2.6.23-rc8-mm1/drivers/net/e1000/e1000_main.c 2007-11-07 16:22:30.000000000 -0600
> @@ -545,6 +545,7 @@ int e1000_up(struct e1000_adapter *adapt
>
> #ifdef CONFIG_E1000_NAPI
> napi_enable(&adapter->napi);
> + adapter->napi_enabled = 1;
> #endif
> e1000_irq_enable(adapter);
>
> @@ -633,7 +634,10 @@ e1000_down(struct e1000_adapter *adapter
> set_bit(__E1000_DOWN, &adapter->flags);
>
> #ifdef CONFIG_E1000_NAPI
> - napi_disable(&adapter->napi);
> + if (adapter->napi_enabled) {
> + napi_disable(&adapter->napi);
> + adapter->napi_enabled = 0;
> + }
> #endif
> e1000_irq_disable(adapter);
>
> @@ -1437,6 +1441,7 @@ e1000_open(struct net_device *netdev)
>
> #ifdef CONFIG_E1000_NAPI
> napi_enable(&adapter->napi);
> + adapter->napi_enabled = 1;
> #endif
>
> e1000_irq_enable(adapter);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery
2007-11-07 22:45 ` [PATCH 2/2]: e1000: avoid lockup durig error recovery Kok, Auke
@ 2007-11-07 23:21 ` Linas Vepstas
2007-11-09 17:02 ` Ingo Oeser
2007-11-08 1:15 ` Stephen Hemminger
1 sibling, 1 reply; 6+ messages in thread
From: Linas Vepstas @ 2007-11-07 23:21 UTC (permalink / raw)
To: Kok, Auke
Cc: Jeff Garzik, e1000-devel, NetDev, Brandeburg, Jesse,
john.ronciak, jeffrey.t.kirsher, 'Stephen Hemminger',
wenxiong
On Wed, Nov 07, 2007 at 02:45:18PM -0800, Kok, Auke wrote:
> [adding netdev, jeff G to the Cc]
>
> Linas Vepstas wrote:
> > On Wed, Nov 07, 2007 at 01:50:17PM -0800, Kok, Auke wrote:
> >> Linas Vepstas wrote:
> >>> If a PCI bus error is encountered during device open, the
> >>> error recovery routines will attempt to close the device.
> >>> If napi has not yet been enabled, the napi disable in the
> >>> close will hang.
> >>>
> >>> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
> >>>
> >>> ----
> >>> The "elegence" of this solution is arguable: one could
> >>> say its "better" to perform this check in e1000_down().
> >>> However, doing so will disrupt a commonly used path,
> >>> whereas here, the hack is in the infrequently used
> >>> error path, and thus less intrusive.
> >>>
> >>> drivers/net/e1000/e1000_main.c | 9 ++++++++-
> >>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >> I think this is OK, but it's quite awful looking if you ask me.
> >
> > Yeah, ...
> >
> > There are several alternatives: below are two. If you
> > find one to be more appealing.. could you use it? Consider them
> > to be "signed-off-by"; I have not actually compiled or tested
> > either of them.
>
> I'm not a particular fan of putting extra state tracking in the driver for
> something we could extract from the napi subsystem already.
>
> Jeff, Stephen, can't we have a generic napi_enabled() inline in netdevice.h that
> tests for NAPI_STATE_SCHED ?
Like this?
include/linux/netdevice.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
Index: linux-2.6.23-rc8-mm1/include/linux/netdevice.h
===================================================================
--- linux-2.6.23-rc8-mm1.orig/include/linux/netdevice.h 2007-09-26 15:07:05.000000000 -0500
+++ linux-2.6.23-rc8-mm1/include/linux/netdevice.h 2007-11-07 17:14:50.000000000 -0600
@@ -384,6 +384,18 @@ static inline void napi_enable(struct na
clear_bit(NAPI_STATE_SCHED, &n->state);
}
+/**
+ * napi_enabled_p - return non-zero if napi enabled
+ * @n: napi context
+ *
+ * Mnemonic: _p stands for "predicate", returning a yes/no
+ * answer to the question.
+ */
+static inline int napi_enabled_p(struct napi_struct *n)
+{
+ return !test_bit(NAPI_STATE_SCHED, &n->state);
+}
+
/*
* The DEVICE structure.
* Actually, this whole structure is a big mistake. It mixes I/O
> I wonder if there isn't something in the PCI error recovery missing the point and
> we can solve this problem better for all drivers somehow.
Well, there's also scsi, which doesn't use napi :-)
For the most part, error recovery is a fairly cut-n-paste
set of steps. However, I don't quite have enough confidence
to say "yea verily, all network adapters will use these
same steps."
--linas
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery
2007-11-07 22:45 ` [PATCH 2/2]: e1000: avoid lockup durig error recovery Kok, Auke
2007-11-07 23:21 ` Linas Vepstas
@ 2007-11-08 1:15 ` Stephen Hemminger
1 sibling, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2007-11-08 1:15 UTC (permalink / raw)
To: Kok, Auke
Cc: Jeff Garzik, e1000-devel, NetDev, Brandeburg, Jesse,
john.ronciak, jeffrey.t.kirsher, Linas Vepstas, wenxiong
On Wed, 07 Nov 2007 14:45:18 -0800
"Kok, Auke" <auke-jan.h.kok@intel.com> wrote:
> [adding netdev, jeff G to the Cc]
>
> Linas Vepstas wrote:
> > On Wed, Nov 07, 2007 at 01:50:17PM -0800, Kok, Auke wrote:
> >> Linas Vepstas wrote:
> >>> If a PCI bus error is encountered during device open, the
> >>> error recovery routines will attempt to close the device.
> >>> If napi has not yet been enabled, the napi disable in the
> >>> close will hang.
> >>>
> >>> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
> >>>
> >>> ----
> >>> The "elegence" of this solution is arguable: one could
> >>> say its "better" to perform this check in e1000_down().
> >>> However, doing so will disrupt a commonly used path,
> >>> whereas here, the hack is in the infrequently used
> >>> error path, and thus less intrusive.
> >>>
Same problem might be possible in suspend/resume.
The problem with testing napi_enabled is that state can get changed
by racing irq.
--
Stephen Hemminger <shemminger@linux-foundation.org>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery
2007-11-07 23:21 ` Linas Vepstas
@ 2007-11-09 17:02 ` Ingo Oeser
2007-11-09 22:40 ` Linas Vepstas
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Oeser @ 2007-11-09 17:02 UTC (permalink / raw)
To: Linas Vepstas
Cc: Kok, Auke, Jeff Garzik, e1000-devel, NetDev, Brandeburg, Jesse,
john.ronciak, jeffrey.t.kirsher, 'Stephen Hemminger',
wenxiong
Hi Linas,
Linas Vepstas schrieb:
> Index: linux-2.6.23-rc8-mm1/include/linux/netdevice.h
> ===================================================================
> --- linux-2.6.23-rc8-mm1.orig/include/linux/netdevice.h 2007-09-26 15:07:05.000000000 -0500
> +++ linux-2.6.23-rc8-mm1/include/linux/netdevice.h 2007-11-07 17:14:50.000000000 -0600
> @@ -384,6 +384,18 @@ static inline void napi_enable(struct na
> clear_bit(NAPI_STATE_SCHED, &n->state);
> }
>
> +/**
> + * napi_enabled_p - return non-zero if napi enabled
> + * @n: napi context
> + *
> + * Mnemonic: _p stands for "predicate", returning a yes/no
> + * answer to the question.
Call it "is_napi_enabled()" an nobody will ask :-)
> + */
> +static inline int napi_enabled_p(struct napi_struct *n)
And please make it return "bool" instead of "int".
Best Regards
Ingo Oeser
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery
2007-11-09 17:02 ` Ingo Oeser
@ 2007-11-09 22:40 ` Linas Vepstas
2007-11-10 16:55 ` Stephen Hemminger
0 siblings, 1 reply; 6+ messages in thread
From: Linas Vepstas @ 2007-11-09 22:40 UTC (permalink / raw)
To: Ingo Oeser
Cc: Kok, Auke, wenxiong, e1000-devel, Brandeburg, Jesse, john.ronciak,
jeffrey.t.kirsher, Jeff Garzik, NetDev,
'Stephen Hemminger'
On Fri, Nov 09, 2007 at 06:02:34PM +0100, Ingo Oeser wrote:
> Linas Vepstas schrieb:
> > + * napi_enabled_p - return non-zero if napi enabled
> > + *
> > + * Mnemonic: _p stands for "predicate", returning a yes/no
> > + * answer to the question.
>
> Call it "is_napi_enabled()" an nobody will ask :-)
Heh. The suffix _p is standard coding style for lisp/scheme
and first-order logic interpreters. This was my lame attempt
to introduce it to the kernel. I guess that lame duck won't fly.
--linas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery
2007-11-09 22:40 ` Linas Vepstas
@ 2007-11-10 16:55 ` Stephen Hemminger
0 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2007-11-10 16:55 UTC (permalink / raw)
To: Linas Vepstas
Cc: Kok, Auke, Jeff Garzik, e1000-devel, NetDev, Brandeburg, Jesse,
Ingo Oeser, john.ronciak, jeffrey.t.kirsher, wenxiong
On Fri, 9 Nov 2007 16:40:05 -0600
linas@austin.ibm.com (Linas Vepstas) wrote:
> On Fri, Nov 09, 2007 at 06:02:34PM +0100, Ingo Oeser wrote:
> > Linas Vepstas schrieb:
> > > + * napi_enabled_p - return non-zero if napi enabled
> > > + *
> > > + * Mnemonic: _p stands for "predicate", returning a yes/no
> > > + * answer to the question.
> >
> > Call it "is_napi_enabled()" an nobody will ask :-)
>
> Heh. The suffix _p is standard coding style for lisp/scheme
> and first-order logic interpreters. This was my lame attempt
> to introduce it to the kernel. I guess that lame duck won't fly.
>
> --linas
>
NO. it reeks of hungarian notation.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-11-10 16:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20071107211935.GH4239@austin.ibm.com>
[not found] ` <20071107212404.GA15251@austin.ibm.com>
[not found] ` <47323319.10709@intel.com>
[not found] ` <20071107222446.GM4239@austin.ibm.com>
2007-11-07 22:45 ` [PATCH 2/2]: e1000: avoid lockup durig error recovery Kok, Auke
2007-11-07 23:21 ` Linas Vepstas
2007-11-09 17:02 ` Ingo Oeser
2007-11-09 22:40 ` Linas Vepstas
2007-11-10 16:55 ` Stephen Hemminger
2007-11-08 1:15 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).