* 2.6.29 forcedeth hang W/O NAPI enabled
@ 2009-03-24 15:28 Mr. Berkley Shands
0 siblings, 0 replies; 11+ messages in thread
From: Mr. Berkley Shands @ 2009-03-24 15:28 UTC (permalink / raw)
To: netdev; +Cc: Mark A. Bober, Lloyd, Dave
Another new kernel, another interesting lock up. Centos 5.2, X86_84 on
an opteron 8GB 4 cores (275 X 2).
If CONFIG_FORCEDETH_NAPI is not enabled, then within 60 seconds of the
console login prompt
appearing, the network becomes unresponsive. packets are seen to appear
according to ifconfig eth0
and with ethtool -S eth0, but they go nowhere. NFS stops, ping stops,
logins stop, ldap stops.
My network is class B, netmask 255.255.0.0, and the department router is
directly connected
under this netmask. If I re-compile the forcedeth.ko with NAPI enabled,
then reinstall it,
and depmod -aq then
service network stop; rmmod forcedeth; modprobe forcedeth; service
network start
brings everything back online eventually. This was not an issue with
2.6.28-8 or before.
Berkley
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
@ 2009-03-25 23:24 Adam Richter
2009-03-26 0:05 ` David Miller
0 siblings, 1 reply; 11+ messages in thread
From: Adam Richter @ 2009-03-25 23:24 UTC (permalink / raw)
To: netdev; +Cc: berkley
I am experiencing what is probably the same forcedeth ethernet
hang with FORCEDETH_NAPI disabled as reported by Berkley Shands. I
want to add the following additional data (items 2-7 basically just
confirm what one would expect):
1) I can narrow where the problem was introduced. The problem
does not occur for me in 2.6.29-rc8-git6, the last git snapshot
before 2.6.29. There are no changes to forcedeth.c between
these versions.
2) The amount of time it takes to reproduce the problem seems
to depend on networking utilization. I can reproduce the
problem in about 30 seconds by doing "ping -f" to a
computer on my local ethernet for about one minute.
Otherwise, my computer, which normally does not do much
network communication takes about an hour to exhibit the
problem.
3) I can recover by doing "rmmod forcedeth ; modprobe forcedeth"
even without recompiling with NAPI enabled, but the
problem seems to recur more quickly, until reloading the
forcedeth module no longer seems to work. (I infer from
Berkley Shands' message that reloading the module
recompiled with NAPI enabled will cause the problem not
to recur.)
4) Given that this looks like a NAPI problem, it should come
as no surprise that ethernet transmit still works when the
problem is occuring. I know this because I can run ping
from the effected machine to a target machine running
tcpdump, and the target machine sees the ping packets.
5) When the problem occurs, "ifconfig eth0" reports a gradually
increasing count of "RX packets" (I assume from random
broadcast packets originating elsewhere on the local
ethernet), and no obvious signs of trouble:
RX packets:2092 errors:0 dropped:0 overruns:0 frame:0
TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:177338 (173.1 KB) TX bytes:6732 (6.5 KB)
6) No complaints on the kernel console appear when
ethernet receive stops working.
7) When the problem occurs, the other functions of the
computer apparently continue to work fine. In particular,
I can reboot the computer from a user program without
incident.
When I can find some time, I plan to try to narrow the problem
with git bisect, but that may not be today.
Adam Richter
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-25 23:24 2.6.29 forcedeth hang W/O NAPI enabled Adam Richter
@ 2009-03-26 0:05 ` David Miller
2009-03-26 1:20 ` Adam Richter
0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2009-03-26 0:05 UTC (permalink / raw)
To: adam_richter2004; +Cc: netdev, berkley
From: Adam Richter <adam_richter2004@yahoo.com>
Date: Wed, 25 Mar 2009 16:24:47 -0700 (PDT)
> When I can find some time, I plan to try to narrow the problem
> with git bisect, but that may not be today.
We're pretty sure we know exactly what commit causes this.
Can you try playing with a patch Jarek P. just posted in
the thread where this bug is being discussed? (Subject:
Revert "gro: Fix legacy path napi_complete crash"):
diff --git a/net/core/dev.c b/net/core/dev.c
index e3fe5c7..cf53c24 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2589,7 +2589,11 @@ static int process_backlog(struct napi_struct *napi, int quota)
skb = __skb_dequeue(&queue->input_pkt_queue);
if (!skb) {
local_irq_enable();
- napi_complete(napi);
+ napi_gro_flush(napi);
+ local_irq_disable();
+ if (skb_queue_empty(&queue->input_pkt_queue))
+ __napi_complete(napi);
+ local_irq_enable();
goto out;
}
local_irq_enable();
^ permalink raw reply related [flat|nested] 11+ messages in thread
* 2.6.29 forcedeth hang W/O NAPI enabled
@ 2009-03-26 0:06 Adam Richter
2009-03-26 0:08 ` David Miller
0 siblings, 1 reply; 11+ messages in thread
From: Adam Richter @ 2009-03-26 0:06 UTC (permalink / raw)
To: netdev; +Cc: berkley
In addition to seeing the problem with CONFIG_FORCEDETH_NAPI disabled,
I have now reproduced the problem with that configuration option enabled. So, NAPI might not be the problem at all.
Adam Richter
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-26 0:06 Adam Richter
@ 2009-03-26 0:08 ` David Miller
0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2009-03-26 0:08 UTC (permalink / raw)
To: adam_richter2004; +Cc: netdev, berkley
From: Adam Richter <adam_richter2004@yahoo.com>
Date: Wed, 25 Mar 2009 17:06:03 -0700 (PDT)
> In addition to seeing the problem with CONFIG_FORCEDETH_NAPI
> disabled, I have now reproduced the problem with that configuration
> option enabled. So, NAPI might not be the problem at all.
Adam, first of all, you don't need to subscribe then unsubscribe
from this mailing list just to post your report. I just saw you
do that.
We're not nazis and you can post to this mailing list without being a
member. :-)
Second of all, please look at my reply to your original report,
there is a patch there for you to test.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-26 0:05 ` David Miller
@ 2009-03-26 1:20 ` Adam Richter
2009-03-26 3:14 ` David Miller
2009-03-26 3:36 ` Herbert Xu
0 siblings, 2 replies; 11+ messages in thread
From: Adam Richter @ 2009-03-26 1:20 UTC (permalink / raw)
To: David Miller; +Cc: netdev, berkley
Hello David,
The patch you forwarded to me seems to work. Thank you for bringing it
to my attention.
In particular linux-2.6.29 with your patch applied with CONFIG_FORCEDETH_NAPI disabled has been doing "ping -f" on the same
local computer I was tested with before for more than 5 minutes with
no problem. I am able to surf the web and compose this email in the
meantime. In the past, "ping -f" with NAPI disabled would produce the problem within about 30 seconds (with NAPI enabled, it did not seem to reproduce the problem, but I am pretty sure that the problem did happen once with with forcedeth module compiled with NAPI, although my quick
efforts to reproduce the problem that way did not immediately succeed).
I have appended the patch you sent, just to be clear, since there are a few patches being discussed in the thread you referred to. If I have anything else to add, I will follow-up in that thread.
Thank you for your help!
Adam
--- On Wed, 3/25/09, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Subject: Re: 2.6.29 forcedeth hang W/O NAPI enabled
> To: adam_richter2004@yahoo.com
> Cc: netdev@vger.kernel.org, berkley@cs.wustl.edu
> Date: Wednesday, March 25, 2009, 5:05 PM
> From: Adam Richter <adam_richter2004@yahoo.com>
> Date: Wed, 25 Mar 2009 16:24:47 -0700 (PDT)
>
> > When I can find some time, I plan to try to narrow
> the problem
> > with git bisect, but that may not be today.
>
> We're pretty sure we know exactly what commit causes
> this.
>
> Can you try playing with a patch Jarek P. just posted in
> the thread where this bug is being discussed? (Subject:
> Revert "gro: Fix legacy path napi_complete
> crash"):
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e3fe5c7..cf53c24 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2589,7 +2589,11 @@ static int process_backlog(struct
> napi_struct *napi, int quota)
> skb = __skb_dequeue(&queue->input_pkt_queue);
> if (!skb) {
> local_irq_enable();
> - napi_complete(napi);
> + napi_gro_flush(napi);
> + local_irq_disable();
> + if (skb_queue_empty(&queue->input_pkt_queue))
> + __napi_complete(napi);
> + local_irq_enable();
> goto out;
> }
> local_irq_enable();
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-26 1:20 ` Adam Richter
@ 2009-03-26 3:14 ` David Miller
2009-03-26 3:36 ` Herbert Xu
1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2009-03-26 3:14 UTC (permalink / raw)
To: adam_richter2004; +Cc: netdev, berkley
From: Adam Richter <adam_richter2004@yahoo.com>
Date: Wed, 25 Mar 2009 18:20:22 -0700 (PDT)
> The patch you forwarded to me seems to work.
Thanks for testing.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-26 1:20 ` Adam Richter
2009-03-26 3:14 ` David Miller
@ 2009-03-26 3:36 ` Herbert Xu
2009-03-26 5:24 ` Adam Richter
1 sibling, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2009-03-26 3:36 UTC (permalink / raw)
To: adam_richter2004; +Cc: davem, netdev, berkley
Adam Richter <adam_richter2004@yahoo.com> wrote:
>
> The patch you forwarded to me seems to work. Thank you for bringing it
> to my attention.
Hi Adam:
Any chance you can test this patch instead of the previous one?
net: Fix netpoll lockup in legacy receive path
When I fixed the GRO crash in the legacy receive path I used
napi_complete to replace __napi_complete. Unfortunately they're
not the same when NETPOLL is enabled, which may result in us
not calling __napi_complete at all.
What's more, we really do need to keep the __napi_complete call
within the IRQ-off section since in theory an IRQ can occur in
between and fill up the backlog to the maximum, causing us to
lock up.
This patch fixes this by essentially open-coding __napi_complete.
Note we no longer need the memory barrier because this function
is per-cpu.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/net/core/dev.c b/net/core/dev.c
index e3fe5c7..2a7f6b3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2588,9 +2588,10 @@ static int process_backlog(struct napi_struct *napi, int quota)
local_irq_disable();
skb = __skb_dequeue(&queue->input_pkt_queue);
if (!skb) {
+ list_del(&napi->poll_list);
+ clear_bit(NAPI_STATE_SCHED, &napi->state);
local_irq_enable();
- napi_complete(napi);
- goto out;
+ break;
}
local_irq_enable();
@@ -2599,7 +2600,6 @@ static int process_backlog(struct napi_struct *napi, int quota)
napi_gro_flush(napi);
-out:
return work;
}
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-26 3:36 ` Herbert Xu
@ 2009-03-26 5:24 ` Adam Richter
2009-03-26 6:58 ` Herbert Xu
0 siblings, 1 reply; 11+ messages in thread
From: Adam Richter @ 2009-03-26 5:24 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, netdev, berkley
--- On Wed, 3/25/09, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Hi Adam:
>
> Any chance you can test this patch instead of the previous
> one?
OK. I tried your patch. It seems fine. I was able to do "ping -f" for five minutes with no problems, and surf the web during that time.
I preserving the rest of your message below, just to be clear about which patch I tested.
Adam
>
> net: Fix netpoll lockup in legacy receive path
>
> When I fixed the GRO crash in the legacy receive path I
> used
> napi_complete to replace __napi_complete. Unfortunately
> they're
> not the same when NETPOLL is enabled, which may result in
> us
> not calling __napi_complete at all.
>
> What's more, we really do need to keep the
> __napi_complete call
> within the IRQ-off section since in theory an IRQ can occur
> in
> between and fill up the backlog to the maximum, causing us
> to
> lock up.
>
> This patch fixes this by essentially open-coding
> __napi_complete.
>
> Note we no longer need the memory barrier because this
> function
> is per-cpu.
>
> Signed-off-by: Herbert Xu
> <herbert@gondor.apana.org.au>
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e3fe5c7..2a7f6b3 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2588,9 +2588,10 @@ static int process_backlog(struct
> napi_struct *napi, int quota)
> local_irq_disable();
> skb = __skb_dequeue(&queue->input_pkt_queue);
> if (!skb) {
> + list_del(&napi->poll_list);
> + clear_bit(NAPI_STATE_SCHED, &napi->state);
> local_irq_enable();
> - napi_complete(napi);
> - goto out;
> + break;
> }
> local_irq_enable();
>
> @@ -2599,7 +2600,6 @@ static int process_backlog(struct
> napi_struct *napi, int quota)
>
> napi_gro_flush(napi);
>
> -out:
> return work;
> }
>
> Thanks,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu 许志壬
> <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/�herbert/
> PGP Key: http://gondor.apana.org.au/�herbert/pubkey.txt
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-26 5:24 ` Adam Richter
@ 2009-03-26 6:58 ` Herbert Xu
2009-03-26 23:29 ` Adam Richter
0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2009-03-26 6:58 UTC (permalink / raw)
To: Adam Richter; +Cc: davem, netdev, berkley
On Wed, Mar 25, 2009 at 10:24:42PM -0700, Adam Richter wrote:
>
> OK. I tried your patch. It seems fine. I was able to do "ping -f" for five minutes with no problems, and surf the web during that time.
Thanks for testing! Since you also have a forcedeth card, could
you try rebuilding your kernel with FORCEDETH_NAPI enabled and
see if that is stable?
> I preserving the rest of your message below, just to be clear about which patch I tested.
Yep it's the right one.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.29 forcedeth hang W/O NAPI enabled
2009-03-26 6:58 ` Herbert Xu
@ 2009-03-26 23:29 ` Adam Richter
0 siblings, 0 replies; 11+ messages in thread
From: Adam Richter @ 2009-03-26 23:29 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, netdev, berkley
On Wed, 3/25/09, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Thanks for testing! Since you also have a forcedeth card,
> could
> you try rebuilding your kernel with FORCEDETH_NAPI enabled
> and
> see if that is stable?
It works so far, doing "ping -f" for more than five minutes while I
browse the web, but bear in mind that I was not able to reproduce the
problem earlier with FORCEDETH_NAPI on demand. I only saw an ethernet lock up with NAPI enabled once (with the original 2.6.29 code). I
will let you know if the lockup occurs with your code.
Just to be clear which version I am now testing, I have attached a
copy of process_backlog() from my net/core/dev.c.
Adam
static int process_backlog(struct napi_struct *napi, int quota)
{
int work = 0;
struct softnet_data *queue = &__get_cpu_var(softnet_data);
unsigned long start_time = jiffies;
napi->weight = weight_p;
do {
struct sk_buff *skb;
local_irq_disable();
skb = __skb_dequeue(&queue->input_pkt_queue);
if (!skb) {
list_del(&napi->poll_list);
clear_bit(NAPI_STATE_SCHED, &napi->state);
local_irq_enable();
break;
}
local_irq_enable();
napi_gro_receive(napi, skb);
} while (++work < quota && jiffies == start_time);
napi_gro_flush(napi);
return work;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-03-26 23:29 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-25 23:24 2.6.29 forcedeth hang W/O NAPI enabled Adam Richter
2009-03-26 0:05 ` David Miller
2009-03-26 1:20 ` Adam Richter
2009-03-26 3:14 ` David Miller
2009-03-26 3:36 ` Herbert Xu
2009-03-26 5:24 ` Adam Richter
2009-03-26 6:58 ` Herbert Xu
2009-03-26 23:29 ` Adam Richter
-- strict thread matches above, loose matches on Subject: below --
2009-03-26 0:06 Adam Richter
2009-03-26 0:08 ` David Miller
2009-03-24 15:28 Mr. Berkley Shands
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).