* [PATCH]: Fix myri10ge NAPI oops & warnings
@ 2007-10-31 21:40 Andrew Gallatin
2007-10-31 21:44 ` Stephen Hemminger
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Gallatin @ 2007-10-31 21:40 UTC (permalink / raw)
To: shemminger; +Cc: jeff, netdev
[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]
When testing the myri10ge driver with 2.6.24-rc1, I found
that the machine crashed under heavy load:
Unable to handle kernel paging request at 0000000000100108 RIP:
[<ffffffff803cc8dd>] net_rx_action+0x11b/0x184
The address corresponds to the list_move_tail() in
netif_rx_complete():
if (unlikely(work == weight))
list_move_tail(&n->poll_list, list);
Eventually, I traced the crashes to calling netif_rx_complete() with
work_done == budget. From looking at other drivers, it appears that
one should only call netif_rx_complete() when work_done < budget.
To fix it, I changed the test in myri10ge_poll() so that it refers
to to work_done rather than looking at the rx ring status. If
work_done is < budget, then that implies we have no more packets to
process. Any races will be resolved by the NIC when the write to
irq_claim is made.
In myri10ge_clean_rx_done(), if we ever exceeded our budget, it would
report a work_done one larger than was acutally done. This is because
the increment was done in the conditional, so work_done would be
incremented regardless of whether or not the test passed or failed.
This would lead to the WARN_ON_ONCE(work > weight); warning in
net_rx_action triggering. I've moved the increment of work_done
inside the loop. Note that this would only be a problem when we had
exceeded our budget.
Signed off by: Andrew Gallatin <gallatin@myri.com>
Andrew Gallatin Myricom Inc
[-- Attachment #2: myri10ge-napi-oops.diff --]
[-- Type: text/plain, Size: 1360 bytes --]
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 366e62a..0f306dd 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -1151,7 +1151,7 @@ static inline int myri10ge_clean_rx_done
u16 length;
__wsum checksum;
- while (rx_done->entry[idx].length != 0 && work_done++ < budget) {
+ while (rx_done->entry[idx].length != 0 && work_done < budget) {
length = ntohs(rx_done->entry[idx].length);
rx_done->entry[idx].length = 0;
checksum = csum_unfold(rx_done->entry[idx].checksum);
@@ -1167,6 +1167,7 @@ static inline int myri10ge_clean_rx_done
rx_bytes += rx_ok * (unsigned long)length;
cnt++;
idx = cnt & (myri10ge_max_intr_slots - 1);
+ work_done++;
}
rx_done->idx = idx;
rx_done->cnt = cnt;
@@ -1233,13 +1234,12 @@ static int myri10ge_poll(struct napi_str
struct myri10ge_priv *mgp =
container_of(napi, struct myri10ge_priv, napi);
struct net_device *netdev = mgp->dev;
- struct myri10ge_rx_done *rx_done = &mgp->rx_done;
int work_done;
/* process as many rx events as NAPI will allow */
work_done = myri10ge_clean_rx_done(mgp, budget);
- if (rx_done->entry[rx_done->idx].length == 0 || !netif_running(netdev)) {
+ if (work_done < budget || !netif_running(netdev)) {
netif_rx_complete(netdev, napi);
put_be32(htonl(3), mgp->irq_claim);
}
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH]: Fix myri10ge NAPI oops & warnings
2007-10-31 21:40 [PATCH]: Fix myri10ge NAPI oops & warnings Andrew Gallatin
@ 2007-10-31 21:44 ` Stephen Hemminger
2007-10-31 22:54 ` Andrew Gallatin
0 siblings, 1 reply; 3+ messages in thread
From: Stephen Hemminger @ 2007-10-31 21:44 UTC (permalink / raw)
To: Andrew Gallatin; +Cc: jeff, netdev
On Wed, 31 Oct 2007 17:40:06 -0400
Andrew Gallatin <gallatin@myri.com> wrote:
>
> When testing the myri10ge driver with 2.6.24-rc1, I found
> that the machine crashed under heavy load:
>
> Unable to handle kernel paging request at 0000000000100108 RIP:
> [<ffffffff803cc8dd>] net_rx_action+0x11b/0x184
>
> The address corresponds to the list_move_tail() in
> netif_rx_complete():
> if (unlikely(work == weight))
> list_move_tail(&n->poll_list, list);
>
> Eventually, I traced the crashes to calling netif_rx_complete() with
> work_done == budget. From looking at other drivers, it appears that
> one should only call netif_rx_complete() when work_done < budget.
>
> To fix it, I changed the test in myri10ge_poll() so that it refers
> to to work_done rather than looking at the rx ring status. If
> work_done is < budget, then that implies we have no more packets to
> process. Any races will be resolved by the NIC when the write to
> irq_claim is made.
>
> In myri10ge_clean_rx_done(), if we ever exceeded our budget, it would
> report a work_done one larger than was acutally done. This is because
> the increment was done in the conditional, so work_done would be
> incremented regardless of whether or not the test passed or failed.
> This would lead to the WARN_ON_ONCE(work > weight); warning in
> net_rx_action triggering. I've moved the increment of work_done
> inside the loop. Note that this would only be a problem when we had
> exceeded our budget.
>
> Signed off by: Andrew Gallatin <gallatin@myri.com>
>
> Andrew Gallatin Myricom Inc
>
>
Yes, this looks right.
How could the check in netif_rx_complete be changed to catch this better?
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH]: Fix myri10ge NAPI oops & warnings
2007-10-31 21:44 ` Stephen Hemminger
@ 2007-10-31 22:54 ` Andrew Gallatin
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Gallatin @ 2007-10-31 22:54 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: jeff, netdev
Stephen Hemminger wrote:
> On Wed, 31 Oct 2007 17:40:06 -0400
> Andrew Gallatin <gallatin@myri.com> wrote:
>
>> When testing the myri10ge driver with 2.6.24-rc1, I found
>> that the machine crashed under heavy load:
>>
>> Unable to handle kernel paging request at 0000000000100108 RIP:
>> [<ffffffff803cc8dd>] net_rx_action+0x11b/0x184
>>
>> The address corresponds to the list_move_tail() in
>> netif_rx_complete():
>> if (unlikely(work == weight))
>> list_move_tail(&n->poll_list, list);
>>
>> Eventually, I traced the crashes to calling netif_rx_complete() with
>> work_done == budget. From looking at other drivers, it appears that
>> one should only call netif_rx_complete() when work_done < budget.
>>
>> To fix it, I changed the test in myri10ge_poll() so that it refers
>> to to work_done rather than looking at the rx ring status. If
>> work_done is < budget, then that implies we have no more packets to
>> process. Any races will be resolved by the NIC when the write to
>> irq_claim is made.
>>
>> In myri10ge_clean_rx_done(), if we ever exceeded our budget, it would
>> report a work_done one larger than was acutally done. This is because
>> the increment was done in the conditional, so work_done would be
>> incremented regardless of whether or not the test passed or failed.
>> This would lead to the WARN_ON_ONCE(work > weight); warning in
>> net_rx_action triggering. I've moved the increment of work_done
>> inside the loop. Note that this would only be a problem when we had
>> exceeded our budget.
>>
>> Signed off by: Andrew Gallatin <gallatin@myri.com>
>>
>> Andrew Gallatin Myricom Inc
>>
>>
>
> Yes, this looks right.
> How could the check in netif_rx_complete be changed to catch this better?
I'm sorry, but I don't really know. I'm afraid that I am
rather clueless about the new NAPI, and found this by
stumbling around in the dark. Naively, it seems like some
global option to override all napi weights (eg, to 1) would
be helpful so that if a driver had this problem, it could
be easily reproduced by the first packet received. At least
I found setting our weight to one made the bug rather
obvious to me.
Drew
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-10-31 22:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-31 21:40 [PATCH]: Fix myri10ge NAPI oops & warnings Andrew Gallatin
2007-10-31 21:44 ` Stephen Hemminger
2007-10-31 22:54 ` Andrew Gallatin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).