* [PATCHv1 net] xen-netfront: use napi_complete() correctly to prevent Rx stalling @ 2014-12-16 18:59 David Vrabel 2014-12-16 20:22 ` David Miller 2014-12-20 0:23 ` virtio_net: Fix napi poll list corruption Herbert Xu 0 siblings, 2 replies; 23+ messages in thread From: David Vrabel @ 2014-12-16 18:59 UTC (permalink / raw) To: netdev Cc: David Vrabel, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, Eric Dumazet After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) the napi instance is removed from the per-cpu list prior to calling the n->poll(), and is only requeued if all of the budget was used. This inadvertently broke netfront because netfront does not use NAPI correctly. If netfront had not used all of its budget it would do a final check for any Rx responses and avoid calling napi_complete() if there were more responses. It would still return under budget so it would never be rescheduled. The final check would also not re-enable the Rx interrupt. Additionally, xenvif_poll() would also call napi_complete() /after/ enabling the interrupt. This resulted in a race between the napi_complete() and the napi_schedule() in the interrupt handler. The use of local_irq_save/restore() avoided by race iff the handler is running on the same CPU but not if it was running on a different CPU. Fix both of these by always calling napi_compete() if the budget was not all used, and then calling napi_schedule() if the final checks says there's more work. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Eric Dumazet <edumazet@google.com> --- drivers/net/xen-netfront.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 2f0a9ce..22bcb4e 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -977,7 +977,6 @@ static int xennet_poll(struct napi_struct *napi, int budget) struct sk_buff_head rxq; struct sk_buff_head errq; struct sk_buff_head tmpq; - unsigned long flags; int err; spin_lock(&queue->rx_lock); @@ -1050,15 +1049,11 @@ err: if (work_done < budget) { int more_to_do = 0; - napi_gro_flush(napi, false); - - local_irq_save(flags); + napi_complete(napi); RING_FINAL_CHECK_FOR_RESPONSES(&queue->rx, more_to_do); - if (!more_to_do) - __napi_complete(napi); - - local_irq_restore(flags); + if (more_to_do) + napi_schedule(napi); } spin_unlock(&queue->rx_lock); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCHv1 net] xen-netfront: use napi_complete() correctly to prevent Rx stalling 2014-12-16 18:59 [PATCHv1 net] xen-netfront: use napi_complete() correctly to prevent Rx stalling David Vrabel @ 2014-12-16 20:22 ` David Miller 2014-12-20 0:23 ` virtio_net: Fix napi poll list corruption Herbert Xu 1 sibling, 0 replies; 23+ messages in thread From: David Miller @ 2014-12-16 20:22 UTC (permalink / raw) To: david.vrabel; +Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: David Vrabel <david.vrabel@citrix.com> Date: Tue, 16 Dec 2014 18:59:46 +0000 > After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt > masking in NAPI) the napi instance is removed from the per-cpu list > prior to calling the n->poll(), and is only requeued if all of the > budget was used. This inadvertently broke netfront because netfront > does not use NAPI correctly. > > If netfront had not used all of its budget it would do a final check > for any Rx responses and avoid calling napi_complete() if there were > more responses. It would still return under budget so it would never > be rescheduled. The final check would also not re-enable the Rx > interrupt. > > Additionally, xenvif_poll() would also call napi_complete() /after/ > enabling the interrupt. This resulted in a race between the > napi_complete() and the napi_schedule() in the interrupt handler. The > use of local_irq_save/restore() avoided by race iff the handler is > running on the same CPU but not if it was running on a different CPU. > > Fix both of these by always calling napi_compete() if the budget was > not all used, and then calling napi_schedule() if the final checks > says there's more work. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> Applied, thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* virtio_net: Fix napi poll list corruption 2014-12-16 18:59 [PATCHv1 net] xen-netfront: use napi_complete() correctly to prevent Rx stalling David Vrabel 2014-12-16 20:22 ` David Miller @ 2014-12-20 0:23 ` Herbert Xu 2014-12-20 0:36 ` net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu ` (3 more replies) 1 sibling, 4 replies; 23+ messages in thread From: Herbert Xu @ 2014-12-20 0:23 UTC (permalink / raw) To: David Vrabel Cc: netdev, david.vrabel, xen-devel, konrad.wilk, boris.ostrovsky, edumazet, David S. Miller David Vrabel <david.vrabel@citrix.com> wrote: > After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt > masking in NAPI) the napi instance is removed from the per-cpu list > prior to calling the n->poll(), and is only requeued if all of the > budget was used. This inadvertently broke netfront because netfront > does not use NAPI correctly. A similar bug exists in virtio_net. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) breaks virtio_net in an insidious way. It is now required that if the entire budget is consumed when poll returns, the napi poll_list must remain empty. However, like some other drivers virtio_net tries to do a last-ditch check and if there is more work it will call napi_schedule and then immediately process some of this new work. Should the entire budget be consumed while processing such new work then we will violate the new caller contract. This patch fixes this by not touching any work when we reschedule in virtio_net. The worst part of this bug is that the list corruption causes other napi users to be moved off-list. In my case I was chasing a stall in IPsec (IPsec uses netif_rx) and I only belatedly realised that it was virtio_net which caused the stall even though the virtio_net poll was still functioning perfectly after IPsec stalled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b8bd719..5ca9771 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -760,7 +760,6 @@ static int virtnet_poll(struct napi_struct *napi, int budget) container_of(napi, struct receive_queue, napi); unsigned int r, received = 0; -again: received += virtnet_receive(rq, budget - received); /* Out of packets? */ @@ -771,7 +770,6 @@ again: napi_schedule_prep(napi)) { virtqueue_disable_cb(rq->vq); __napi_schedule(napi); - goto again; } } Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 23+ messages in thread
* net: Detect drivers that reschedule NAPI and exhaust budget 2014-12-20 0:23 ` virtio_net: Fix napi poll list corruption Herbert Xu @ 2014-12-20 0:36 ` Herbert Xu 2014-12-20 1:34 ` Eric Dumazet 2014-12-22 8:18 ` virtio_net: Fix napi poll list corruption Jason Wang ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2014-12-20 0:36 UTC (permalink / raw) To: David Vrabel Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet, David S. Miller On Sat, Dec 20, 2014 at 11:23:27AM +1100, Herbert Xu wrote: > > A similar bug exists in virtio_net. In order to detect other drivers doing this we should add something like this. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) required drivers to leave poll_list empty if the entire budget is consumed. We have already had two broken drivers so let's add a check for this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/net/core/dev.c b/net/core/dev.c index f411c28..88f9725 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h) */ napi_gro_flush(n, HZ >= 1000); } - list_add_tail(&n->poll_list, &repoll); + /* Some drivers may have called napi_schedule + * prior to exhausting their budget. + */ + if (!WARN_ON_ONCE(!list_empty(&n->poll_list))) + list_add_tail(&n->poll_list, &repoll); } } Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: net: Detect drivers that reschedule NAPI and exhaust budget 2014-12-20 0:36 ` net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu @ 2014-12-20 1:34 ` Eric Dumazet 2014-12-20 2:40 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Eric Dumazet @ 2014-12-20 1:34 UTC (permalink / raw) To: Herbert Xu Cc: David Vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet, David S. Miller On Sat, 2014-12-20 at 11:36 +1100, Herbert Xu wrote: > On Sat, Dec 20, 2014 at 11:23:27AM +1100, Herbert Xu wrote: > > > > A similar bug exists in virtio_net. > > In order to detect other drivers doing this we should add something > like this. > > -- >8 -- > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) required drivers to leave poll_list > empty if the entire budget is consumed. > > We have already had two broken drivers so let's add a check for > this. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/net/core/dev.c b/net/core/dev.c > index f411c28..88f9725 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h) > */ > napi_gro_flush(n, HZ >= 1000); > } > - list_add_tail(&n->poll_list, &repoll); > + /* Some drivers may have called napi_schedule > + * prior to exhausting their budget. > + */ > + if (!WARN_ON_ONCE(!list_empty(&n->poll_list))) > + list_add_tail(&n->poll_list, &repoll); > } > } > I do not think stack trace will point to the buggy driver ? IMO it would be better to print a single line with the netdev name ? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: net: Detect drivers that reschedule NAPI and exhaust budget 2014-12-20 1:34 ` Eric Dumazet @ 2014-12-20 2:40 ` David Miller 2014-12-20 6:55 ` Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2014-12-20 2:40 UTC (permalink / raw) To: eric.dumazet Cc: herbert, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 19 Dec 2014 17:34:48 -0800 >> @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h) >> */ >> napi_gro_flush(n, HZ >= 1000); >> } >> - list_add_tail(&n->poll_list, &repoll); >> + /* Some drivers may have called napi_schedule >> + * prior to exhausting their budget. >> + */ >> + if (!WARN_ON_ONCE(!list_empty(&n->poll_list))) >> + list_add_tail(&n->poll_list, &repoll); >> } >> } >> > > I do not think stack trace will point to the buggy driver ? > > IMO it would be better to print a single line with the netdev name ? Right, we are already back from the poll routine and will just end up seeing the call trace leading to the software interrupt. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: net: Detect drivers that reschedule NAPI and exhaust budget 2014-12-20 2:40 ` David Miller @ 2014-12-20 6:55 ` Herbert Xu 2014-12-20 18:00 ` Eric Dumazet 0 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2014-12-20 6:55 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet On Fri, Dec 19, 2014 at 09:40:00PM -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Fri, 19 Dec 2014 17:34:48 -0800 > > >> @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h) > >> */ > >> napi_gro_flush(n, HZ >= 1000); > >> } > >> - list_add_tail(&n->poll_list, &repoll); > >> + /* Some drivers may have called napi_schedule > >> + * prior to exhausting their budget. > >> + */ > >> + if (!WARN_ON_ONCE(!list_empty(&n->poll_list))) > >> + list_add_tail(&n->poll_list, &repoll); > >> } > >> } > >> > > > > I do not think stack trace will point to the buggy driver ? > > > > IMO it would be better to print a single line with the netdev name ? > > Right, we are already back from the poll routine and will just end > up seeing the call trace leading to the software interrupt. Good point Eric. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) required drivers to leave poll_list empty if the entire budget is consumed. We have already had two broken drivers so let's add a check for this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/net/core/dev.c b/net/core/dev.c index f411c28..47fdc5c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4620,7 +4620,13 @@ static void net_rx_action(struct softirq_action *h) */ napi_gro_flush(n, HZ >= 1000); } - list_add_tail(&n->poll_list, &repoll); + /* Some drivers may have called napi_schedule + * prior to exhausting their budget. + */ + if (unlikely(!list_empty(&n->poll_list))) + pr_warn("%s: Budget exhausted after napi rescheduled\n", n->dev ? n->dev->name : "backlog"); + else + list_add_tail(&n->poll_list, &repoll); } } Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: net: Detect drivers that reschedule NAPI and exhaust budget 2014-12-20 6:55 ` Herbert Xu @ 2014-12-20 18:00 ` Eric Dumazet 2014-12-20 20:14 ` [0/4] net: net_rx_action fixes and clean-ups Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: Eric Dumazet @ 2014-12-20 18:00 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet On Sat, 2014-12-20 at 17:55 +1100, Herbert Xu wrote: > -- >8 -- > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) required drivers to leave poll_list > empty if the entire budget is consumed. > > We have already had two broken drivers so let's add a check for > this. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/net/core/dev.c b/net/core/dev.c > index f411c28..47fdc5c 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4620,7 +4620,13 @@ static void net_rx_action(struct softirq_action *h) > */ > napi_gro_flush(n, HZ >= 1000); > } > - list_add_tail(&n->poll_list, &repoll); > + /* Some drivers may have called napi_schedule > + * prior to exhausting their budget. > + */ > + if (unlikely(!list_empty(&n->poll_list))) > + pr_warn("%s: Budget exhausted after napi rescheduled\n", n->dev ? n->dev->name : "backlog"); > + else > + list_add_tail(&n->poll_list, &repoll); > } > } > > Thanks, OK, but could you : 1) use pr_warn_once() 2) split the too long line pr_warn_once("%s: Budget exhausted after napi rescheduled\n", n->dev ? n->dev->name : "backlog"); Thanks Herbert ! ^ permalink raw reply [flat|nested] 23+ messages in thread
* [0/4] net: net_rx_action fixes and clean-ups 2014-12-20 18:00 ` Eric Dumazet @ 2014-12-20 20:14 ` Herbert Xu 2014-12-20 20:16 ` [PATCH 1/4] net: Move napi polling code out of net_rx_action Herbert Xu ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: Herbert Xu @ 2014-12-20 20:14 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet On Sat, Dec 20, 2014 at 10:00:12AM -0800, Eric Dumazet wrote: > > OK, but could you : > 1) use pr_warn_once() > 2) split the too long line > pr_warn_once("%s: Budget exhausted after napi rescheduled\n", > n->dev ? n->dev->name : "backlog"); Sure, I'll clean it up a bit too. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/4] net: Move napi polling code out of net_rx_action 2014-12-20 20:14 ` [0/4] net: net_rx_action fixes and clean-ups Herbert Xu @ 2014-12-20 20:16 ` Herbert Xu 2014-12-24 4:20 ` David Miller 2014-12-20 20:16 ` [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2014-12-20 20:16 UTC (permalink / raw) To: Eric Dumazet, David Miller, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet This patch creates a new function napi_poll and moves the napi polling code from net_rx_action into it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/core/dev.c | 98 +++++++++++++++++++++++++++++++-------------------------- 1 file changed, 54 insertions(+), 44 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index f411c28..f7c4f4e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4557,6 +4557,59 @@ void netif_napi_del(struct napi_struct *napi) } EXPORT_SYMBOL(netif_napi_del); +static int napi_poll(struct napi_struct *n, struct list_head *repoll) +{ + void *have; + int work, weight; + + list_del_init(&n->poll_list); + + have = netpoll_poll_lock(n); + + weight = n->weight; + + /* This NAPI_STATE_SCHED test is for avoiding a race + * with netpoll's poll_napi(). Only the entity which + * obtains the lock and sees NAPI_STATE_SCHED set will + * actually make the ->poll() call. Therefore we avoid + * accidentally calling ->poll() when NAPI is not scheduled. + */ + work = 0; + if (test_bit(NAPI_STATE_SCHED, &n->state)) { + work = n->poll(n, weight); + trace_napi_poll(n); + } + + WARN_ON_ONCE(work > weight); + + if (likely(work < weight)) + goto out_unlock; + + /* Drivers must not modify the NAPI state if they + * consume the entire weight. In such cases this code + * still "owns" the NAPI instance and therefore can + * move the instance around on the list at-will. + */ + if (unlikely(napi_disable_pending(n))) { + napi_complete(n); + goto out_unlock; + } + + if (n->gro_list) { + /* flush too old packets + * If HZ < 1000, flush all packets. + */ + napi_gro_flush(n, HZ >= 1000); + } + + list_add_tail(&n->poll_list, repoll); + +out_unlock: + netpoll_poll_unlock(have); + + return work; +} + static void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); @@ -4564,7 +4617,6 @@ static void net_rx_action(struct softirq_action *h) int budget = netdev_budget; LIST_HEAD(list); LIST_HEAD(repoll); - void *have; local_irq_disable(); list_splice_init(&sd->poll_list, &list); @@ -4572,7 +4624,6 @@ static void net_rx_action(struct softirq_action *h) while (!list_empty(&list)) { struct napi_struct *n; - int work, weight; /* If softirq window is exhausted then punt. * Allow this to run for 2 jiffies since which will allow @@ -4583,48 +4634,7 @@ static void net_rx_action(struct softirq_action *h) n = list_first_entry(&list, struct napi_struct, poll_list); - list_del_init(&n->poll_list); - - have = netpoll_poll_lock(n); - - weight = n->weight; - - /* This NAPI_STATE_SCHED test is for avoiding a race - * with netpoll's poll_napi(). Only the entity which - * obtains the lock and sees NAPI_STATE_SCHED set will - * actually make the ->poll() call. Therefore we avoid - * accidentally calling ->poll() when NAPI is not scheduled. - */ - work = 0; - if (test_bit(NAPI_STATE_SCHED, &n->state)) { - work = n->poll(n, weight); - trace_napi_poll(n); - } - - WARN_ON_ONCE(work > weight); - - budget -= work; - - /* Drivers must not modify the NAPI state if they - * consume the entire weight. In such cases this code - * still "owns" the NAPI instance and therefore can - * move the instance around on the list at-will. - */ - if (unlikely(work == weight)) { - if (unlikely(napi_disable_pending(n))) { - napi_complete(n); - } else { - if (n->gro_list) { - /* flush too old packets - * If HZ < 1000, flush all packets. - */ - napi_gro_flush(n, HZ >= 1000); - } - list_add_tail(&n->poll_list, &repoll); - } - } - - netpoll_poll_unlock(have); + budget -= napi_poll(n, &repoll); } if (!sd_has_rps_ipi_waiting(sd) && ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/4] net: Move napi polling code out of net_rx_action 2014-12-20 20:16 ` [PATCH 1/4] net: Move napi polling code out of net_rx_action Herbert Xu @ 2014-12-24 4:20 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2014-12-24 4:20 UTC (permalink / raw) To: herbert Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sun, 21 Dec 2014 07:16:21 +1100 > This patch creates a new function napi_poll and moves the napi > polling code from net_rx_action into it. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget 2014-12-20 20:14 ` [0/4] net: net_rx_action fixes and clean-ups Herbert Xu 2014-12-20 20:16 ` [PATCH 1/4] net: Move napi polling code out of net_rx_action Herbert Xu @ 2014-12-20 20:16 ` Herbert Xu 2014-12-24 4:20 ` David Miller 2014-12-20 20:16 ` [PATCH 3/4] net: Always poll at least one device in net_rx_action Herbert Xu 2014-12-20 20:16 ` [PATCH 4/4] net: Rearrange loop " Herbert Xu 3 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2014-12-20 20:16 UTC (permalink / raw) To: Eric Dumazet, David Miller, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) required drivers to leave poll_list empty if the entire budget is consumed. We have already had two broken drivers so let's add a check for this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/core/dev.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index f7c4f4e..4a9c424 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4602,6 +4602,15 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) napi_gro_flush(n, HZ >= 1000); } + /* Some drivers may have called napi_schedule + * prior to exhausting their budget. + */ + if (unlikely(!list_empty(&n->poll_list))) { + pr_warn_once("%s: Budget exhausted after napi rescheduled\n", + n->dev ? n->dev->name : "backlog"); + goto out_unlock; + } + list_add_tail(&n->poll_list, repoll); out_unlock: ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget 2014-12-20 20:16 ` [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu @ 2014-12-24 4:20 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2014-12-24 4:20 UTC (permalink / raw) To: herbert Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sun, 21 Dec 2014 07:16:22 +1100 > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) required drivers to leave poll_list > empty if the entire budget is consumed. > > We have already had two broken drivers so let's add a check for > this. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/4] net: Always poll at least one device in net_rx_action 2014-12-20 20:14 ` [0/4] net: net_rx_action fixes and clean-ups Herbert Xu 2014-12-20 20:16 ` [PATCH 1/4] net: Move napi polling code out of net_rx_action Herbert Xu 2014-12-20 20:16 ` [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu @ 2014-12-20 20:16 ` Herbert Xu 2014-12-24 4:20 ` David Miller 2014-12-20 20:16 ` [PATCH 4/4] net: Rearrange loop " Herbert Xu 3 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2014-12-20 20:16 UTC (permalink / raw) To: Eric Dumazet, David Miller, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet We should only perform the softnet_break check after we have polled at least one device in net_rx_action. Otherwise a zero or negative setting of netdev_budget can lock up the whole system. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/core/dev.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 4a9c424..22b0fb2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4634,16 +4634,15 @@ static void net_rx_action(struct softirq_action *h) while (!list_empty(&list)) { struct napi_struct *n; + n = list_first_entry(&list, struct napi_struct, poll_list); + budget -= napi_poll(n, &repoll); + /* If softirq window is exhausted then punt. * Allow this to run for 2 jiffies since which will allow * an average latency of 1.5/HZ. */ if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit))) goto softnet_break; - - - n = list_first_entry(&list, struct napi_struct, poll_list); - budget -= napi_poll(n, &repoll); } if (!sd_has_rps_ipi_waiting(sd) && ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] net: Always poll at least one device in net_rx_action 2014-12-20 20:16 ` [PATCH 3/4] net: Always poll at least one device in net_rx_action Herbert Xu @ 2014-12-24 4:20 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2014-12-24 4:20 UTC (permalink / raw) To: herbert Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sun, 21 Dec 2014 07:16:24 +1100 > We should only perform the softnet_break check after we have polled > at least one device in net_rx_action. Otherwise a zero or negative > setting of netdev_budget can lock up the whole system. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 4/4] net: Rearrange loop in net_rx_action 2014-12-20 20:14 ` [0/4] net: net_rx_action fixes and clean-ups Herbert Xu ` (2 preceding siblings ...) 2014-12-20 20:16 ` [PATCH 3/4] net: Always poll at least one device in net_rx_action Herbert Xu @ 2014-12-20 20:16 ` Herbert Xu 2014-12-24 4:20 ` David Miller 3 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2014-12-20 20:16 UTC (permalink / raw) To: Eric Dumazet, David Miller, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet This patch rearranges the loop in net_rx_action to reduce the amount of jumping back and forth when reading the code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/core/dev.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 22b0fb2..dd565a5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4631,9 +4631,15 @@ static void net_rx_action(struct softirq_action *h) list_splice_init(&sd->poll_list, &list); local_irq_enable(); - while (!list_empty(&list)) { + for (;;) { struct napi_struct *n; + if (list_empty(&list)) { + if (!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll)) + return; + break; + } + n = list_first_entry(&list, struct napi_struct, poll_list); budget -= napi_poll(n, &repoll); @@ -4641,15 +4647,13 @@ static void net_rx_action(struct softirq_action *h) * Allow this to run for 2 jiffies since which will allow * an average latency of 1.5/HZ. */ - if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit))) - goto softnet_break; + if (unlikely(budget <= 0 || + time_after_eq(jiffies, time_limit))) { + sd->time_squeeze++; + break; + } } - if (!sd_has_rps_ipi_waiting(sd) && - list_empty(&list) && - list_empty(&repoll)) - return; -out: local_irq_disable(); list_splice_tail_init(&sd->poll_list, &list); @@ -4659,12 +4663,6 @@ out: __raise_softirq_irqoff(NET_RX_SOFTIRQ); net_rps_action_and_irq_enable(sd); - - return; - -softnet_break: - sd->time_squeeze++; - goto out; } struct netdev_adjacent { ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] net: Rearrange loop in net_rx_action 2014-12-20 20:16 ` [PATCH 4/4] net: Rearrange loop " Herbert Xu @ 2014-12-24 4:20 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2014-12-24 4:20 UTC (permalink / raw) To: herbert Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sun, 21 Dec 2014 07:16:25 +1100 > This patch rearranges the loop in net_rx_action to reduce the > amount of jumping back and forth when reading the code. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: virtio_net: Fix napi poll list corruption 2014-12-20 0:23 ` virtio_net: Fix napi poll list corruption Herbert Xu 2014-12-20 0:36 ` net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu @ 2014-12-22 8:18 ` Jason Wang 2014-12-22 9:35 ` caif: " Herbert Xu 2014-12-22 16:19 ` virtio_net: " Marcelo Ricardo Leitner 2014-12-22 21:10 ` David Miller 3 siblings, 1 reply; 23+ messages in thread From: Jason Wang @ 2014-12-22 8:18 UTC (permalink / raw) To: Herbert Xu, David Vrabel Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet, David S. Miller On 12/20/2014 08:23 AM, Herbert Xu wrote: > David Vrabel <david.vrabel@citrix.com> wrote: >> After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt >> masking in NAPI) the napi instance is removed from the per-cpu list >> prior to calling the n->poll(), and is only requeued if all of the >> budget was used. This inadvertently broke netfront because netfront >> does not use NAPI correctly. > A similar bug exists in virtio_net. > > -- >8 -- > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks virtio_net in an insidious way. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers virtio_net tries to do a last-ditch check and if > there is more work it will call napi_schedule and then immediately > process some of this new work. Should the entire budget be consumed > while processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in virtio_net. > > The worst part of this bug is that the list corruption causes other > napi users to be moved off-list. In my case I was chasing a stall > in IPsec (IPsec uses netif_rx) and I only belatedly realised that it > was virtio_net which caused the stall even though the virtio_net > poll was still functioning perfectly after IPsec stalled. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index b8bd719..5ca9771 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -760,7 +760,6 @@ static int virtnet_poll(struct napi_struct *napi, int budget) > container_of(napi, struct receive_queue, napi); > unsigned int r, received = 0; > > -again: > received += virtnet_receive(rq, budget - received); > > /* Out of packets? */ > @@ -771,7 +770,6 @@ again: > napi_schedule_prep(napi)) { > virtqueue_disable_cb(rq->vq); > __napi_schedule(napi); > - goto again; > } > } > > Cheers, Acked-by: Jason Wang <jasowang@redhat.com> btw, looks like at least caif_virtio has the same issue. ^ permalink raw reply [flat|nested] 23+ messages in thread
* caif: Fix napi poll list corruption 2014-12-22 8:18 ` virtio_net: Fix napi poll list corruption Jason Wang @ 2014-12-22 9:35 ` Herbert Xu 2014-12-22 10:02 ` Jason Wang 2014-12-22 21:35 ` David Miller 0 siblings, 2 replies; 23+ messages in thread From: Herbert Xu @ 2014-12-22 9:35 UTC (permalink / raw) To: Jason Wang Cc: David Vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet, David S. Miller On Mon, Dec 22, 2014 at 04:18:33PM +0800, Jason Wang wrote: > > btw, looks like at least caif_virtio has the same issue. Good catch. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) breaks caif. It is now required that if the entire budget is consumed when poll returns, the napi poll_list must remain empty. However, like some other drivers caif tries to do a last-ditch check and if there is more work it will call napi_schedule and then immediately process some of this new work. Should the entire budget be consumed while processing such new work then we will violate the new caller contract. This patch fixes this by not touching any work when we reschedule in caif. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c index a5fefb9..b306210 100644 --- a/drivers/net/caif/caif_virtio.c +++ b/drivers/net/caif/caif_virtio.c @@ -257,7 +257,6 @@ static int cfv_rx_poll(struct napi_struct *napi, int quota) struct vringh_kiov *riov = &cfv->ctx.riov; unsigned int skb_len; -again: do { skb = NULL; @@ -322,7 +321,6 @@ exit: napi_schedule_prep(napi)) { vringh_notify_disable_kern(cfv->vr_rx); __napi_schedule(napi); - goto again; } break; Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: caif: Fix napi poll list corruption 2014-12-22 9:35 ` caif: " Herbert Xu @ 2014-12-22 10:02 ` Jason Wang 2014-12-22 21:35 ` David Miller 1 sibling, 0 replies; 23+ messages in thread From: Jason Wang @ 2014-12-22 10:02 UTC (permalink / raw) To: Herbert Xu Cc: David Vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet, David S. Miller On Mon, Dec 22, 2014 at 5:35 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Dec 22, 2014 at 04:18:33PM +0800, Jason Wang wrote: >> >> btw, looks like at least caif_virtio has the same issue. > > Good catch. > > -- >8 -- > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks caif. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers caif tries to do a last-ditch check and if there is > more work it will call napi_schedule and then immediately process > some of this new work. Should the entire budget be consumed while > processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in caif. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jason Wang <jasowang@redhat.com> Thanks. > > > diff --git a/drivers/net/caif/caif_virtio.c > b/drivers/net/caif/caif_virtio.c > index a5fefb9..b306210 100644 > --- a/drivers/net/caif/caif_virtio.c > +++ b/drivers/net/caif/caif_virtio.c > @@ -257,7 +257,6 @@ static int cfv_rx_poll(struct napi_struct *napi, > int quota) > struct vringh_kiov *riov = &cfv->ctx.riov; > unsigned int skb_len; > > -again: > do { > skb = NULL; > > @@ -322,7 +321,6 @@ exit: > napi_schedule_prep(napi)) { > vringh_notify_disable_kern(cfv->vr_rx); > __napi_schedule(napi); > - goto again; > } > break; > > Thanks, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: caif: Fix napi poll list corruption 2014-12-22 9:35 ` caif: " Herbert Xu 2014-12-22 10:02 ` Jason Wang @ 2014-12-22 21:35 ` David Miller 1 sibling, 0 replies; 23+ messages in thread From: David Miller @ 2014-12-22 21:35 UTC (permalink / raw) To: herbert Cc: jasowang, david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: Herbert Xu <herbert@gondor.apana.org.au> Date: Mon, 22 Dec 2014 20:35:25 +1100 > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks caif. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers caif tries to do a last-ditch check and if there is > more work it will call napi_schedule and then immediately process > some of this new work. Should the entire budget be consumed while > processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in caif. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied, thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: virtio_net: Fix napi poll list corruption 2014-12-20 0:23 ` virtio_net: Fix napi poll list corruption Herbert Xu 2014-12-20 0:36 ` net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu 2014-12-22 8:18 ` virtio_net: Fix napi poll list corruption Jason Wang @ 2014-12-22 16:19 ` Marcelo Ricardo Leitner 2014-12-22 21:10 ` David Miller 3 siblings, 0 replies; 23+ messages in thread From: Marcelo Ricardo Leitner @ 2014-12-22 16:19 UTC (permalink / raw) To: Herbert Xu, David Vrabel Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet, David S. Miller On 19-12-2014 22:23, Herbert Xu wrote: > David Vrabel <david.vrabel@citrix.com> wrote: >> After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt >> masking in NAPI) the napi instance is removed from the per-cpu list >> prior to calling the n->poll(), and is only requeued if all of the >> budget was used. This inadvertently broke netfront because netfront >> does not use NAPI correctly. > > A similar bug exists in virtio_net. > > -- >8 -- > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks virtio_net in an insidious way. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers virtio_net tries to do a last-ditch check and if > there is more work it will call napi_schedule and then immediately > process some of this new work. Should the entire budget be consumed > while processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in virtio_net. > > The worst part of this bug is that the list corruption causes other > napi users to be moved off-list. In my case I was chasing a stall > in IPsec (IPsec uses netif_rx) and I only belatedly realised that it > was virtio_net which caused the stall even though the virtio_net > poll was still functioning perfectly after IPsec stalled. Thanks for finding/fixing this, Herbert. I was debugging this one too. In my case, vxlan interface was getting stuck. Marcelo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: virtio_net: Fix napi poll list corruption 2014-12-20 0:23 ` virtio_net: Fix napi poll list corruption Herbert Xu ` (2 preceding siblings ...) 2014-12-22 16:19 ` virtio_net: " Marcelo Ricardo Leitner @ 2014-12-22 21:10 ` David Miller 3 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2014-12-22 21:10 UTC (permalink / raw) To: herbert Cc: david.vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 20 Dec 2014 11:23:27 +1100 > The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less > interrupt masking in NAPI) breaks virtio_net in an insidious way. > > It is now required that if the entire budget is consumed when poll > returns, the napi poll_list must remain empty. However, like some > other drivers virtio_net tries to do a last-ditch check and if > there is more work it will call napi_schedule and then immediately > process some of this new work. Should the entire budget be consumed > while processing such new work then we will violate the new caller > contract. > > This patch fixes this by not touching any work when we reschedule > in virtio_net. > > The worst part of this bug is that the list corruption causes other > napi users to be moved off-list. In my case I was chasing a stall > in IPsec (IPsec uses netif_rx) and I only belatedly realised that it > was virtio_net which caused the stall even though the virtio_net > poll was still functioning perfectly after IPsec stalled. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied, thanks Herbert. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2014-12-24 4:20 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-16 18:59 [PATCHv1 net] xen-netfront: use napi_complete() correctly to prevent Rx stalling David Vrabel 2014-12-16 20:22 ` David Miller 2014-12-20 0:23 ` virtio_net: Fix napi poll list corruption Herbert Xu 2014-12-20 0:36 ` net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu 2014-12-20 1:34 ` Eric Dumazet 2014-12-20 2:40 ` David Miller 2014-12-20 6:55 ` Herbert Xu 2014-12-20 18:00 ` Eric Dumazet 2014-12-20 20:14 ` [0/4] net: net_rx_action fixes and clean-ups Herbert Xu 2014-12-20 20:16 ` [PATCH 1/4] net: Move napi polling code out of net_rx_action Herbert Xu 2014-12-24 4:20 ` David Miller 2014-12-20 20:16 ` [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget Herbert Xu 2014-12-24 4:20 ` David Miller 2014-12-20 20:16 ` [PATCH 3/4] net: Always poll at least one device in net_rx_action Herbert Xu 2014-12-24 4:20 ` David Miller 2014-12-20 20:16 ` [PATCH 4/4] net: Rearrange loop " Herbert Xu 2014-12-24 4:20 ` David Miller 2014-12-22 8:18 ` virtio_net: Fix napi poll list corruption Jason Wang 2014-12-22 9:35 ` caif: " Herbert Xu 2014-12-22 10:02 ` Jason Wang 2014-12-22 21:35 ` David Miller 2014-12-22 16:19 ` virtio_net: " Marcelo Ricardo Leitner 2014-12-22 21:10 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).