* [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
@ 2014-01-08 19:24 Ma JieYue
2014-01-08 20:11 ` David Miller
2014-01-08 21:18 ` Zoltan Kiss
0 siblings, 2 replies; 4+ messages in thread
From: Ma JieYue @ 2014-01-08 19:24 UTC (permalink / raw)
To: netdev, xen-devel
Cc: Ma JieYue, Wang Yingbin, Fu Tienan, Wei Liu, Ian Campbell,
David Vrabel
From: Ma JieYue <jieyue.majy@alibaba-inc.com>
There is a race when waking up or stopping xenvif tx queue, and it leads to
unnecessary packet drop. The problem is that the rx ring still full when entering
into xenvif_start_xmit. In fact, in xenvif_rx_interrupt, the netif_wake_queue
may be called not just after the ring is not full any more, so the operation
is not atomic. Here is part of the debug log when the race scenario happened:
wake_queue: req_cons_peek 2679757 req_cons 2679586 req_prod 2679841
stop_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
[tx_queue_stopped true]
wake_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
[tx_queue_stopped false]
drop packet: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
The debug log was written, every time right after netif_wake_queue been called
in xenvif_rx_interrupt, every time after netif_stop_queue been called in
xenvif_start_xmit and every time packet drop happened in xenvif_start_xmit.
As we can see, the second wake_queue appeared in the place it should not be, and
we believed the ring had been checked before the stop_queue, but the actual
wake_queue action didn't follow, and took place after the stop_queue, so that when
entering into xenvif_start_xmit the ring was full but the queue was not stopped.
The patch fixes the race by checking if tx queue stopped, before trying to
wake it up in xenvif_rx_interrupt. It only wakes the queue when it is stopped,
as well as it is not full and schedulable.
Signed-off-by: Ma JieYue <jieyue.majy@alibaba-inc.com>
Signed-off-by: Wang Yingbin <yingbin.wangyb@alibaba-inc.com>
Signed-off-by: Fu Tienan <tienan.ftn@alibaba-inc.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
drivers/net/xen-netback/interface.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index fff8cdd..e099f62 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -105,7 +105,7 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
{
struct xenvif *vif = dev_id;
- if (xenvif_rx_schedulable(vif))
+ if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
netif_wake_queue(vif->dev);
return IRQ_HANDLED;
--
1.8.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
2014-01-08 19:24 [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt Ma JieYue
@ 2014-01-08 20:11 ` David Miller
2014-01-09 9:35 ` Paul Durrant
2014-01-08 21:18 ` Zoltan Kiss
1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2014-01-08 20:11 UTC (permalink / raw)
To: majieyue
Cc: netdev, xen-devel, jieyue.majy, yingbin.wangyb, tienan.ftn,
wei.liu2, ian.campbell, david.vrabel
From: Ma JieYue <majieyue@gmail.com>
Date: Thu, 9 Jan 2014 03:24:21 +0800
> - if (xenvif_rx_schedulable(vif))
> + if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
I do not see anything which prevents a netif_stop_queue() call from happening
between these two tests in another thread of control.
This therefore looks like a bandaid and not a real fix.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
2014-01-08 19:24 [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt Ma JieYue
2014-01-08 20:11 ` David Miller
@ 2014-01-08 21:18 ` Zoltan Kiss
1 sibling, 0 replies; 4+ messages in thread
From: Zoltan Kiss @ 2014-01-08 21:18 UTC (permalink / raw)
To: Ma JieYue, netdev, xen-devel
Cc: Ma JieYue, Wang Yingbin, Fu Tienan, Wei Liu, Ian Campbell,
David Vrabel
Hi,
With Paul's recent flow control improvement I think this became invalid:
http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=ca2f09f2b2c6c25047cfc545d057c4edfcfe561c
Zoli
On 08/01/14 19:24, Ma JieYue wrote:
> From: Ma JieYue <jieyue.majy@alibaba-inc.com>
>
> There is a race when waking up or stopping xenvif tx queue, and it leads to
> unnecessary packet drop. The problem is that the rx ring still full when entering
> into xenvif_start_xmit. In fact, in xenvif_rx_interrupt, the netif_wake_queue
> may be called not just after the ring is not full any more, so the operation
> is not atomic. Here is part of the debug log when the race scenario happened:
>
> wake_queue: req_cons_peek 2679757 req_cons 2679586 req_prod 2679841
> stop_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
> [tx_queue_stopped true]
> wake_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
> [tx_queue_stopped false]
> drop packet: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
>
> The debug log was written, every time right after netif_wake_queue been called
> in xenvif_rx_interrupt, every time after netif_stop_queue been called in
> xenvif_start_xmit and every time packet drop happened in xenvif_start_xmit.
> As we can see, the second wake_queue appeared in the place it should not be, and
> we believed the ring had been checked before the stop_queue, but the actual
> wake_queue action didn't follow, and took place after the stop_queue, so that when
> entering into xenvif_start_xmit the ring was full but the queue was not stopped.
>
> The patch fixes the race by checking if tx queue stopped, before trying to
> wake it up in xenvif_rx_interrupt. It only wakes the queue when it is stopped,
> as well as it is not full and schedulable.
>
> Signed-off-by: Ma JieYue <jieyue.majy@alibaba-inc.com>
> Signed-off-by: Wang Yingbin <yingbin.wangyb@alibaba-inc.com>
> Signed-off-by: Fu Tienan <tienan.ftn@alibaba-inc.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> ---
> drivers/net/xen-netback/interface.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index fff8cdd..e099f62 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -105,7 +105,7 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
> {
> struct xenvif *vif = dev_id;
>
> - if (xenvif_rx_schedulable(vif))
> + if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
> netif_wake_queue(vif->dev);
>
> return IRQ_HANDLED;
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
2014-01-08 20:11 ` David Miller
@ 2014-01-09 9:35 ` Paul Durrant
0 siblings, 0 replies; 4+ messages in thread
From: Paul Durrant @ 2014-01-09 9:35 UTC (permalink / raw)
To: David Miller, majieyue@gmail.com
Cc: netdev@vger.kernel.org, xen-devel@lists.xen.org,
jieyue.majy@alibaba-inc.com, yingbin.wangyb@alibaba-inc.com,
tienan.ftn@alibaba-inc.com, Wei Liu, Ian Campbell, David Vrabel
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: 08 January 2014 20:12
> To: majieyue@gmail.com
> Cc: netdev@vger.kernel.org; xen-devel@lists.xen.org; jieyue.majy@alibaba-
> inc.com; yingbin.wangyb@alibaba-inc.com; tienan.ftn@alibaba-inc.com; Wei
> Liu; Ian Campbell; David Vrabel
> Subject: Re: [PATCH net] xen-netback: fix vif tx queue race in
> xenvif_rx_interrupt
>
> From: Ma JieYue <majieyue@gmail.com>
> Date: Thu, 9 Jan 2014 03:24:21 +0800
>
> > - if (xenvif_rx_schedulable(vif))
> > + if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
>
> I do not see anything which prevents a netif_stop_queue() call from
> happening
> between these two tests in another thread of control.
>
> This therefore looks like a bandaid and not a real fix.
My fix "xen-netback: improve guest-receive-side flow control" in net-next will make this change irrelevant as it completely removes all this somewhat fragile code.
Paul
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-01-09 9:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-08 19:24 [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt Ma JieYue
2014-01-08 20:11 ` David Miller
2014-01-09 9:35 ` Paul Durrant
2014-01-08 21:18 ` Zoltan Kiss
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).