netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
@ 2014-01-08 19:24 Ma JieYue
  2014-01-08 20:11 ` David Miller
  2014-01-08 21:18 ` Zoltan Kiss
  0 siblings, 2 replies; 4+ messages in thread
From: Ma JieYue @ 2014-01-08 19:24 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: Ma JieYue, Wang Yingbin, Fu Tienan, Wei Liu, Ian Campbell,
	David Vrabel

From: Ma JieYue <jieyue.majy@alibaba-inc.com>

There is a race when waking up or stopping xenvif tx queue, and it leads to 
unnecessary packet drop. The problem is that the rx ring still full when entering 
into xenvif_start_xmit. In fact, in xenvif_rx_interrupt, the netif_wake_queue 
may be called not just after the ring is not full any more, so the operation 
is not atomic. Here is part of the debug log when the race scenario happened:

wake_queue: req_cons_peek 2679757 req_cons 2679586 req_prod 2679841
stop_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
[tx_queue_stopped true]
wake_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
[tx_queue_stopped false]
drop packet: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841

The debug log was written, every time right after netif_wake_queue been called 
in xenvif_rx_interrupt, every time after netif_stop_queue been called in 
xenvif_start_xmit and every time packet drop happened in xenvif_start_xmit. 
As we can see, the second wake_queue appeared in the place it should not be, and 
we believed the ring had been checked before the stop_queue, but the actual 
wake_queue action didn't follow, and took place after the stop_queue, so that when 
entering into xenvif_start_xmit the ring was full but the queue was not stopped.

The patch fixes the race by checking if tx queue stopped, before trying to 
wake it up in xenvif_rx_interrupt. It only wakes the queue when it is stopped, 
as well as it is not full and schedulable.

Signed-off-by: Ma JieYue <jieyue.majy@alibaba-inc.com>
Signed-off-by: Wang Yingbin <yingbin.wangyb@alibaba-inc.com>
Signed-off-by: Fu Tienan <tienan.ftn@alibaba-inc.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/interface.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index fff8cdd..e099f62 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -105,7 +105,7 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
 	struct xenvif *vif = dev_id;
 
-	if (xenvif_rx_schedulable(vif))
+	if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
 		netif_wake_queue(vif->dev);
 
 	return IRQ_HANDLED;
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
  2014-01-08 19:24 [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt Ma JieYue
@ 2014-01-08 20:11 ` David Miller
  2014-01-09  9:35   ` Paul Durrant
  2014-01-08 21:18 ` Zoltan Kiss
  1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2014-01-08 20:11 UTC (permalink / raw)
  To: majieyue
  Cc: netdev, xen-devel, jieyue.majy, yingbin.wangyb, tienan.ftn,
	wei.liu2, ian.campbell, david.vrabel

From: Ma JieYue <majieyue@gmail.com>
Date: Thu,  9 Jan 2014 03:24:21 +0800

> -	if (xenvif_rx_schedulable(vif))
> +	if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))

I do not see anything which prevents a netif_stop_queue() call from happening
between these two tests in another thread of control.

This therefore looks like a bandaid and not a real fix.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
  2014-01-08 19:24 [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt Ma JieYue
  2014-01-08 20:11 ` David Miller
@ 2014-01-08 21:18 ` Zoltan Kiss
  1 sibling, 0 replies; 4+ messages in thread
From: Zoltan Kiss @ 2014-01-08 21:18 UTC (permalink / raw)
  To: Ma JieYue, netdev, xen-devel
  Cc: Ma JieYue, Wang Yingbin, Fu Tienan, Wei Liu, Ian Campbell,
	David Vrabel

Hi,

With Paul's recent flow control improvement I think this became invalid:

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=ca2f09f2b2c6c25047cfc545d057c4edfcfe561c

Zoli

On 08/01/14 19:24, Ma JieYue wrote:
> From: Ma JieYue <jieyue.majy@alibaba-inc.com>
>
> There is a race when waking up or stopping xenvif tx queue, and it leads to
> unnecessary packet drop. The problem is that the rx ring still full when entering
> into xenvif_start_xmit. In fact, in xenvif_rx_interrupt, the netif_wake_queue
> may be called not just after the ring is not full any more, so the operation
> is not atomic. Here is part of the debug log when the race scenario happened:
>
> wake_queue: req_cons_peek 2679757 req_cons 2679586 req_prod 2679841
> stop_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
> [tx_queue_stopped true]
> wake_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
> [tx_queue_stopped false]
> drop packet: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
>
> The debug log was written, every time right after netif_wake_queue been called
> in xenvif_rx_interrupt, every time after netif_stop_queue been called in
> xenvif_start_xmit and every time packet drop happened in xenvif_start_xmit.
> As we can see, the second wake_queue appeared in the place it should not be, and
> we believed the ring had been checked before the stop_queue, but the actual
> wake_queue action didn't follow, and took place after the stop_queue, so that when
> entering into xenvif_start_xmit the ring was full but the queue was not stopped.
>
> The patch fixes the race by checking if tx queue stopped, before trying to
> wake it up in xenvif_rx_interrupt. It only wakes the queue when it is stopped,
> as well as it is not full and schedulable.
>
> Signed-off-by: Ma JieYue <jieyue.majy@alibaba-inc.com>
> Signed-off-by: Wang Yingbin <yingbin.wangyb@alibaba-inc.com>
> Signed-off-by: Fu Tienan <tienan.ftn@alibaba-inc.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> ---
>   drivers/net/xen-netback/interface.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index fff8cdd..e099f62 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -105,7 +105,7 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
>   {
>   	struct xenvif *vif = dev_id;
>
> -	if (xenvif_rx_schedulable(vif))
> +	if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
>   		netif_wake_queue(vif->dev);
>
>   	return IRQ_HANDLED;
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt
  2014-01-08 20:11 ` David Miller
@ 2014-01-09  9:35   ` Paul Durrant
  0 siblings, 0 replies; 4+ messages in thread
From: Paul Durrant @ 2014-01-09  9:35 UTC (permalink / raw)
  To: David Miller, majieyue@gmail.com
  Cc: netdev@vger.kernel.org, xen-devel@lists.xen.org,
	jieyue.majy@alibaba-inc.com, yingbin.wangyb@alibaba-inc.com,
	tienan.ftn@alibaba-inc.com, Wei Liu, Ian Campbell, David Vrabel

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: 08 January 2014 20:12
> To: majieyue@gmail.com
> Cc: netdev@vger.kernel.org; xen-devel@lists.xen.org; jieyue.majy@alibaba-
> inc.com; yingbin.wangyb@alibaba-inc.com; tienan.ftn@alibaba-inc.com; Wei
> Liu; Ian Campbell; David Vrabel
> Subject: Re: [PATCH net] xen-netback: fix vif tx queue race in
> xenvif_rx_interrupt
> 
> From: Ma JieYue <majieyue@gmail.com>
> Date: Thu,  9 Jan 2014 03:24:21 +0800
> 
> > -	if (xenvif_rx_schedulable(vif))
> > +	if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
> 
> I do not see anything which prevents a netif_stop_queue() call from
> happening
> between these two tests in another thread of control.
> 
> This therefore looks like a bandaid and not a real fix.

My fix "xen-netback: improve guest-receive-side flow control" in net-next will make this change irrelevant as it completely removes all this somewhat fragile code.

  Paul

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-09  9:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-08 19:24 [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt Ma JieYue
2014-01-08 20:11 ` David Miller
2014-01-09  9:35   ` Paul Durrant
2014-01-08 21:18 ` Zoltan Kiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).