netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 net] xen-netback: fix race between napi_complete() and interrupt handler
@ 2014-05-16 11:20 David Vrabel
  2014-05-16 11:21 ` Wei Liu
  0 siblings, 1 reply; 4+ messages in thread
From: David Vrabel @ 2014-05-16 11:20 UTC (permalink / raw)
  To: netdev; +Cc: xen-devel, Ian Campbell, Wei Liu, David Vrabel

When the NAPI budget was not all used, xenvif_poll() would call
napi_complete() /after/ enabling the interrupt.  This resulted in a
race between the napi_complete() and the napi_schedule() in the
interrupt handler.  The use of local_irq_save/restore() avoided by
race iff the handler is running on the same CPU but not if it was
running on a different CPU.

Fix this properly by calling napi_complete() before reenabling
interrupts (in the xenvif_napi_schedule_or_enable_irq() call).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Lui <wei.lui2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3:
- new name is xenvif_napi_schedule_or_enable_events().

v2:
- Rename xenvif_check_rx_xenvif() to
  xenvif_napi_schedule_or_enable_irq() to make it more obvious what it
  does.
---
 drivers/net/xen-netback/common.h    |    2 +-
 drivers/net/xen-netback/interface.c |   30 +++---------------------------
 drivers/net/xen-netback/netback.c   |    4 ++--
 3 files changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ae413a2..d66de9a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -209,7 +209,7 @@ int xenvif_map_frontend_rings(struct xenvif *vif,
 			      grant_ref_t rx_ring_ref);
 
 /* Check for SKBs from frontend and schedule backend processing */
-void xenvif_check_rx_xenvif(struct xenvif *vif);
+void xenvif_napi_schedule_or_enable_events(struct xenvif *vif);
 
 /* Prevent the device from generating any further traffic. */
 void xenvif_carrier_off(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7669d49..0200de2 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 	work_done = xenvif_tx_action(vif, budget);
 
 	if (work_done < budget) {
-		int more_to_do = 0;
-		unsigned long flags;
-
-		/* It is necessary to disable IRQ before calling
-		 * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might
-		 * lose event from the frontend.
-		 *
-		 * Consider:
-		 *   RING_HAS_UNCONSUMED_REQUESTS
-		 *   <frontend generates event to trigger napi_schedule>
-		 *   __napi_complete
-		 *
-		 * This handler is still in scheduled state so the
-		 * event has no effect at all. After __napi_complete
-		 * this handler is descheduled and cannot get
-		 * scheduled again. We lose event in this case and the ring
-		 * will be completely stalled.
-		 */
-
-		local_irq_save(flags);
-
-		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
-		if (!more_to_do)
-			__napi_complete(napi);
-
-		local_irq_restore(flags);
+		napi_complete(napi);
+		xenvif_napi_schedule_or_enable_events(vif);
 	}
 
 	return work_done;
@@ -166,7 +142,7 @@ static void xenvif_up(struct xenvif *vif)
 	enable_irq(vif->tx_irq);
 	if (vif->tx_irq != vif->rx_irq)
 		enable_irq(vif->rx_irq);
-	xenvif_check_rx_xenvif(vif);
+	xenvif_napi_schedule_or_enable_events(vif);
 }
 
 static void xenvif_down(struct xenvif *vif)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e5284bc..0efda31 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -604,7 +604,7 @@ done:
 		notify_remote_via_irq(vif->rx_irq);
 }
 
-void xenvif_check_rx_xenvif(struct xenvif *vif)
+void xenvif_napi_schedule_or_enable_events(struct xenvif *vif)
 {
 	int more_to_do;
 
@@ -638,7 +638,7 @@ static void tx_credit_callback(unsigned long data)
 {
 	struct xenvif *vif = (struct xenvif *)data;
 	tx_add_credit(vif);
-	xenvif_check_rx_xenvif(vif);
+	xenvif_napi_schedule_or_enable_events(vif);
 }
 
 static void xenvif_tx_err(struct xenvif *vif,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv3 net] xen-netback: fix race between napi_complete() and interrupt handler
  2014-05-16 11:20 [PATCHv3 net] xen-netback: fix race between napi_complete() and interrupt handler David Vrabel
@ 2014-05-16 11:21 ` Wei Liu
  2014-05-16 11:26   ` David Vrabel
  0 siblings, 1 reply; 4+ messages in thread
From: Wei Liu @ 2014-05-16 11:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, xen-devel, Ian Campbell, Wei Liu

On Fri, May 16, 2014 at 12:20:02PM +0100, David Vrabel wrote:
> When the NAPI budget was not all used, xenvif_poll() would call
> napi_complete() /after/ enabling the interrupt.  This resulted in a
> race between the napi_complete() and the napi_schedule() in the
> interrupt handler.  The use of local_irq_save/restore() avoided by
> race iff the handler is running on the same CPU but not if it was
> running on a different CPU.
> 
> Fix this properly by calling napi_complete() before reenabling
> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Acked-by: Wei Lui <wei.lui2@citrix.com>

It's "Liu". ;-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv3 net] xen-netback: fix race between napi_complete() and interrupt handler
  2014-05-16 11:21 ` Wei Liu
@ 2014-05-16 11:26   ` David Vrabel
  2014-05-16 20:27     ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: David Vrabel @ 2014-05-16 11:26 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, Ian Campbell

On 16/05/14 12:21, Wei Liu wrote:
> On Fri, May 16, 2014 at 12:20:02PM +0100, David Vrabel wrote:
>> When the NAPI budget was not all used, xenvif_poll() would call
>> napi_complete() /after/ enabling the interrupt.  This resulted in a
>> race between the napi_complete() and the napi_schedule() in the
>> interrupt handler.  The use of local_irq_save/restore() avoided by
>> race iff the handler is running on the same CPU but not if it was
>> running on a different CPU.
>>
>> Fix this properly by calling napi_complete() before reenabling
>> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> Acked-by: Wei Lui <wei.lui2@citrix.com>
> 
> It's "Liu". ;-)

Sorry!

Dave, can you take this instead:

8<--------------
xen-netback: fix race between napi_complete() and interrupt handler

When the NAPI budget was not all used, xenvif_poll() would call
napi_complete() /after/ enabling the interrupt.  This resulted in a
race between the napi_complete() and the napi_schedule() in the
interrupt handler.  The use of local_irq_save/restore() avoided by
race iff the handler is running on the same CPU but not if it was
running on a different CPU.

Fix this properly by calling napi_complete() before reenabling
interrupts (in the xenvif_napi_schedule_or_enable_irq() call).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3:
- new name is xenvif_napi_schedule_or_enable_events().

v2:
- Rename xenvif_check_rx_xenvif() to
  xenvif_napi_schedule_or_enable_irq() to make it more obvious what it
  does.
---
 drivers/net/xen-netback/common.h    |    2 +-
 drivers/net/xen-netback/interface.c |   30 +++---------------------------
 drivers/net/xen-netback/netback.c   |    4 ++--
 3 files changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ae413a2..d66de9a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -209,7 +209,7 @@ int xenvif_map_frontend_rings(struct xenvif *vif,
 			      grant_ref_t rx_ring_ref);
 
 /* Check for SKBs from frontend and schedule backend processing */
-void xenvif_check_rx_xenvif(struct xenvif *vif);
+void xenvif_napi_schedule_or_enable_events(struct xenvif *vif);
 
 /* Prevent the device from generating any further traffic. */
 void xenvif_carrier_off(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7669d49..0200de2 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 	work_done = xenvif_tx_action(vif, budget);
 
 	if (work_done < budget) {
-		int more_to_do = 0;
-		unsigned long flags;
-
-		/* It is necessary to disable IRQ before calling
-		 * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might
-		 * lose event from the frontend.
-		 *
-		 * Consider:
-		 *   RING_HAS_UNCONSUMED_REQUESTS
-		 *   <frontend generates event to trigger napi_schedule>
-		 *   __napi_complete
-		 *
-		 * This handler is still in scheduled state so the
-		 * event has no effect at all. After __napi_complete
-		 * this handler is descheduled and cannot get
-		 * scheduled again. We lose event in this case and the ring
-		 * will be completely stalled.
-		 */
-
-		local_irq_save(flags);
-
-		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
-		if (!more_to_do)
-			__napi_complete(napi);
-
-		local_irq_restore(flags);
+		napi_complete(napi);
+		xenvif_napi_schedule_or_enable_events(vif);
 	}
 
 	return work_done;
@@ -166,7 +142,7 @@ static void xenvif_up(struct xenvif *vif)
 	enable_irq(vif->tx_irq);
 	if (vif->tx_irq != vif->rx_irq)
 		enable_irq(vif->rx_irq);
-	xenvif_check_rx_xenvif(vif);
+	xenvif_napi_schedule_or_enable_events(vif);
 }
 
 static void xenvif_down(struct xenvif *vif)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e5284bc..0efda31 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -604,7 +604,7 @@ done:
 		notify_remote_via_irq(vif->rx_irq);
 }
 
-void xenvif_check_rx_xenvif(struct xenvif *vif)
+void xenvif_napi_schedule_or_enable_events(struct xenvif *vif)
 {
 	int more_to_do;
 
@@ -638,7 +638,7 @@ static void tx_credit_callback(unsigned long data)
 {
 	struct xenvif *vif = (struct xenvif *)data;
 	tx_add_credit(vif);
-	xenvif_check_rx_xenvif(vif);
+	xenvif_napi_schedule_or_enable_events(vif);
 }
 
 static void xenvif_tx_err(struct xenvif *vif,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv3 net] xen-netback: fix race between napi_complete() and interrupt handler
  2014-05-16 11:26   ` David Vrabel
@ 2014-05-16 20:27     ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2014-05-16 20:27 UTC (permalink / raw)
  To: david.vrabel; +Cc: wei.liu2, netdev, xen-devel, ian.campbell

From: David Vrabel <david.vrabel@citrix.com>
Date: Fri, 16 May 2014 12:26:04 +0100

> Dave, can you take this instead:
> 
> 8<--------------
> xen-netback: fix race between napi_complete() and interrupt handler
> 
> When the NAPI budget was not all used, xenvif_poll() would call
> napi_complete() /after/ enabling the interrupt.  This resulted in a
> race between the napi_complete() and the napi_schedule() in the
> interrupt handler.  The use of local_irq_save/restore() avoided by
> race iff the handler is running on the same CPU but not if it was
> running on a different CPU.
> 
> Fix this properly by calling napi_complete() before reenabling
> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-05-16 20:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-16 11:20 [PATCHv3 net] xen-netback: fix race between napi_complete() and interrupt handler David Vrabel
2014-05-16 11:21 ` Wei Liu
2014-05-16 11:26   ` David Vrabel
2014-05-16 20:27     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).