* [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler
@ 2014-05-15 17:17 David Vrabel
2014-05-15 18:07 ` Wei Liu
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: David Vrabel @ 2014-05-15 17:17 UTC (permalink / raw)
To: netdev; +Cc: xen-devel, Ian Campbell, Wei Liu, David Vrabel
When the NAPI budget was not all used, xenvif_poll() would call
napi_complete() /after/ enabling the interrupt. This resulted in a
race between the napi_complete() and the napi_schedule() in the
interrupt handler. The use of local_irq_save/restore() avoided by
race iff the handler is running on the same CPU but not if it was
running on a different CPU.
Fix this properly by calling napi_complete() before reenabling
interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v2:
- Rename xenvif_check_rx_xenvif() to
xenvif_napi_schedule_or_enable_irq() to make it more obvious what it
does.
---
drivers/net/xen-netback/common.h | 2 +-
drivers/net/xen-netback/interface.c | 30 +++---------------------------
drivers/net/xen-netback/netback.c | 4 ++--
3 files changed, 6 insertions(+), 30 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ae413a2..243574b 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -209,7 +209,7 @@ int xenvif_map_frontend_rings(struct xenvif *vif,
grant_ref_t rx_ring_ref);
/* Check for SKBs from frontend and schedule backend processing */
-void xenvif_check_rx_xenvif(struct xenvif *vif);
+void xenvif_napi_schedule_or_enable_irq(struct xenvif *vif);
/* Prevent the device from generating any further traffic. */
void xenvif_carrier_off(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7669d49..bfd336a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
work_done = xenvif_tx_action(vif, budget);
if (work_done < budget) {
- int more_to_do = 0;
- unsigned long flags;
-
- /* It is necessary to disable IRQ before calling
- * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might
- * lose event from the frontend.
- *
- * Consider:
- * RING_HAS_UNCONSUMED_REQUESTS
- * <frontend generates event to trigger napi_schedule>
- * __napi_complete
- *
- * This handler is still in scheduled state so the
- * event has no effect at all. After __napi_complete
- * this handler is descheduled and cannot get
- * scheduled again. We lose event in this case and the ring
- * will be completely stalled.
- */
-
- local_irq_save(flags);
-
- RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
- if (!more_to_do)
- __napi_complete(napi);
-
- local_irq_restore(flags);
+ napi_complete(napi);
+ xenvif_napi_schedule_or_enable_irq(vif);
}
return work_done;
@@ -166,7 +142,7 @@ static void xenvif_up(struct xenvif *vif)
enable_irq(vif->tx_irq);
if (vif->tx_irq != vif->rx_irq)
enable_irq(vif->rx_irq);
- xenvif_check_rx_xenvif(vif);
+ xenvif_napi_schedule_or_enable_irq(vif);
}
static void xenvif_down(struct xenvif *vif)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e5284bc..c4e630f 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -604,7 +604,7 @@ done:
notify_remote_via_irq(vif->rx_irq);
}
-void xenvif_check_rx_xenvif(struct xenvif *vif)
+void xenvif_napi_schedule_or_enable_irq(struct xenvif *vif)
{
int more_to_do;
@@ -638,7 +638,7 @@ static void tx_credit_callback(unsigned long data)
{
struct xenvif *vif = (struct xenvif *)data;
tx_add_credit(vif);
- xenvif_check_rx_xenvif(vif);
+ xenvif_napi_schedule_or_enable_irq(vif);
}
static void xenvif_tx_err(struct xenvif *vif,
--
1.7.10.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler
2014-05-15 17:17 [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler David Vrabel
@ 2014-05-15 18:07 ` Wei Liu
2014-05-16 8:30 ` Ian Campbell
2014-05-16 9:01 ` Wei Liu
2 siblings, 0 replies; 7+ messages in thread
From: Wei Liu @ 2014-05-15 18:07 UTC (permalink / raw)
To: David Vrabel; +Cc: netdev, xen-devel, Ian Campbell, Wei Liu
On Thu, May 15, 2014 at 06:17:08PM +0100, David Vrabel wrote:
> When the NAPI budget was not all used, xenvif_poll() would call
> napi_complete() /after/ enabling the interrupt. This resulted in a
> race between the napi_complete() and the napi_schedule() in the
> interrupt handler. The use of local_irq_save/restore() avoided by
> race iff the handler is running on the same CPU but not if it was
> running on a different CPU.
>
> Fix this properly by calling napi_complete() before reenabling
> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler
2014-05-15 17:17 [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler David Vrabel
2014-05-15 18:07 ` Wei Liu
@ 2014-05-16 8:30 ` Ian Campbell
2014-05-16 8:39 ` Wei Liu
2014-05-16 9:53 ` [Xen-devel] " David Vrabel
2014-05-16 9:01 ` Wei Liu
2 siblings, 2 replies; 7+ messages in thread
From: Ian Campbell @ 2014-05-16 8:30 UTC (permalink / raw)
To: David Vrabel; +Cc: netdev, xen-devel, Wei Liu
On Thu, 2014-05-15 at 18:17 +0100, David Vrabel wrote:
> When the NAPI budget was not all used, xenvif_poll() would call
> napi_complete() /after/ enabling the interrupt. This resulted in a
> race between the napi_complete() and the napi_schedule() in the
> interrupt handler. The use of local_irq_save/restore() avoided by
> race iff the handler is running on the same CPU but not if it was
> running on a different CPU.
>
> Fix this properly by calling napi_complete() before reenabling
> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> v2:
> - Rename xenvif_check_rx_xenvif() to
> xenvif_napi_schedule_or_enable_irq() to make it more obvious what it
> does.
Which is to update the event pointer such that IRQs are generated again
rather than enabling IRQs as such (enable_irq makes me think of
EFLAGS.IF). But given that I can't think of a better way to describe it:
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(although in this area Wei's ack is more important than mine IMHO).
I think this is intended as a replacement to Wei's "xen-netback: don't
move event pointer in TX credit timeout callback" rather than in
addition to, correct?
Ian.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler
2014-05-16 8:30 ` Ian Campbell
@ 2014-05-16 8:39 ` Wei Liu
2014-05-16 9:53 ` [Xen-devel] " David Vrabel
1 sibling, 0 replies; 7+ messages in thread
From: Wei Liu @ 2014-05-16 8:39 UTC (permalink / raw)
To: Ian Campbell; +Cc: David Vrabel, netdev, xen-devel, Wei Liu
On Fri, May 16, 2014 at 09:30:09AM +0100, Ian Campbell wrote:
> On Thu, 2014-05-15 at 18:17 +0100, David Vrabel wrote:
> > When the NAPI budget was not all used, xenvif_poll() would call
> > napi_complete() /after/ enabling the interrupt. This resulted in a
> > race between the napi_complete() and the napi_schedule() in the
> > interrupt handler. The use of local_irq_save/restore() avoided by
> > race iff the handler is running on the same CPU but not if it was
> > running on a different CPU.
> >
> > Fix this properly by calling napi_complete() before reenabling
> > interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
> >
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> > ---
> > v2:
> > - Rename xenvif_check_rx_xenvif() to
> > xenvif_napi_schedule_or_enable_irq() to make it more obvious what it
> > does.
>
> Which is to update the event pointer such that IRQs are generated again
> rather than enabling IRQs as such (enable_irq makes me think of
> EFLAGS.IF). But given that I can't think of a better way to describe it:
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> (although in this area Wei's ack is more important than mine IMHO).
>
> I think this is intended as a replacement to Wei's "xen-netback: don't
> move event pointer in TX credit timeout callback" rather than in
> addition to, correct?
>
That patch of mine is incorrect and it's not needed anymore.
Wei.
> Ian.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler
2014-05-15 17:17 [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler David Vrabel
2014-05-15 18:07 ` Wei Liu
2014-05-16 8:30 ` Ian Campbell
@ 2014-05-16 9:01 ` Wei Liu
2 siblings, 0 replies; 7+ messages in thread
From: Wei Liu @ 2014-05-16 9:01 UTC (permalink / raw)
To: David Vrabel; +Cc: netdev, xen-devel, Ian Campbell, Wei Liu
On Thu, May 15, 2014 at 06:17:08PM +0100, David Vrabel wrote:
> When the NAPI budget was not all used, xenvif_poll() would call
> napi_complete() /after/ enabling the interrupt. This resulted in a
> race between the napi_complete() and the napi_schedule() in the
> interrupt handler. The use of local_irq_save/restore() avoided by
> race iff the handler is running on the same CPU but not if it was
> running on a different CPU.
>
> Fix this properly by calling napi_complete() before reenabling
> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
I think this patch should be applied to 'net' tree.
Wei.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xen-devel] [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler
2014-05-16 8:30 ` Ian Campbell
2014-05-16 8:39 ` Wei Liu
@ 2014-05-16 9:53 ` David Vrabel
2014-05-16 9:58 ` Ian Campbell
1 sibling, 1 reply; 7+ messages in thread
From: David Vrabel @ 2014-05-16 9:53 UTC (permalink / raw)
To: Ian Campbell, David Vrabel; +Cc: netdev, Wei Liu, xen-devel
On 16/05/14 09:30, Ian Campbell wrote:
> On Thu, 2014-05-15 at 18:17 +0100, David Vrabel wrote:
>> When the NAPI budget was not all used, xenvif_poll() would call
>> napi_complete() /after/ enabling the interrupt. This resulted in a
>> race between the napi_complete() and the napi_schedule() in the
>> interrupt handler. The use of local_irq_save/restore() avoided by
>> race iff the handler is running on the same CPU but not if it was
>> running on a different CPU.
>>
>> Fix this properly by calling napi_complete() before reenabling
>> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> ---
>> v2:
>> - Rename xenvif_check_rx_xenvif() to
>> xenvif_napi_schedule_or_enable_irq() to make it more obvious what it
>> does.
>
> Which is to update the event pointer such that IRQs are generated again
> rather than enabling IRQs as such (enable_irq makes me think of
> EFLAGS.IF). But given that I can't think of a better way to describe it:
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Would xenvif_napi_schedule_or_enable_event() be better?
David
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Xen-devel] [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler
2014-05-16 9:53 ` [Xen-devel] " David Vrabel
@ 2014-05-16 9:58 ` Ian Campbell
0 siblings, 0 replies; 7+ messages in thread
From: Ian Campbell @ 2014-05-16 9:58 UTC (permalink / raw)
To: David Vrabel; +Cc: netdev, Wei Liu, xen-devel
On Fri, 2014-05-16 at 10:53 +0100, David Vrabel wrote:
> On 16/05/14 09:30, Ian Campbell wrote:
> > On Thu, 2014-05-15 at 18:17 +0100, David Vrabel wrote:
> >> When the NAPI budget was not all used, xenvif_poll() would call
> >> napi_complete() /after/ enabling the interrupt. This resulted in a
> >> race between the napi_complete() and the napi_schedule() in the
> >> interrupt handler. The use of local_irq_save/restore() avoided by
> >> race iff the handler is running on the same CPU but not if it was
> >> running on a different CPU.
> >>
> >> Fix this properly by calling napi_complete() before reenabling
> >> interrupts (in the xenvif_napi_schedule_or_enable_irq() call).
> >>
> >> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> >> ---
> >> v2:
> >> - Rename xenvif_check_rx_xenvif() to
> >> xenvif_napi_schedule_or_enable_irq() to make it more obvious what it
> >> does.
> >
> > Which is to update the event pointer such that IRQs are generated again
> > rather than enabling IRQs as such (enable_irq makes me think of
> > EFLAGS.IF). But given that I can't think of a better way to describe it:
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
>
> Would xenvif_napi_schedule_or_enable_event() be better?
I think so, yes. (_events?)
Ian.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-05-16 9:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-15 17:17 [PATCHv2 net-next] xen-netback: fix race between napi_complete() and interrupt handler David Vrabel
2014-05-15 18:07 ` Wei Liu
2014-05-16 8:30 ` Ian Campbell
2014-05-16 8:39 ` Wei Liu
2014-05-16 9:53 ` [Xen-devel] " David Vrabel
2014-05-16 9:58 ` Ian Campbell
2014-05-16 9:01 ` Wei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).