From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zoltan Kiss Subject: Re: [PATCH] xen-netback: fix race between napi_complete() and interrupt handler Date: Tue, 25 Mar 2014 14:41:58 +0000 Message-ID: <533195B6.5090305@citrix.com> References: <1395756505-21573-1-git-send-email-david.vrabel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: , Ian Campbell , Wei Liu , Eric Dumazet , David Miller To: David Vrabel , Return-path: Received: from smtp02.citrix.com ([66.165.176.63]:32745 "EHLO SMTP02.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752203AbaCYOmB (ORCPT ); Tue, 25 Mar 2014 10:42:01 -0400 In-Reply-To: <1395756505-21573-1-git-send-email-david.vrabel@citrix.com> Sender: netdev-owner@vger.kernel.org List-ID: My idea was that the current code can't race with interrupt running on a different CPU, because if the interrupt was moved since the last napi_schedule (which scheduled NAPI on the same CPU as the interrupt), the kernel would make sure that the NAPI instance is moved along with it. However I couldn't find any trace of this in the kernel so far, but the current code actually works for me, even when I used a bash script to aggressively move the interrupts around while running. I've added David and Eric to the mailing, maybe they can quickly shed some light on this: how does the kernel make sure that if the interrupt is moved away from a CPU (e.g. by irqbalance), the NAPI instance already scheduled there won't race with it? Zoli On 25/03/14 14:08, David Vrabel wrote: > When the NAPI budget was not all used, xenvif_poll() would call > napi_complete() /after/ enabling the interrupt. This resulted in a > race between the napi_complete() and the napi_schedule() in the > interrupt handler. The use of local_irq_save/restore() avoided by > race iff the handler is running on the same CPU but not if it was > running on a different CPU. > > Fix this properly by calling napi_complete() before reenabling > interrupts (in the xenvif_check_rx_xenvif() call). > > Signed-off-by: David Vrabel > --- > drivers/net/xen-netback/interface.c | 28 ++-------------------------- > 1 files changed, 2 insertions(+), 26 deletions(-) > > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c > index 7669d49..ee322d9 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -65,32 +65,8 @@ static int xenvif_poll(struct napi_struct *napi, int budget) > work_done = xenvif_tx_action(vif, budget); > > if (work_done < budget) { > - int more_to_do = 0; > - unsigned long flags; > - > - /* It is necessary to disable IRQ before calling > - * RING_HAS_UNCONSUMED_REQUESTS. Otherwise we might > - * lose event from the frontend. > - * > - * Consider: > - * RING_HAS_UNCONSUMED_REQUESTS > - * > - * __napi_complete > - * > - * This handler is still in scheduled state so the > - * event has no effect at all. After __napi_complete > - * this handler is descheduled and cannot get > - * scheduled again. We lose event in this case and the ring > - * will be completely stalled. > - */ > - > - local_irq_save(flags); > - > - RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); > - if (!more_to_do) > - __napi_complete(napi); > - > - local_irq_restore(flags); > + napi_complete(napi); > + xenvif_check_rx_xenvif(vif); > } > > return work_done; >