From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?iso-8859-1?Q?Ga=EBtan?= Rivet Subject: Re: [PATCH] net/failsafe: fix Rx clean race Date: Thu, 26 Oct 2017 18:20:29 +0200 Message-ID: <20171026162029.GC10890@bidouze.vm.6wind.com> References: <1508651468-31866-1-git-send-email-matan@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: dev@dpdk.org, stable@dpdk.org To: Matan Azrad Return-path: Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by dpdk.org (Postfix) with ESMTP id 3CC741B1C3 for ; Thu, 26 Oct 2017 18:20:43 +0200 (CEST) Received: by mail-wm0-f67.google.com with SMTP id m72so9067177wmc.1 for ; Thu, 26 Oct 2017 09:20:43 -0700 (PDT) Content-Disposition: inline In-Reply-To: <1508651468-31866-1-git-send-email-matan@mellanox.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello Matan, I think the commit log could be shorter. Proposing this, feel free to expand it if you prefer. ---8<--- When removing a device, the fail-safe checks that it is not within its datapath before cleaning it. When checking whether an Rx burst should be performed on a device, the remove flag is not checked. Thus the port could still enter its datapath and miss a removal round. Furthermore, there is a race between the thread removing the device and the polling thread. Check the remove flag before entering a sub-device Rx burst when in safe mode. This check mitigates the aforementioned race condition. --->8--- Otherwise, On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote: > In case of plug out, the RMV interrupt callback sets the remove flag of > the removed sub-device. The next hotplug alarm cycle should read this > flag and if the data path are clean it should remove the sub-device. > > In case of fail-safe RX burst calling from application, fail-afe tries > to call to all STARTED sub-device rx_burst functions. The remove flag > is not checked here and fail-safe may call to the removed sub-device > rx_burst function. > > The above 2 cases run in different threads and there is a race between > the removed sub-device RX clean check to the removed sub-device > rx_burst call makes the sub device RX unclean. > > If the application calls to rx_burst in loop, the probability to get RX > clean is not enough, especially when there are few sub-devices or if the > rx_burst function of the removed sub-device takes a lot of time. > > Each time the sub-device data path is unclean, the second oportunity to > check it again should be only in the hotplug alarm next cycle; the > default time between cycles is 2 seconds. > > In this loop when fail-safe tries to remove the sub-device, the > sub-device may appear back and fail-safe cannot plug it in back until > the removal process is completted. In this time fail-safe may lose the > primary sub-device services and may hurt application performance. > > This patch adds a remove flag check in safe rx_burst function. > By this way, at most one more hotplug alarm cycle is necessary > to get the sub-device clean for actual removal. > > Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions") > Cc: stable@dpdk.org > > Signed-off-by: Matan Azrad Acked-by: Gaetan Rivet > --- > drivers/net/failsafe/failsafe_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c > index 7311421..70157c8 100644 > --- a/drivers/net/failsafe/failsafe_rxtx.c > +++ b/drivers/net/failsafe/failsafe_rxtx.c > @@ -43,7 +43,8 @@ > { > return (ETH(sdev) == NULL) || > (ETH(sdev)->rx_pkt_burst == NULL) || > - (sdev->state != DEV_STARTED); > + (sdev->state != DEV_STARTED) || > + (sdev->remove != 0); > } > > static inline int > -- > 1.8.3.1 > -- Gaëtan Rivet 6WIND