From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olivier MATZ Subject: Re: [PATCH 3/4] ixgbe: automatic link recovery on VF Date: Tue, 17 May 2016 09:50:49 +0200 Message-ID: <573ACD59.3010806@6wind.com> References: <1462396246-26517-1-git-send-email-wenzhuo.lu@intel.com> <1462396246-26517-4-git-send-email-wenzhuo.lu@intel.com> <5739B698.8010909@6wind.com> <6A0DE07E22DDAD4C9103DF62FEBC090903468932@shsmsx102.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit To: "Lu, Wenzhuo" , "dev@dpdk.org" Return-path: Received: from proxy.6wind.com (host.76.145.23.62.rev.coltfrance.com [62.23.145.76]) by dpdk.org (Postfix) with ESMTP id 2AA9668D9 for ; Tue, 17 May 2016 09:51:02 +0200 (CEST) In-Reply-To: <6A0DE07E22DDAD4C9103DF62FEBC090903468932@shsmsx102.ccr.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Wenzhuo, On 05/17/2016 03:11 AM, Lu, Wenzhuo wrote: >> -----Original Message----- >> From: Olivier Matz [mailto:olivier.matz@6wind.com] >> If I understand well, ixgbevf_dev_link_up_down_handler() is called by >> ixgbevf_recv_pkts_fake() on a dataplane core. It means that the core that >> acquired the lock will loop during 100us + 1sec at least. >> If this core was also in charge of polling other queues of other ports, or timers, >> many packets will be dropped (even with a 100us loop). I don't think it is >> acceptable to actively wait inside a rx function. >> >> I think it would avoid many issues to delegate this work to the application, >> maybe by notifying it that the port is in a bad state and must be restarted. The >> application could then properly stop polling the queues, and stop and restart the >> port in a separate thread, without bothering the dataplane cores. > Thanks for the comments. > Yes, you're right. I had a wrong assumption that every queue is handled by one core. > But surely it's not right, we cannot tell how the users will deploy their system. > > I plan to update this patch set. The solution now is, first let the users choose if they want this > auto-reset feature. If so, we will apply another series rx/tx functions which have lock. So we > can stop the rx/tx of the bad ports. > And we also apply a reset API for users. The APPs should call this API in their management thread or so. > It means APPs should guarantee the thread safe for the API. > You see, there're 2 things, > 1, Lock the rx/tx to stop them for users. > 2, Apply a resetting API for users, and every NIC can do their own job. APPs need not to worry about the difference > between different NICs. > > Surely, it's not *automatic* now. The reason is DPDK doesn't guarantee the thread safe. So the operations have to be > left to the APPs and let them to guarantee the thread safe. > > And if the users choose not using auto-reset feature, we will leave this work to the APP :) Yes, I think having 2 modes is a good approach: - the first mode would let the application know a reset has to be performed, without active loop or lock in the rx/tx funcs. - the second mode would transparently manage the reset in the driver, but may lock the core during some time. By the way, you talk about a reset API, why not just using the usual stop/start functions? I think it would work the same. Regards, Olivier