From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [PATCH net] failover: eliminate callback hell Date: Tue, 5 Jun 2018 23:00:38 -0700 Message-ID: <20180605230038.521d5c18@xeon-e3> References: <20180605034231.31610-1-sthemmin@microsoft.com> <20180605211927-mutt-send-email-mst@kernel.org> <20180605115305.502a7ebb@xeon-e3> <20180605221049-mutt-send-email-mst@kernel.org> <20180605145222.477e5ae8@xeon-e3> <20180605205118.439a7873@xeon-e3> <72cf1d79-f311-98e5-23a0-6ee23dc8fd6b@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "Michael S. Tsirkin" , kys@microsoft.com, haiyangz@microsoft.com, davem@davemloft.net, netdev@vger.kernel.org, Stephen Hemminger To: "Samudrala, Sridhar" Return-path: Received: from mail-pg0-f65.google.com ([74.125.83.65]:42261 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751330AbeFFGAm (ORCPT ); Wed, 6 Jun 2018 02:00:42 -0400 Received: by mail-pg0-f65.google.com with SMTP id p9-v6so2417226pgc.9 for ; Tue, 05 Jun 2018 23:00:41 -0700 (PDT) In-Reply-To: <72cf1d79-f311-98e5-23a0-6ee23dc8fd6b@intel.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 5 Jun 2018 22:39:12 -0700 "Samudrala, Sridhar" wrote: > On 6/5/2018 8:51 PM, Stephen Hemminger wrote: > > On Tue, 5 Jun 2018 16:52:22 -0700 > > "Samudrala, Sridhar" wrote: > > > >> On 6/5/2018 2:52 PM, Stephen Hemminger wrote: > >>> On Tue, 5 Jun 2018 22:38:43 +0300 > >>> "Michael S. Tsirkin" wrote: > >>> > >>>>> See: > >>>>> https://patchwork.ozlabs.org/patch/851711/ > >>>> Let me try to summarize that: > >>>> > >>>> You wanted to speed up the delayed link up. You had an idea to > >>>> additionally take link up when userspace renames the interface (standby > >>>> one which is also the failover for netvsc). > >>>> > >>>> But userspace might not do any renames, in which case there will > >>>> still be the delay, and so this never got applied. > >>>> > >>>> Is this a good summary? > >>>> > >>>> Davem said delay should go away completely as it's not robust, and I > >>>> think I agree. So I don't think we should make all failover users use > >>>> delay. IIUC failover kept a delay option especially for netvsc to > >>>> minimize the surprise factor. Hopefully we can come up with > >>>> something more robust and drop that option completely. > >>> The timeout was the original solution to how to complete setup after > >>> userspace has had a chance to rename the device. Unfortunately, the whole network > >>> device initialization (cooperation with udev and userspace) is a a mess because > >>> there is no well defined specification, and there are multiple ways userspace > >>> does this in old and new distributions. The timeout has its own issues > >>> (how long, handling errors during that window, what if userspace modifies other > >>> device state); and open to finding a better solution. > >>> > >>> My point was that if name change can not be relied on (or used) by netvsc, > >>> then we can't allow it for net_failover either. > >> I think the push back was with the usage of the delay, not bringing up the primary/standby > >> device in the name change event handler. > >> Can't netvsc use this mechanism instead of depending on the delay? > >> > >> > > The patch that was rejected for netvsc was about using name change. > > Also, you can't depend on name change; you still need a timer. Not all distributions > > change name of devices. Or user has blocked that by udev rules. > > In the net_failover_slave_register() we do a dev_open() and ignore any failure due to > EBUSY and do another dev_open() in the name change event handler. > If the name is not expected to change, i would think the dev_open() at the time of > register will succeed. The problem is your first dev_open will bring device up and lockout udev from changing the network device name.