All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: si-wei liu <si-wei.liu@oracle.com>
Cc: Sridhar Samudrala <sridhar.samudrala@intel.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Jakub Kicinski <kubakici@wp.pl>, Jiri Pirko <jiri@resnulli.us>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>,
	virtualization@lists.linux-foundation.org, liran.alon@oracle.com,
	boris.ostrovsky@oracle.com, vijay.balakrishna@oracle.com
Subject: Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces
Date: Tue, 5 Mar 2019 15:28:20 -0500	[thread overview]
Message-ID: <20190305152740-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <e3b34115-39ca-ac5f-7d53-8bf9e69c910d@oracle.com>

On Tue, Mar 05, 2019 at 11:19:32AM -0800, si-wei liu wrote:
> 
> 
> On 3/4/2019 6:33 PM, Michael S. Tsirkin wrote:
> > On Mon, Mar 04, 2019 at 07:50:59PM -0500, Si-Wei Liu wrote:
> > > When a netdev appears through hot plug then gets enslaved by a failover
> > > master that is already up and running, the slave will be opened
> > > right away after getting enslaved. Today there's a race that userspace
> > > (udev) may fail to rename the slave if the kernel (net_failover)
> > > opens the slave earlier than when the userspace rename happens.
> > > Unlike bond or team, the primary slave of failover can't be renamed by
> > > userspace ahead of time, since the kernel initiated auto-enslavement is
> > > unable to, or rather, is never meant to be synchronized with the rename
> > > request from userspace.
> > > 
> > > As the failover slave interfaces are not designed to be operated
> > > directly by userspace apps: IP configuration, filter rules with
> > > regard to network traffic passing and etc., should all be done on master
> > > interface. In general, userspace apps only care about the
> > > name of master interface, while slave names are less important as long
> > > as admin users can see reliable names that may carry
> > > other information describing the netdev. For e.g., they can infer that
> > > "ens3nsby" is a standby slave of "ens3", while for a
> > > name like "eth0" they can't tell which master it belongs to.
> > > 
> > > Historically the name of IFF_UP interface can't be changed because
> > > there might be admin script or management software that is already
> > > relying on such behavior and assumes that the slave name can't be
> > > changed once UP. But failover is special: with the in-kernel
> > > auto-enslavement mechanism, the userspace expectation for device
> > > enumeration and bring-up order is already broken. Previously initramfs
> > > and various userspace config tools were modified to bypass failover
> > > slaves because of auto-enslavement and duplicate MAC address. Similarly,
> > > in case that users care about seeing reliable slave name, the new type
> > > of failover slaves needs to be taken care of specifically in userspace
> > > anyway.
> > > 
> > > For that to work, now introduce a module-level tunable,
> > > "slave_rename_ok" that allows users to lift up the rename restriction on
> > > failover slave which is already UP. Although it's possible this change
> > > potentially break userspace component (most likely configuration scripts
> > > or management software) that assumes slave name can't be changed while
> > > UP, it's relatively a limited and controllable set among all userspace
> > > components, which can be fixed specifically to work with the new naming
> > > behavior of the failover slave. Userspace component interacting with
> > > slaves should be changed to operate on failover master instead, as the
> > > failover slave is dynamic in nature which may come and go at any point.
> > > The goal is to make the role of failover slaves less relevant, and
> > > all userspace should only deal with master in the long run. The default
> > > for the "slave_rename_ok" is set to true(1). If userspace doesn't have
> > > the right support in place meanwhile users don't care about reliable
> > > userspace naming, the value can be set to false(0).
> > > 
> > > Signed-off-by: Si-Wei.Liu@oracle.com
> > > Reviewed-by: Liran Alon <liran.alon@oracle.com>
> > Not sure which of the versions I should reply to.
> Sorry for multiple copies sent. It's fine to reply to this one.
> 
> > 
> > I have a vague idea: would it work to *not* set
> > IFF_UP on slave devices at all?
> Hmm, I ever thought about this option, and it appears this solution is more
> invasive than required to convert existing scripts, despite the controversy
> of introducing internal netdev state to differentiate user visible state.
> Either we disallow slave to be brought up by user, or to not set IFF_UP flag
> but instead use the internal one, could end up with substantial behavioral
> change that breaks scripts. Consider any admin script that does `ip link set
> dev ... up' successfully just assumes the link is up and subsequent
> operation can be done as usual. While it *may* work for dracut (yet to be
> verified), I'm a bit concerned that there are more scripts to be converted
> than those that don't follow volatile failover slave names. It's technically
> doable, but may not worth the effort (in terms of porting existing
> scripts/apps).
> 
> Thanks
> -Siwei


Right. Advantage could be that we prevent all kind of
misconfigurations e.g. when one has a route on a slave.

> > 
> > Would this reduce the chances of existing scripts such as dracut being
> > confused?
> > 
> > And this leaves open the option for scripts to address
> > slaves by checking some custom attribute.
> > 
> > > ---
> > >   include/linux/netdevice.h |  3 +++
> > >   net/core/dev.c            |  3 ++-
> > >   net/core/failover.c       | 11 +++++++++--
> > >   3 files changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 857f8ab..6d9e4e0 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -1487,6 +1487,7 @@ struct net_device_ops {
> > >    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> > >    * @IFF_FAILOVER: device is a failover master device
> > >    * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> > > + * @IFF_SLAVE_RENAME_OK: rename is allowed while slave device is running
> > >    */
> > >   enum netdev_priv_flags {
> > >   	IFF_802_1Q_VLAN			= 1<<0,
> > > @@ -1518,6 +1519,7 @@ enum netdev_priv_flags {
> > >   	IFF_NO_RX_HANDLER		= 1<<26,
> > >   	IFF_FAILOVER			= 1<<27,
> > >   	IFF_FAILOVER_SLAVE		= 1<<28,
> > > +	IFF_SLAVE_RENAME_OK		= 1<<29,
> > >   };
> > >   #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
> > > @@ -1548,6 +1550,7 @@ enum netdev_priv_flags {
> > >   #define IFF_NO_RX_HANDLER		IFF_NO_RX_HANDLER
> > >   #define IFF_FAILOVER			IFF_FAILOVER
> > >   #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
> > > +#define IFF_SLAVE_RENAME_OK		IFF_SLAVE_RENAME_OK
> > >   /**
> > >    *	struct net_device - The DEVICE structure.
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 722d50d..ae070de 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -1180,7 +1180,8 @@ int dev_change_name(struct net_device *dev, const char *newname)
> > >   	BUG_ON(!dev_net(dev));
> > >   	net = dev_net(dev);
> > > -	if (dev->flags & IFF_UP)
> > > +	if (dev->flags & IFF_UP &&
> > > +	    !(dev->priv_flags & IFF_SLAVE_RENAME_OK))
> > >   		return -EBUSY;
> > >   	write_seqcount_begin(&devnet_rename_seq);
> > > diff --git a/net/core/failover.c b/net/core/failover.c
> > > index 4a92a98..1fd8bbb 100644
> > > --- a/net/core/failover.c
> > > +++ b/net/core/failover.c
> > > @@ -16,6 +16,11 @@
> > >   static LIST_HEAD(failover_list);
> > >   static DEFINE_SPINLOCK(failover_lock);
> > > +static bool slave_rename_ok = true;
> > > +
> > > +module_param(slave_rename_ok, bool, (S_IRUGO | S_IWUSR));
> > > +MODULE_PARM_DESC(slave_rename_ok,
> > > +		 "If set allow renaming the slave when failover master is up");
> > >   static struct net_device *failover_get_bymac(u8 *mac, struct failover_ops **ops)
> > >   {
> > > @@ -81,13 +86,15 @@ static int failover_slave_register(struct net_device *slave_dev)
> > >   	}
> > >   	slave_dev->priv_flags |= IFF_FAILOVER_SLAVE;
> > > +	if (slave_rename_ok)
> > > +		slave_dev->priv_flags |= IFF_SLAVE_RENAME_OK;
> > >   	if (fops && fops->slave_register &&
> > >   	    !fops->slave_register(slave_dev, failover_dev))
> > >   		return NOTIFY_OK;
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   err_upper_link:
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   done:
> > > @@ -121,7 +128,7 @@ int failover_slave_unregister(struct net_device *slave_dev)
> > >   	netdev_rx_handler_unregister(slave_dev);
> > >   	netdev_upper_dev_unlink(slave_dev, failover_dev);
> > > -	slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE;
> > > +	slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_SLAVE_RENAME_OK);
> > >   	if (fops && fops->slave_unregister &&
> > >   	    !fops->slave_unregister(slave_dev, failover_dev))
> > > -- 
> > > 1.8.3.1

  parent reply	other threads:[~2019-03-05 20:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-05  0:50 [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces Si-Wei Liu
2019-03-05  2:33 ` Michael S. Tsirkin
2019-03-05 19:19   ` si-wei liu
2019-03-05 19:24     ` Stephen Hemminger
2019-03-05 19:35       ` si-wei liu
2019-03-06  0:06         ` Michael S. Tsirkin
2019-03-06  0:06           ` Michael S. Tsirkin
2019-03-06  0:20           ` si-wei liu
2019-03-06  0:36             ` Michael S. Tsirkin
2019-03-06  0:51               ` si-wei liu
2019-03-06  6:43                 ` Michael S. Tsirkin
2019-03-06  6:43                   ` Michael S. Tsirkin
2019-03-06  7:15                   ` si-wei liu
2019-03-06  7:23                     ` Michael S. Tsirkin
2019-03-06  7:23                     ` Michael S. Tsirkin
2019-03-06  8:20                       ` si-wei liu
2019-03-06  0:36             ` Michael S. Tsirkin
2019-03-05 19:24     ` Stephen Hemminger
2019-03-05 20:28     ` Michael S. Tsirkin
2019-03-05 20:28     ` Michael S. Tsirkin [this message]
2019-03-05 22:49       ` si-wei liu
2019-03-05  2:33 ` Michael S. Tsirkin
2019-03-06 12:04 ` Jiri Pirko
     [not found]   ` <7d1e79f6-01ff-413d-dac0-ee34258aafec@oracle.com>
2019-03-06 21:36     ` Samudrala, Sridhar
     [not found]       ` <7bc9dc90-6597-4223-c192-55a314ff079f@oracle.com>
2019-03-06 23:36         ` Liran Alon
2019-03-06 23:36         ` Liran Alon
2019-03-06 12:04 ` Jiri Pirko
  -- strict thread matches above, loose matches on Subject: below --
2019-03-05  0:36 Si-Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190305152740-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=davem@davemloft.net \
    --cc=jiri@resnulli.us \
    --cc=kubakici@wp.pl \
    --cc=liran.alon@oracle.com \
    --cc=netdev@vger.kernel.org \
    --cc=si-wei.liu@oracle.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=vijay.balakrishna@oracle.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.