virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Liran Alon <liran.alon@oracle.com>
Cc: vuhuong@mellanox.com, Jiri Pirko <jiri@resnulli.us>,
	Jakub Kicinski <kubakici@wp.pl>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	virtualization@lists.linux-foundation.org,
	Netdev <netdev@vger.kernel.org>,
	Si-Wei Liu <si-wei.liu@oracle.com>,
	boris.ostrovsky@oracle.com, David Miller <davem@davemloft.net>,
	ogerlitz@mellanox.com
Subject: Re: [summary] virtio network device failover writeup
Date: Wed, 20 Mar 2019 18:10:37 -0400	[thread overview]
Message-ID: <20190320180641-mutt-send-email-mst__13317.1582290501$1553119862$gmane$org@kernel.org> (raw)
In-Reply-To: <36772E22-7A8F-4C42-A731-398E3204B418@oracle.com>

On Wed, Mar 20, 2019 at 11:43:41PM +0200, Liran Alon wrote:
> 
> 
> > On 20 Mar 2019, at 16:09, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Wed, Mar 20, 2019 at 02:23:36PM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 20 Mar 2019, at 12:25, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> 
> >>> On Wed, Mar 20, 2019 at 01:25:58AM +0200, Liran Alon wrote:
> >>>> 
> >>>> 
> >>>>> On 19 Mar 2019, at 23:19, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>> 
> >>>>> On Tue, Mar 19, 2019 at 08:46:47AM -0700, Stephen Hemminger wrote:
> >>>>>> On Tue, 19 Mar 2019 14:38:06 +0200
> >>>>>> Liran Alon <liran.alon@oracle.com> wrote:
> >>>>>> 
> >>>>>>> b.3) cloud-init: If configured to perform network-configuration, it attempts to configure all available netdevs. It should avoid however doing so on net-failover slaves.
> >>>>>>> (Microsoft has handled this by adding a mechanism in cloud-init to blacklist a netdev from being configured in case it is owned by a specific PCI driver. Specifically, they blacklist Mellanox VF driver. However, this technique doesn’t work for the net-failover mechanism because both the net-failover netdev and the virtio-net netdev are owned by the virtio-net PCI driver).
> >>>>>> 
> >>>>>> Cloud-init should really just ignore all devices that have a master device.
> >>>>>> That would have been more general, and safer for other use cases.
> >>>>> 
> >>>>> Given lots of userspace doesn't do this, I wonder whether it would be
> >>>>> safer to just somehow pretend to userspace that the slave links are
> >>>>> down? And add a special attribute for the actual link state.
> >>>> 
> >>>> I think this may be problematic as it would also break legit use case
> >>>> of userspace attempt to set various config on VF slave.
> >>>> In general, lying to userspace usually leads to problems.
> >>> 
> >>> I hear you on this. So how about instead of lying,
> >>> we basically just fail some accesses to slaves
> >>> unless a flag is set e.g. in ethtool.
> >>> 
> >>> Some userspace will need to change to set it but in a minor way.
> >>> Arguably/hopefully failure to set config would generally be a safer
> >>> failure.
> >> 
> >> Once userspace will set this new flag by ethtool, all operations done by other userspace components will still work.
> > 
> > Sorry about being unclear, the idea would be to require the flag on each ethtool operation.
> 
> Oh. I have indeed misunderstood your previous email then. :)
> Thanks for clarifying.
> 
> > 
> >> E.g. Running dhclient without parameters, after this flag was set, will still attempt to perform DHCP on it and will now succeed.
> > 
> > I think sending/receiving should probably just fail unconditionally.
> 
> You mean that you wish that somehow kernel will prevent Tx on net-failover slave netdev
> unless skb is marked with some flag to indicate it has been sent via the net-failover master?

We can maybe avoid binding a protocol socket to the device?

> This indeed resolves the group of userspace issues around performing DHCP on net-failover slaves directly (By dracut/initramfs, dhclient and etc.).
> 
> However, I see a couple of down-sides to it:
> 1) It doesn’t resolve all userspace issues listed in this email thread. For example, cloud-init will still attempt to perform network config on net-failover slaves.
> It also doesn’t help with regard to Ubuntu’s netplan issue that creates udev rules that match only by MAC.


How about we fail to retrieve mac from the slave?

> 2) It brings non-intuitive customer experience. For example, a customer may attempt to analyse connectivity issue by checking the connectivity
> on a net-failover slave (e.g. the VF) but will see no connectivity when in-fact checking the connectivity on the net-failover master netdev shows correct connectivity.
> 
> The set of changes I vision to fix our issues are:
> 1) Hide net-failover slaves in a different netns created and managed by the kernel. But that user can enter to it and manage the netdevs there if wishes to do so explicitly.
> (E.g. Configure the net-failover VF slave in some special way).
> 2) Match the virtio-net and the VF based on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get PCI slot where the matching VF will be hot-plugged by hypervisor.
> 3) Have an explicit virtio-net control message to command hypervisor to switch data-path from virtio-net to VF and vice-versa. Instead of relying on intercepting the PCI master enable-bit
> as an indicator on when VF is about to be set up. (Similar to as done in NetVSC).
> 
> Is there any clear issue we see regarding the above suggestion?
> 
> -Liran

The issue would be this: how do we avoid conflicting with namespaces
created by users?

> > 
> >> Therefore, this proposal just effectively delays when the net-failover slave can be operated on by userspace.
> >> But what we actually want is to never allow a net-failover slave to be operated by userspace unless it is explicitly stated
> >> by userspace that it wishes to perform a set of actions on the net-failover slave.
> >> 
> >> Something that was achieved if, for example, the net-failover slaves were in a different netns than default netns.
> >> This also aligns with expected customer experience that most customers just want to see a 1:1 mapping between a vNIC and a visible netdev.
> >> But of course maybe there are other ideas that can achieve similar behaviour.
> >> 
> >> -Liran
> >> 
> >>> 
> >>> Which things to fail? Probably sending/receiving packets?  Getting MAC?
> >>> More?
> >>> 
> >>>> If we reach
> >>>> to a scenario where we try to avoid userspace issues generically and
> >>>> not on a userspace component basis, I believe the right path should be
> >>>> to hide the net-failover slaves such that explicit action is required
> >>>> to actually manipulate them (As described in blog-post). E.g.
> >>>> Automatically move net-failover slaves by kernel to a different netns.
> >>>> 
> >>>> -Liran
> >>>> 
> >>>>> 
> >>>>> -- 
> >>>>> MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2019-03-20 22:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190317095052-mutt-send-email-mst@kernel.org>
2019-03-19 12:38 ` [summary] virtio network device failover writeup Liran Alon
     [not found] ` <54E7C3AF-C3C5-4AF2-86C9-AA50389F855F@oracle.com>
2019-03-19 15:46   ` Stephen Hemminger
2019-03-19 21:06   ` Michael S. Tsirkin
     [not found]   ` <20190319084647.727f8dcf@shemminger-XPS-13-9360>
2019-03-19 21:19     ` Michael S. Tsirkin
     [not found]     ` <20190319171638-mutt-send-email-mst@kernel.org>
2019-03-19 23:25       ` Liran Alon
     [not found]       ` <79F5D7C0-BBAA-4F78-9039-27A444970002@oracle.com>
2019-03-20 10:25         ` Michael S. Tsirkin
     [not found]         ` <20190320061632-mutt-send-email-mst@kernel.org>
2019-03-20 12:23           ` Liran Alon
     [not found]           ` <E0606AE7-7DE2-47C7-B64A-8161FD814277@oracle.com>
2019-03-20 14:09             ` Michael S. Tsirkin
     [not found]             ` <20190320100747-mutt-send-email-mst@kernel.org>
2019-03-20 21:43               ` Liran Alon
     [not found]               ` <36772E22-7A8F-4C42-A731-398E3204B418@oracle.com>
2019-03-20 22:10                 ` Michael S. Tsirkin [this message]
     [not found]                 ` <20190320180641-mutt-send-email-mst@kernel.org>
2019-03-20 22:19                   ` Liran Alon
     [not found]                   ` <B3FB267A-2DC0-4A4C-8193-7F420BC9791B@oracle.com>
2019-03-21  8:58                     ` Michael S. Tsirkin
     [not found]                     ` <20190321044920-mutt-send-email-mst@kernel.org>
2019-03-21 10:07                       ` Liran Alon
     [not found]                       ` <CD40EA09-242D-41E1-8BD2-4FF4BB4D1986@oracle.com>
2019-03-21 12:37                         ` Michael S. Tsirkin
     [not found]                         ` <20190321082532-mutt-send-email-mst@kernel.org>
2019-03-21 12:47                           ` Liran Alon
     [not found]                           ` <BB2B938F-D2F0-47F8-B651-70EC8A19BC6B@oracle.com>
2019-03-21 12:57                             ` Michael S. Tsirkin
     [not found]                             ` <20190321085159-mutt-send-email-mst@kernel.org>
2019-03-21 13:04                               ` Liran Alon
     [not found]                               ` <2939FB15-720A-4C9E-92B7-2DBA139DDE0F@oracle.com>
2019-03-21 13:12                                 ` Michael S. Tsirkin
     [not found]                                 ` <20190321090619-mutt-send-email-mst@kernel.org>
2019-03-21 13:24                                   ` Liran Alon
     [not found]                                   ` <1B52153B-B968-4E5B-8959-E7E83CE7FEAF@oracle.com>
2019-03-21 13:51                                     ` Michael S. Tsirkin
     [not found]                                     ` <20190321094217-mutt-send-email-mst@kernel.org>
2019-03-21 14:16                                       ` Liran Alon
     [not found]                                       ` <FF25F82E-F6FF-424E-88BA-A3D74EE505DE@oracle.com>
2019-03-21 15:15                                         ` Michael S. Tsirkin
2019-03-21 15:45                                 ` Stephen Hemminger
     [not found]                                 ` <20190321084517.773a65fa@shemminger-XPS-13-9360>
2019-03-21 15:50                                   ` Michael S. Tsirkin
     [not found]                                   ` <20190321114557-mutt-send-email-mst@kernel.org>
2019-03-21 16:31                                     ` Liran Alon
     [not found]                                     ` <BC327EF2-22C7-402B-9078-93891A4774D6@oracle.com>
2019-03-21 17:12                                       ` Michael S. Tsirkin
2019-03-21 17:15                                         ` Liran Alon
2019-03-21 15:44                               ` Stephen Hemminger
     [not found]   ` <20190319170445-mutt-send-email-mst@kernel.org>
2019-03-19 23:05     ` Liran Alon
2019-03-17 13:55 Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20190320180641-mutt-send-email-mst__13317.1582290501$1553119862$gmane$org@kernel.org' \
    --to=mst@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=davem@davemloft.net \
    --cc=jiri@resnulli.us \
    --cc=kubakici@wp.pl \
    --cc=liran.alon@oracle.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=si-wei.liu@oracle.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vuhuong@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).