From: si-wei liu <si-wei.liu@oracle.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Samudrala, Sridhar" <sridhar.samudrala@intel.com>,
Siwei Liu <loseweigh@gmail.com>, Jiri Pirko <jiri@resnulli.us>,
Stephen Hemminger <stephen@networkplumber.org>,
David Miller <davem@davemloft.net>,
Netdev <netdev@vger.kernel.org>,
virtualization@lists.linux-foundation.org,
virtio-dev <virtio-dev@lists.oasis-open.org>,
"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
Alexander Duyck <alexander.h.duyck@intel.com>,
Jakub Kicinski <kubakici@wp.pl>, Jason Wang <jasowang@redhat.com>,
liran.alon@oracle.com
Subject: Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework)
Date: Tue, 26 Feb 2019 16:17:21 -0800 [thread overview]
Message-ID: <d1060c75-eaba-ab6f-ff31-38cb3a47c711@oracle.com> (raw)
In-Reply-To: <20190225210529-mutt-send-email-mst@kernel.org>
On 2/25/2019 6:08 PM, Michael S. Tsirkin wrote:
> On Mon, Feb 25, 2019 at 04:58:07PM -0800, si-wei liu wrote:
>>
>> On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
>>> On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
>>>> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
>>>>> On 2/21/2019 7:33 PM, si-wei liu wrote:
>>>>>> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:
>>>>>>>> Sorry for replying to this ancient thread. There was some remaining
>>>>>>>> issue that I don't think the initial net_failover patch got addressed
>>>>>>>> cleanly, see:
>>>>>>>>
>>>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
>>>>>>>>
>>>>>>>> The renaming of 'eth0' to 'ens4' fails because the udev userspace was
>>>>>>>> not specifically writtten for such kernel automatic enslavement.
>>>>>>>> Specifically, if it is a bond or team, the slave would typically get
>>>>>>>> renamed *before* virtual device gets created, that's what udev can
>>>>>>>> control (without getting netdev opened early by the other part of
>>>>>>>> kernel) and other userspace components for e.g. initramfs,
>>>>>>>> init-scripts can coordinate well in between. The in-kernel
>>>>>>>> auto-enslavement of net_failover breaks this userspace convention,
>>>>>>>> which don't provides a solution if user care about consistent naming
>>>>>>>> on the slave netdevs specifically.
>>>>>>>>
>>>>>>>> Previously this issue had been specifically called out when IFF_HIDDEN
>>>>>>>> and the 1-netdev was proposed, but no one gives out a solution to this
>>>>>>>> problem ever since. Please share your mind how to proceed and solve
>>>>>>>> this userspace issue if netdev does not welcome a 1-netdev model.
>>>>>>> Above says:
>>>>>>>
>>>>>>> there's no motivation in the systemd/udevd community at
>>>>>>> this point to refactor the rename logic and make it work well with
>>>>>>> 3-netdev.
>>>>>>>
>>>>>>> What would the fix be? Skip slave devices?
>>>>>>>
>>>>>> There's nothing user can get if just skipping slave devices - the
>>>>>> name is still unchanged and unpredictable e.g. eth0, or eth1 the
>>>>>> next reboot, while the rest may conform to the naming scheme (ens3
>>>>>> and such). There's no way one can fix this in userspace alone - when
>>>>>> the failover is created the enslaved netdev was opened by the kernel
>>>>>> earlier than the userspace is made aware of, and there's no
>>>>>> negotiation protocol for kernel to know when userspace has done
>>>>>> initial renaming of the interface. I would expect netdev list should
>>>>>> at least provide the direction in general for how this can be
>>>>>> solved...
>>> I was just wondering what did you mean when you said
>>> "refactor the rename logic and make it work well with 3-netdev" -
>>> was there a proposal udev rejected?
>> No. I never believed this particular issue can be fixed in userspace alone.
>> Previously someone had said it could be, but I never see any work or
>> relevant discussion ever happened in various userspace communities (for e.g.
>> dracut, initramfs-tools, systemd, udev, and NetworkManager). IMHO the root
>> of the issue derives from the kernel, it makes more sense to start from
>> netdev, work out and decide on a solution: see what can be done in the
>> kernel in order to fix it, then after that engage userspace community for
>> the feasibility...
>>
>>> Anyway, can we write a time diagram for what happens in which order that
>>> leads to failure? That would help look for triggers that we can tie
>>> into, or add new ones.
>>>
>> See attached diagram.
>>
>>>
>>>
>>>
>>>>> Is there an issue if slave device names are not predictable? The user/admin scripts are expected
>>>>> to only work with the master failover device.
>>>> Where does this expectation come from?
>>>>
>>>> Admin users may have ethtool or tc configurations that need to deal with
>>>> predictable interface name. Third-party app which was built upon specifying
>>>> certain interface name can't be modified to chase dynamic names.
>>>>
>>>> Specifically, we have pre-canned image that uses ethtool to fine tune VF
>>>> offload settings post boot for specific workload. Those images won't work
>>>> well if the name is constantly changing just after couple rounds of live
>>>> migration.
>>> It should be possible to specify the ethtool configuration on the
>>> master and have it automatically propagated to the slave.
>>>
>>> BTW this is something we should look at IMHO.
>> I was elaborating a few examples that the expectation and assumption that
>> user/admin scripts only deal with master failover device is incorrect. It
>> had never been taken good care of, although I did try to emphasize it from
>> the very beginning.
>>
>> Basically what you said about propagating the ethtool configuration down to
>> the slave is the key pursuance of 1-netdev model. However, what I am seeking
>> now is any alternative that can also fix the specific udev rename problem,
>> before concluding that 1-netdev is the only solution. Generally a 1-netdev
>> scheme would take time to implement, while I'm trying to find a way out to
>> fix this particular naming problem under 3-netdev.
>>
>>>>> Moreover, you were suggesting hiding the lower slave devices anyway. There was some discussion
>>>>> about moving them to a hidden network namespace so that they are not visible from the default namespace.
>>>>> I looked into this sometime back, but did not find the right kernel api to create a network namespace within
>>>>> kernel. If so, we could use this mechanism to simulate a 1-netdev model.
>>>> Yes, that's one possible implementation (IMHO the key is to make 1-netdev
>>>> model as much transparent to a real NIC as possible, while a hidden netns is
>>>> just the vehicle). However, I recall there was resistance around this
>>>> discussion that even the concept of hiding itself is a taboo for Linux
>>>> netdev. I would like to summon potential alternatives before concluding
>>>> 1-netdev is the only solution too soon.
>>>>
>>>> Thanks,
>>>> -Siwei
>>> Your scripts would not work at all then, right?
>> At this point we don't claim images with such usage as SR-IOV live
>> migrate-able. We would flag it as live migrate-able until this ethtool
>> config issue is fully addressed and a transparent live migration solution
>> emerges in upstream eventually.
>>
>>
>> Thanks,
>> -Siwei
>>>
>>>>>> -Siwei
>>>>>>
>>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>
>> net_failover(kernel) | network.service (user) | systemd-udevd (user)
>> --------------------------------------------------+------------------------------+--------------------------------------------
>> (standby virtio-net and net_failover | |
>> devices created and initialized, | |
>> i.e. virtnet_probe()-> | |
>> net_failover_create() | |
>> was done.) | |
>> | |
>> | runs `ifup ens3' -> |
>> | ip link set dev ens3 up |
>> net_failover_open() | |
>> dev_open(virtnet_dev) | |
>> virtnet_open(virtnet_dev) | |
>> netif_carrier_on(failover_dev) | |
>> ... | |
>> | |
>> (VF hot plugged in) | |
>> ixgbevf_probe() | |
>> register_netdev(ixgbevf_netdev) | |
>> netdev_register_kobject(ixgbevf_netdev) | |
>> kobject_add(ixgbevf_dev) | |
>> device_add(ixgbevf_dev) | |
>> kobject_uevent(&ixgbevf_dev->kobj, KOBJ_ADD) | |
>> netlink_broadcast() | |
>> ... | |
>> call_netdevice_notifiers(NETDEV_REGISTER) | |
>> failover_event(..., NETDEV_REGISTER, ...) | |
>> failover_slave_register(ixgbevf_netdev) | |
>> net_failover_slave_register(ixgbevf_netdev) | |
>> dev_open(ixgbevf_netdev) | |
>> | |
>> | |
>> | | received ADD uevent from netlink fd
>> | | ...
>> | | udev-builtin-net_id.c:dev_pci_slot()
>> | | (decided to renamed 'eth0' )
>> | | ip link set dev eth0 name ens4
>> (dev_change_name() returns -EBUSY as | |
>> ixgbevf_netdev->flags has IFF_UP) | |
>> | |
>>
> Given renaming slaves does not work anyway:
I was actually thinking what if we relieve the rename restriction just
for the failover slave? What the impact would be? I think users don't
care about slave being renamed when it's in use, especially the initial
rename. Thoughts?
> would it work if we just
> hard-coded slave names instead?
>
> E.g.
> 1. fail slave renames
> 2. rename of failover to XX automatically renames standby to XXnsby
> and primary to XXnpry
That wouldn't help. The time when the failover master gets renamed, the
VF may not be present. I don't like the idea to delay exposing failover
master until VF is hot plugged in (probably subject to various failures)
later.
Thanks,
-Siwei
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
next prev parent reply other threads:[~2019-02-27 0:17 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-10 18:59 [virtio-dev] [RFC PATCH net-next v6 0/4] Enable virtio_net to act as a backup for a passthru device Sridhar Samudrala
2018-04-10 18:59 ` [virtio-dev] [RFC PATCH net-next v6 1/4] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit Sridhar Samudrala
2018-04-10 18:59 ` [virtio-dev] [RFC PATCH net-next v6 2/4] net: Introduce generic bypass module Sridhar Samudrala
[not found] ` <20180411155127.GQ2028@nanopsycho>
2018-04-11 19:13 ` [virtio-dev] " Samudrala, Sridhar
[not found] ` <20180418092515.GB1989@nanopsycho>
2018-04-18 18:43 ` Samudrala, Sridhar
[not found] ` <20180418191315.GA1922@nanopsycho>
2018-04-18 19:46 ` Michael S. Tsirkin
[not found] ` <20180418203206.GC1922@nanopsycho>
2018-04-18 22:46 ` Samudrala, Sridhar
2018-04-19 4:08 ` Michael S. Tsirkin
2018-04-10 18:59 ` [virtio-dev] [RFC PATCH net-next v6 3/4] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-04-10 18:59 ` [virtio-dev] [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework Sridhar Samudrala
[not found] ` <20180410142608.50f15b45@xeon-e3>
2018-04-10 22:56 ` [virtio-dev] " Samudrala, Sridhar
2018-04-10 23:28 ` Michael S. Tsirkin
2018-04-10 23:44 ` Siwei Liu
[not found] ` <20180411075334.GK2028@nanopsycho>
2019-02-22 1:14 ` [virtio-dev] net_failover slave udev renaming (was Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework) Siwei Liu
2019-02-22 1:39 ` [virtio-dev] " Michael S. Tsirkin
2019-02-22 3:33 ` si-wei liu
2019-02-22 7:00 ` Samudrala, Sridhar
2019-02-22 7:55 ` si-wei liu
2019-02-22 12:58 ` Rob Miller
2019-02-22 15:14 ` Michael S. Tsirkin
2019-02-26 0:58 ` si-wei liu
[not found] ` <20190225173912.26b93422@shemminger-XPS-13-9360>
2019-02-26 2:05 ` Michael S. Tsirkin
2019-02-27 0:49 ` si-wei liu
2019-02-26 2:08 ` Michael S. Tsirkin
2019-02-27 0:17 ` si-wei liu [this message]
[not found] ` <20190227135732.04cbced3@shemminger-XPS-13-9360>
2019-02-27 22:30 ` si-wei liu
2019-02-27 22:38 ` Michael S. Tsirkin
2019-02-27 23:34 ` si-wei liu
2019-02-27 23:50 ` Michael S. Tsirkin
2019-02-28 0:38 ` si-wei liu
2019-02-28 0:41 ` Michael S. Tsirkin
[not found] ` <20190227165205.307ed83c@cakuba.netronome.com>
2019-02-28 1:26 ` Michael S. Tsirkin
[not found] ` <20190227175218.736e13b6@cakuba.netronome.com>
2019-02-28 4:47 ` Michael S. Tsirkin
[not found] ` <20190228101356.39ac70aa@cakuba.netronome.com>
2019-02-28 19:36 ` Michael S. Tsirkin
2019-02-28 9:32 ` si-wei liu
2019-02-28 14:26 ` Michael S. Tsirkin
2019-03-01 1:30 ` si-wei liu
2019-03-01 13:27 ` Michael S. Tsirkin
2019-03-01 20:55 ` si-wei liu
[not found] ` <20190227160342.788dc2b4@shemminger-XPS-13-9360>
2019-02-28 0:38 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d1060c75-eaba-ab6f-ff31-38cb3a47c711@oracle.com \
--to=si-wei.liu@oracle.com \
--cc=alexander.h.duyck@intel.com \
--cc=davem@davemloft.net \
--cc=jasowang@redhat.com \
--cc=jesse.brandeburg@intel.com \
--cc=jiri@resnulli.us \
--cc=kubakici@wp.pl \
--cc=liran.alon@oracle.com \
--cc=loseweigh@gmail.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=sridhar.samudrala@intel.com \
--cc=stephen@networkplumber.org \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox