From: Stephen Hemminger <stephen@networkplumber.org>
To: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>,
"sashal@kernel.org" <sashal@kernel.org>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
KY Srinivasan <kys@microsoft.com>,
Stephen Hemminger <sthemmin@microsoft.com>,
"olaf@aepfle.de" <olaf@aepfle.de>, vkuznets <vkuznets@redhat.com>,
"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [PATCH V2,net-next, 3/3] hv_netvsc: Name NICs based on vmbus offer sequence and use async probe
Date: Tue, 31 Dec 2019 09:36:14 -0800 [thread overview]
Message-ID: <20191231093614.75da9bea@hermes.lan> (raw)
In-Reply-To: <MN2PR21MB1375D41039A8A68A2117DDFCCA260@MN2PR21MB1375.namprd21.prod.outlook.com>
On Tue, 31 Dec 2019 16:12:36 +0000
Haiyang Zhang <haiyangz@microsoft.com> wrote:
> > -----Original Message-----
> > From: Roman Kagan <rkagan@virtuozzo.com>
> > Sent: Tuesday, December 31, 2019 6:35 AM
> > To: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> > KY Srinivasan <kys@microsoft.com>; Stephen Hemminger
> > <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets
> > <vkuznets@redhat.com>; davem@davemloft.net; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH V2,net-next, 3/3] hv_netvsc: Name NICs based on vmbus
> > offer sequence and use async probe
> >
> > On Mon, Dec 30, 2019 at 12:13:34PM -0800, Haiyang Zhang wrote:
> > > The dev_num field in vmbus channel structure is assigned to the first
> > > available number when the channel is offered. So netvsc driver uses it
> > > for NIC naming based on channel offer sequence. Now re-enable the
> > > async probing mode for faster probing.
> > >
> > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > > ---
> > > drivers/net/hyperv/netvsc_drv.c | 18 +++++++++++++++---
> > > 1 file changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > > b/drivers/net/hyperv/netvsc_drv.c index f3f9eb8..39c412f 100644
> > > --- a/drivers/net/hyperv/netvsc_drv.c
> > > +++ b/drivers/net/hyperv/netvsc_drv.c
> > > @@ -2267,10 +2267,14 @@ static int netvsc_probe(struct hv_device *dev,
> > > struct net_device_context *net_device_ctx;
> > > struct netvsc_device_info *device_info = NULL;
> > > struct netvsc_device *nvdev;
> > > + char name[IFNAMSIZ];
> > > int ret = -ENOMEM;
> > >
> > > - net = alloc_etherdev_mq(sizeof(struct net_device_context),
> > > - VRSS_CHANNEL_MAX);
> > > + snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);
> >
> > How is this supposed to work when there are other ethernet device types on the
> > system, which may claim the same device names?
> >
> > > + net = alloc_netdev_mqs(sizeof(struct net_device_context), name,
> > > + NET_NAME_ENUM, ether_setup,
> > > + VRSS_CHANNEL_MAX, VRSS_CHANNEL_MAX);
> > > +
> > > if (!net)
> > > goto no_net;
> > >
> > > @@ -2355,6 +2359,14 @@ static int netvsc_probe(struct hv_device *dev,
> > > net->max_mtu = ETH_DATA_LEN;
> > >
> > > ret = register_netdevice(net);
> > > +
> > > + if (ret == -EEXIST) {
> > > + pr_info("NIC name %s exists, request another name.\n",
> > > + net->name);
> > > + strlcpy(net->name, "eth%d", IFNAMSIZ);
> > > + ret = register_netdevice(net);
> > > + }
> >
> > IOW you want the device naming to be predictable, but don't guarantee this?
> >
> > I think the problem this patchset is trying to solve is much better solved with a
> > udev rule, similar to how it's done for PCI net devices.
> > And IMO the primary channel number, being a device's "hardware"
> > property, is more suited to be used in the device name, than this completely
> > ephemeral device number.
>
> The vmbus number can be affected by other types of devices and/or subchannel
> offerings. They are not stable either. That's why this patch set keeps track of the
> offering sequence within the same device type in a new variable "dev_num".
>
> As in my earlier email, to avoid impact by other types of NICs, we should put them
> into different naming formats, like "vf*", "enP*", etc. And yes, these can be done in
> udev.
>
> But for netvsc (synthetic) NICs, we still want the default naming format "eth*". And
> the variable "dev_num" gives them the basis for stable naming with Async probing.
>
> Thanks,
> - Haiyang
>
The primary requirements for network naming are:
1. Network names must be repeatable on each boot. This was the original problem
that PCI devices discovered back years ago when parallel probing was enabled.
2. Network names must be predictable. If new VM is created, the names should
match a similar VM config.
3. Names must be persistent. If a NIC is added or deleted, the existing names
must not change.
The other things which are important (and this proposal breaks):
1. Don't break it principle: an existing VM should not suddenly get interfaces
renamed if kernel is upgraded. A corrallary is that a lot of current userspace
code expects eth0. It doesn't look like first interface would be guaranteed
to be eth0.
2. No snowflakes principle: a device driver should follow the current practice
of other devices. For netvsc, this means VMBus should act like PCI as much
as possible. Is there another driver doing this already?
3. Userspace policy principle: Every distribution has its own policy by now.
The solution must make netvsc work reliably on Redhat (udev), Ubuntu (netplan), SuSE (Yast)
doing something in the kernel violates #2.
My recommendation would be to take a multi-phase approach:
1. Expose persistent value in sysfs now.
2. Work with udev/netplan/... to use that value.
3. Make parallel VMBus probing an option. So that when distributions have picked up
the udev changes they can enable parallel probe. Some will be quick to adopt
and the enterprise laggards can get to it when they feel the heat.
Long term wish list (requires host side changes):
1. The interface index could be a host side property; the host networking
already has the virtual device table and it is persistent.
2. The Azure NIC name should be visible as a property in guest.
Then userspace could do rename based on that property.
Having multiple disconnected names is leads to confusion.
next prev parent reply other threads:[~2019-12-31 17:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-30 20:13 [PATCH V2,net-next, 0/3] Add vmbus dev_num and enable netvsc async probing Haiyang Zhang
2019-12-30 20:13 ` [PATCH V2,net-next, 1/3] Drivers: hv: vmbus: Add a dev_num variable based on channel offer sequence Haiyang Zhang
2019-12-31 1:34 ` Michael Kelley
2019-12-31 15:48 ` Haiyang Zhang
2019-12-30 20:13 ` [PATCH V2,net-next, 2/3] Drivers: hv: vmbus: Add dev_num to sysfs Haiyang Zhang
2019-12-31 1:34 ` Michael Kelley
2019-12-31 15:50 ` Haiyang Zhang
2019-12-30 20:13 ` [PATCH V2,net-next, 3/3] hv_netvsc: Name NICs based on vmbus offer sequence and use async probe Haiyang Zhang
2019-12-31 1:35 ` Michael Kelley
2019-12-31 16:01 ` Haiyang Zhang
2019-12-31 11:34 ` Roman Kagan
2019-12-31 16:12 ` Haiyang Zhang
2019-12-31 17:36 ` Stephen Hemminger [this message]
2019-12-31 17:51 ` Haiyang Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191231093614.75da9bea@hermes.lan \
--to=stephen@networkplumber.org \
--cc=davem@davemloft.net \
--cc=haiyangz@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=olaf@aepfle.de \
--cc=rkagan@virtuozzo.com \
--cc=sashal@kernel.org \
--cc=sthemmin@microsoft.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).