netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Siwei Liu <loseweigh@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	Jiri Pirko <jiri@resnulli.us>,
	Sridhar Samudrala <sridhar.samudrala@intel.com>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	virtio-dev@lists.oasis-open.org, "Brandeburg,
	Jesse" <jesse.brandeburg@intel.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	Jakub Kicinski <kubakici@wp.pl>, Jason Wang <jasowang@redhat.com>
Subject: Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
Date: Thu, 26 Apr 2018 05:28:47 +0300	[thread overview]
Message-ID: <20180426050934-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CADGSJ20vck5V8JCoF0Tq9PWfBu7QYPDvg0yAZ_8Xkig7TKU7Lw@mail.gmail.com>

On Wed, Apr 25, 2018 at 03:57:57PM -0700, Siwei Liu wrote:
> On Wed, Apr 25, 2018 at 3:22 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, Apr 25, 2018 at 02:38:57PM -0700, Siwei Liu wrote:
> >> On Mon, Apr 23, 2018 at 1:06 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Mon, Apr 23, 2018 at 12:44:39PM -0700, Siwei Liu wrote:
> >> >> On Mon, Apr 23, 2018 at 10:56 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> > On Mon, Apr 23, 2018 at 10:44:40AM -0700, Stephen Hemminger wrote:
> >> >> >> On Mon, 23 Apr 2018 20:24:56 +0300
> >> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> >>
> >> >> >> > On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:
> >> >> >> > > > >
> >> >> >> > > > >I will NAK patches to change to common code for netvsc especially the
> >> >> >> > > > >three device model.  MS worked hard with distro vendors to support transparent
> >> >> >> > > > >mode, ans we really can't have a new model; or do backport.
> >> >> >> > > > >
> >> >> >> > > > >Plus, DPDK is now dependent on existing model.
> >> >> >> > > >
> >> >> >> > > > Sorry, but nobody here cares about dpdk or other similar oddities.
> >> >> >> > >
> >> >> >> > > The network device model is a userspace API, and DPDK is a userspace application.
> >> >> >> >
> >> >> >> > It is userspace but are you sure dpdk is actually poking at netdevs?
> >> >> >> > AFAIK it's normally banging device registers directly.
> >> >> >> >
> >> >> >> > > You can't go breaking userspace even if you don't like the application.
> >> >> >> >
> >> >> >> > Could you please explain how is the proposed patchset breaking
> >> >> >> > userspace? Ignoring DPDK for now, I don't think it changes the userspace
> >> >> >> > API at all.
> >> >> >> >
> >> >> >>
> >> >> >> The DPDK has a device driver vdev_netvsc which scans the Linux network devices
> >> >> >> to look for Linux netvsc device and the paired VF device and setup the
> >> >> >> DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
> >> >> >> and sets up TAP support over the Linux netvsc device as well as the Mellanox
> >> >> >> VF device.
> >> >> >>
> >> >> >> So it depends on existing 2 device model. You can't go to a 3 device model
> >> >> >> or start hiding devices from userspace.
> >> >> >
> >> >> > Okay so how does the existing patch break that? IIUC does not go to
> >> >> > a 3 device model since netvsc calls failover_register directly.
> >> >> >
> >> >> >> Also, I am working on associating netvsc and VF device based on serial number
> >> >> >> rather than MAC address. The serial number is how Windows works now, and it makes
> >> >> >> sense for Linux and Windows to use the same mechanism if possible.
> >> >> >
> >> >> > Maybe we should support same for virtio ...
> >> >> > Which serial do you mean? From vpd?
> >> >> >
> >> >> > I guess you will want to keep supporting MAC for old hypervisors?
> >> >> >
> >> >> > It all seems like a reasonable thing to support in the generic core.
> >> >>
> >> >> That's the reason why I chose explicit identifier rather than rely on
> >> >> MAC address to bind/pair a device. MAC address can change. Even if it
> >> >> can't, malicious guest user can fake MAC address to skip binding.
> >> >>
> >> >> -Siwei
> >> >
> >> > Address should be sampled at device creation to prevent this
> >> > kind of hack. Not that it buys the malicious user much:
> >> > if you can poke at MAC addresses you probably already can
> >> > break networking.
> >>
> >> I don't understand why poking at MAC address may potentially break
> >> networking.
> >
> > Set a MAC address to match another device on the same LAN,
> > packets will stop reaching that MAC.
> 
> What I meant was guest users may create a virtual link, say veth that
> has exactly the same MAC address as that for the VF, which can easily
> get around of the binding procedure.

This patchset limits binding to PCI devices so it won't be affected
by any hacks around virtual devices.

> There's no explicit flag to
> identify a VF or pass-through device AFAIK. And sometimes this happens
> maybe due to user misconfiguring the link. This process should be
> hardened to avoid from any potential configuration errors.

They are still PCI devices though.

> >
> >> Unlike VF, passthrough PCI endpoint device has its freedom
> >> to change the MAC address. Even on a VF setup it's not neccessarily
> >> always safe to assume the VF's MAC address cannot or shouldn't be
> >> changed. That depends on the specific need whether the host admin
> >> wants to restrict guest from changing the MAC address, although in
> >> most cases it's true.
> >>
> >> I understand we can use the perm_addr to distinguish. But as said,
> >> this will pose limitation of flexible configuration where one can
> >> assign VFs with identical MAC address at all while each VF belongs to
> >> different PF and/or different subnet for e.g. load balancing.
> >> And
> >> furthermore, the QEMU device model never uses MAC address to be
> >> interpreted as an identifier, which requires to be unique per VM
> >> instance. Why we're introducing this inconsistency?
> >>
> >> -Siwei
> >
> > Because it addresses most of the issues and is simple.  That's already
> > much better than what we have now which is nothing unless guest
> > configures things manually.
> 
> Did you see my QEMU patch for using BDF as the grouping identifier?

Yes. And I don't think it can work because bus numbers are
guest specified.

> And there can be others like what you suggested, but the point is that
> it's requried to support explicit grouping mechanism from day one,
> before the backup property cast into stones.

Let's start with addressing simple configs with just two NICs.

Down the road I can see possible extensions that can work: for example,
require that devices are on the same pci bridge. Or we could even make
the virtio device actually include a pci bridge (as part of same
or a child function), the PT would have to be
behind it.

As long as we are not breaking anything, adding more flags to fix
non-working configurations is always fair game.

> This is orthogonal to
> device model being proposed, be it 1-netdev or not. Delaying it would
> just mean support and compatibility burden, appearing more like a
> design flaw rather than a feature to add later on.

Well it's mostly myself who gets to support it, and I see the device
model as much more fundamental as userspace will come to depend
on it. So I'm not too worried, let's take this one step at a time.

> >
> > I think ideally the infrastructure should suppport flexible matching of
> > NICs - netvsc is already reported to be moving to some kind of serial
> > address.
> >
> As Stephen said, Hyper-V supports the serial UUID thing from day-one.
> It's just the Linux netvsc guest driver itself does not leverage that
> ID from the very beginging.
> 
> Regards,
> -Siwei

We could add something like this, too. For example,
we could add a virtual VPD capability with a UUID.

Do you know how exactly does hyperv pass the UUID for NICs?

> >
> >> >
> >> >
> >> >
> >> >
> >> >>
> >> >> >
> >> >> > --
> >> >> > MST

  parent reply	other threads:[~2018-04-26  2:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-20  1:42 [PATCH net-next v7 0/4] Enable virtio_net to act as a standby for a passthru device Sridhar Samudrala
2018-04-20  1:42 ` [PATCH v7 net-next 1/4] virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit Sridhar Samudrala
2018-04-20  1:42 ` [PATCH v7 net-next 2/4] net: Introduce generic failover module Sridhar Samudrala
2018-04-20  2:44   ` Michael S. Tsirkin
2018-04-20 15:21     ` Samudrala, Sridhar
2018-04-20 15:34       ` Michael S. Tsirkin
2018-04-20 15:56         ` [virtio-dev] " Alexander Duyck
2018-04-20 16:03           ` Michael S. Tsirkin
2018-04-20  3:34   ` Michael S. Tsirkin
2018-04-22 17:06   ` Michael S. Tsirkin
2018-04-23 17:21     ` Samudrala, Sridhar
2018-04-22 18:29   ` kbuild test robot
2018-04-20  1:42 ` [PATCH v7 net-next 3/4] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-04-20  2:46   ` Michael S. Tsirkin
2018-04-22 15:41   ` kbuild test robot
2018-04-22 15:41   ` kbuild test robot
2018-04-20  1:42 ` [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework Sridhar Samudrala
2018-04-20 15:28   ` Stephen Hemminger
2018-04-20 15:43     ` Michael S. Tsirkin
2018-04-20 15:47       ` David Miller
2018-04-20 15:46     ` David Miller
2018-04-20 15:46     ` Samudrala, Sridhar
2018-04-20 16:00     ` Jiri Pirko
2018-04-23 17:04       ` Stephen Hemminger
2018-04-23 17:24         ` Michael S. Tsirkin
2018-04-23 17:44           ` Stephen Hemminger
2018-04-23 17:56             ` Michael S. Tsirkin
2018-04-23 19:44               ` Siwei Liu
2018-04-23 20:06                 ` Michael S. Tsirkin
2018-04-24  1:28                   ` Stephen Hemminger
2018-04-25 21:38                   ` Siwei Liu
2018-04-25 22:22                     ` Michael S. Tsirkin
2018-04-25 22:57                       ` Siwei Liu
2018-04-26  0:18                         ` Stephen Hemminger
2018-04-26  2:43                           ` Michael S. Tsirkin
2018-04-26  2:28                         ` Michael S. Tsirkin [this message]
2018-04-26 22:14                           ` Siwei Liu
2018-04-26 23:42                             ` Michael S. Tsirkin
2018-04-28  0:43                               ` Siwei Liu
2018-04-24  1:25                 ` Stephen Hemminger
2018-04-24  1:42                   ` Michael S. Tsirkin
2018-04-24  5:07                     ` Stephen Hemminger
2018-04-23 17:25         ` Jiri Pirko
2018-04-22 15:41   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180426050934-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jiri@resnulli.us \
    --cc=kubakici@wp.pl \
    --cc=loseweigh@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=sridhar.samudrala@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).