From: "Michael S. Tsirkin" <mst@redhat.com>
To: Siwei Liu <loseweigh@gmail.com>
Cc: Venu Busireddy <venu.busireddy@oracle.com>,
Cornelia Huck <cohuck@redhat.com>,
"Samudrala, Sridhar" <sridhar.samudrala@intel.com>,
virtio-dev <virtio-dev@lists.oasis-open.org>
Subject: Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
Date: Thu, 27 Sep 2018 12:32:31 -0400 [thread overview]
Message-ID: <20180927121739-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CADGSJ23GRrt25k1G9wD3JhA6en4MuXLE9WA24Ep8PLgkyk_4=A@mail.gmail.com>
On Wed, Sep 26, 2018 at 05:18:38PM -0700, Siwei Liu wrote:
> On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote:
> >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
> >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
> >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> >> > >
> >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> >> >> >> > > > > > > device to act as a standby for another device with the same MAC address.
> >> >> >> > > > > > >
> >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> >> >> >> > > > > > Applied but when do you plan to add documentation as pointed
> >> >> >> > > > > > out by Jan and Halil?
> >> >> >> > > > >
> >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
> >> >> >> > > > > patches and i hope someone in RH is looking into it.
> >> >> >> > > > >
> >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> >> >> >> > > > > the spec
> >> >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > I do not think this will address the comments posted. Specifically we
> >> >> >> > > > should probably include documentation for what is a standby and primary:
> >> >> >> > > > what is expected of driver (maintain configuration on standby, support
> >> >> >> > > > primary coming and going, transmit on standby only if there is no
> >> >> >> > > > primary) and of device (have same mac for standby as for standby).
> >> >> >> > >
> >> >> >> > > Yes, we need some definitive statements of what a driver and a device
> >> >> >> > > is supposed to do in order to conform; it might make sense to discuss
> >> >> >> > > this in conjunction with discussion on any QEMU patches (have not
> >> >> >> > > checked whether anything has been posted, just returned from vacation).
> >> >> >> > >
> >> >> >> > > I assume that we still stick with the plan to implement/document
> >> >> >> > > MAC-based handling first and then enhance with other methods later?
> >> >> >> >
> >> >> >> > I'm fine with that at least. If someone wants to work on
> >> >> >> > other methods straight away, that's also fine by me.
> >> >> >>
> >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you
> >> >> >> thinking of some other method?
> >> >> >>
> >> >> >> Venu
> >> >> >>
> >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> >> >> >>
> >> >> >
> >> >> > Yes, the grouping mechanism seems fine to me (I don't remember
> >> >> > about the implementation, it's been a while).
> >> >> >
> >> >> > It is not by itself sufficient though, is it?
> >> >>
> >> >> I do understand that the group ID patch is incomplete though it's a
> >> >> base patch for the real work.
> >> >>
> >> >> >
> >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor
> >> >> > rediscovery, right?
> >> >>
> >> >> True, but does this really need to be part of the guest-host
> >> >> interface? Or rather, I don't see how MAC based matching can be done
> >> >> on the host part.
> >> >
> >> > mac address matching does not need to affect host side.
> >>
> >> Did you realize that the host side can't have duplicate MAC address
> >> filters for both PV and VF at the same time?
> >>
> >> If hot adding a VF with duplicate MAC address filter programmed in
> >> prior, the PV path for virtio in the host side is effectively
> >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
> >> does not mean it's ready and usable in the guest. You end up with
> >> unusable guest networking, *temporarily only when VF is successfully
> >> probed and properly enslabed*. As of now, no guest-host handshake was
> >> defined in the spec to make virtio driver aware of hotplug event thus
> >> VF's exposure, and zero handshake was done to switch the datapath when
> >> VF driver is ready and usable in guest. The current implementation
> >> relies on the lucky side that all the entire hot plug process will be
> >> successul in the guest.
> >
> > I think it's a PF bug then. PF driver should ignore filters
> > for VFs which have not been enabled by guest since reset.
>
> Even so, the fact is that if the design is tied to MAC based matching
> you end up with relying on that MAC address to pair device, which
> loses the flexibility to move MAC filter at some point later after
> assigning VF to guest.
Whatever you use for pairing, you still need to reuse same MAC
to avoid redoing neighbour disovery/arp, right?
> >
> >> BTW netvsc mitigate potential failure in the hotplug and driver
> >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH
> >> hypercall (VMbus message) when VF driver is enslaved and ready, only
> >> then hypervisor will kick off datapath switching by moving the MAC
> >> address filter.
> >
> > We can do it without need for PV. We can detect e.g. bus master enable.
>
> I'm not sure if it's valid to assume master enable/disable is the
> right point to move the filter, although it improves a bit than do
> nothing. The thing is that from device (QEMU) perspective it knows
> nothing and should not assume too much about guest implementation -
> the time to move the filter around means the VF driver is fully ready
> in guest and properly handled by the bond driver (net_failover), so
> the primary can take over the datapath going forward. While the bus
> master enable usually happens earlier than that, which does not
> indicate anything about readiness on the control side that the
> bond/failver driver can actually see this VF and "manage" it. This
> strictly does not form any guest-host handshake to me. Think about
> what if VM user changes VF to a different netns, or rebind it to DPDK
> PMD?
OK. What then?
> These just demostrate a few things that can get well covered by
> this design, and I suspect the errors in the real life would be much
> more complex.
What's missing is actual design though. The only thing that I saw so far
is bridge group identifier for qemu, which is a start but doesn't
actually solve any problems by itself. You even said "forget about the
grouping mechanism" yourself below.
> > Move the filter when enabled, move it back when disabled e.g. by
> > VF reset. Or maybe MSE, or both.
>
> MSE is on the PF and shared by all VFs, why it's relevant?
Oh, right. Just FLR then.
> >
> >> >
> >> >> Are you going to expose MAC address to VFIO?
> >> >
> >> > If mac of a VF is programmed by libvirt through the PF
> >> > (that's already the case), VFIO does not need to care about it.
> >> >
> >> >>
> >> >> The thing is the current MAC based implementation has intrinsic flaw
> >> >> that doesn't propagate errors to hypervisor, or there's no back
> >> >> channel for guest to unwind the hot plug action upon failure in
> >> >> probing or enslaving the primary.
> >> >
> >> > I guess you can eject the primary if you like. But
> >> > why does hypervisor need to know? On error, just don't use primary,
> >> > use standby.
> >>
> >> Forget about the grouping mechanism first.
> >
> > OK :)
> >
> >> What guest kernel change do
> >> you propose to make virtio driver know every possible error, think
> >> about how many moving targets it needs to specifically track with or
> >> has to depend on during the hot plug and driver probing process? If
> >> someone starts to implement the code and think about various error
> >> cases as a whole, I bet it would be more clear why grouping is
> >> relevant in the first place.
> >>
> >> -Siwei
> >
> > It just seems that no one's been motivated to do it so far.
>
> It's just that the MAC matching design is simply too broken.
Too broken to even bother coding up any alternatives?
> We have
> root disk hosted on networked storage, i.e. iSCSI, that can't tolerate
> any potential network failure if the design itself is not error proof.
> IOW our criteria for network downtime and errors is super rigorous..
>
> -Siwei
IMO that's a very interesting usecase to address! I'll be happy to merge
patches that help reduce downtime, spec-wise I'll be happy to propose
them for TC vote.
> >
> >> >
> >> >> If you think about a more robust
> >> >> implementation, another grouping mechanism rather than MAC is pretty
> >> >> much required.
> >> >>
> >> >> Thanks,
> >> >> -Siwei
> >> >
> >> > I don't really know what is the flaw, or how is it fixed by a grouping
> >> > mechanism. All this motivation was never described as part of work on
> >> > an alternate grouping.
> >> >
> >> >> > If true that implies that to avoid guest confusion visibility of the
> >> >> > primary needs to be controlled by standby's driver.
> >> >> > This makes this patchset incomplete.
> >> >> >
> >> >> > For this work to be complete what is needed is:
> >> >> > - hypervisor: add control of primary's visibility to guest
> >> >> > - guest: add support for this grouping to the failover driver
> >> >> >
> >> >> > We also need
> >> >> > - spec: document matching rules based on the pci bridge
> >> >> >
> >> >> > and it's helpful to have a spec proposal with implementation, but I
> >> >> > would say at least proposed patches to one of the above 2 would be
> >> >> > helpful before we include this in spec.
> >> >> >
> >> >> > --
> >> >> > MST
> >> >> >
> >> >> > ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> >> >
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
next prev parent reply other threads:[~2018-09-27 16:32 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-15 18:49 [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature Sridhar Samudrala
2018-08-27 8:40 ` [virtio-dev] " Cornelia Huck
2018-08-27 12:34 ` Michael S. Tsirkin
2018-08-27 16:50 ` Samudrala, Sridhar
2018-08-28 12:13 ` Michael S. Tsirkin
2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin
2018-09-12 15:17 ` Samudrala, Sridhar
2018-09-12 15:22 ` Michael S. Tsirkin
2018-09-18 10:20 ` Cornelia Huck
2018-09-18 10:37 ` Sameeh Jubran
2018-09-18 13:25 ` Michael S. Tsirkin
2018-09-18 18:30 ` Siwei Liu
2018-09-18 18:39 ` Michael S. Tsirkin
2018-09-18 19:10 ` Siwei Liu
2018-09-20 3:04 ` Michael S. Tsirkin
2018-09-19 5:03 ` Samudrala, Sridhar
2018-09-20 5:51 ` Sameeh Jubran
2018-09-18 13:35 ` Michael S. Tsirkin
2018-09-18 15:13 ` Venu Busireddy
2018-09-18 15:31 ` Michael S. Tsirkin
2018-09-18 18:48 ` Siwei Liu
2018-09-20 3:11 ` Michael S. Tsirkin
2018-09-20 23:57 ` Siwei Liu
2018-09-21 2:23 ` Michael S. Tsirkin
2018-09-21 2:34 ` Michael S. Tsirkin
2018-09-27 0:18 ` Siwei Liu
2018-09-27 7:17 ` Sameeh Jubran
2018-09-27 16:17 ` Michael S. Tsirkin
2018-09-27 17:23 ` Samudrala, Sridhar
2018-09-27 23:45 ` Michael S. Tsirkin
2018-09-30 9:17 ` Sameeh Jubran
2018-09-30 13:50 ` Sameeh Jubran
2018-09-27 16:32 ` Michael S. Tsirkin [this message]
2018-10-02 8:42 ` Siwei Liu
2018-10-02 12:43 ` Michael S. Tsirkin
2018-10-05 0:03 ` Siwei Liu
2018-10-05 5:17 ` Samudrala, Sridhar
2018-10-10 14:40 ` Michael S. Tsirkin
2018-10-11 0:16 ` Samudrala, Sridhar
2018-10-05 19:18 ` Michael S. Tsirkin
2018-10-08 22:06 ` Sameeh Jubran
2018-10-10 14:43 ` Michael S. Tsirkin
2018-10-11 1:26 ` Siwei Liu
2018-10-18 23:20 ` Siwei Liu
2018-10-18 23:40 ` Michael S. Tsirkin
2018-10-19 3:45 ` Michael S. Tsirkin
2018-11-21 15:39 ` Sameeh Jubran
2018-11-21 18:41 ` Michael S. Tsirkin
2018-11-21 20:04 ` Sameeh Jubran
2018-11-21 23:51 ` Samudrala, Sridhar
2018-11-22 13:55 ` Sameeh Jubran
2018-11-22 18:27 ` Michael S. Tsirkin
2018-11-26 15:13 ` Sameeh Jubran
2018-11-26 15:43 ` Sameeh Jubran
2018-11-26 20:22 ` Samudrala, Sridhar
2018-11-27 11:24 ` Sameeh Jubran
2018-11-28 17:08 ` Michael S. Tsirkin
2018-11-28 17:31 ` Samudrala, Sridhar
2018-11-28 17:35 ` Michael S. Tsirkin
2018-11-28 18:39 ` Samudrala, Sridhar
2018-11-28 18:51 ` Michael S. Tsirkin
2018-11-29 6:29 ` Samudrala, Sridhar
2018-11-28 20:06 ` Michael S. Tsirkin
2018-11-28 20:28 ` si-wei liu
2018-11-28 20:43 ` Michael S. Tsirkin
2018-11-28 20:47 ` si-wei liu
2018-11-29 1:15 ` Michael S. Tsirkin
2018-11-29 6:37 ` Samudrala, Sridhar
2018-11-29 20:14 ` si-wei liu
2018-11-29 21:17 ` Michael S. Tsirkin
2018-11-29 22:53 ` si-wei liu
2018-11-29 23:53 ` Samudrala, Sridhar
2018-11-30 0:24 ` si-wei liu
2018-11-30 3:08 ` Samudrala, Sridhar
2018-11-30 4:46 ` si-wei liu
2018-11-30 6:21 ` Michael S. Tsirkin
2018-12-04 2:09 ` si-wei liu
2018-12-04 3:59 ` Michael S. Tsirkin
2018-12-05 16:18 ` Sameeh Jubran
2018-12-05 17:18 ` Michael S. Tsirkin
2018-12-08 1:54 ` si-wei liu
2018-12-10 15:13 ` Sameeh Jubran
2018-12-10 15:34 ` Sameeh Jubran
2018-12-10 17:46 ` Michael S. Tsirkin
2018-12-11 15:50 ` Sameeh Jubran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180927121739-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=cohuck@redhat.com \
--cc=loseweigh@gmail.com \
--cc=sridhar.samudrala@intel.com \
--cc=venu.busireddy@oracle.com \
--cc=virtio-dev@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox