* [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature @ 2018-08-15 18:49 Sridhar Samudrala 2018-08-27 8:40 ` [virtio-dev] " Cornelia Huck 2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: Sridhar Samudrala @ 2018-08-15 18:49 UTC (permalink / raw) To: mst, cohuck, virtio-dev VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net device to act as a standby for another device with the same MAC address. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 --- content.tex | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/content.tex b/content.tex index be18234..42a0e7e 100644 --- a/content.tex +++ b/content.tex @@ -2525,6 +2525,9 @@ features. \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. + +\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary + device with the same MAC address. \end{description} \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements} @@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do so without fragmentation, after VIRTIO_NET_F_MTU has been successfully negotiated. +If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act +as a standby device for a primary device with the same MAC address. + \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout} A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it. @@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of size exceeding the value of \field{mtu} (plus low level ethernet header length) with \field{gso_type} NONE or ECN. +A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it. + \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout} \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout} When using the legacy interface, transitional devices and drivers -- 2.14.4 --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 85+ messages in thread
* [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-08-15 18:49 [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature Sridhar Samudrala @ 2018-08-27 8:40 ` Cornelia Huck 2018-08-27 12:34 ` Michael S. Tsirkin 2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Cornelia Huck @ 2018-08-27 8:40 UTC (permalink / raw) To: Sridhar Samudrala, mst; +Cc: virtio-dev On Wed, 15 Aug 2018 11:49:15 -0700 Sridhar Samudrala <sridhar.samudrala@intel.com> wrote: > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > device to act as a standby for another device with the same MAC address. > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > Acked-by: Cornelia Huck <cohuck@redhat.com> > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 I think you need to update the github issue to point to this (v4) patch. > --- > content.tex | 8 ++++++++ > 1 file changed, 8 insertions(+) Other than that, I'd vote (...) to start voting on this issue, but AFAIK I can't do that. Michael? --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-08-27 8:40 ` [virtio-dev] " Cornelia Huck @ 2018-08-27 12:34 ` Michael S. Tsirkin 2018-08-27 16:50 ` Samudrala, Sridhar 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-08-27 12:34 UTC (permalink / raw) To: Cornelia Huck; +Cc: Sridhar Samudrala, virtio-dev On Mon, Aug 27, 2018 at 10:40:35AM +0200, Cornelia Huck wrote: > On Wed, 15 Aug 2018 11:49:15 -0700 > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote: > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > device to act as a standby for another device with the same MAC address. > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > I think you need to update the github issue to point to this (v4) patch. > > > --- > > content.tex | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > Other than that, I'd vote (...) to start voting on this issue, but > AFAIK I can't do that. Michael? OK, ballot started. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-08-27 12:34 ` Michael S. Tsirkin @ 2018-08-27 16:50 ` Samudrala, Sridhar 2018-08-28 12:13 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Samudrala, Sridhar @ 2018-08-27 16:50 UTC (permalink / raw) To: Michael S. Tsirkin, Cornelia Huck; +Cc: virtio-dev On 8/27/2018 5:34 AM, Michael S. Tsirkin wrote: > On Mon, Aug 27, 2018 at 10:40:35AM +0200, Cornelia Huck wrote: >> On Wed, 15 Aug 2018 11:49:15 -0700 >> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote: >> >>> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >>> device to act as a standby for another device with the same MAC address. >>> >>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>> Acked-by: Cornelia Huck <cohuck@redhat.com> >>> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 >> I think you need to update the github issue to point to this (v4) patch. Updated the github issue with link to v4 patch. >> >>> --- >>> content.tex | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >> Other than that, I'd vote (...) to start voting on this issue, but >> AFAIK I can't do that. Michael? > OK, ballot started. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-08-27 16:50 ` Samudrala, Sridhar @ 2018-08-28 12:13 ` Michael S. Tsirkin 0 siblings, 0 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-08-28 12:13 UTC (permalink / raw) To: Samudrala, Sridhar; +Cc: Cornelia Huck, virtio-dev On Mon, Aug 27, 2018 at 09:50:33AM -0700, Samudrala, Sridhar wrote: > On 8/27/2018 5:34 AM, Michael S. Tsirkin wrote: > > On Mon, Aug 27, 2018 at 10:40:35AM +0200, Cornelia Huck wrote: > > > On Wed, 15 Aug 2018 11:49:15 -0700 > > > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote: > > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > I think you need to update the github issue to point to this (v4) patch. > > Updated the github issue with link to v4 patch. Please verify it's the same link that was put in the ballot: https://www.oasis-open.org/committees/ballot.php?id=3240 > > > > > > > --- > > > > content.tex | 8 ++++++++ > > > > 1 file changed, 8 insertions(+) > > > Other than that, I'd vote (...) to start voting on this issue, but > > > AFAIK I can't do that. Michael? > > OK, ballot started. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-08-15 18:49 [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature Sridhar Samudrala 2018-08-27 8:40 ` [virtio-dev] " Cornelia Huck @ 2018-09-07 21:34 ` Michael S. Tsirkin 2018-09-12 15:17 ` Samudrala, Sridhar 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-07 21:34 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: cohuck, virtio-dev On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > device to act as a standby for another device with the same MAC address. > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > Acked-by: Cornelia Huck <cohuck@redhat.com> > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 Applied but when do you plan to add documentation as pointed out by Jan and Halil? > --- > content.tex | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/content.tex b/content.tex > index be18234..42a0e7e 100644 > --- a/content.tex > +++ b/content.tex > @@ -2525,6 +2525,9 @@ features. > > \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control > channel. > + > +\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary > + device with the same MAC address. > \end{description} > > \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements} > @@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do > so without fragmentation, after VIRTIO_NET_F_MTU has been successfully > negotiated. > > +If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act > +as a standby device for a primary device with the same MAC address. > + > \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout} > > A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it. > @@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of > size exceeding the value of \field{mtu} (plus low level ethernet header length) > with \field{gso_type} NONE or ECN. > > +A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it. > + > \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout} > \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout} > When using the legacy interface, transitional devices and drivers > -- > 2.14.4 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin @ 2018-09-12 15:17 ` Samudrala, Sridhar 2018-09-12 15:22 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Samudrala, Sridhar @ 2018-09-12 15:17 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: cohuck, virtio-dev On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: >> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >> device to act as a standby for another device with the same MAC address. >> >> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> Acked-by: Cornelia Huck <cohuck@redhat.com> >> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > Applied but when do you plan to add documentation as pointed > out by Jan and Halil? I thought additional documentation will be done as part of the Qemu enablement patches and i hope someone in RH is looking into it. Does it make sense to add a link to to the kernel documentation of this feature in the spec https://www.kernel.org/doc/html/latest/networking/net_failover.html > >> --- >> content.tex | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/content.tex b/content.tex >> index be18234..42a0e7e 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -2525,6 +2525,9 @@ features. >> >> \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control >> channel. >> + >> +\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary >> + device with the same MAC address. >> \end{description} >> >> \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements} >> @@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do >> so without fragmentation, after VIRTIO_NET_F_MTU has been successfully >> negotiated. >> >> +If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act >> +as a standby device for a primary device with the same MAC address. >> + >> \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout} >> >> A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it. >> @@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of >> size exceeding the value of \field{mtu} (plus low level ethernet header length) >> with \field{gso_type} NONE or ECN. >> >> +A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it. >> + >> \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout} >> \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout} >> When using the legacy interface, transitional devices and drivers >> -- >> 2.14.4 >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-12 15:17 ` Samudrala, Sridhar @ 2018-09-12 15:22 ` Michael S. Tsirkin 2018-09-18 10:20 ` Cornelia Huck 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-12 15:22 UTC (permalink / raw) To: Samudrala, Sridhar; +Cc: cohuck, virtio-dev On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > device to act as a standby for another device with the same MAC address. > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > Applied but when do you plan to add documentation as pointed > > out by Jan and Halil? > > I thought additional documentation will be done as part of the Qemu enablement > patches and i hope someone in RH is looking into it. > > Does it make sense to add a link to to the kernel documentation of this feature in > the spec > https://www.kernel.org/doc/html/latest/networking/net_failover.html I do not think this will address the comments posted. Specifically we should probably include documentation for what is a standby and primary: what is expected of driver (maintain configuration on standby, support primary coming and going, transmit on standby only if there is no primary) and of device (have same mac for standby as for standby). > > > > > > --- > > > content.tex | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/content.tex b/content.tex > > > index be18234..42a0e7e 100644 > > > --- a/content.tex > > > +++ b/content.tex > > > @@ -2525,6 +2525,9 @@ features. > > > \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control > > > channel. > > > + > > > +\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary > > > + device with the same MAC address. > > > \end{description} > > > \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements} > > > @@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do > > > so without fragmentation, after VIRTIO_NET_F_MTU has been successfully > > > negotiated. > > > +If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act > > > +as a standby device for a primary device with the same MAC address. > > > + > > > \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout} > > > A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it. > > > @@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of > > > size exceeding the value of \field{mtu} (plus low level ethernet header length) > > > with \field{gso_type} NONE or ECN. > > > +A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it. > > > + > > > \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout} > > > \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout} > > > When using the legacy interface, transitional devices and drivers > > > -- > > > 2.14.4 > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-12 15:22 ` Michael S. Tsirkin @ 2018-09-18 10:20 ` Cornelia Huck 2018-09-18 10:37 ` Sameeh Jubran 2018-09-18 13:35 ` Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: Cornelia Huck @ 2018-09-18 10:20 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Samudrala, Sridhar, virtio-dev On Wed, 12 Sep 2018 11:22:12 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > Applied but when do you plan to add documentation as pointed > > > out by Jan and Halil? > > > > I thought additional documentation will be done as part of the Qemu enablement > > patches and i hope someone in RH is looking into it. > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > the spec > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > I do not think this will address the comments posted. Specifically we > should probably include documentation for what is a standby and primary: > what is expected of driver (maintain configuration on standby, support > primary coming and going, transmit on standby only if there is no > primary) and of device (have same mac for standby as for standby). Yes, we need some definitive statements of what a driver and a device is supposed to do in order to conform; it might make sense to discuss this in conjunction with discussion on any QEMU patches (have not checked whether anything has been posted, just returned from vacation). I assume that we still stick with the plan to implement/document MAC-based handling first and then enhance with other methods later? --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 10:20 ` Cornelia Huck @ 2018-09-18 10:37 ` Sameeh Jubran 2018-09-18 13:25 ` Michael S. Tsirkin 2018-09-18 13:35 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-09-18 10:37 UTC (permalink / raw) To: cohuck; +Cc: Michael S. Tsirkin, sridhar.samudrala, virtio-dev On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: > > On Wed, 12 Sep 2018 11:22:12 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > > > > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > > Applied but when do you plan to add documentation as pointed > > > > out by Jan and Halil? > > > > > > I thought additional documentation will be done as part of the Qemu enablement > > > patches and i hope someone in RH is looking into it. > > > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > > the spec > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > I do not think this will address the comments posted. Specifically we > > should probably include documentation for what is a standby and primary: > > what is expected of driver (maintain configuration on standby, support > > primary coming and going, transmit on standby only if there is no > > primary) and of device (have same mac for standby as for standby). > > Yes, we need some definitive statements of what a driver and a device > is supposed to do in order to conform; it might make sense to discuss > this in conjunction with discussion on any QEMU patches (have not > checked whether anything has been posted, just returned from vacation). > > I assume that we still stick with the plan to implement/document > MAC-based handling first and then enhance with other methods later? I am currently in the process of writing the patches for this feature, I have thought about how the feature should be implemented and decided to go with a different approach. I've decided that the id of the vfio attached device will be specified in the virtio-net arguments as follows: -device virtio-net,standby=<device_id_of_vfio_device> -vfio #address,id=<device_id_of_vfio_device> This approach makes minimal changes to the current infrastructure and does so elegantly without adding unnecessary ids to the bridges. The mac address approach seems to be very complicated as there is no standard way to find the mac address of a given device and it is vendor dependent, which makes the task of identifying the target standby device by it's mac address a very tough one. Please share your thoughts so I'll move forward with the patches. An initial patch which implements hiding the device from pci bus before the feature is acked is provided below: commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover) Author: Sameeh Jubran <sjubran@redhat.com> Date: Sun Sep 16 13:21:41 2018 +0300 virtio-net: Implement standby feature Signed-off-by: Sameeh Jubran <sjubran@redhat.com> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index f154756e85..46386c0e1b 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -26,7 +26,9 @@ #include "qapi/qapi-events-net.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" +#include "hw/pci/pci.h" #include "standard-headers/linux/ethtool.h" +#include "hw/vfio/vfio-common.h" #define VIRTIO_NET_VM_VERSION 11 @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name, n->netclient_type = g_strdup(type); } +static bool standby_device_present(VirtIONet *n, const char *id, + struct PCIDevice **pdev) +{ + return pci_qdev_find_device(id, pdev) >= 0 && pdev && + vfio_is_vfio_pci(*pdev); +} + static void virtio_net_device_realize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); @@ -1976,6 +1985,21 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); } + if (n->net_conf.standby_id_str && standby_device_present(n, + n->net_conf.standby_id_str, &n->standby_pdev)) { + DeviceState *dev = DEVICE(n->standby_pdev); + DeviceClass *klass = DEVICE_GET_CLASS(dev); + /* Hide standby from pci till the feature is acked */ + if (klass->hotpluggable) + { + qdev_unplug(dev, errp); + if (errp == NULL) + { + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); + } + } + } + virtio_net_set_config_size(n, n->host_features); virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = { true), DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 866f0deeb7..593debe56e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) #endif } +bool vfio_is_vfio_pci(PCIDevice* pdev) +{ + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; +} + static void vfio_intx_update(PCIDevice *pdev) { VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 821def0565..26dfde805f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, hwaddr *pgsize); int vfio_spapr_remove_window(VFIOContainer *container, hwaddr offset_within_address_space); +bool vfio_is_vfio_pci(PCIDevice* pdev); #endif /* HW_VFIO_VFIO_COMMON_H */ diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 4d7f3c82ca..94388b40cb 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -42,6 +42,7 @@ typedef struct virtio_net_conf int32_t speed; char *duplex_str; uint8_t duplex; + char *standby_id_str; } virtio_net_conf; /* Maximum packet size we can receive from tap device: header + 64k */ @@ -103,6 +104,7 @@ typedef struct VirtIONet { int announce_counter; bool needs_vnet_hdr_swap; bool mtu_bypass_backend; + PCIDevice *standby_pdev; } VirtIONet; void virtio_net_set_netclient_name(VirtIONet *n, const char *name, (END) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 10:37 ` Sameeh Jubran @ 2018-09-18 13:25 ` Michael S. Tsirkin 2018-09-18 18:30 ` Siwei Liu ` (2 more replies) 0 siblings, 3 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-18 13:25 UTC (permalink / raw) To: Sameeh Jubran; +Cc: cohuck, sridhar.samudrala, virtio-dev On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote: > On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: > > > > On Wed, 12 Sep 2018 11:22:12 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > > > > > > > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > > > Applied but when do you plan to add documentation as pointed > > > > > out by Jan and Halil? > > > > > > > > I thought additional documentation will be done as part of the Qemu enablement > > > > patches and i hope someone in RH is looking into it. > > > > > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > > > the spec > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > I do not think this will address the comments posted. Specifically we > > > should probably include documentation for what is a standby and primary: > > > what is expected of driver (maintain configuration on standby, support > > > primary coming and going, transmit on standby only if there is no > > > primary) and of device (have same mac for standby as for standby). > > > > Yes, we need some definitive statements of what a driver and a device > > is supposed to do in order to conform; it might make sense to discuss > > this in conjunction with discussion on any QEMU patches (have not > > checked whether anything has been posted, just returned from vacation). > > > > I assume that we still stick with the plan to implement/document > > MAC-based handling first and then enhance with other methods later? > > I am currently in the process of writing the patches for this feature, > I have thought about how the feature should be implemented > and decided to go with a different approach. I've decided that the id > of the vfio attached device will be specified in the virtio-net > arguments as follows: > > -device virtio-net,standby=<device_id_of_vfio_device> > -vfio #address,id=<device_id_of_vfio_device> > > This approach makes minimal changes to the current infrastructure and > does so elegantly without adding unnecessary ids to the bridges. > > The mac address approach seems to be very complicated as there is no > standard way to find the mac address of a given device and it is > vendor dependent, > which makes the task of identifying the target standby device by it's > mac address a very tough one. Oh mac address is used by guest. I agree it's not a great qemu interface. The idea was basically to have -vfio #address,primary=<id> > Please share your thoughts so I'll move forward with the patches. Can this actually support hotplug add and remove of the vfio device though? E.g. hotplug add vfio device while VM is already running? With the primary=<> it works because standby must always exist even when primary isn't there. > An initial patch which implements hiding the device from pci bus > before the feature is acked is provided below: > > commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover) > Author: Sameeh Jubran <sjubran@redhat.com> > Date: Sun Sep 16 13:21:41 2018 +0300 > > virtio-net: Implement standby feature > > Signed-off-by: Sameeh Jubran <sjubran@redhat.com> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index f154756e85..46386c0e1b 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -26,7 +26,9 @@ > #include "qapi/qapi-events-net.h" > #include "hw/virtio/virtio-access.h" > #include "migration/misc.h" > +#include "hw/pci/pci.h" > #include "standard-headers/linux/ethtool.h" > +#include "hw/vfio/vfio-common.h" > > #define VIRTIO_NET_VM_VERSION 11 > > @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet > *n, const char *name, > n->netclient_type = g_strdup(type); > } > > +static bool standby_device_present(VirtIONet *n, const char *id, > + struct PCIDevice **pdev) > +{ > + return pci_qdev_find_device(id, pdev) >= 0 && pdev && > + vfio_is_vfio_pci(*pdev); > +} > + > static void virtio_net_device_realize(DeviceState *dev, Error **errp) > { > VirtIODevice *vdev = VIRTIO_DEVICE(dev); > @@ -1976,6 +1985,21 @@ static void > virtio_net_device_realize(DeviceState *dev, Error **errp) > n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); > } > > + if (n->net_conf.standby_id_str && standby_device_present(n, > + n->net_conf.standby_id_str, &n->standby_pdev)) { > + DeviceState *dev = DEVICE(n->standby_pdev); > + DeviceClass *klass = DEVICE_GET_CLASS(dev); > + /* Hide standby from pci till the feature is acked */ > + if (klass->hotpluggable) > + { > + qdev_unplug(dev, errp); Does this really hide the device? I see: hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl); if (hdc->unplug_request) { hotplug_handler_unplug_request(hotplug_ctrl, dev, errp); } else { hotplug_handler_unplug(hotplug_ctrl, dev, errp); } which seems to just send an eject request to guest - the reverse of what we want to do. > + if (errp == NULL) > + { > + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); > + } I'm not sure how is this error handling supposed to work. > + } > + } > + > virtio_net_set_config_size(n, n->host_features); > virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); > > @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = { > true), > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 866f0deeb7..593debe56e 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) > #endif > } > > +bool vfio_is_vfio_pci(PCIDevice* pdev) > +{ > + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; > +} > + > static void vfio_intx_update(PCIDevice *pdev) > { > VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h > index 821def0565..26dfde805f 100644 > --- a/include/hw/vfio/vfio-common.h > +++ b/include/hw/vfio/vfio-common.h > @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, > hwaddr *pgsize); > int vfio_spapr_remove_window(VFIOContainer *container, > hwaddr offset_within_address_space); > +bool vfio_is_vfio_pci(PCIDevice* pdev); > > #endif /* HW_VFIO_VFIO_COMMON_H */ > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h > index 4d7f3c82ca..94388b40cb 100644 > --- a/include/hw/virtio/virtio-net.h > +++ b/include/hw/virtio/virtio-net.h > @@ -42,6 +42,7 @@ typedef struct virtio_net_conf > int32_t speed; > char *duplex_str; > uint8_t duplex; > + char *standby_id_str; > } virtio_net_conf; > > /* Maximum packet size we can receive from tap device: header + 64k */ > @@ -103,6 +104,7 @@ typedef struct VirtIONet { > int announce_counter; > bool needs_vnet_hdr_swap; > bool mtu_bypass_backend; > + PCIDevice *standby_pdev; > } VirtIONet; > > void virtio_net_set_netclient_name(VirtIONet *n, const char *name, > (END) > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > > > -- > Respectfully, > Sameeh Jubran > Linkedin > Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 13:25 ` Michael S. Tsirkin @ 2018-09-18 18:30 ` Siwei Liu 2018-09-18 18:39 ` Michael S. Tsirkin 2018-09-19 5:03 ` Samudrala, Sridhar 2018-09-20 5:51 ` Sameeh Jubran 2 siblings, 1 reply; 85+ messages in thread From: Siwei Liu @ 2018-09-18 18:30 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote: >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: >> > >> > On Wed, 12 Sep 2018 11:22:12 -0400 >> > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> > >> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: >> > > > >> > > > >> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: >> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: >> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >> > > > > > device to act as a standby for another device with the same MAC address. >> > > > > > >> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> >> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 >> > > > > Applied but when do you plan to add documentation as pointed >> > > > > out by Jan and Halil? >> > > > >> > > > I thought additional documentation will be done as part of the Qemu enablement >> > > > patches and i hope someone in RH is looking into it. >> > > > >> > > > Does it make sense to add a link to to the kernel documentation of this feature in >> > > > the spec >> > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html >> > > >> > > >> > > I do not think this will address the comments posted. Specifically we >> > > should probably include documentation for what is a standby and primary: >> > > what is expected of driver (maintain configuration on standby, support >> > > primary coming and going, transmit on standby only if there is no >> > > primary) and of device (have same mac for standby as for standby). >> > >> > Yes, we need some definitive statements of what a driver and a device >> > is supposed to do in order to conform; it might make sense to discuss >> > this in conjunction with discussion on any QEMU patches (have not >> > checked whether anything has been posted, just returned from vacation). >> > >> > I assume that we still stick with the plan to implement/document >> > MAC-based handling first and then enhance with other methods later? >> >> I am currently in the process of writing the patches for this feature, >> I have thought about how the feature should be implemented >> and decided to go with a different approach. I've decided that the id >> of the vfio attached device will be specified in the virtio-net >> arguments as follows: >> >> -device virtio-net,standby=<device_id_of_vfio_device> >> -vfio #address,id=<device_id_of_vfio_device> >> >> This approach makes minimal changes to the current infrastructure and >> does so elegantly without adding unnecessary ids to the bridges. >> >> The mac address approach seems to be very complicated as there is no >> standard way to find the mac address of a given device and it is >> vendor dependent, >> which makes the task of identifying the target standby device by it's >> mac address a very tough one. > > Oh mac address is used by guest. I agree it's not a great qemu > interface. > The idea was basically to have -vfio #address,primary=<id> Interesting... How do you make sure the MAC address are same (grouped) between vfio and virtio-net-pci (from QEMU side)? I thought the spec meant to make this a guest-host interface, right? -Siwei > > >> Please share your thoughts so I'll move forward with the patches. > > Can this actually support hotplug add and remove of the vfio device though? > E.g. hotplug add vfio device while VM is already running? > With the primary=<> it works because standby must always exist > even when primary isn't there. > > >> An initial patch which implements hiding the device from pci bus >> before the feature is acked is provided below: >> >> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover) >> Author: Sameeh Jubran <sjubran@redhat.com> >> Date: Sun Sep 16 13:21:41 2018 +0300 >> >> virtio-net: Implement standby feature >> >> Signed-off-by: Sameeh Jubran <sjubran@redhat.com> >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index f154756e85..46386c0e1b 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -26,7 +26,9 @@ >> #include "qapi/qapi-events-net.h" >> #include "hw/virtio/virtio-access.h" >> #include "migration/misc.h" >> +#include "hw/pci/pci.h" >> #include "standard-headers/linux/ethtool.h" >> +#include "hw/vfio/vfio-common.h" >> >> #define VIRTIO_NET_VM_VERSION 11 >> >> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet >> *n, const char *name, >> n->netclient_type = g_strdup(type); >> } >> >> +static bool standby_device_present(VirtIONet *n, const char *id, >> + struct PCIDevice **pdev) >> +{ >> + return pci_qdev_find_device(id, pdev) >= 0 && pdev && >> + vfio_is_vfio_pci(*pdev); >> +} >> + >> static void virtio_net_device_realize(DeviceState *dev, Error **errp) >> { >> VirtIODevice *vdev = VIRTIO_DEVICE(dev); >> @@ -1976,6 +1985,21 @@ static void >> virtio_net_device_realize(DeviceState *dev, Error **errp) >> n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); >> } >> >> + if (n->net_conf.standby_id_str && standby_device_present(n, >> + n->net_conf.standby_id_str, &n->standby_pdev)) { >> + DeviceState *dev = DEVICE(n->standby_pdev); >> + DeviceClass *klass = DEVICE_GET_CLASS(dev); >> + /* Hide standby from pci till the feature is acked */ >> + if (klass->hotpluggable) >> + { >> + qdev_unplug(dev, errp); > > > Does this really hide the device? > I see: > hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl); > if (hdc->unplug_request) { > hotplug_handler_unplug_request(hotplug_ctrl, dev, errp); > } else { > hotplug_handler_unplug(hotplug_ctrl, dev, errp); > } > > which seems to just send an eject request to guest - the reverse of > what we want to do. > >> + if (errp == NULL) >> + { >> + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); >> + } > > I'm not sure how is this error handling supposed to work. > >> + } >> + } >> + >> virtio_net_set_config_size(n, n->host_features); >> virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); >> >> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = { >> true), >> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >> + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), >> DEFINE_PROP_END_OF_LIST(), >> }; >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >> index 866f0deeb7..593debe56e 100644 >> --- a/hw/vfio/pci.c >> +++ b/hw/vfio/pci.c >> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) >> #endif >> } >> >> +bool vfio_is_vfio_pci(PCIDevice* pdev) >> +{ >> + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >> + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; >> +} >> + >> static void vfio_intx_update(PCIDevice *pdev) >> { >> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >> index 821def0565..26dfde805f 100644 >> --- a/include/hw/vfio/vfio-common.h >> +++ b/include/hw/vfio/vfio-common.h >> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, >> hwaddr *pgsize); >> int vfio_spapr_remove_window(VFIOContainer *container, >> hwaddr offset_within_address_space); >> +bool vfio_is_vfio_pci(PCIDevice* pdev); >> >> #endif /* HW_VFIO_VFIO_COMMON_H */ >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h >> index 4d7f3c82ca..94388b40cb 100644 >> --- a/include/hw/virtio/virtio-net.h >> +++ b/include/hw/virtio/virtio-net.h >> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf >> int32_t speed; >> char *duplex_str; >> uint8_t duplex; >> + char *standby_id_str; >> } virtio_net_conf; >> >> /* Maximum packet size we can receive from tap device: header + 64k */ >> @@ -103,6 +104,7 @@ typedef struct VirtIONet { >> int announce_counter; >> bool needs_vnet_hdr_swap; >> bool mtu_bypass_backend; >> + PCIDevice *standby_pdev; >> } VirtIONet; >> >> void virtio_net_set_netclient_name(VirtIONet *n, const char *name, >> (END) >> >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> > >> >> >> -- >> Respectfully, >> Sameeh Jubran >> Linkedin >> Software Engineer @ Daynix. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 18:30 ` Siwei Liu @ 2018-09-18 18:39 ` Michael S. Tsirkin 2018-09-18 19:10 ` Siwei Liu 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-18 18:39 UTC (permalink / raw) To: Siwei Liu; +Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 11:30:27AM -0700, Siwei Liu wrote: > On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote: > >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: > >> > > >> > On Wed, 12 Sep 2018 11:22:12 -0400 > >> > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> > > >> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > >> > > > > >> > > > > >> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > >> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > >> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > >> > > > > > device to act as a standby for another device with the same MAC address. > >> > > > > > > >> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > >> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > >> > > > > Applied but when do you plan to add documentation as pointed > >> > > > > out by Jan and Halil? > >> > > > > >> > > > I thought additional documentation will be done as part of the Qemu enablement > >> > > > patches and i hope someone in RH is looking into it. > >> > > > > >> > > > Does it make sense to add a link to to the kernel documentation of this feature in > >> > > > the spec > >> > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > >> > > > >> > > > >> > > I do not think this will address the comments posted. Specifically we > >> > > should probably include documentation for what is a standby and primary: > >> > > what is expected of driver (maintain configuration on standby, support > >> > > primary coming and going, transmit on standby only if there is no > >> > > primary) and of device (have same mac for standby as for standby). > >> > > >> > Yes, we need some definitive statements of what a driver and a device > >> > is supposed to do in order to conform; it might make sense to discuss > >> > this in conjunction with discussion on any QEMU patches (have not > >> > checked whether anything has been posted, just returned from vacation). > >> > > >> > I assume that we still stick with the plan to implement/document > >> > MAC-based handling first and then enhance with other methods later? > >> > >> I am currently in the process of writing the patches for this feature, > >> I have thought about how the feature should be implemented > >> and decided to go with a different approach. I've decided that the id > >> of the vfio attached device will be specified in the virtio-net > >> arguments as follows: > >> > >> -device virtio-net,standby=<device_id_of_vfio_device> > >> -vfio #address,id=<device_id_of_vfio_device> > >> > >> This approach makes minimal changes to the current infrastructure and > >> does so elegantly without adding unnecessary ids to the bridges. > >> > >> The mac address approach seems to be very complicated as there is no > >> standard way to find the mac address of a given device and it is > >> vendor dependent, > >> which makes the task of identifying the target standby device by it's > >> mac address a very tough one. > > > > Oh mac address is used by guest. I agree it's not a great qemu > > interface. > > The idea was basically to have -vfio #address,primary=<id> > > Interesting... How do you make sure the MAC address are same (grouped) > between vfio and virtio-net-pci (from QEMU side)? I thought the spec > meant to make this a guest-host interface, right? > > -Siwei I guess at this point that can be up to the management tool. > > > > > >> Please share your thoughts so I'll move forward with the patches. > > > > Can this actually support hotplug add and remove of the vfio device though? > > E.g. hotplug add vfio device while VM is already running? > > With the primary=<> it works because standby must always exist > > even when primary isn't there. > > > > > >> An initial patch which implements hiding the device from pci bus > >> before the feature is acked is provided below: > >> > >> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover) > >> Author: Sameeh Jubran <sjubran@redhat.com> > >> Date: Sun Sep 16 13:21:41 2018 +0300 > >> > >> virtio-net: Implement standby feature > >> > >> Signed-off-by: Sameeh Jubran <sjubran@redhat.com> > >> > >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> index f154756e85..46386c0e1b 100644 > >> --- a/hw/net/virtio-net.c > >> +++ b/hw/net/virtio-net.c > >> @@ -26,7 +26,9 @@ > >> #include "qapi/qapi-events-net.h" > >> #include "hw/virtio/virtio-access.h" > >> #include "migration/misc.h" > >> +#include "hw/pci/pci.h" > >> #include "standard-headers/linux/ethtool.h" > >> +#include "hw/vfio/vfio-common.h" > >> > >> #define VIRTIO_NET_VM_VERSION 11 > >> > >> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet > >> *n, const char *name, > >> n->netclient_type = g_strdup(type); > >> } > >> > >> +static bool standby_device_present(VirtIONet *n, const char *id, > >> + struct PCIDevice **pdev) > >> +{ > >> + return pci_qdev_find_device(id, pdev) >= 0 && pdev && > >> + vfio_is_vfio_pci(*pdev); > >> +} > >> + > >> static void virtio_net_device_realize(DeviceState *dev, Error **errp) > >> { > >> VirtIODevice *vdev = VIRTIO_DEVICE(dev); > >> @@ -1976,6 +1985,21 @@ static void > >> virtio_net_device_realize(DeviceState *dev, Error **errp) > >> n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); > >> } > >> > >> + if (n->net_conf.standby_id_str && standby_device_present(n, > >> + n->net_conf.standby_id_str, &n->standby_pdev)) { > >> + DeviceState *dev = DEVICE(n->standby_pdev); > >> + DeviceClass *klass = DEVICE_GET_CLASS(dev); > >> + /* Hide standby from pci till the feature is acked */ > >> + if (klass->hotpluggable) > >> + { > >> + qdev_unplug(dev, errp); > > > > > > Does this really hide the device? > > I see: > > hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl); > > if (hdc->unplug_request) { > > hotplug_handler_unplug_request(hotplug_ctrl, dev, errp); > > } else { > > hotplug_handler_unplug(hotplug_ctrl, dev, errp); > > } > > > > which seems to just send an eject request to guest - the reverse of > > what we want to do. > > > >> + if (errp == NULL) > >> + { > >> + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); > >> + } > > > > I'm not sure how is this error handling supposed to work. > > > >> + } > >> + } > >> + > >> virtio_net_set_config_size(n, n->host_features); > >> virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); > >> > >> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = { > >> true), > >> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > >> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > >> + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), > >> DEFINE_PROP_END_OF_LIST(), > >> }; > >> > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >> index 866f0deeb7..593debe56e 100644 > >> --- a/hw/vfio/pci.c > >> +++ b/hw/vfio/pci.c > >> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) > >> #endif > >> } > >> > >> +bool vfio_is_vfio_pci(PCIDevice* pdev) > >> +{ > >> + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > >> + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; > >> +} > >> + > >> static void vfio_intx_update(PCIDevice *pdev) > >> { > >> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h > >> index 821def0565..26dfde805f 100644 > >> --- a/include/hw/vfio/vfio-common.h > >> +++ b/include/hw/vfio/vfio-common.h > >> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, > >> hwaddr *pgsize); > >> int vfio_spapr_remove_window(VFIOContainer *container, > >> hwaddr offset_within_address_space); > >> +bool vfio_is_vfio_pci(PCIDevice* pdev); > >> > >> #endif /* HW_VFIO_VFIO_COMMON_H */ > >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h > >> index 4d7f3c82ca..94388b40cb 100644 > >> --- a/include/hw/virtio/virtio-net.h > >> +++ b/include/hw/virtio/virtio-net.h > >> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf > >> int32_t speed; > >> char *duplex_str; > >> uint8_t duplex; > >> + char *standby_id_str; > >> } virtio_net_conf; > >> > >> /* Maximum packet size we can receive from tap device: header + 64k */ > >> @@ -103,6 +104,7 @@ typedef struct VirtIONet { > >> int announce_counter; > >> bool needs_vnet_hdr_swap; > >> bool mtu_bypass_backend; > >> + PCIDevice *standby_pdev; > >> } VirtIONet; > >> > >> void virtio_net_set_netclient_name(VirtIONet *n, const char *name, > >> (END) > >> > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > >> > > >> > >> > >> -- > >> Respectfully, > >> Sameeh Jubran > >> Linkedin > >> Software Engineer @ Daynix. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 18:39 ` Michael S. Tsirkin @ 2018-09-18 19:10 ` Siwei Liu 2018-09-20 3:04 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Siwei Liu @ 2018-09-18 19:10 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 11:39 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Sep 18, 2018 at 11:30:27AM -0700, Siwei Liu wrote: >> On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote: >> >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: >> >> > >> >> > On Wed, 12 Sep 2018 11:22:12 -0400 >> >> > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >> > >> >> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: >> >> > > > >> >> > > > >> >> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: >> >> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: >> >> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >> >> > > > > > device to act as a standby for another device with the same MAC address. >> >> > > > > > >> >> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> >> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> >> >> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 >> >> > > > > Applied but when do you plan to add documentation as pointed >> >> > > > > out by Jan and Halil? >> >> > > > >> >> > > > I thought additional documentation will be done as part of the Qemu enablement >> >> > > > patches and i hope someone in RH is looking into it. >> >> > > > >> >> > > > Does it make sense to add a link to to the kernel documentation of this feature in >> >> > > > the spec >> >> > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html >> >> > > >> >> > > >> >> > > I do not think this will address the comments posted. Specifically we >> >> > > should probably include documentation for what is a standby and primary: >> >> > > what is expected of driver (maintain configuration on standby, support >> >> > > primary coming and going, transmit on standby only if there is no >> >> > > primary) and of device (have same mac for standby as for standby). >> >> > >> >> > Yes, we need some definitive statements of what a driver and a device >> >> > is supposed to do in order to conform; it might make sense to discuss >> >> > this in conjunction with discussion on any QEMU patches (have not >> >> > checked whether anything has been posted, just returned from vacation). >> >> > >> >> > I assume that we still stick with the plan to implement/document >> >> > MAC-based handling first and then enhance with other methods later? >> >> >> >> I am currently in the process of writing the patches for this feature, >> >> I have thought about how the feature should be implemented >> >> and decided to go with a different approach. I've decided that the id >> >> of the vfio attached device will be specified in the virtio-net >> >> arguments as follows: >> >> >> >> -device virtio-net,standby=<device_id_of_vfio_device> >> >> -vfio #address,id=<device_id_of_vfio_device> >> >> >> >> This approach makes minimal changes to the current infrastructure and >> >> does so elegantly without adding unnecessary ids to the bridges. >> >> >> >> The mac address approach seems to be very complicated as there is no >> >> standard way to find the mac address of a given device and it is >> >> vendor dependent, >> >> which makes the task of identifying the target standby device by it's >> >> mac address a very tough one. >> > >> > Oh mac address is used by guest. I agree it's not a great qemu >> > interface. >> > The idea was basically to have -vfio #address,primary=<id> >> >> Interesting... How do you make sure the MAC address are same (grouped) >> between vfio and virtio-net-pci (from QEMU side)? I thought the spec >> meant to make this a guest-host interface, right? >> >> -Siwei > > I guess at this point that can be up to the management tool. Although still a guest-host interface, moving this device-driver virtio requirement to management toolstack is poor engineering practice IMO. -Siwei > > >> > >> > >> >> Please share your thoughts so I'll move forward with the patches. >> > >> > Can this actually support hotplug add and remove of the vfio device though? >> > E.g. hotplug add vfio device while VM is already running? >> > With the primary=<> it works because standby must always exist >> > even when primary isn't there. >> > >> > >> >> An initial patch which implements hiding the device from pci bus >> >> before the feature is acked is provided below: >> >> >> >> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover) >> >> Author: Sameeh Jubran <sjubran@redhat.com> >> >> Date: Sun Sep 16 13:21:41 2018 +0300 >> >> >> >> virtio-net: Implement standby feature >> >> >> >> Signed-off-by: Sameeh Jubran <sjubran@redhat.com> >> >> >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> >> index f154756e85..46386c0e1b 100644 >> >> --- a/hw/net/virtio-net.c >> >> +++ b/hw/net/virtio-net.c >> >> @@ -26,7 +26,9 @@ >> >> #include "qapi/qapi-events-net.h" >> >> #include "hw/virtio/virtio-access.h" >> >> #include "migration/misc.h" >> >> +#include "hw/pci/pci.h" >> >> #include "standard-headers/linux/ethtool.h" >> >> +#include "hw/vfio/vfio-common.h" >> >> >> >> #define VIRTIO_NET_VM_VERSION 11 >> >> >> >> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet >> >> *n, const char *name, >> >> n->netclient_type = g_strdup(type); >> >> } >> >> >> >> +static bool standby_device_present(VirtIONet *n, const char *id, >> >> + struct PCIDevice **pdev) >> >> +{ >> >> + return pci_qdev_find_device(id, pdev) >= 0 && pdev && >> >> + vfio_is_vfio_pci(*pdev); >> >> +} >> >> + >> >> static void virtio_net_device_realize(DeviceState *dev, Error **errp) >> >> { >> >> VirtIODevice *vdev = VIRTIO_DEVICE(dev); >> >> @@ -1976,6 +1985,21 @@ static void >> >> virtio_net_device_realize(DeviceState *dev, Error **errp) >> >> n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); >> >> } >> >> >> >> + if (n->net_conf.standby_id_str && standby_device_present(n, >> >> + n->net_conf.standby_id_str, &n->standby_pdev)) { >> >> + DeviceState *dev = DEVICE(n->standby_pdev); >> >> + DeviceClass *klass = DEVICE_GET_CLASS(dev); >> >> + /* Hide standby from pci till the feature is acked */ >> >> + if (klass->hotpluggable) >> >> + { >> >> + qdev_unplug(dev, errp); >> > >> > >> > Does this really hide the device? >> > I see: >> > hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl); >> > if (hdc->unplug_request) { >> > hotplug_handler_unplug_request(hotplug_ctrl, dev, errp); >> > } else { >> > hotplug_handler_unplug(hotplug_ctrl, dev, errp); >> > } >> > >> > which seems to just send an eject request to guest - the reverse of >> > what we want to do. >> > >> >> + if (errp == NULL) >> >> + { >> >> + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); >> >> + } >> > >> > I'm not sure how is this error handling supposed to work. >> > >> >> + } >> >> + } >> >> + >> >> virtio_net_set_config_size(n, n->host_features); >> >> virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); >> >> >> >> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = { >> >> true), >> >> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >> >> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >> >> + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), >> >> DEFINE_PROP_END_OF_LIST(), >> >> }; >> >> >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >> >> index 866f0deeb7..593debe56e 100644 >> >> --- a/hw/vfio/pci.c >> >> +++ b/hw/vfio/pci.c >> >> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) >> >> #endif >> >> } >> >> >> >> +bool vfio_is_vfio_pci(PCIDevice* pdev) >> >> +{ >> >> + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >> >> + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; >> >> +} >> >> + >> >> static void vfio_intx_update(PCIDevice *pdev) >> >> { >> >> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >> >> index 821def0565..26dfde805f 100644 >> >> --- a/include/hw/vfio/vfio-common.h >> >> +++ b/include/hw/vfio/vfio-common.h >> >> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, >> >> hwaddr *pgsize); >> >> int vfio_spapr_remove_window(VFIOContainer *container, >> >> hwaddr offset_within_address_space); >> >> +bool vfio_is_vfio_pci(PCIDevice* pdev); >> >> >> >> #endif /* HW_VFIO_VFIO_COMMON_H */ >> >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h >> >> index 4d7f3c82ca..94388b40cb 100644 >> >> --- a/include/hw/virtio/virtio-net.h >> >> +++ b/include/hw/virtio/virtio-net.h >> >> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf >> >> int32_t speed; >> >> char *duplex_str; >> >> uint8_t duplex; >> >> + char *standby_id_str; >> >> } virtio_net_conf; >> >> >> >> /* Maximum packet size we can receive from tap device: header + 64k */ >> >> @@ -103,6 +104,7 @@ typedef struct VirtIONet { >> >> int announce_counter; >> >> bool needs_vnet_hdr_swap; >> >> bool mtu_bypass_backend; >> >> + PCIDevice *standby_pdev; >> >> } VirtIONet; >> >> >> >> void virtio_net_set_netclient_name(VirtIONet *n, const char *name, >> >> (END) >> >> >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> >> > >> >> >> >> >> >> -- >> >> Respectfully, >> >> Sameeh Jubran >> >> Linkedin >> >> Software Engineer @ Daynix. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 19:10 ` Siwei Liu @ 2018-09-20 3:04 ` Michael S. Tsirkin 0 siblings, 0 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-20 3:04 UTC (permalink / raw) To: Siwei Liu; +Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 12:10:54PM -0700, Siwei Liu wrote: > On Tue, Sep 18, 2018 at 11:39 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Sep 18, 2018 at 11:30:27AM -0700, Siwei Liu wrote: > >> On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote: > >> >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: > >> >> > > >> >> > On Wed, 12 Sep 2018 11:22:12 -0400 > >> >> > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> >> > > >> >> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > >> >> > > > > >> >> > > > > >> >> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > >> >> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > >> >> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > >> >> > > > > > device to act as a standby for another device with the same MAC address. > >> >> > > > > > > >> >> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >> >> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > >> >> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > >> >> > > > > Applied but when do you plan to add documentation as pointed > >> >> > > > > out by Jan and Halil? > >> >> > > > > >> >> > > > I thought additional documentation will be done as part of the Qemu enablement > >> >> > > > patches and i hope someone in RH is looking into it. > >> >> > > > > >> >> > > > Does it make sense to add a link to to the kernel documentation of this feature in > >> >> > > > the spec > >> >> > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > >> >> > > > >> >> > > > >> >> > > I do not think this will address the comments posted. Specifically we > >> >> > > should probably include documentation for what is a standby and primary: > >> >> > > what is expected of driver (maintain configuration on standby, support > >> >> > > primary coming and going, transmit on standby only if there is no > >> >> > > primary) and of device (have same mac for standby as for standby). > >> >> > > >> >> > Yes, we need some definitive statements of what a driver and a device > >> >> > is supposed to do in order to conform; it might make sense to discuss > >> >> > this in conjunction with discussion on any QEMU patches (have not > >> >> > checked whether anything has been posted, just returned from vacation). > >> >> > > >> >> > I assume that we still stick with the plan to implement/document > >> >> > MAC-based handling first and then enhance with other methods later? > >> >> > >> >> I am currently in the process of writing the patches for this feature, > >> >> I have thought about how the feature should be implemented > >> >> and decided to go with a different approach. I've decided that the id > >> >> of the vfio attached device will be specified in the virtio-net > >> >> arguments as follows: > >> >> > >> >> -device virtio-net,standby=<device_id_of_vfio_device> > >> >> -vfio #address,id=<device_id_of_vfio_device> > >> >> > >> >> This approach makes minimal changes to the current infrastructure and > >> >> does so elegantly without adding unnecessary ids to the bridges. > >> >> > >> >> The mac address approach seems to be very complicated as there is no > >> >> standard way to find the mac address of a given device and it is > >> >> vendor dependent, > >> >> which makes the task of identifying the target standby device by it's > >> >> mac address a very tough one. > >> > > >> > Oh mac address is used by guest. I agree it's not a great qemu > >> > interface. > >> > The idea was basically to have -vfio #address,primary=<id> > >> > >> Interesting... How do you make sure the MAC address are same (grouped) > >> between vfio and virtio-net-pci (from QEMU side)? I thought the spec > >> meant to make this a guest-host interface, right? > >> > >> -Siwei > > > > I guess at this point that can be up to the management tool. > > Although still a guest-host interface, moving this device-driver > virtio requirement to management toolstack is poor engineering > practice IMO. > > -Siwei There are advantages to doing it outside QEMU, such as security (libvirt has access to netlink, QEMU doesn't). It doesn't look like such an important detail to me - these details are going to be up to whoever implements it. Anyway we are discussing this on a wrong list. Where does code belong (qemu or libvirt) is a question to be discussed on qemu and libvirt lists, virtio spec does not care which host side module does what. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 13:25 ` Michael S. Tsirkin 2018-09-18 18:30 ` Siwei Liu @ 2018-09-19 5:03 ` Samudrala, Sridhar 2018-09-20 5:51 ` Sameeh Jubran 2 siblings, 0 replies; 85+ messages in thread From: Samudrala, Sridhar @ 2018-09-19 5:03 UTC (permalink / raw) To: Michael S. Tsirkin, Sameeh Jubran; +Cc: cohuck, virtio-dev On 9/18/2018 6:25 AM, Michael S. Tsirkin wrote: > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote: >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: >>> On Wed, 12 Sep 2018 11:22:12 -0400 >>> "Michael S. Tsirkin" <mst@redhat.com> wrote: >>> >>>> On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: >>>>> >>>>> On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: >>>>>> On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: >>>>>>> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >>>>>>> device to act as a standby for another device with the same MAC address. >>>>>>> >>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>>>>>> Acked-by: Cornelia Huck <cohuck@redhat.com> >>>>>>> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 >>>>>> Applied but when do you plan to add documentation as pointed >>>>>> out by Jan and Halil? >>>>> I thought additional documentation will be done as part of the Qemu enablement >>>>> patches and i hope someone in RH is looking into it. >>>>> >>>>> Does it make sense to add a link to to the kernel documentation of this feature in >>>>> the spec >>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>> >>>> I do not think this will address the comments posted. Specifically we >>>> should probably include documentation for what is a standby and primary: >>>> what is expected of driver (maintain configuration on standby, support >>>> primary coming and going, transmit on standby only if there is no >>>> primary) and of device (have same mac for standby as for standby). >>> Yes, we need some definitive statements of what a driver and a device >>> is supposed to do in order to conform; it might make sense to discuss >>> this in conjunction with discussion on any QEMU patches (have not >>> checked whether anything has been posted, just returned from vacation). >>> >>> I assume that we still stick with the plan to implement/document >>> MAC-based handling first and then enhance with other methods later? >> I am currently in the process of writing the patches for this feature, >> I have thought about how the feature should be implemented >> and decided to go with a different approach. I've decided that the id >> of the vfio attached device will be specified in the virtio-net >> arguments as follows: >> >> -device virtio-net,standby=<device_id_of_vfio_device> >> -vfio #address,id=<device_id_of_vfio_device> >> >> This approach makes minimal changes to the current infrastructure and >> does so elegantly without adding unnecessary ids to the bridges. >> >> The mac address approach seems to be very complicated as there is no >> standard way to find the mac address of a given device and it is >> vendor dependent, >> which makes the task of identifying the target standby device by it's >> mac address a very tough one. > Oh mac address is used by guest. I agree it's not a great qemu > interface. > The idea was basically to have -vfio #address,primary=<id> > > >> Please share your thoughts so I'll move forward with the patches. > Can this actually support hotplug add and remove of the vfio device though? > E.g. hotplug add vfio device while VM is already running? > With the primary=<> it works because standby must always exist > even when primary isn't there. Also, how do we want to handle a scenario where a VM has a direct attached VF device and virtio-net in standby mode is hotplugged/unplugged? What should be the behavior if guest unloads a virtio-net driver that is acting as a standby? Do we want qemu to unplug VF device too? > > >> An initial patch which implements hiding the device from pci bus >> before the feature is acked is provided below: >> >> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover) >> Author: Sameeh Jubran <sjubran@redhat.com> >> Date: Sun Sep 16 13:21:41 2018 +0300 >> >> virtio-net: Implement standby feature >> >> Signed-off-by: Sameeh Jubran <sjubran@redhat.com> >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index f154756e85..46386c0e1b 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -26,7 +26,9 @@ >> #include "qapi/qapi-events-net.h" >> #include "hw/virtio/virtio-access.h" >> #include "migration/misc.h" >> +#include "hw/pci/pci.h" >> #include "standard-headers/linux/ethtool.h" >> +#include "hw/vfio/vfio-common.h" >> >> #define VIRTIO_NET_VM_VERSION 11 >> >> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet >> *n, const char *name, >> n->netclient_type = g_strdup(type); >> } >> >> +static bool standby_device_present(VirtIONet *n, const char *id, >> + struct PCIDevice **pdev) >> +{ >> + return pci_qdev_find_device(id, pdev) >= 0 && pdev && >> + vfio_is_vfio_pci(*pdev); >> +} >> + >> static void virtio_net_device_realize(DeviceState *dev, Error **errp) >> { >> VirtIODevice *vdev = VIRTIO_DEVICE(dev); >> @@ -1976,6 +1985,21 @@ static void >> virtio_net_device_realize(DeviceState *dev, Error **errp) >> n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); >> } >> >> + if (n->net_conf.standby_id_str && standby_device_present(n, >> + n->net_conf.standby_id_str, &n->standby_pdev)) { >> + DeviceState *dev = DEVICE(n->standby_pdev); >> + DeviceClass *klass = DEVICE_GET_CLASS(dev); >> + /* Hide standby from pci till the feature is acked */ >> + if (klass->hotpluggable) >> + { >> + qdev_unplug(dev, errp); > > Does this really hide the device? > I see: > hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl); > if (hdc->unplug_request) { > hotplug_handler_unplug_request(hotplug_ctrl, dev, errp); > } else { > hotplug_handler_unplug(hotplug_ctrl, dev, errp); > } > > which seems to just send an eject request to guest - the reverse of > what we want to do. > >> + if (errp == NULL) >> + { >> + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); >> + } > I'm not sure how is this error handling supposed to work. > >> + } >> + } >> + >> virtio_net_set_config_size(n, n->host_features); >> virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); >> >> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = { >> true), >> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >> + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), >> DEFINE_PROP_END_OF_LIST(), >> }; >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >> index 866f0deeb7..593debe56e 100644 >> --- a/hw/vfio/pci.c >> +++ b/hw/vfio/pci.c >> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) >> #endif >> } >> >> +bool vfio_is_vfio_pci(PCIDevice* pdev) >> +{ >> + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >> + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; >> +} >> + >> static void vfio_intx_update(PCIDevice *pdev) >> { >> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >> index 821def0565..26dfde805f 100644 >> --- a/include/hw/vfio/vfio-common.h >> +++ b/include/hw/vfio/vfio-common.h >> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, >> hwaddr *pgsize); >> int vfio_spapr_remove_window(VFIOContainer *container, >> hwaddr offset_within_address_space); >> +bool vfio_is_vfio_pci(PCIDevice* pdev); >> >> #endif /* HW_VFIO_VFIO_COMMON_H */ >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h >> index 4d7f3c82ca..94388b40cb 100644 >> --- a/include/hw/virtio/virtio-net.h >> +++ b/include/hw/virtio/virtio-net.h >> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf >> int32_t speed; >> char *duplex_str; >> uint8_t duplex; >> + char *standby_id_str; >> } virtio_net_conf; >> >> /* Maximum packet size we can receive from tap device: header + 64k */ >> @@ -103,6 +104,7 @@ typedef struct VirtIONet { >> int announce_counter; >> bool needs_vnet_hdr_swap; >> bool mtu_bypass_backend; >> + PCIDevice *standby_pdev; >> } VirtIONet; >> >> void virtio_net_set_netclient_name(VirtIONet *n, const char *name, >> (END) >> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >>> >> >> -- >> Respectfully, >> Sameeh Jubran >> Linkedin >> Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 13:25 ` Michael S. Tsirkin 2018-09-18 18:30 ` Siwei Liu 2018-09-19 5:03 ` Samudrala, Sridhar @ 2018-09-20 5:51 ` Sameeh Jubran 2 siblings, 0 replies; 85+ messages in thread From: Sameeh Jubran @ 2018-09-20 5:51 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: cohuck, sridhar.samudrala, virtio-dev On Tue, Sep 18, 2018 at 4:25 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote: > > On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > On Wed, 12 Sep 2018 11:22:12 -0400 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > > > > > > > > > > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > > > > Applied but when do you plan to add documentation as pointed > > > > > > out by Jan and Halil? > > > > > > > > > > I thought additional documentation will be done as part of the Qemu enablement > > > > > patches and i hope someone in RH is looking into it. > > > > > > > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > > > > the spec > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > > > I do not think this will address the comments posted. Specifically we > > > > should probably include documentation for what is a standby and primary: > > > > what is expected of driver (maintain configuration on standby, support > > > > primary coming and going, transmit on standby only if there is no > > > > primary) and of device (have same mac for standby as for standby). > > > > > > Yes, we need some definitive statements of what a driver and a device > > > is supposed to do in order to conform; it might make sense to discuss > > > this in conjunction with discussion on any QEMU patches (have not > > > checked whether anything has been posted, just returned from vacation). > > > > > > I assume that we still stick with the plan to implement/document > > > MAC-based handling first and then enhance with other methods later? > > > > I am currently in the process of writing the patches for this feature, > > I have thought about how the feature should be implemented > > and decided to go with a different approach. I've decided that the id > > of the vfio attached device will be specified in the virtio-net > > arguments as follows: > > > > -device virtio-net,standby=<device_id_of_vfio_device> > > -vfio #address,id=<device_id_of_vfio_device> > > > > This approach makes minimal changes to the current infrastructure and > > does so elegantly without adding unnecessary ids to the bridges. > > > > The mac address approach seems to be very complicated as there is no > > standard way to find the mac address of a given device and it is > > vendor dependent, > > which makes the task of identifying the target standby device by it's > > mac address a very tough one. > > Oh mac address is used by guest. I agree it's not a great qemu > interface. > The idea was basically to have -vfio #address,primary=<id> > > > > Please share your thoughts so I'll move forward with the patches. > > Can this actually support hotplug add and remove of the vfio device though? > E.g. hotplug add vfio device while VM is already running? > With the primary=<> it works because standby must always exist > even when primary isn't there. Oh I get what are you saying, what are you suggesting can be easily done too. The primary searches for the standby device and if it exists it should hide itself and somehow register itself to the standby. Now the idea of group idea starts to make sense to me as it makes the identification of the paired device accessible from both devices without any additions. However this can be easily done by exposing an API from virio-net for the primary device to announce itself. I don't really like the idea of the group id as it seems to me as unneeded logic, but I think I'm missing something. Can someone explain the motive behind the group id? Is it necessary? > > > > An initial patch which implements hiding the device from pci bus > > before the feature is acked is provided below: > > > > commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover) > > Author: Sameeh Jubran <sjubran@redhat.com> > > Date: Sun Sep 16 13:21:41 2018 +0300 > > > > virtio-net: Implement standby feature > > > > Signed-off-by: Sameeh Jubran <sjubran@redhat.com> > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > index f154756e85..46386c0e1b 100644 > > --- a/hw/net/virtio-net.c > > +++ b/hw/net/virtio-net.c > > @@ -26,7 +26,9 @@ > > #include "qapi/qapi-events-net.h" > > #include "hw/virtio/virtio-access.h" > > #include "migration/misc.h" > > +#include "hw/pci/pci.h" > > #include "standard-headers/linux/ethtool.h" > > +#include "hw/vfio/vfio-common.h" > > > > #define VIRTIO_NET_VM_VERSION 11 > > > > @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet > > *n, const char *name, > > n->netclient_type = g_strdup(type); > > } > > > > +static bool standby_device_present(VirtIONet *n, const char *id, > > + struct PCIDevice **pdev) > > +{ > > + return pci_qdev_find_device(id, pdev) >= 0 && pdev && > > + vfio_is_vfio_pci(*pdev); > > +} > > + > > static void virtio_net_device_realize(DeviceState *dev, Error **errp) > > { > > VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > @@ -1976,6 +1985,21 @@ static void > > virtio_net_device_realize(DeviceState *dev, Error **errp) > > n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); > > } > > > > + if (n->net_conf.standby_id_str && standby_device_present(n, > > + n->net_conf.standby_id_str, &n->standby_pdev)) { > > + DeviceState *dev = DEVICE(n->standby_pdev); > > + DeviceClass *klass = DEVICE_GET_CLASS(dev); > > + /* Hide standby from pci till the feature is acked */ > > + if (klass->hotpluggable) > > + { > > + qdev_unplug(dev, errp); > > > Does this really hide the device? > I see: > hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl); > if (hdc->unplug_request) { > hotplug_handler_unplug_request(hotplug_ctrl, dev, errp); > } else { > hotplug_handler_unplug(hotplug_ctrl, dev, errp); > } > > which seems to just send an eject request to guest - the reverse of > what we want to do. You are right it doesn't hide the device, I thought about registering a pre plug callback which should be called before the device is realized and deattach it from it's parent or override the realize callback in the deviec state to null. This should hide the device from the pci bus as far as I understand. > > > + if (errp == NULL) > > + { > > + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); > > + } > > I'm not sure how is this error handling supposed to work. > > > + } > > + } > > + > > virtio_net_set_config_size(n, n->host_features); > > virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); > > > > @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = { > > true), > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > > + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), > > DEFINE_PROP_END_OF_LIST(), > > }; > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > index 866f0deeb7..593debe56e 100644 > > --- a/hw/vfio/pci.c > > +++ b/hw/vfio/pci.c > > @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) > > #endif > > } > > > > +bool vfio_is_vfio_pci(PCIDevice* pdev) > > +{ > > + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > > + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; > > +} > > + > > static void vfio_intx_update(PCIDevice *pdev) > > { > > VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h > > index 821def0565..26dfde805f 100644 > > --- a/include/hw/vfio/vfio-common.h > > +++ b/include/hw/vfio/vfio-common.h > > @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, > > hwaddr *pgsize); > > int vfio_spapr_remove_window(VFIOContainer *container, > > hwaddr offset_within_address_space); > > +bool vfio_is_vfio_pci(PCIDevice* pdev); > > > > #endif /* HW_VFIO_VFIO_COMMON_H */ > > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h > > index 4d7f3c82ca..94388b40cb 100644 > > --- a/include/hw/virtio/virtio-net.h > > +++ b/include/hw/virtio/virtio-net.h > > @@ -42,6 +42,7 @@ typedef struct virtio_net_conf > > int32_t speed; > > char *duplex_str; > > uint8_t duplex; > > + char *standby_id_str; > > } virtio_net_conf; > > > > /* Maximum packet size we can receive from tap device: header + 64k */ > > @@ -103,6 +104,7 @@ typedef struct VirtIONet { > > int announce_counter; > > bool needs_vnet_hdr_swap; > > bool mtu_bypass_backend; > > + PCIDevice *standby_pdev; > > } VirtIONet; > > > > void virtio_net_set_netclient_name(VirtIONet *n, const char *name, > > (END) > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > > > > > > > -- > > Respectfully, > > Sameeh Jubran > > Linkedin > > Software Engineer @ Daynix. -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 10:20 ` Cornelia Huck 2018-09-18 10:37 ` Sameeh Jubran @ 2018-09-18 13:35 ` Michael S. Tsirkin 2018-09-18 15:13 ` Venu Busireddy 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-18 13:35 UTC (permalink / raw) To: Cornelia Huck; +Cc: Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > On Wed, 12 Sep 2018 11:22:12 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > > > > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > > Applied but when do you plan to add documentation as pointed > > > > out by Jan and Halil? > > > > > > I thought additional documentation will be done as part of the Qemu enablement > > > patches and i hope someone in RH is looking into it. > > > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > > the spec > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > I do not think this will address the comments posted. Specifically we > > should probably include documentation for what is a standby and primary: > > what is expected of driver (maintain configuration on standby, support > > primary coming and going, transmit on standby only if there is no > > primary) and of device (have same mac for standby as for standby). > > Yes, we need some definitive statements of what a driver and a device > is supposed to do in order to conform; it might make sense to discuss > this in conjunction with discussion on any QEMU patches (have not > checked whether anything has been posted, just returned from vacation). > > I assume that we still stick with the plan to implement/document > MAC-based handling first and then enhance with other methods later? I'm fine with that at least. If someone wants to work on other methods straight away, that's also fine by me. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 13:35 ` Michael S. Tsirkin @ 2018-09-18 15:13 ` Venu Busireddy 2018-09-18 15:31 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Venu Busireddy @ 2018-09-18 15:13 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Cornelia Huck, Samudrala, Sridhar, virtio-dev On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > > On Wed, 12 Sep 2018 11:22:12 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > > > > > > > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > > > Applied but when do you plan to add documentation as pointed > > > > > out by Jan and Halil? > > > > > > > > I thought additional documentation will be done as part of the Qemu enablement > > > > patches and i hope someone in RH is looking into it. > > > > > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > > > the spec > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > I do not think this will address the comments posted. Specifically we > > > should probably include documentation for what is a standby and primary: > > > what is expected of driver (maintain configuration on standby, support > > > primary coming and going, transmit on standby only if there is no > > > primary) and of device (have same mac for standby as for standby). > > > > Yes, we need some definitive statements of what a driver and a device > > is supposed to do in order to conform; it might make sense to discuss > > this in conjunction with discussion on any QEMU patches (have not > > checked whether anything has been posted, just returned from vacation). > > > > I assume that we still stick with the plan to implement/document > > MAC-based handling first and then enhance with other methods later? > > I'm fine with that at least. If someone wants to work on > other methods straight away, that's also fine by me. Patch set [1] implements the failover-group-id mechanism. Are you thinking of some other method? Venu [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 15:13 ` Venu Busireddy @ 2018-09-18 15:31 ` Michael S. Tsirkin 2018-09-18 18:48 ` Siwei Liu 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-18 15:31 UTC (permalink / raw) To: Venu Busireddy; +Cc: Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: > On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > > > On Wed, 12 Sep 2018 11:22:12 -0400 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > > > > > > > > > > > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > > > > > > device to act as a standby for another device with the same MAC address. > > > > > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > > > > > Applied but when do you plan to add documentation as pointed > > > > > > out by Jan and Halil? > > > > > > > > > > I thought additional documentation will be done as part of the Qemu enablement > > > > > patches and i hope someone in RH is looking into it. > > > > > > > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > > > > the spec > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > > > I do not think this will address the comments posted. Specifically we > > > > should probably include documentation for what is a standby and primary: > > > > what is expected of driver (maintain configuration on standby, support > > > > primary coming and going, transmit on standby only if there is no > > > > primary) and of device (have same mac for standby as for standby). > > > > > > Yes, we need some definitive statements of what a driver and a device > > > is supposed to do in order to conform; it might make sense to discuss > > > this in conjunction with discussion on any QEMU patches (have not > > > checked whether anything has been posted, just returned from vacation). > > > > > > I assume that we still stick with the plan to implement/document > > > MAC-based handling first and then enhance with other methods later? > > > > I'm fine with that at least. If someone wants to work on > > other methods straight away, that's also fine by me. > > Patch set [1] implements the failover-group-id mechanism. Are you > thinking of some other method? > > Venu > > [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html > Yes, the grouping mechanism seems fine to me (I don't remember about the implementation, it's been a while). It is not by itself sufficient though, is it? MAC is assumed to be shared to avoid things like ARP/neighboor rediscovery, right? If true that implies that to avoid guest confusion visibility of the primary needs to be controlled by standby's driver. This makes this patchset incomplete. For this work to be complete what is needed is: - hypervisor: add control of primary's visibility to guest - guest: add support for this grouping to the failover driver We also need - spec: document matching rules based on the pci bridge and it's helpful to have a spec proposal with implementation, but I would say at least proposed patches to one of the above 2 would be helpful before we include this in spec. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 15:31 ` Michael S. Tsirkin @ 2018-09-18 18:48 ` Siwei Liu 2018-09-20 3:11 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Siwei Liu @ 2018-09-18 18:48 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: >> > > On Wed, 12 Sep 2018 11:22:12 -0400 >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> > > >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: >> > > > > >> > > > > >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >> > > > > > > device to act as a standby for another device with the same MAC address. >> > > > > > > >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 >> > > > > > Applied but when do you plan to add documentation as pointed >> > > > > > out by Jan and Halil? >> > > > > >> > > > > I thought additional documentation will be done as part of the Qemu enablement >> > > > > patches and i hope someone in RH is looking into it. >> > > > > >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in >> > > > > the spec >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html >> > > > >> > > > >> > > > I do not think this will address the comments posted. Specifically we >> > > > should probably include documentation for what is a standby and primary: >> > > > what is expected of driver (maintain configuration on standby, support >> > > > primary coming and going, transmit on standby only if there is no >> > > > primary) and of device (have same mac for standby as for standby). >> > > >> > > Yes, we need some definitive statements of what a driver and a device >> > > is supposed to do in order to conform; it might make sense to discuss >> > > this in conjunction with discussion on any QEMU patches (have not >> > > checked whether anything has been posted, just returned from vacation). >> > > >> > > I assume that we still stick with the plan to implement/document >> > > MAC-based handling first and then enhance with other methods later? >> > >> > I'm fine with that at least. If someone wants to work on >> > other methods straight away, that's also fine by me. >> >> Patch set [1] implements the failover-group-id mechanism. Are you >> thinking of some other method? >> >> Venu >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html >> > > Yes, the grouping mechanism seems fine to me (I don't remember > about the implementation, it's been a while). > > It is not by itself sufficient though, is it? I do understand that the group ID patch is incomplete though it's a base patch for the real work. > > MAC is assumed to be shared to avoid things like ARP/neighboor > rediscovery, right? True, but does this really need to be part of the guest-host interface? Or rather, I don't see how MAC based matching can be done on the host part. Are you going to expose MAC address to VFIO? The thing is the current MAC based implementation has intrinsic flaw that doesn't propagate errors to hypervisor, or there's no back channel for guest to unwind the hot plug action upon failure in probing or enslaving the primary. If you think about a more robust implementation, another grouping mechanism rather than MAC is pretty much required. Thanks, -Siwei > If true that implies that to avoid guest confusion visibility of the > primary needs to be controlled by standby's driver. > This makes this patchset incomplete. > > For this work to be complete what is needed is: > - hypervisor: add control of primary's visibility to guest > - guest: add support for this grouping to the failover driver > > We also need > - spec: document matching rules based on the pci bridge > > and it's helpful to have a spec proposal with implementation, but I > would say at least proposed patches to one of the above 2 would be > helpful before we include this in spec. > > -- > MST > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-18 18:48 ` Siwei Liu @ 2018-09-20 3:11 ` Michael S. Tsirkin 2018-09-20 23:57 ` Siwei Liu 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-20 3:11 UTC (permalink / raw) To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: > On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: > >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > >> > > On Wed, 12 Sep 2018 11:22:12 -0400 > >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> > > > >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > >> > > > > > >> > > > > > >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > >> > > > > > > device to act as a standby for another device with the same MAC address. > >> > > > > > > > >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > >> > > > > > Applied but when do you plan to add documentation as pointed > >> > > > > > out by Jan and Halil? > >> > > > > > >> > > > > I thought additional documentation will be done as part of the Qemu enablement > >> > > > > patches and i hope someone in RH is looking into it. > >> > > > > > >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in > >> > > > > the spec > >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > >> > > > > >> > > > > >> > > > I do not think this will address the comments posted. Specifically we > >> > > > should probably include documentation for what is a standby and primary: > >> > > > what is expected of driver (maintain configuration on standby, support > >> > > > primary coming and going, transmit on standby only if there is no > >> > > > primary) and of device (have same mac for standby as for standby). > >> > > > >> > > Yes, we need some definitive statements of what a driver and a device > >> > > is supposed to do in order to conform; it might make sense to discuss > >> > > this in conjunction with discussion on any QEMU patches (have not > >> > > checked whether anything has been posted, just returned from vacation). > >> > > > >> > > I assume that we still stick with the plan to implement/document > >> > > MAC-based handling first and then enhance with other methods later? > >> > > >> > I'm fine with that at least. If someone wants to work on > >> > other methods straight away, that's also fine by me. > >> > >> Patch set [1] implements the failover-group-id mechanism. Are you > >> thinking of some other method? > >> > >> Venu > >> > >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html > >> > > > > Yes, the grouping mechanism seems fine to me (I don't remember > > about the implementation, it's been a while). > > > > It is not by itself sufficient though, is it? > > I do understand that the group ID patch is incomplete though it's a > base patch for the real work. > > > > > MAC is assumed to be shared to avoid things like ARP/neighboor > > rediscovery, right? > > True, but does this really need to be part of the guest-host > interface? Or rather, I don't see how MAC based matching can be done > on the host part. mac address matching does not need to affect host side. > Are you going to expose MAC address to VFIO? If mac of a VF is programmed by libvirt through the PF (that's already the case), VFIO does not need to care about it. > > The thing is the current MAC based implementation has intrinsic flaw > that doesn't propagate errors to hypervisor, or there's no back > channel for guest to unwind the hot plug action upon failure in > probing or enslaving the primary. I guess you can eject the primary if you like. But why does hypervisor need to know? On error, just don't use primary, use standby. > If you think about a more robust > implementation, another grouping mechanism rather than MAC is pretty > much required. > > Thanks, > -Siwei I don't really know what is the flaw, or how is it fixed by a grouping mechanism. All this motivation was never described as part of work on an alternate grouping. > > If true that implies that to avoid guest confusion visibility of the > > primary needs to be controlled by standby's driver. > > This makes this patchset incomplete. > > > > For this work to be complete what is needed is: > > - hypervisor: add control of primary's visibility to guest > > - guest: add support for this grouping to the failover driver > > > > We also need > > - spec: document matching rules based on the pci bridge > > > > and it's helpful to have a spec proposal with implementation, but I > > would say at least proposed patches to one of the above 2 would be > > helpful before we include this in spec. > > > > -- > > MST > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-20 3:11 ` Michael S. Tsirkin @ 2018-09-20 23:57 ` Siwei Liu 2018-09-21 2:23 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Siwei Liu @ 2018-09-20 23:57 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400 >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >> > > >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: >> >> > > > > >> >> > > > > >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >> >> > > > > > > device to act as a standby for another device with the same MAC address. >> >> > > > > > > >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 >> >> > > > > > Applied but when do you plan to add documentation as pointed >> >> > > > > > out by Jan and Halil? >> >> > > > > >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement >> >> > > > > patches and i hope someone in RH is looking into it. >> >> > > > > >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in >> >> > > > > the spec >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html >> >> > > > >> >> > > > >> >> > > > I do not think this will address the comments posted. Specifically we >> >> > > > should probably include documentation for what is a standby and primary: >> >> > > > what is expected of driver (maintain configuration on standby, support >> >> > > > primary coming and going, transmit on standby only if there is no >> >> > > > primary) and of device (have same mac for standby as for standby). >> >> > > >> >> > > Yes, we need some definitive statements of what a driver and a device >> >> > > is supposed to do in order to conform; it might make sense to discuss >> >> > > this in conjunction with discussion on any QEMU patches (have not >> >> > > checked whether anything has been posted, just returned from vacation). >> >> > > >> >> > > I assume that we still stick with the plan to implement/document >> >> > > MAC-based handling first and then enhance with other methods later? >> >> > >> >> > I'm fine with that at least. If someone wants to work on >> >> > other methods straight away, that's also fine by me. >> >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you >> >> thinking of some other method? >> >> >> >> Venu >> >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html >> >> >> > >> > Yes, the grouping mechanism seems fine to me (I don't remember >> > about the implementation, it's been a while). >> > >> > It is not by itself sufficient though, is it? >> >> I do understand that the group ID patch is incomplete though it's a >> base patch for the real work. >> >> > >> > MAC is assumed to be shared to avoid things like ARP/neighboor >> > rediscovery, right? >> >> True, but does this really need to be part of the guest-host >> interface? Or rather, I don't see how MAC based matching can be done >> on the host part. > > mac address matching does not need to affect host side. Did you realize that the host side can't have duplicate MAC address filters for both PV and VF at the same time? If hot adding a VF with duplicate MAC address filter programmed in prior, the PV path for virtio in the host side is effectively disabled. However, the fact that VF gets hot plugged by QEMU/libvirt does not mean it's ready and usable in the guest. You end up with unusable guest networking, *temporarily only when VF is successfully probed and properly enslabed*. As of now, no guest-host handshake was defined in the spec to make virtio driver aware of hotplug event thus VF's exposure, and zero handshake was done to switch the datapath when VF driver is ready and usable in guest. The current implementation relies on the lucky side that all the entire hot plug process will be successul in the guest. BTW netvsc mitigate potential failure in the hotplug and driver probing by acknowledging the hypervisor through a DATAPATH_SWITCH hypercall (VMbus message) when VF driver is enslaved and ready, only then hypervisor will kick off datapath switching by moving the MAC address filter. > >> Are you going to expose MAC address to VFIO? > > If mac of a VF is programmed by libvirt through the PF > (that's already the case), VFIO does not need to care about it. > >> >> The thing is the current MAC based implementation has intrinsic flaw >> that doesn't propagate errors to hypervisor, or there's no back >> channel for guest to unwind the hot plug action upon failure in >> probing or enslaving the primary. > > I guess you can eject the primary if you like. But > why does hypervisor need to know? On error, just don't use primary, > use standby. Forget about the grouping mechanism first. What guest kernel change do you propose to make virtio driver know every possible error, think about how many moving targets it needs to specifically track with or has to depend on during the hot plug and driver probing process? If someone starts to implement the code and think about various error cases as a whole, I bet it would be more clear why grouping is relevant in the first place. -Siwei > >> If you think about a more robust >> implementation, another grouping mechanism rather than MAC is pretty >> much required. >> >> Thanks, >> -Siwei > > I don't really know what is the flaw, or how is it fixed by a grouping > mechanism. All this motivation was never described as part of work on > an alternate grouping. > >> > If true that implies that to avoid guest confusion visibility of the >> > primary needs to be controlled by standby's driver. >> > This makes this patchset incomplete. >> > >> > For this work to be complete what is needed is: >> > - hypervisor: add control of primary's visibility to guest >> > - guest: add support for this grouping to the failover driver >> > >> > We also need >> > - spec: document matching rules based on the pci bridge >> > >> > and it's helpful to have a spec proposal with implementation, but I >> > would say at least proposed patches to one of the above 2 would be >> > helpful before we include this in spec. >> > >> > -- >> > MST >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-20 23:57 ` Siwei Liu @ 2018-09-21 2:23 ` Michael S. Tsirkin 2018-09-21 2:34 ` Michael S. Tsirkin 2018-09-27 0:18 ` Siwei Liu 0 siblings, 2 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-21 2:23 UTC (permalink / raw) To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote: > On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: > >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: > >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400 > >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> >> > > > >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > >> >> > > > > > >> >> > > > > > >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > >> >> > > > > > > device to act as a standby for another device with the same MAC address. > >> >> > > > > > > > >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > >> >> > > > > > Applied but when do you plan to add documentation as pointed > >> >> > > > > > out by Jan and Halil? > >> >> > > > > > >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement > >> >> > > > > patches and i hope someone in RH is looking into it. > >> >> > > > > > >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in > >> >> > > > > the spec > >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > >> >> > > > > >> >> > > > > >> >> > > > I do not think this will address the comments posted. Specifically we > >> >> > > > should probably include documentation for what is a standby and primary: > >> >> > > > what is expected of driver (maintain configuration on standby, support > >> >> > > > primary coming and going, transmit on standby only if there is no > >> >> > > > primary) and of device (have same mac for standby as for standby). > >> >> > > > >> >> > > Yes, we need some definitive statements of what a driver and a device > >> >> > > is supposed to do in order to conform; it might make sense to discuss > >> >> > > this in conjunction with discussion on any QEMU patches (have not > >> >> > > checked whether anything has been posted, just returned from vacation). > >> >> > > > >> >> > > I assume that we still stick with the plan to implement/document > >> >> > > MAC-based handling first and then enhance with other methods later? > >> >> > > >> >> > I'm fine with that at least. If someone wants to work on > >> >> > other methods straight away, that's also fine by me. > >> >> > >> >> Patch set [1] implements the failover-group-id mechanism. Are you > >> >> thinking of some other method? > >> >> > >> >> Venu > >> >> > >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html > >> >> > >> > > >> > Yes, the grouping mechanism seems fine to me (I don't remember > >> > about the implementation, it's been a while). > >> > > >> > It is not by itself sufficient though, is it? > >> > >> I do understand that the group ID patch is incomplete though it's a > >> base patch for the real work. > >> > >> > > >> > MAC is assumed to be shared to avoid things like ARP/neighboor > >> > rediscovery, right? > >> > >> True, but does this really need to be part of the guest-host > >> interface? Or rather, I don't see how MAC based matching can be done > >> on the host part. > > > > mac address matching does not need to affect host side. > > Did you realize that the host side can't have duplicate MAC address > filters for both PV and VF at the same time? > > If hot adding a VF with duplicate MAC address filter programmed in > prior, the PV path for virtio in the host side is effectively > disabled. However, the fact that VF gets hot plugged by QEMU/libvirt > does not mean it's ready and usable in the guest. You end up with > unusable guest networking, *temporarily only when VF is successfully > probed and properly enslabed*. As of now, no guest-host handshake was > defined in the spec to make virtio driver aware of hotplug event thus > VF's exposure, and zero handshake was done to switch the datapath when > VF driver is ready and usable in guest. The current implementation > relies on the lucky side that all the entire hot plug process will be > successul in the guest. I think it's a PF bug then. PF driver should ignore filters for VFs which have not been enabled by guest since reset. > BTW netvsc mitigate potential failure in the hotplug and driver > probing by acknowledging the hypervisor through a DATAPATH_SWITCH > hypercall (VMbus message) when VF driver is enslaved and ready, only > then hypervisor will kick off datapath switching by moving the MAC > address filter. We can do it without need for PV. We can detect e.g. bus master enable. Move the filter when enabled, move it back when disabled e.g. by VF reset. Or maybe MSE, or both. > > > >> Are you going to expose MAC address to VFIO? > > > > If mac of a VF is programmed by libvirt through the PF > > (that's already the case), VFIO does not need to care about it. > > > >> > >> The thing is the current MAC based implementation has intrinsic flaw > >> that doesn't propagate errors to hypervisor, or there's no back > >> channel for guest to unwind the hot plug action upon failure in > >> probing or enslaving the primary. > > > > I guess you can eject the primary if you like. But > > why does hypervisor need to know? On error, just don't use primary, > > use standby. > > Forget about the grouping mechanism first. OK :) > What guest kernel change do > you propose to make virtio driver know every possible error, think > about how many moving targets it needs to specifically track with or > has to depend on during the hot plug and driver probing process? If > someone starts to implement the code and think about various error > cases as a whole, I bet it would be more clear why grouping is > relevant in the first place. > > -Siwei It just seems that no one's been motivated to do it so far. > > > >> If you think about a more robust > >> implementation, another grouping mechanism rather than MAC is pretty > >> much required. > >> > >> Thanks, > >> -Siwei > > > > I don't really know what is the flaw, or how is it fixed by a grouping > > mechanism. All this motivation was never described as part of work on > > an alternate grouping. > > > >> > If true that implies that to avoid guest confusion visibility of the > >> > primary needs to be controlled by standby's driver. > >> > This makes this patchset incomplete. > >> > > >> > For this work to be complete what is needed is: > >> > - hypervisor: add control of primary's visibility to guest > >> > - guest: add support for this grouping to the failover driver > >> > > >> > We also need > >> > - spec: document matching rules based on the pci bridge > >> > > >> > and it's helpful to have a spec proposal with implementation, but I > >> > would say at least proposed patches to one of the above 2 would be > >> > helpful before we include this in spec. > >> > > >> > -- > >> > MST > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-21 2:23 ` Michael S. Tsirkin @ 2018-09-21 2:34 ` Michael S. Tsirkin 2018-09-27 0:18 ` Siwei Liu 1 sibling, 0 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-21 2:34 UTC (permalink / raw) To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Thu, Sep 20, 2018 at 10:23:22PM -0400, Michael S. Tsirkin wrote: > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote: > > On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: > > >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: > > >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > > >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > > >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400 > > >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > >> >> > > > > >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > >> >> > > > > > > device to act as a standby for another device with the same MAC address. > > >> >> > > > > > > > > >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > >> >> > > > > > Applied but when do you plan to add documentation as pointed > > >> >> > > > > > out by Jan and Halil? > > >> >> > > > > > > >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement > > >> >> > > > > patches and i hope someone in RH is looking into it. > > >> >> > > > > > > >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > >> >> > > > > the spec > > >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > >> >> > > > > > >> >> > > > > > >> >> > > > I do not think this will address the comments posted. Specifically we > > >> >> > > > should probably include documentation for what is a standby and primary: > > >> >> > > > what is expected of driver (maintain configuration on standby, support > > >> >> > > > primary coming and going, transmit on standby only if there is no > > >> >> > > > primary) and of device (have same mac for standby as for standby). > > >> >> > > > > >> >> > > Yes, we need some definitive statements of what a driver and a device > > >> >> > > is supposed to do in order to conform; it might make sense to discuss > > >> >> > > this in conjunction with discussion on any QEMU patches (have not > > >> >> > > checked whether anything has been posted, just returned from vacation). > > >> >> > > > > >> >> > > I assume that we still stick with the plan to implement/document > > >> >> > > MAC-based handling first and then enhance with other methods later? > > >> >> > > > >> >> > I'm fine with that at least. If someone wants to work on > > >> >> > other methods straight away, that's also fine by me. > > >> >> > > >> >> Patch set [1] implements the failover-group-id mechanism. Are you > > >> >> thinking of some other method? > > >> >> > > >> >> Venu > > >> >> > > >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html > > >> >> > > >> > > > >> > Yes, the grouping mechanism seems fine to me (I don't remember > > >> > about the implementation, it's been a while). > > >> > > > >> > It is not by itself sufficient though, is it? > > >> > > >> I do understand that the group ID patch is incomplete though it's a > > >> base patch for the real work. > > >> > > >> > > > >> > MAC is assumed to be shared to avoid things like ARP/neighboor > > >> > rediscovery, right? > > >> > > >> True, but does this really need to be part of the guest-host > > >> interface? Or rather, I don't see how MAC based matching can be done > > >> on the host part. > > > > > > mac address matching does not need to affect host side. > > > > Did you realize that the host side can't have duplicate MAC address > > filters for both PV and VF at the same time? > > > > If hot adding a VF with duplicate MAC address filter programmed in > > prior, the PV path for virtio in the host side is effectively > > disabled. However, the fact that VF gets hot plugged by QEMU/libvirt > > does not mean it's ready and usable in the guest. You end up with > > unusable guest networking, *temporarily only when VF is successfully > > probed and properly enslabed*. As of now, no guest-host handshake was > > defined in the spec to make virtio driver aware of hotplug event thus > > VF's exposure, and zero handshake was done to switch the datapath when > > VF driver is ready and usable in guest. The current implementation > > relies on the lucky side that all the entire hot plug process will be > > successul in the guest. > > I think it's a PF bug then. PF driver should ignore filters > for VFs which have not been enabled by guest since reset. > > > BTW netvsc mitigate potential failure in the hotplug and driver > > probing by acknowledging the hypervisor through a DATAPATH_SWITCH > > hypercall (VMbus message) when VF driver is enslaved and ready, only > > then hypervisor will kick off datapath switching by moving the MAC > > address filter. > > We can do it without need for PV. We can detect e.g. bus master enable. > Move the filter when enabled, move it back when disabled e.g. by > VF reset. Or maybe MSE, or both. One other issue that I think netvsc will also have would be the moving of the MAC address. We need to reserve resources for the filter otherwise we risk attempt to install a filter will fail. Maybe we can start VF with a temporary MAC, then change it to a final one when guest tries to use it. It will work but we run into fact that MACs are currently programmed by mgmnt - in many setups qemu does not have the rights to do it. I'll try to ask some mgmt guys about the feasibility of this. I'm less worried about errors and more worried about downtime - hotplug on PCIe takes a while to complete (which maybe we should fix for linux by some PV, but would be tricky to fix for windows). > > > > > >> Are you going to expose MAC address to VFIO? > > > > > > If mac of a VF is programmed by libvirt through the PF > > > (that's already the case), VFIO does not need to care about it. > > > > > >> > > >> The thing is the current MAC based implementation has intrinsic flaw > > >> that doesn't propagate errors to hypervisor, or there's no back > > >> channel for guest to unwind the hot plug action upon failure in > > >> probing or enslaving the primary. > > > > > > I guess you can eject the primary if you like. But > > > why does hypervisor need to know? On error, just don't use primary, > > > use standby. > > > > Forget about the grouping mechanism first. > > OK :) > > > What guest kernel change do > > you propose to make virtio driver know every possible error, think > > about how many moving targets it needs to specifically track with or > > has to depend on during the hot plug and driver probing process? If > > someone starts to implement the code and think about various error > > cases as a whole, I bet it would be more clear why grouping is > > relevant in the first place. > > > > -Siwei > > It just seems that no one's been motivated to do it so far. > > > > > > >> If you think about a more robust > > >> implementation, another grouping mechanism rather than MAC is pretty > > >> much required. > > >> > > >> Thanks, > > >> -Siwei > > > > > > I don't really know what is the flaw, or how is it fixed by a grouping > > > mechanism. All this motivation was never described as part of work on > > > an alternate grouping. > > > > > >> > If true that implies that to avoid guest confusion visibility of the > > >> > primary needs to be controlled by standby's driver. > > >> > This makes this patchset incomplete. > > >> > > > >> > For this work to be complete what is needed is: > > >> > - hypervisor: add control of primary's visibility to guest > > >> > - guest: add support for this grouping to the failover driver > > >> > > > >> > We also need > > >> > - spec: document matching rules based on the pci bridge > > >> > > > >> > and it's helpful to have a spec proposal with implementation, but I > > >> > would say at least proposed patches to one of the above 2 would be > > >> > helpful before we include this in spec. > > >> > > > >> > -- > > >> > MST > > >> > > > >> > --------------------------------------------------------------------- > > >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-21 2:23 ` Michael S. Tsirkin 2018-09-21 2:34 ` Michael S. Tsirkin @ 2018-09-27 0:18 ` Siwei Liu 2018-09-27 7:17 ` Sameeh Jubran 2018-09-27 16:32 ` Michael S. Tsirkin 1 sibling, 2 replies; 85+ messages in thread From: Siwei Liu @ 2018-09-27 0:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote: >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400 >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >> >> > > >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: >> >> >> > > > > >> >> >> > > > > >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net >> >> >> > > > > > > device to act as a standby for another device with the same MAC address. >> >> >> > > > > > > >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 >> >> >> > > > > > Applied but when do you plan to add documentation as pointed >> >> >> > > > > > out by Jan and Halil? >> >> >> > > > > >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement >> >> >> > > > > patches and i hope someone in RH is looking into it. >> >> >> > > > > >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in >> >> >> > > > > the spec >> >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html >> >> >> > > > >> >> >> > > > >> >> >> > > > I do not think this will address the comments posted. Specifically we >> >> >> > > > should probably include documentation for what is a standby and primary: >> >> >> > > > what is expected of driver (maintain configuration on standby, support >> >> >> > > > primary coming and going, transmit on standby only if there is no >> >> >> > > > primary) and of device (have same mac for standby as for standby). >> >> >> > > >> >> >> > > Yes, we need some definitive statements of what a driver and a device >> >> >> > > is supposed to do in order to conform; it might make sense to discuss >> >> >> > > this in conjunction with discussion on any QEMU patches (have not >> >> >> > > checked whether anything has been posted, just returned from vacation). >> >> >> > > >> >> >> > > I assume that we still stick with the plan to implement/document >> >> >> > > MAC-based handling first and then enhance with other methods later? >> >> >> > >> >> >> > I'm fine with that at least. If someone wants to work on >> >> >> > other methods straight away, that's also fine by me. >> >> >> >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you >> >> >> thinking of some other method? >> >> >> >> >> >> Venu >> >> >> >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html >> >> >> >> >> > >> >> > Yes, the grouping mechanism seems fine to me (I don't remember >> >> > about the implementation, it's been a while). >> >> > >> >> > It is not by itself sufficient though, is it? >> >> >> >> I do understand that the group ID patch is incomplete though it's a >> >> base patch for the real work. >> >> >> >> > >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor >> >> > rediscovery, right? >> >> >> >> True, but does this really need to be part of the guest-host >> >> interface? Or rather, I don't see how MAC based matching can be done >> >> on the host part. >> > >> > mac address matching does not need to affect host side. >> >> Did you realize that the host side can't have duplicate MAC address >> filters for both PV and VF at the same time? >> >> If hot adding a VF with duplicate MAC address filter programmed in >> prior, the PV path for virtio in the host side is effectively >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt >> does not mean it's ready and usable in the guest. You end up with >> unusable guest networking, *temporarily only when VF is successfully >> probed and properly enslabed*. As of now, no guest-host handshake was >> defined in the spec to make virtio driver aware of hotplug event thus >> VF's exposure, and zero handshake was done to switch the datapath when >> VF driver is ready and usable in guest. The current implementation >> relies on the lucky side that all the entire hot plug process will be >> successul in the guest. > > I think it's a PF bug then. PF driver should ignore filters > for VFs which have not been enabled by guest since reset. Even so, the fact is that if the design is tied to MAC based matching you end up with relying on that MAC address to pair device, which loses the flexibility to move MAC filter at some point later after assigning VF to guest. > >> BTW netvsc mitigate potential failure in the hotplug and driver >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH >> hypercall (VMbus message) when VF driver is enslaved and ready, only >> then hypervisor will kick off datapath switching by moving the MAC >> address filter. > > We can do it without need for PV. We can detect e.g. bus master enable. I'm not sure if it's valid to assume master enable/disable is the right point to move the filter, although it improves a bit than do nothing. The thing is that from device (QEMU) perspective it knows nothing and should not assume too much about guest implementation - the time to move the filter around means the VF driver is fully ready in guest and properly handled by the bond driver (net_failover), so the primary can take over the datapath going forward. While the bus master enable usually happens earlier than that, which does not indicate anything about readiness on the control side that the bond/failver driver can actually see this VF and "manage" it. This strictly does not form any guest-host handshake to me. Think about what if VM user changes VF to a different netns, or rebind it to DPDK PMD? These just demostrate a few things that can get well covered by this design, and I suspect the errors in the real life would be much more complex. > Move the filter when enabled, move it back when disabled e.g. by > VF reset. Or maybe MSE, or both. MSE is on the PF and shared by all VFs, why it's relevant? > >> > >> >> Are you going to expose MAC address to VFIO? >> > >> > If mac of a VF is programmed by libvirt through the PF >> > (that's already the case), VFIO does not need to care about it. >> > >> >> >> >> The thing is the current MAC based implementation has intrinsic flaw >> >> that doesn't propagate errors to hypervisor, or there's no back >> >> channel for guest to unwind the hot plug action upon failure in >> >> probing or enslaving the primary. >> > >> > I guess you can eject the primary if you like. But >> > why does hypervisor need to know? On error, just don't use primary, >> > use standby. >> >> Forget about the grouping mechanism first. > > OK :) > >> What guest kernel change do >> you propose to make virtio driver know every possible error, think >> about how many moving targets it needs to specifically track with or >> has to depend on during the hot plug and driver probing process? If >> someone starts to implement the code and think about various error >> cases as a whole, I bet it would be more clear why grouping is >> relevant in the first place. >> >> -Siwei > > It just seems that no one's been motivated to do it so far. It's just that the MAC matching design is simply too broken. We have root disk hosted on networked storage, i.e. iSCSI, that can't tolerate any potential network failure if the design itself is not error proof. IOW our criteria for network downtime and errors is super rigorous.. -Siwei > >> > >> >> If you think about a more robust >> >> implementation, another grouping mechanism rather than MAC is pretty >> >> much required. >> >> >> >> Thanks, >> >> -Siwei >> > >> > I don't really know what is the flaw, or how is it fixed by a grouping >> > mechanism. All this motivation was never described as part of work on >> > an alternate grouping. >> > >> >> > If true that implies that to avoid guest confusion visibility of the >> >> > primary needs to be controlled by standby's driver. >> >> > This makes this patchset incomplete. >> >> > >> >> > For this work to be complete what is needed is: >> >> > - hypervisor: add control of primary's visibility to guest >> >> > - guest: add support for this grouping to the failover driver >> >> > >> >> > We also need >> >> > - spec: document matching rules based on the pci bridge >> >> > >> >> > and it's helpful to have a spec proposal with implementation, but I >> >> > would say at least proposed patches to one of the above 2 would be >> >> > helpful before we include this in spec. >> >> > >> >> > -- >> >> > MST >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-27 0:18 ` Siwei Liu @ 2018-09-27 7:17 ` Sameeh Jubran 2018-09-27 16:17 ` Michael S. Tsirkin 2018-09-27 16:32 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-09-27 7:17 UTC (permalink / raw) To: loseweigh Cc: Michael S. Tsirkin, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev What do you think about the following alternative implementation which uses cross id validation. -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device> On Thu, Sep 27, 2018 at 3:19 AM Siwei Liu <loseweigh@gmail.com> wrote: > > On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote: > >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: > >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: > >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400 > >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> >> >> > > > >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > >> >> >> > > > > > > device to act as a standby for another device with the same MAC address. > >> >> >> > > > > > > > >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > >> >> >> > > > > > Applied but when do you plan to add documentation as pointed > >> >> >> > > > > > out by Jan and Halil? > >> >> >> > > > > > >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement > >> >> >> > > > > patches and i hope someone in RH is looking into it. > >> >> >> > > > > > >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in > >> >> >> > > > > the spec > >> >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > I do not think this will address the comments posted. Specifically we > >> >> >> > > > should probably include documentation for what is a standby and primary: > >> >> >> > > > what is expected of driver (maintain configuration on standby, support > >> >> >> > > > primary coming and going, transmit on standby only if there is no > >> >> >> > > > primary) and of device (have same mac for standby as for standby). > >> >> >> > > > >> >> >> > > Yes, we need some definitive statements of what a driver and a device > >> >> >> > > is supposed to do in order to conform; it might make sense to discuss > >> >> >> > > this in conjunction with discussion on any QEMU patches (have not > >> >> >> > > checked whether anything has been posted, just returned from vacation). > >> >> >> > > > >> >> >> > > I assume that we still stick with the plan to implement/document > >> >> >> > > MAC-based handling first and then enhance with other methods later? > >> >> >> > > >> >> >> > I'm fine with that at least. If someone wants to work on > >> >> >> > other methods straight away, that's also fine by me. > >> >> >> > >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you > >> >> >> thinking of some other method? > >> >> >> > >> >> >> Venu > >> >> >> > >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html > >> >> >> > >> >> > > >> >> > Yes, the grouping mechanism seems fine to me (I don't remember > >> >> > about the implementation, it's been a while). > >> >> > > >> >> > It is not by itself sufficient though, is it? > >> >> > >> >> I do understand that the group ID patch is incomplete though it's a > >> >> base patch for the real work. > >> >> > >> >> > > >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor > >> >> > rediscovery, right? > >> >> > >> >> True, but does this really need to be part of the guest-host > >> >> interface? Or rather, I don't see how MAC based matching can be done > >> >> on the host part. > >> > > >> > mac address matching does not need to affect host side. > >> > >> Did you realize that the host side can't have duplicate MAC address > >> filters for both PV and VF at the same time? > >> > >> If hot adding a VF with duplicate MAC address filter programmed in > >> prior, the PV path for virtio in the host side is effectively > >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt > >> does not mean it's ready and usable in the guest. You end up with > >> unusable guest networking, *temporarily only when VF is successfully > >> probed and properly enslabed*. As of now, no guest-host handshake was > >> defined in the spec to make virtio driver aware of hotplug event thus > >> VF's exposure, and zero handshake was done to switch the datapath when > >> VF driver is ready and usable in guest. The current implementation > >> relies on the lucky side that all the entire hot plug process will be > >> successul in the guest. > > > > I think it's a PF bug then. PF driver should ignore filters > > for VFs which have not been enabled by guest since reset. > > Even so, the fact is that if the design is tied to MAC based matching > you end up with relying on that MAC address to pair device, which > loses the flexibility to move MAC filter at some point later after > assigning VF to guest. > > > > >> BTW netvsc mitigate potential failure in the hotplug and driver > >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH > >> hypercall (VMbus message) when VF driver is enslaved and ready, only > >> then hypervisor will kick off datapath switching by moving the MAC > >> address filter. > > > > We can do it without need for PV. We can detect e.g. bus master enable. > > I'm not sure if it's valid to assume master enable/disable is the > right point to move the filter, although it improves a bit than do > nothing. The thing is that from device (QEMU) perspective it knows > nothing and should not assume too much about guest implementation - > the time to move the filter around means the VF driver is fully ready > in guest and properly handled by the bond driver (net_failover), so > the primary can take over the datapath going forward. While the bus > master enable usually happens earlier than that, which does not > indicate anything about readiness on the control side that the > bond/failver driver can actually see this VF and "manage" it. This > strictly does not form any guest-host handshake to me. Think about > what if VM user changes VF to a different netns, or rebind it to DPDK > PMD? These just demostrate a few things that can get well covered by > this design, and I suspect the errors in the real life would be much > more complex. > > > Move the filter when enabled, move it back when disabled e.g. by > > VF reset. Or maybe MSE, or both. > > MSE is on the PF and shared by all VFs, why it's relevant? > > > > >> > > >> >> Are you going to expose MAC address to VFIO? > >> > > >> > If mac of a VF is programmed by libvirt through the PF > >> > (that's already the case), VFIO does not need to care about it. > >> > > >> >> > >> >> The thing is the current MAC based implementation has intrinsic flaw > >> >> that doesn't propagate errors to hypervisor, or there's no back > >> >> channel for guest to unwind the hot plug action upon failure in > >> >> probing or enslaving the primary. > >> > > >> > I guess you can eject the primary if you like. But > >> > why does hypervisor need to know? On error, just don't use primary, > >> > use standby. > >> > >> Forget about the grouping mechanism first. > > > > OK :) > > > >> What guest kernel change do > >> you propose to make virtio driver know every possible error, think > >> about how many moving targets it needs to specifically track with or > >> has to depend on during the hot plug and driver probing process? If > >> someone starts to implement the code and think about various error > >> cases as a whole, I bet it would be more clear why grouping is > >> relevant in the first place. > >> > >> -Siwei > > > > It just seems that no one's been motivated to do it so far. > > It's just that the MAC matching design is simply too broken. We have > root disk hosted on networked storage, i.e. iSCSI, that can't tolerate > any potential network failure if the design itself is not error proof. > IOW our criteria for network downtime and errors is super rigorous.. > > -Siwei > > > > >> > > >> >> If you think about a more robust > >> >> implementation, another grouping mechanism rather than MAC is pretty > >> >> much required. > >> >> > >> >> Thanks, > >> >> -Siwei > >> > > >> > I don't really know what is the flaw, or how is it fixed by a grouping > >> > mechanism. All this motivation was never described as part of work on > >> > an alternate grouping. > >> > > >> >> > If true that implies that to avoid guest confusion visibility of the > >> >> > primary needs to be controlled by standby's driver. > >> >> > This makes this patchset incomplete. > >> >> > > >> >> > For this work to be complete what is needed is: > >> >> > - hypervisor: add control of primary's visibility to guest > >> >> > - guest: add support for this grouping to the failover driver > >> >> > > >> >> > We also need > >> >> > - spec: document matching rules based on the pci bridge > >> >> > > >> >> > and it's helpful to have a spec proposal with implementation, but I > >> >> > would say at least proposed patches to one of the above 2 would be > >> >> > helpful before we include this in spec. > >> >> > > >> >> > -- > >> >> > MST > >> >> > > >> >> > --------------------------------------------------------------------- > >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-27 7:17 ` Sameeh Jubran @ 2018-09-27 16:17 ` Michael S. Tsirkin 2018-09-27 17:23 ` Samudrala, Sridhar 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-27 16:17 UTC (permalink / raw) To: Sameeh Jubran Cc: loseweigh, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote: > What do you think about the following alternative implementation which > uses cross id validation. > > -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> > -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device> virtio is a standby device, isn't it? Besides that I don't see issues with this API. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-27 16:17 ` Michael S. Tsirkin @ 2018-09-27 17:23 ` Samudrala, Sridhar 2018-09-27 23:45 ` Michael S. Tsirkin 2018-09-30 9:17 ` Sameeh Jubran 0 siblings, 2 replies; 85+ messages in thread From: Samudrala, Sridhar @ 2018-09-27 17:23 UTC (permalink / raw) To: Michael S. Tsirkin, Sameeh Jubran Cc: loseweigh, venu.busireddy, cohuck, virtio-dev On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote: > On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote: >> What do you think about the following alternative implementation which >> uses cross id validation. >> >> -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> >> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device> > virtio is a standby device, isn't it? > > Besides that I don't see issues with this API. Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work. -device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> -vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device> It should be OK to have virtio-net in standby mode without an associated vfio primary device, but a vfio primary device should not allowed without a virtio-net standby device. Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM? Thanks Sridhar --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-27 17:23 ` Samudrala, Sridhar @ 2018-09-27 23:45 ` Michael S. Tsirkin 2018-09-30 9:17 ` Sameeh Jubran 1 sibling, 0 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-27 23:45 UTC (permalink / raw) To: Samudrala, Sridhar Cc: Sameeh Jubran, loseweigh, venu.busireddy, cohuck, virtio-dev On Thu, Sep 27, 2018 at 10:23:17AM -0700, Samudrala, Sridhar wrote: > > On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote: > > On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote: > > > What do you think about the following alternative implementation which > > > uses cross id validation. > > > > > > -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> > > > -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device> > > virtio is a standby device, isn't it? > > > > Besides that I don't see issues with this API. > > Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work. > > -device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> > -vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device> > > It should be OK to have virtio-net in standby mode without an associated vfio primary device, > but a vfio primary device should not allowed without a virtio-net standby device. > > Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM? > > Thanks > Sridhar > > We can request removal, yes. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-27 17:23 ` Samudrala, Sridhar 2018-09-27 23:45 ` Michael S. Tsirkin @ 2018-09-30 9:17 ` Sameeh Jubran 2018-09-30 13:50 ` Sameeh Jubran 1 sibling, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-09-30 9:17 UTC (permalink / raw) To: sridhar.samudrala Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck, virtio-dev On Thu, Sep 27, 2018 at 8:25 PM Samudrala, Sridhar <sridhar.samudrala@intel.com> wrote: > > > On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote: > > On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote: > >> What do you think about the following alternative implementation which > >> uses cross id validation. > >> > >> -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> > >> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device> > > virtio is a standby device, isn't it? > > > > Besides that I don't see issues with this API. > > Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work. hmm, I thought about standby being a property that virtio-net has, and this property is that it is a standby for the primary device and vice versa for the vfio. However this can be viewed from different aspects and isn't a deal breaker :) > > -device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> > -vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device> > > It should be OK to have virtio-net in standby mode without an associated vfio primary device, > but a vfio primary device should not allowed without a virtio-net standby device. > > Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM? > > Thanks > Sridhar > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-30 9:17 ` Sameeh Jubran @ 2018-09-30 13:50 ` Sameeh Jubran 0 siblings, 0 replies; 85+ messages in thread From: Sameeh Jubran @ 2018-09-30 13:50 UTC (permalink / raw) To: sridhar.samudrala Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck, virtio-dev I have created the following pacth which implements the basic functionality of hiding and plugging the primary device upon acking the standby feature. It's not currently implemented for vfio devices but for e1000 as I don't have access to vfio device on my current setup, however the implementation should be similar. I am facing an issue with hotplug handler being NULL whe calling qdev_get_hotplug_handler even though it did work for me before, if anyone has any idea what the issue could be it would be much appreciated. I think there should be a structure which describes the state of the primary device in order to implement the migration feature. Please share your thoughts and insights commit 39a350ee65a26ab6ede4c08b3ca3b9e945fcf305 (HEAD -> failover) Author: Sameeh Jubran <sjubran@redhat.com> Date: Sun Sep 16 13:21:41 2018 +0300 virtio-net: Implement standby feature Signed-off-by: Sameeh Jubran <sjubran@redhat.com> diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 13a9494a8d..026b8631ed 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -36,6 +36,8 @@ #include "qemu/range.h" #include "e1000x_common.h" +#include "hw/virtio/virtio-net.h" +#include "hw/virtio/virtio-pci.h" static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; @@ -118,6 +120,7 @@ typedef struct E1000State_st { bool mit_timer_on; /* Mitigation timer is running. */ bool mit_irq_level; /* Tracks interrupt pin level. */ uint32_t mit_ide; /* Tracks E1000_TXD_CMD_IDE bit. */ + char *primary_id_str; /* Compatibility flags for migration to/from qemu 1.3.0 and older */ #define E1000_FLAG_AUTONEG_BIT 0 @@ -1652,9 +1655,16 @@ static void e1000_write_config(PCIDevice *pci_dev, uint32_t address, } } +static bool standby_device_present(const char *id, + struct PCIDevice **pdev) +{ + return pci_qdev_find_device(id, pdev) >= 0; +} + static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp) { DeviceState *dev = DEVICE(pci_dev); + PCIDevice *standby_pci_dev; E1000State *d = E1000(pci_dev); uint8_t *pci_conf; uint8_t *macaddr; @@ -1690,6 +1700,12 @@ static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp) d->autoneg_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, e1000_autoneg_timer, d); d->mit_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, e1000_mit_timer, d); + if (d->primary_id_str && standby_device_present( + d->primary_id_str, &standby_pci_dev) && standby_pci_dev) { + VirtIOPCIProxy *proxy = VIRTIO_PCI(standby_pci_dev); + VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); + virtio_net_register_primary_device(DEVICE(vdev), dev); + } } static void qdev_e1000_reset(DeviceState *dev) @@ -1708,6 +1724,7 @@ static Property e1000_properties[] = { compat_flags, E1000_FLAG_MAC_BIT, true), DEFINE_PROP_BIT("migrate_tso_props", E1000State, compat_flags, E1000_FLAG_TSO_BIT, true), + DEFINE_PROP_STRING("primary", E1000State, primary_id_str), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index f154756e85..fbe10f4fe1 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -26,7 +26,9 @@ #include "qapi/qapi-events-net.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" +#include "hw/pci/pci.h" #include "standard-headers/linux/ethtool.h" +#include "hw/vfio/vfio-common.h" #define VIRTIO_NET_VM_VERSION 11 @@ -312,9 +314,14 @@ static void virtio_net_set_link_status(NetClientState *nc) uint16_t old_status = n->status; if (nc->link_down) + { n->status &= ~VIRTIO_NET_S_LINK_UP; + } else + { + n->status |= VIRTIO_NET_S_LINK_UP; + } if (n->status != old_status) virtio_notify_config(vdev); @@ -721,6 +728,16 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features) } else { memset(n->vlans, 0xff, MAX_VLAN >> 3); } + + if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) { + Error * errp; + DeviceState *pdev = DEVICE(n->primary_pdev); + DeviceClass *klass = DEVICE_GET_CLASS(pdev); + if (klass->hotpluggable && n->primary_hph) + { + hotplug_handler_plug(n->primary_hph, pdev, &errp); + } + } } static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, @@ -1946,6 +1963,41 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name, n->netclient_type = g_strdup(type); } +static bool primary_device_present(const char *id, struct PCIDevice **pdev) +{ + return pci_qdev_find_device(id, pdev) >= 0 && + vfio_is_vfio_pci(*pdev); +} + +bool virtio_net_register_primary_device(DeviceState *dev, DeviceState *primary_dev) +{ + bool ret = false; + VirtIONet *n = VIRTIO_NET(dev); + Error *errp; + DeviceClass *klass = DEVICE_GET_CLASS(primary_dev); + + if (n->primary_pdev == NULL) + { + n->primary_pdev = PCI_DEVICE(primary_dev); + } + + if (n->primary_hph == NULL) + { + n->primary_hph = qdev_get_hotplug_handler(primary_dev); + } + + /* Hide standby from pci till the feature is acked */ + if (klass->hotpluggable && n->primary_hph) + { + object_ref(OBJECT(primary_dev)); + qdev_simple_device_unplug_cb(n->primary_hph, primary_dev, &errp); + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); + ret = true; + } + + return ret; +} + static void virtio_net_device_realize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); @@ -1976,6 +2028,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); } + if (n->net_conf.standby_id_str && primary_device_present( + n->net_conf.standby_id_str, &n->primary_pdev)) { + virtio_net_register_primary_device(dev, DEVICE(n->primary_pdev)); + } + virtio_net_set_config_size(n, n->host_features); virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); @@ -2198,6 +2255,7 @@ static Property virtio_net_properties[] = { true), DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), commit 39a350ee65a26ab6ede4c08b3ca3b9e945fcf305 (HEAD -> failover) Author: Sameeh Jubran <sjubran@redhat.com> Date: Sun Sep 16 13:21:41 2018 +0300 virtio-net: Implement standby feature Signed-off-by: Sameeh Jubran <sjubran@redhat.com> diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 13a9494a8d..026b8631ed 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -36,6 +36,8 @@ #include "qemu/range.h" #include "e1000x_common.h" +#include "hw/virtio/virtio-net.h" +#include "hw/virtio/virtio-pci.h" static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; @@ -118,6 +120,7 @@ typedef struct E1000State_st { bool mit_timer_on; /* Mitigation timer is running. */ bool mit_irq_level; /* Tracks interrupt pin level. */ uint32_t mit_ide; /* Tracks E1000_TXD_CMD_IDE bit. */ + char *primary_id_str; /* Compatibility flags for migration to/from qemu 1.3.0 and older */ #define E1000_FLAG_AUTONEG_BIT 0 @@ -1652,9 +1655,16 @@ static void e1000_write_config(PCIDevice *pci_dev, uint32_t address, } } +static bool standby_device_present(const char *id, + struct PCIDevice **pdev) +{ + return pci_qdev_find_device(id, pdev) >= 0; +} + static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp) { DeviceState *dev = DEVICE(pci_dev); + PCIDevice *standby_pci_dev; E1000State *d = E1000(pci_dev); uint8_t *pci_conf; uint8_t *macaddr; @@ -1690,6 +1700,12 @@ static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp) d->autoneg_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, e1000_autoneg_timer, d); d->mit_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, e1000_mit_timer, d); + if (d->primary_id_str && standby_device_present( + d->primary_id_str, &standby_pci_dev) && standby_pci_dev) { + VirtIOPCIProxy *proxy = VIRTIO_PCI(standby_pci_dev); + VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); + virtio_net_register_primary_device(DEVICE(vdev), dev); + } } static void qdev_e1000_reset(DeviceState *dev) @@ -1708,6 +1724,7 @@ static Property e1000_properties[] = { compat_flags, E1000_FLAG_MAC_BIT, true), DEFINE_PROP_BIT("migrate_tso_props", E1000State, compat_flags, E1000_FLAG_TSO_BIT, true), + DEFINE_PROP_STRING("primary", E1000State, primary_id_str), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index f154756e85..fbe10f4fe1 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -26,7 +26,9 @@ #include "qapi/qapi-events-net.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" +#include "hw/pci/pci.h" #include "standard-headers/linux/ethtool.h" +#include "hw/vfio/vfio-common.h" #define VIRTIO_NET_VM_VERSION 11 @@ -312,9 +314,14 @@ static void virtio_net_set_link_status(NetClientState *nc) uint16_t old_status = n->status; if (nc->link_down) + { n->status &= ~VIRTIO_NET_S_LINK_UP; + } else + { + n->status |= VIRTIO_NET_S_LINK_UP; + } if (n->status != old_status) virtio_notify_config(vdev); @@ -721,6 +728,16 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features) } else { memset(n->vlans, 0xff, MAX_VLAN >> 3); } + + if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) { + Error * errp; + DeviceState *pdev = DEVICE(n->primary_pdev); + DeviceClass *klass = DEVICE_GET_CLASS(pdev); + if (klass->hotpluggable && n->primary_hph) + { + hotplug_handler_plug(n->primary_hph, pdev, &errp); + } + } } static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, @@ -1946,6 +1963,41 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name, n->netclient_type = g_strdup(type); } +static bool primary_device_present(const char *id, struct PCIDevice **pdev) +{ + return pci_qdev_find_device(id, pdev) >= 0 && + vfio_is_vfio_pci(*pdev); +} + +bool virtio_net_register_primary_device(DeviceState *dev, DeviceState *primary_dev) +{ + bool ret = false; + VirtIONet *n = VIRTIO_NET(dev); + Error *errp; + DeviceClass *klass = DEVICE_GET_CLASS(primary_dev); + + if (n->primary_pdev == NULL) + { + n->primary_pdev = PCI_DEVICE(primary_dev); + } + + if (n->primary_hph == NULL) + { + n->primary_hph = qdev_get_hotplug_handler(primary_dev); + } + + /* Hide standby from pci till the feature is acked */ + if (klass->hotpluggable && n->primary_hph) + { + object_ref(OBJECT(primary_dev)); + qdev_simple_device_unplug_cb(n->primary_hph, primary_dev, &errp); + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); + ret = true; + } + + return ret; +} + static void virtio_net_device_realize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); @@ -1976,6 +2028,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); } + if (n->net_conf.standby_id_str && primary_device_present( + n->net_conf.standby_id_str, &n->primary_pdev)) { + virtio_net_register_primary_device(dev, DEVICE(n->primary_pdev)); + } + virtio_net_set_config_size(n, n->host_features); virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); @@ -2198,6 +2255,7 @@ static Property virtio_net_properties[] = { true), DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 866f0deeb7..593debe56e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) #endif } +bool vfio_is_vfio_pci(PCIDevice* pdev) +{ + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; +} + static void vfio_intx_update(PCIDevice *pdev) { VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 821def0565..26dfde805f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, hwaddr *pgsize); int vfio_spapr_remove_window(VFIOContainer *container, hwaddr offset_within_address_space); +bool vfio_is_vfio_pci(PCIDevice* pdev); #endif /* HW_VFIO_VFIO_COMMON_H */ diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 4d7f3c82ca..3b86f17805 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -42,6 +42,7 @@ typedef struct virtio_net_conf int32_t speed; char *duplex_str; uint8_t duplex; + char *standby_id_str; } virtio_net_conf; /* Maximum packet size we can receive from tap device: header + 64k */ @@ -103,9 +104,13 @@ typedef struct VirtIONet { int announce_counter; bool needs_vnet_hdr_swap; bool mtu_bypass_backend; + PCIDevice *primary_pdev; + HotplugHandler *primary_hph; } VirtIONet; void virtio_net_set_netclient_name(VirtIONet *n, const char *name, const char *type); +bool virtio_net_register_primary_device(DeviceState *vdev, DeviceState *pdev); + #endif On Sun, Sep 30, 2018 at 12:17 PM Sameeh Jubran <sameeh@daynix.com> wrote: > > On Thu, Sep 27, 2018 at 8:25 PM Samudrala, Sridhar > <sridhar.samudrala@intel.com> wrote: > > > > > > On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote: > > > On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote: > > >> What do you think about the following alternative implementation which > > >> uses cross id validation. > > >> > > >> -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> > > >> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device> > > > virtio is a standby device, isn't it? > > > > > > Besides that I don't see issues with this API. > > > > Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work. > hmm, I thought about standby being a property that virtio-net has, and > this property is that it is a standby for the > primary device and vice versa for the vfio. However this can be viewed > from different aspects and isn't a deal breaker :) > > > > -device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device> > > -vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device> > > > > It should be OK to have virtio-net in standby mode without an associated vfio primary device, > > but a vfio primary device should not allowed without a virtio-net standby device. > > > > Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM? > > > > Thanks > > Sridhar > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > > > -- > Respectfully, > Sameeh Jubran > Linkedin > Software Engineer @ Daynix. -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-27 0:18 ` Siwei Liu 2018-09-27 7:17 ` Sameeh Jubran @ 2018-09-27 16:32 ` Michael S. Tsirkin 2018-10-02 8:42 ` Siwei Liu 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-09-27 16:32 UTC (permalink / raw) To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Wed, Sep 26, 2018 at 05:18:38PM -0700, Siwei Liu wrote: > On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote: > >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: > >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: > >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400 > >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> >> >> > > > >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > >> >> >> > > > > > > device to act as a standby for another device with the same MAC address. > >> >> >> > > > > > > > >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > >> >> >> > > > > > Applied but when do you plan to add documentation as pointed > >> >> >> > > > > > out by Jan and Halil? > >> >> >> > > > > > >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement > >> >> >> > > > > patches and i hope someone in RH is looking into it. > >> >> >> > > > > > >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in > >> >> >> > > > > the spec > >> >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > I do not think this will address the comments posted. Specifically we > >> >> >> > > > should probably include documentation for what is a standby and primary: > >> >> >> > > > what is expected of driver (maintain configuration on standby, support > >> >> >> > > > primary coming and going, transmit on standby only if there is no > >> >> >> > > > primary) and of device (have same mac for standby as for standby). > >> >> >> > > > >> >> >> > > Yes, we need some definitive statements of what a driver and a device > >> >> >> > > is supposed to do in order to conform; it might make sense to discuss > >> >> >> > > this in conjunction with discussion on any QEMU patches (have not > >> >> >> > > checked whether anything has been posted, just returned from vacation). > >> >> >> > > > >> >> >> > > I assume that we still stick with the plan to implement/document > >> >> >> > > MAC-based handling first and then enhance with other methods later? > >> >> >> > > >> >> >> > I'm fine with that at least. If someone wants to work on > >> >> >> > other methods straight away, that's also fine by me. > >> >> >> > >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you > >> >> >> thinking of some other method? > >> >> >> > >> >> >> Venu > >> >> >> > >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html > >> >> >> > >> >> > > >> >> > Yes, the grouping mechanism seems fine to me (I don't remember > >> >> > about the implementation, it's been a while). > >> >> > > >> >> > It is not by itself sufficient though, is it? > >> >> > >> >> I do understand that the group ID patch is incomplete though it's a > >> >> base patch for the real work. > >> >> > >> >> > > >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor > >> >> > rediscovery, right? > >> >> > >> >> True, but does this really need to be part of the guest-host > >> >> interface? Or rather, I don't see how MAC based matching can be done > >> >> on the host part. > >> > > >> > mac address matching does not need to affect host side. > >> > >> Did you realize that the host side can't have duplicate MAC address > >> filters for both PV and VF at the same time? > >> > >> If hot adding a VF with duplicate MAC address filter programmed in > >> prior, the PV path for virtio in the host side is effectively > >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt > >> does not mean it's ready and usable in the guest. You end up with > >> unusable guest networking, *temporarily only when VF is successfully > >> probed and properly enslabed*. As of now, no guest-host handshake was > >> defined in the spec to make virtio driver aware of hotplug event thus > >> VF's exposure, and zero handshake was done to switch the datapath when > >> VF driver is ready and usable in guest. The current implementation > >> relies on the lucky side that all the entire hot plug process will be > >> successul in the guest. > > > > I think it's a PF bug then. PF driver should ignore filters > > for VFs which have not been enabled by guest since reset. > > Even so, the fact is that if the design is tied to MAC based matching > you end up with relying on that MAC address to pair device, which > loses the flexibility to move MAC filter at some point later after > assigning VF to guest. Whatever you use for pairing, you still need to reuse same MAC to avoid redoing neighbour disovery/arp, right? > > > >> BTW netvsc mitigate potential failure in the hotplug and driver > >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH > >> hypercall (VMbus message) when VF driver is enslaved and ready, only > >> then hypervisor will kick off datapath switching by moving the MAC > >> address filter. > > > > We can do it without need for PV. We can detect e.g. bus master enable. > > I'm not sure if it's valid to assume master enable/disable is the > right point to move the filter, although it improves a bit than do > nothing. The thing is that from device (QEMU) perspective it knows > nothing and should not assume too much about guest implementation - > the time to move the filter around means the VF driver is fully ready > in guest and properly handled by the bond driver (net_failover), so > the primary can take over the datapath going forward. While the bus > master enable usually happens earlier than that, which does not > indicate anything about readiness on the control side that the > bond/failver driver can actually see this VF and "manage" it. This > strictly does not form any guest-host handshake to me. Think about > what if VM user changes VF to a different netns, or rebind it to DPDK > PMD? OK. What then? > These just demostrate a few things that can get well covered by > this design, and I suspect the errors in the real life would be much > more complex. What's missing is actual design though. The only thing that I saw so far is bridge group identifier for qemu, which is a start but doesn't actually solve any problems by itself. You even said "forget about the grouping mechanism" yourself below. > > Move the filter when enabled, move it back when disabled e.g. by > > VF reset. Or maybe MSE, or both. > > MSE is on the PF and shared by all VFs, why it's relevant? Oh, right. Just FLR then. > > > >> > > >> >> Are you going to expose MAC address to VFIO? > >> > > >> > If mac of a VF is programmed by libvirt through the PF > >> > (that's already the case), VFIO does not need to care about it. > >> > > >> >> > >> >> The thing is the current MAC based implementation has intrinsic flaw > >> >> that doesn't propagate errors to hypervisor, or there's no back > >> >> channel for guest to unwind the hot plug action upon failure in > >> >> probing or enslaving the primary. > >> > > >> > I guess you can eject the primary if you like. But > >> > why does hypervisor need to know? On error, just don't use primary, > >> > use standby. > >> > >> Forget about the grouping mechanism first. > > > > OK :) > > > >> What guest kernel change do > >> you propose to make virtio driver know every possible error, think > >> about how many moving targets it needs to specifically track with or > >> has to depend on during the hot plug and driver probing process? If > >> someone starts to implement the code and think about various error > >> cases as a whole, I bet it would be more clear why grouping is > >> relevant in the first place. > >> > >> -Siwei > > > > It just seems that no one's been motivated to do it so far. > > It's just that the MAC matching design is simply too broken. Too broken to even bother coding up any alternatives? > We have > root disk hosted on networked storage, i.e. iSCSI, that can't tolerate > any potential network failure if the design itself is not error proof. > IOW our criteria for network downtime and errors is super rigorous.. > > -Siwei IMO that's a very interesting usecase to address! I'll be happy to merge patches that help reduce downtime, spec-wise I'll be happy to propose them for TC vote. > > > >> > > >> >> If you think about a more robust > >> >> implementation, another grouping mechanism rather than MAC is pretty > >> >> much required. > >> >> > >> >> Thanks, > >> >> -Siwei > >> > > >> > I don't really know what is the flaw, or how is it fixed by a grouping > >> > mechanism. All this motivation was never described as part of work on > >> > an alternate grouping. > >> > > >> >> > If true that implies that to avoid guest confusion visibility of the > >> >> > primary needs to be controlled by standby's driver. > >> >> > This makes this patchset incomplete. > >> >> > > >> >> > For this work to be complete what is needed is: > >> >> > - hypervisor: add control of primary's visibility to guest > >> >> > - guest: add support for this grouping to the failover driver > >> >> > > >> >> > We also need > >> >> > - spec: document matching rules based on the pci bridge > >> >> > > >> >> > and it's helpful to have a spec proposal with implementation, but I > >> >> > would say at least proposed patches to one of the above 2 would be > >> >> > helpful before we include this in spec. > >> >> > > >> >> > -- > >> >> > MST > >> >> > > >> >> > --------------------------------------------------------------------- > >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-09-27 16:32 ` Michael S. Tsirkin @ 2018-10-02 8:42 ` Siwei Liu 2018-10-02 12:43 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Siwei Liu @ 2018-10-02 8:42 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Thu, Sep 27, 2018 at 9:32 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Sep 26, 2018 at 05:18:38PM -0700, Siwei Liu wrote: > > On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote: > > >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote: > > >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote: > > >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote: > > >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote: > > >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400 > > >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > >> >> >> > > > > >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote: > > >> >> >> > > > > > > >> >> >> > > > > > > >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote: > > >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote: > > >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net > > >> >> >> > > > > > > device to act as a standby for another device with the same MAC address. > > >> >> >> > > > > > > > > >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com> > > >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18 > > >> >> >> > > > > > Applied but when do you plan to add documentation as pointed > > >> >> >> > > > > > out by Jan and Halil? > > >> >> >> > > > > > > >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement > > >> >> >> > > > > patches and i hope someone in RH is looking into it. > > >> >> >> > > > > > > >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in > > >> >> >> > > > > the spec > > >> >> >> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > >> >> >> > > > > > >> >> >> > > > > > >> >> >> > > > I do not think this will address the comments posted. Specifically we > > >> >> >> > > > should probably include documentation for what is a standby and primary: > > >> >> >> > > > what is expected of driver (maintain configuration on standby, support > > >> >> >> > > > primary coming and going, transmit on standby only if there is no > > >> >> >> > > > primary) and of device (have same mac for standby as for standby). > > >> >> >> > > > > >> >> >> > > Yes, we need some definitive statements of what a driver and a device > > >> >> >> > > is supposed to do in order to conform; it might make sense to discuss > > >> >> >> > > this in conjunction with discussion on any QEMU patches (have not > > >> >> >> > > checked whether anything has been posted, just returned from vacation). > > >> >> >> > > > > >> >> >> > > I assume that we still stick with the plan to implement/document > > >> >> >> > > MAC-based handling first and then enhance with other methods later? > > >> >> >> > > > >> >> >> > I'm fine with that at least. If someone wants to work on > > >> >> >> > other methods straight away, that's also fine by me. > > >> >> >> > > >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you > > >> >> >> thinking of some other method? > > >> >> >> > > >> >> >> Venu > > >> >> >> > > >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html > > >> >> >> > > >> >> > > > >> >> > Yes, the grouping mechanism seems fine to me (I don't remember > > >> >> > about the implementation, it's been a while). > > >> >> > > > >> >> > It is not by itself sufficient though, is it? > > >> >> > > >> >> I do understand that the group ID patch is incomplete though it's a > > >> >> base patch for the real work. > > >> >> > > >> >> > > > >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor > > >> >> > rediscovery, right? > > >> >> > > >> >> True, but does this really need to be part of the guest-host > > >> >> interface? Or rather, I don't see how MAC based matching can be done > > >> >> on the host part. > > >> > > > >> > mac address matching does not need to affect host side. > > >> > > >> Did you realize that the host side can't have duplicate MAC address > > >> filters for both PV and VF at the same time? > > >> > > >> If hot adding a VF with duplicate MAC address filter programmed in > > >> prior, the PV path for virtio in the host side is effectively > > >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt > > >> does not mean it's ready and usable in the guest. You end up with > > >> unusable guest networking, *temporarily only when VF is successfully > > >> probed and properly enslabed*. As of now, no guest-host handshake was > > >> defined in the spec to make virtio driver aware of hotplug event thus > > >> VF's exposure, and zero handshake was done to switch the datapath when > > >> VF driver is ready and usable in guest. The current implementation > > >> relies on the lucky side that all the entire hot plug process will be > > >> successul in the guest. > > > > > > I think it's a PF bug then. PF driver should ignore filters > > > for VFs which have not been enabled by guest since reset. > > > > Even so, the fact is that if the design is tied to MAC based matching > > you end up with relying on that MAC address to pair device, which > > loses the flexibility to move MAC filter at some point later after > > assigning VF to guest. > > Whatever you use for pairing, you still need to reuse same MAC > to avoid redoing neighbour disovery/arp, right? The VF's MAC can be updated by PF/host on the fly at any time. One can start with a random MAC but use group ID to pair device instead. And only update MAC address to the real one when moving MAC filter around after PV says OK to switch datapath. Do you see any problem with this design? > > > > > > >> BTW netvsc mitigate potential failure in the hotplug and driver > > >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH > > >> hypercall (VMbus message) when VF driver is enslaved and ready, only > > >> then hypervisor will kick off datapath switching by moving the MAC > > >> address filter. > > > > > > We can do it without need for PV. We can detect e.g. bus master enable. > > > > I'm not sure if it's valid to assume master enable/disable is the > > right point to move the filter, although it improves a bit than do > > nothing. The thing is that from device (QEMU) perspective it knows > > nothing and should not assume too much about guest implementation - > > the time to move the filter around means the VF driver is fully ready > > in guest and properly handled by the bond driver (net_failover), so > > the primary can take over the datapath going forward. While the bus > > master enable usually happens earlier than that, which does not > > indicate anything about readiness on the control side that the > > bond/failver driver can actually see this VF and "manage" it. This > > strictly does not form any guest-host handshake to me. Think about > > what if VM user changes VF to a different netns, or rebind it to DPDK > > PMD? > > OK. What then? The guest should have liberty to switch datapath for its own. Host never knows when VF will be ready and useful in guest. The assumption that MAC based matching can blindly switch host datapath at the time of hot plugging is pretty fragile. There's no gurantee of the time or availability for a useful VF path within the VM. I think the number one goal of live migration is to ensure the connections are alive rather than migrate anyway without caring guest activity. > > > These just demostrate a few things that can get well covered by > > this design, and I suspect the errors in the real life would be much > > more complex. > > What's missing is actual design though. The only thing that I saw so far > is bridge group identifier for qemu, which is a start but doesn't > actually solve any problems by itself. You even said "forget about the > grouping mechanism" yourself below. Then please come up with a more robust design sticking to MAC based matching. The current one does not seem appealing at all to run in production. > > > > Move the filter when enabled, move it back when disabled e.g. by > > > VF reset. Or maybe MSE, or both. > > > > MSE is on the PF and shared by all VFs, why it's relevant? > > Oh, right. Just FLR then. > > > > > > >> > > > >> >> Are you going to expose MAC address to VFIO? > > >> > > > >> > If mac of a VF is programmed by libvirt through the PF > > >> > (that's already the case), VFIO does not need to care about it. > > >> > > > >> >> > > >> >> The thing is the current MAC based implementation has intrinsic flaw > > >> >> that doesn't propagate errors to hypervisor, or there's no back > > >> >> channel for guest to unwind the hot plug action upon failure in > > >> >> probing or enslaving the primary. > > >> > > > >> > I guess you can eject the primary if you like. But > > >> > why does hypervisor need to know? On error, just don't use primary, > > >> > use standby. > > >> > > >> Forget about the grouping mechanism first. > > > > > > OK :) > > > > > >> What guest kernel change do > > >> you propose to make virtio driver know every possible error, think > > >> about how many moving targets it needs to specifically track with or > > >> has to depend on during the hot plug and driver probing process? If > > >> someone starts to implement the code and think about various error > > >> cases as a whole, I bet it would be more clear why grouping is > > >> relevant in the first place. > > >> > > >> -Siwei > > > > > > It just seems that no one's been motivated to do it so far. > > > > It's just that the MAC matching design is simply too broken. > > Too broken to even bother coding up any alternatives? No, but we need to make sure everything works for our iSCSI setup before posting patches back. And we've been in active discussions internally for some interesting scenarios and requirements. > > > We have > > root disk hosted on networked storage, i.e. iSCSI, that can't tolerate > > any potential network failure if the design itself is not error proof. > > IOW our criteria for network downtime and errors is super rigorous.. > > > > -Siwei > > IMO that's a very interesting usecase to address! I'll be happy to merge > patches that help reduce downtime, spec-wise I'll be happy to propose > them for TC vote. Good. Hopefully we'll come back soon. :) -Siwei > > > > > > >> > > > >> >> If you think about a more robust > > >> >> implementation, another grouping mechanism rather than MAC is pretty > > >> >> much required. > > >> >> > > >> >> Thanks, > > >> >> -Siwei > > >> > > > >> > I don't really know what is the flaw, or how is it fixed by a grouping > > >> > mechanism. All this motivation was never described as part of work on > > >> > an alternate grouping. > > >> > > > >> >> > If true that implies that to avoid guest confusion visibility of the > > >> >> > primary needs to be controlled by standby's driver. > > >> >> > This makes this patchset incomplete. > > >> >> > > > >> >> > For this work to be complete what is needed is: > > >> >> > - hypervisor: add control of primary's visibility to guest > > >> >> > - guest: add support for this grouping to the failover driver > > >> >> > > > >> >> > We also need > > >> >> > - spec: document matching rules based on the pci bridge > > >> >> > > > >> >> > and it's helpful to have a spec proposal with implementation, but I > > >> >> > would say at least proposed patches to one of the above 2 would be > > >> >> > helpful before we include this in spec. > > >> >> > > > >> >> > -- > > >> >> > MST > > >> >> > > > >> >> > --------------------------------------------------------------------- > > >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-02 8:42 ` Siwei Liu @ 2018-10-02 12:43 ` Michael S. Tsirkin 2018-10-05 0:03 ` Siwei Liu 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-10-02 12:43 UTC (permalink / raw) To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > The VF's MAC can be updated by PF/host on the fly at any time. One can > start with a random MAC but use group ID to pair device instead. And > only update MAC address to the real one when moving MAC filter around > after PV says OK to switch datapath. > > Do you see any problem with this design? Isn't this what I proposed: Maybe we can start VF with a temporary MAC, then change it to a final one when guest tries to use it. It will work but we run into fact that MACs are currently programmed by mgmnt - in many setups qemu does not have the rights to do it. ? If yes I don't see a problem with the interface design, even though implementation wise it's more work as it will have to include management changes. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-02 12:43 ` Michael S. Tsirkin @ 2018-10-05 0:03 ` Siwei Liu 2018-10-05 5:17 ` Samudrala, Sridhar 2018-10-05 19:18 ` Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: Siwei Liu @ 2018-10-05 0:03 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > The VF's MAC can be updated by PF/host on the fly at any time. One can > > start with a random MAC but use group ID to pair device instead. And > > only update MAC address to the real one when moving MAC filter around > > after PV says OK to switch datapath. > > > > Do you see any problem with this design? > > Isn't this what I proposed: > Maybe we can > start VF with a temporary MAC, then change it to a final one when guest > tries to use it. It will work but we run into fact that MACs are > currently programmed by mgmnt - in many setups qemu does not have the > rights to do it. > > ? > > If yes I don't see a problem with the interface design, even though > implementation wise it's more work as it will have to include management > changes. I thought we discussed this design a while back: https://www.spinics.net/lists/netdev/msg512232.html ... plug in a VF with a random MAC filter programmed in prior, and initially use that random MAC within guest. This would require: a) not relying on permanent MAC address to do pairing during the initial discovery, e.g. use the failover group ID as in this discussion b) host to toggle the MAC address filter: which includes taking down the tap device to return the MAC back to PF, followed by assigning that MAC to VF using "ip link ... set vf ..." c) notify guest to reload/reset VF driver for the change of hardware MAC address d) until VF reloads the driver it won't be able to use the datapath, so very short period of network outage is (still) expected though I still don't think this design can elimnate downtime. However, it looks like as of today the MAC matching still haven't addressed the datapath switching and error handling in a clean way. As said, for SR-IOV live migration on iSCSI root disk there will be a lot of dancing parts going along the way, reliable network connectity and dedicated handshakes are critical to this kind of setup. -Siwei > > -- > MST > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-05 0:03 ` Siwei Liu @ 2018-10-05 5:17 ` Samudrala, Sridhar 2018-10-10 14:40 ` Michael S. Tsirkin 2018-10-05 19:18 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Samudrala, Sridhar @ 2018-10-05 5:17 UTC (permalink / raw) To: Siwei Liu, Michael S. Tsirkin; +Cc: Venu Busireddy, Cornelia Huck, virtio-dev On 10/4/2018 5:03 PM, Siwei Liu wrote: > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: >>> The VF's MAC can be updated by PF/host on the fly at any time. One can >>> start with a random MAC but use group ID to pair device instead. And >>> only update MAC address to the real one when moving MAC filter around >>> after PV says OK to switch datapath. >>> >>> Do you see any problem with this design? >> Isn't this what I proposed: >> Maybe we can >> start VF with a temporary MAC, then change it to a final one when guest >> tries to use it. It will work but we run into fact that MACs are >> currently programmed by mgmnt - in many setups qemu does not have the >> rights to do it. >> >> ? >> >> If yes I don't see a problem with the interface design, even though >> implementation wise it's more work as it will have to include management >> changes. > I thought we discussed this design a while back: > https://www.spinics.net/lists/netdev/msg512232.html > > ... plug in a VF with a random MAC filter programmed in prior, and > initially use that random MAC within guest. This would require: > a) not relying on permanent MAC address to do pairing during the > initial discovery, e.g. use the failover group ID as in this > discussion > b) host to toggle the MAC address filter: which includes taking down > the tap device to return the MAC back to PF, followed by assigning > that MAC to VF using "ip link ... set vf ..." > c) notify guest to reload/reset VF driver for the change of hardware MAC address > d) until VF reloads the driver it won't be able to use the datapath, > so very short period of network outage is (still) expected > > though I still don't think this design can elimnate downtime. However, > it looks like as of today the MAC matching still haven't addressed the > datapath switching and error handling in a clean way. I am not sure what is the issue with datapath switching with the net_failover solution. Do you see any issues with the migration management layer to automate the steps that are listed in the example script in the documentation. https://www.kernel.org/doc/html/latest/networking/net_failover.html Now that we are considering making the VF visible only when the standby negotiation is completed, i am not sure why we need a random MAC. > As said, for > SR-IOV live migration on iSCSI root disk there will be a lot of > dancing parts going along the way, reliable network connectity and > dedicated handshakes are critical to this kind of setup. > > -Siwei > >> -- >> MST >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-05 5:17 ` Samudrala, Sridhar @ 2018-10-10 14:40 ` Michael S. Tsirkin 2018-10-11 0:16 ` Samudrala, Sridhar 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-10-10 14:40 UTC (permalink / raw) To: Samudrala, Sridhar; +Cc: Siwei Liu, Venu Busireddy, Cornelia Huck, virtio-dev On Thu, Oct 04, 2018 at 10:17:04PM -0700, Samudrala, Sridhar wrote: > On 10/4/2018 5:03 PM, Siwei Liu wrote: > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can > > > > start with a random MAC but use group ID to pair device instead. And > > > > only update MAC address to the real one when moving MAC filter around > > > > after PV says OK to switch datapath. > > > > > > > > Do you see any problem with this design? > > > Isn't this what I proposed: > > > Maybe we can > > > start VF with a temporary MAC, then change it to a final one when guest > > > tries to use it. It will work but we run into fact that MACs are > > > currently programmed by mgmnt - in many setups qemu does not have the > > > rights to do it. > > > > > > ? > > > > > > If yes I don't see a problem with the interface design, even though > > > implementation wise it's more work as it will have to include management > > > changes. > > I thought we discussed this design a while back: > > https://www.spinics.net/lists/netdev/msg512232.html > > > > ... plug in a VF with a random MAC filter programmed in prior, and > > initially use that random MAC within guest. This would require: > > a) not relying on permanent MAC address to do pairing during the > > initial discovery, e.g. use the failover group ID as in this > > discussion > > b) host to toggle the MAC address filter: which includes taking down > > the tap device to return the MAC back to PF, followed by assigning > > that MAC to VF using "ip link ... set vf ..." > > c) notify guest to reload/reset VF driver for the change of hardware MAC address > > d) until VF reloads the driver it won't be able to use the datapath, > > so very short period of network outage is (still) expected > > > > though I still don't think this design can elimnate downtime. However, > > it looks like as of today the MAC matching still haven't addressed the > > datapath switching and error handling in a clean way. > > I am not sure what is the issue with datapath switching with the net_failover solution. > > Do you see any issues with the migration management layer to automate the steps > that are listed in the example script in the documentation. > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > Now that we are considering making the VF visible only when the standby negotiation > is completed, i am not sure why we need a random MAC. > The claim is that some pfs update MAC RX filter immediately once vf is created, not when its driver attaches. That will mean on hot-plug there is downtime until device is guest visible and driver initialized. Can you confirm that isn't the case for intel cards? > > As said, for > > SR-IOV live migration on iSCSI root disk there will be a lot of > > dancing parts going along the way, reliable network connectity and > > dedicated handshakes are critical to this kind of setup. > > > > -Siwei > > > > > -- > > > MST > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-10 14:40 ` Michael S. Tsirkin @ 2018-10-11 0:16 ` Samudrala, Sridhar 0 siblings, 0 replies; 85+ messages in thread From: Samudrala, Sridhar @ 2018-10-11 0:16 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Siwei Liu, Venu Busireddy, Cornelia Huck, virtio-dev On 10/10/2018 7:40 AM, Michael S. Tsirkin wrote: > On Thu, Oct 04, 2018 at 10:17:04PM -0700, Samudrala, Sridhar wrote: >> On 10/4/2018 5:03 PM, Siwei Liu wrote: >>> On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: >>>> On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: >>>>> The VF's MAC can be updated by PF/host on the fly at any time. One can >>>>> start with a random MAC but use group ID to pair device instead. And >>>>> only update MAC address to the real one when moving MAC filter around >>>>> after PV says OK to switch datapath. >>>>> >>>>> Do you see any problem with this design? >>>> Isn't this what I proposed: >>>> Maybe we can >>>> start VF with a temporary MAC, then change it to a final one when guest >>>> tries to use it. It will work but we run into fact that MACs are >>>> currently programmed by mgmnt - in many setups qemu does not have the >>>> rights to do it. >>>> >>>> ? >>>> >>>> If yes I don't see a problem with the interface design, even though >>>> implementation wise it's more work as it will have to include management >>>> changes. >>> I thought we discussed this design a while back: >>> https://www.spinics.net/lists/netdev/msg512232.html >>> >>> ... plug in a VF with a random MAC filter programmed in prior, and >>> initially use that random MAC within guest. This would require: >>> a) not relying on permanent MAC address to do pairing during the >>> initial discovery, e.g. use the failover group ID as in this >>> discussion >>> b) host to toggle the MAC address filter: which includes taking down >>> the tap device to return the MAC back to PF, followed by assigning >>> that MAC to VF using "ip link ... set vf ..." >>> c) notify guest to reload/reset VF driver for the change of hardware MAC address >>> d) until VF reloads the driver it won't be able to use the datapath, >>> so very short period of network outage is (still) expected >>> >>> though I still don't think this design can elimnate downtime. However, >>> it looks like as of today the MAC matching still haven't addressed the >>> datapath switching and error handling in a clean way. >> I am not sure what is the issue with datapath switching with the net_failover solution. >> >> Do you see any issues with the migration management layer to automate the steps >> that are listed in the example script in the documentation. >> https://www.kernel.org/doc/html/latest/networking/net_failover.html >> >> Now that we are considering making the VF visible only when the standby negotiation >> is completed, i am not sure why we need a random MAC. >> > The claim is that some pfs update MAC RX filter immediately once vf is > created, not when its driver attaches. That will mean on hot-plug there > is downtime until device is guest visible and driver initialized. > > Can you confirm that isn't the case for intel cards? For an untrusted VF, MAC address is assigned by the management layer and is set via ndo_set_vf_mac() call to the PF on the hypervisor. This does cause the MAC RX filter to be programmed immediately. If possible we could delay setting the MAC RX filter until the device is guest visible, but before the driver is loaded. If the VF driver comes up with a random mac before the MAC address is set via PF, it will require a VF reset to get the right MAC which also would also result in downtime. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-05 0:03 ` Siwei Liu 2018-10-05 5:17 ` Samudrala, Sridhar @ 2018-10-05 19:18 ` Michael S. Tsirkin 2018-10-08 22:06 ` Sameeh Jubran 2018-10-11 1:26 ` Siwei Liu 1 sibling, 2 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-10-05 19:18 UTC (permalink / raw) To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote: > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > > The VF's MAC can be updated by PF/host on the fly at any time. One can > > > start with a random MAC but use group ID to pair device instead. And > > > only update MAC address to the real one when moving MAC filter around > > > after PV says OK to switch datapath. > > > > > > Do you see any problem with this design? > > > > Isn't this what I proposed: > > Maybe we can > > start VF with a temporary MAC, then change it to a final one when guest > > tries to use it. It will work but we run into fact that MACs are > > currently programmed by mgmnt - in many setups qemu does not have the > > rights to do it. > > > > ? > > > > If yes I don't see a problem with the interface design, even though > > implementation wise it's more work as it will have to include management > > changes. > > I thought we discussed this design a while back: > https://www.spinics.net/lists/netdev/msg512232.html > > ... plug in a VF with a random MAC filter programmed in prior, and > initially use that random MAC within guest. This would require: > a) not relying on permanent MAC address to do pairing during the > initial discovery, e.g. use the failover group ID as in this > discussion > b) host to toggle the MAC address filter: which includes taking down > the tap device to return the MAC back to PF, followed by assigning > that MAC to VF using "ip link ... set vf ..." > c) notify guest to reload/reset VF driver for the change of hardware MAC address > d) until VF reloads the driver it won't be able to use the datapath, > so very short period of network outage is (still) expected > > though I still don't think this design can elimnate downtime. No, my idea is somewhat different. As you say there is a problem of delay at point (c). Further, the need to poke at PF filters with set vf does not match the current security model where any security related configuration such as MAC filtering is done upfront. So I have two suggestions: 1. Teach pf driver not to program the filter until vf driver actually goes up. How do we know it went up? For example, it is highly likely that driver will send some kind of command on init. E.g. linux seems to always try to set the mac address during init. We can have any kind of command received by the PF enable the filter, until reset. In absence of an appropriate command, QEMU can detect bus master enable and do that. 2. Create a variant of trusted VF where it starts out without a valid MAC, guest can set a softmac MAC but only can set it to the specific value that matches virtio. Alternatively - if it's preferred for some reason - allow guest to program just two MACs, the original one and the virtio one. Any other value is denied. > However, > it looks like as of today the MAC matching still haven't addressed the > datapath switching and error handling in a clean way. As said, for > SR-IOV live migration on iSCSI root disk there will be a lot of > dancing parts going along the way, reliable network connectity and > dedicated handshakes are critical to this kind of setup. > > -Siwei I think MAC matching removes downtime when device is removed but not when it's re-added, yes. It has the advantage of an already present linux driver support, but if you are prepared to work on adding e.g. bridge based matching, that will go away. > > > > -- > > MST > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-05 19:18 ` Michael S. Tsirkin @ 2018-10-08 22:06 ` Sameeh Jubran 2018-10-10 14:43 ` Michael S. Tsirkin 2018-10-11 1:26 ` Siwei Liu 1 sibling, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-10-08 22:06 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev, ehabkost Hi All, I have been busy trying to figure out how to implement the feature and got very confused with the current open questions and tier impact. As I have stated earlier, I thought about doing the following: 1 - Have an id for virtio-net (the standby device) and one for the vfio device (primary). 2 - On realize of virtio-net check for the existence of the primary device and hide the standby feature. 3 - Once the feature is acked by the guest, the device would be plugged back by virtio-net. I've faced few issues when I tried to implement this which I overcame. At the end of the email I've included a prototype which implements this basic functionality using e1000 instead of vfio net device, I'm sharing this as a draft only (it has many flaws) as parts of it are valid for the actual implementation. Issues that I've faced: * I've used a device_listener callbacks it get the device to register itself for virtio-net. This makes virtio-net listen to the realization of the device. I don't think this approach is right, as it makes the virtio-net listen to every device which can be avoided by extending the current implementation of the device listner, Moreover, this doesn't solve the migration issues, as far as I understand, the realize function doesn't get called after the migration process which means this doesn't work. (correct me if I'm wrong) * When testing with PC machine type which uses the PIIX4 as the hotplug handler, the hotplug handler get's set after the virtio-net and e1000 device has been realized. This means that I can't save the hotplug handler before detaching the device which means I can't plug it back as when the device is unplugged it is unattached from it's parent bus. This was resolved by saving a pointer to the parent bus instead and when attempting to replug the device then the parent can be used to get the hotplug handler. Note that unplugging the device using "qdev_simple_device_unplug_cb" doesn't require the hotplug handler as this function simply detaches the device object from it's parent object (the pci bus). I've talked to Eduardo and he mentioned that he and Michael had discussed the following approach: using a property (for pci devices currently and maybe for others in the future?) which tells Qemu to hide the device from the bus upon init. This approach leaves the responsibility of managing the failover device to the management. The management can send commands to plug the hidden device or hide it back as well. I think that I like this approach better as it is proof of issues that can come up when trying to handle the failure of unplug/plug requests to the guest. Please share your thoughts on this approach versus the draft implementation. _____________________________________________________________________________________________________ commit 06afc24a613b2cb31c064859e89b709ec54fecdc (HEAD -> failover) Author: Sameeh Jubran <sjubran@redhat.com> Date: Sun Sep 16 13:21:41 2018 +0300 virtio-net: Implement standby feature Signed-off-by: Sameeh Jubran <sjubran@redhat.com> diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 13a9494a8d..387d8856c0 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -36,6 +36,8 @@ #include "qemu/range.h" #include "e1000x_common.h" +#include "hw/virtio/virtio-net.h" +#include "hw/virtio/virtio-pci.h" static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; @@ -118,6 +120,7 @@ typedef struct E1000State_st { bool mit_timer_on; /* Mitigation timer is running. */ bool mit_irq_level; /* Tracks interrupt pin level. */ uint32_t mit_ide; /* Tracks E1000_TXD_CMD_IDE bit. */ + char *primary_id_str; /* Compatibility flags for migration to/from qemu 1.3.0 and older */ #define E1000_FLAG_AUTONEG_BIT 0 @@ -1652,9 +1655,16 @@ static void e1000_write_config(PCIDevice *pci_dev, uint32_t address, } } +static bool standby_device_present(const char *id, + struct PCIDevice **pdev) +{ + return pci_qdev_find_device(id, pdev) >= 0; +} + static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp) { DeviceState *dev = DEVICE(pci_dev); + PCIDevice *standby_pci_dev; E1000State *d = E1000(pci_dev); uint8_t *pci_conf; uint8_t *macaddr; @@ -1690,6 +1700,12 @@ static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp) d->autoneg_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, e1000_autoneg_timer, d); d->mit_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, e1000_mit_timer, d); + if (d->primary_id_str && standby_device_present( + d->primary_id_str, &standby_pci_dev) && standby_pci_dev) { + VirtIOPCIProxy *proxy = VIRTIO_PCI(standby_pci_dev); + VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); + virtio_net_register_primary_device(DEVICE(vdev)); + } } static void qdev_e1000_reset(DeviceState *dev) @@ -1708,6 +1724,7 @@ static Property e1000_properties[] = { compat_flags, E1000_FLAG_MAC_BIT, true), DEFINE_PROP_BIT("migrate_tso_props", E1000State, compat_flags, E1000_FLAG_TSO_BIT, true), + DEFINE_PROP_STRING("primary", E1000State, primary_id_str), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index f154756e85..b831ba438b 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -26,7 +26,9 @@ #include "qapi/qapi-events-net.h" #include "hw/virtio/virtio-access.h" #include "migration/misc.h" +#include "hw/pci/pci.h" #include "standard-headers/linux/ethtool.h" +#include "hw/vfio/vfio-common.h" #define VIRTIO_NET_VM_VERSION 11 @@ -721,6 +723,20 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features) } else { memset(n->vlans, 0xff, MAX_VLAN >> 3); } + + if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) { + Error * errp; + DeviceState *pdev = DEVICE(n->primary_pdev); + DeviceClass *klass = DEVICE_GET_CLASS(pdev); + + /* Plug the primary device back to the pci bus */ + if (klass->hotpluggable && n->primary_parent_bus) + { + BusState *qbus = BUS(n->primary_parent_bus); + hotplug_handler_plug(qbus->hotplug_handler, pdev, + &errp); + } + } } static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd, @@ -1946,6 +1962,52 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name, n->netclient_type = g_strdup(type); } +static bool primary_device_present(const char *id, struct PCIDevice **pdev) +{ + return pci_qdev_find_device(id, pdev) >= 0 && + vfio_is_vfio_pci(*pdev); +} + + +static void primary_device_realize(DeviceListener *listener, + DeviceState *dev) +{ + VirtIONet *n = container_of(listener, VirtIONet, primary_listener); + + if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE) && dev->id + && !strcmp(dev->id, n->net_conf.standby_id_str)) + { + Error *errp; + DeviceClass *klass = DEVICE_GET_CLASS(dev); + + if (n->primary_pdev == NULL) + { + n->primary_pdev = PCI_DEVICE(dev); + } + + if (n->primary_parent_bus == NULL) + { + n->primary_parent_bus = qdev_get_parent_bus(dev); + } + + /* Hide standby from pci till the feature is acked */ + if (klass->hotpluggable && n->primary_parent_bus) + { + object_ref(OBJECT(dev)); + qdev_simple_device_unplug_cb(NULL ,dev, &errp); + n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); + } + } +} + +void virtio_net_register_primary_device(DeviceState *dev) +{ + VirtIONet *n = VIRTIO_NET(dev); + n->primary_listener.realize = primary_device_realize; + n->primary_listener.unrealize = NULL; + device_listener_register(&n->primary_listener); +} + static void virtio_net_device_realize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); @@ -1976,6 +2038,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX); } + if (n->net_conf.standby_id_str && primary_device_present( + n->net_conf.standby_id_str, &n->primary_pdev)) { + virtio_net_register_primary_device(dev); + } + virtio_net_set_config_size(n, n->host_features); virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size); @@ -2198,6 +2265,7 @@ static Property virtio_net_properties[] = { true), DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), + DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 866f0deeb7..593debe56e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev) #endif } +bool vfio_is_vfio_pci(PCIDevice* pdev) +{ + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); + return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI; +} + static void vfio_intx_update(PCIDevice *pdev) { VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 821def0565..26dfde805f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container, hwaddr *pgsize); int vfio_spapr_remove_window(VFIOContainer *container, hwaddr offset_within_address_space); +bool vfio_is_vfio_pci(PCIDevice* pdev); #endif /* HW_VFIO_VFIO_COMMON_H */ diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 4d7f3c82ca..cfb8843a77 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -42,6 +42,7 @@ typedef struct virtio_net_conf int32_t speed; char *duplex_str; uint8_t duplex; + char *standby_id_str; } virtio_net_conf; /* Maximum packet size we can receive from tap device: header + 64k */ @@ -103,9 +104,14 @@ typedef struct VirtIONet { int announce_counter; bool needs_vnet_hdr_swap; bool mtu_bypass_backend; + PCIDevice *primary_pdev; + BusState *primary_parent_bus; + DeviceListener primary_listener; } VirtIONet; void virtio_net_set_netclient_name(VirtIONet *n, const char *name, const char *type); +void virtio_net_register_primary_device(DeviceState *vdev); + #endif (END) On Fri, Oct 5, 2018 at 10:18 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote: > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can > > > > start with a random MAC but use group ID to pair device instead. And > > > > only update MAC address to the real one when moving MAC filter around > > > > after PV says OK to switch datapath. > > > > > > > > Do you see any problem with this design? > > > > > > Isn't this what I proposed: > > > Maybe we can > > > start VF with a temporary MAC, then change it to a final one when guest > > > tries to use it. It will work but we run into fact that MACs are > > > currently programmed by mgmnt - in many setups qemu does not have the > > > rights to do it. > > > > > > ? > > > > > > If yes I don't see a problem with the interface design, even though > > > implementation wise it's more work as it will have to include management > > > changes. > > > > I thought we discussed this design a while back: > > https://www.spinics.net/lists/netdev/msg512232.html > > > > ... plug in a VF with a random MAC filter programmed in prior, and > > initially use that random MAC within guest. This would require: > > a) not relying on permanent MAC address to do pairing during the > > initial discovery, e.g. use the failover group ID as in this > > discussion > > b) host to toggle the MAC address filter: which includes taking down > > the tap device to return the MAC back to PF, followed by assigning > > that MAC to VF using "ip link ... set vf ..." > > c) notify guest to reload/reset VF driver for the change of hardware MAC address > > d) until VF reloads the driver it won't be able to use the datapath, > > so very short period of network outage is (still) expected > > > > though I still don't think this design can elimnate downtime. > > > No, my idea is somewhat different. As you say there is a problem > of delay at point (c). Further, the need to poke at PF filters > with set vf does not match the current security model where > any security related configuration such as MAC filtering is done upfront. > > > So I have two suggestions: > > 1. Teach pf driver not to program the filter until vf driver actually goes up. > > How do we know it went up? For example, it is highly likely > that driver will send some kind of command on init. > E.g. linux seems to always try to set the mac address during init. > We can have any kind of command received by the PF enable > the filter, until reset. > > In absence of an appropriate command, QEMU can detect bus master > enable and do that. > > 2. Create a variant of trusted VF where it starts out without a valid > MAC, guest can set a softmac MAC but only can set it to the specific > value that matches virtio. > Alternatively - if it's preferred for some reason - allow > guest to program just two MACs, the original one and the virtio one. > Any other value is denied. > > > > > However, > > it looks like as of today the MAC matching still haven't addressed the > > datapath switching and error handling in a clean way. As said, for > > SR-IOV live migration on iSCSI root disk there will be a lot of > > dancing parts going along the way, reliable network connectity and > > dedicated handshakes are critical to this kind of setup. > > > > -Siwei > > I think MAC matching removes downtime when device is removed but not > when it's re-added, yes. It has the advantage of an already present > linux driver support, but if you are prepared to work on > adding e.g. bridge based matching, that will go away. > > > > > > > > -- > > > MST > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-08 22:06 ` Sameeh Jubran @ 2018-10-10 14:43 ` Michael S. Tsirkin 0 siblings, 0 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-10-10 14:43 UTC (permalink / raw) To: Sameeh Jubran Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev, ehabkost On Tue, Oct 09, 2018 at 01:06:59AM +0300, Sameeh Jubran wrote: > Hi All, > > I have been busy trying to figure out how to implement the feature and > got very confused with the current open questions and tier impact. > > As I have stated earlier, I thought about doing the following: > > 1 - Have an id for virtio-net (the standby device) and one for the > vfio device (primary). > 2 - On realize of virtio-net check for the existence of the primary > device and hide the standby feature. > 3 - Once the feature is acked by the guest, the device would be > plugged back by virtio-net. > > I've faced few issues when I tried to implement this which I overcame. > At the end of the email I've included a prototype which implements > this basic functionality using e1000 instead of vfio net device, I'm > sharing this as a draft only (it has many flaws) as parts of it are > valid for the actual implementation. > > Issues that I've faced: > > * I've used a device_listener callbacks it get the device to register > itself for virtio-net. This makes virtio-net listen to the realization > of the device. I don't think this approach is right, as it makes the > virtio-net listen to every device which can be avoided by extending > the current implementation of the device listner, Moreover, this > doesn't solve the migration issues, as far as I understand, the > realize function doesn't get called after the migration process which > means this doesn't work. (correct me if I'm wrong) > > * When testing with PC machine type which uses the PIIX4 as the > hotplug handler, the hotplug handler get's set after the virtio-net > and e1000 device has been realized. This means that I can't save the > hotplug handler before detaching the device which means I can't plug > it back as when the device is unplugged it is unattached from it's > parent bus. This was resolved by saving a pointer to the parent bus > instead and when attempting to replug the device then the parent can > be used to get the hotplug handler. Note that unplugging the device > using "qdev_simple_device_unplug_cb" doesn't require the hotplug > handler as this function simply detaches the device object from it's > parent object (the pci bus). > > I've talked to Eduardo and he mentioned that he and Michael had > discussed the following approach: using a property (for pci devices > currently and maybe for others in the future?) which tells Qemu to > hide the device from the bus upon init. This approach leaves the > responsibility of managing the failover device to the management. The > management can send commands to plug the hidden device or hide it back > as well. I think that I like this approach better as it is proof of > issues that can come up when trying to handle the failure of > unplug/plug requests to the guest. > > Please share your thoughts on this approach versus the draft implementation. I would think just an internal flag on the pci device that controls whether it's guest visible would be enough. Not sure why would management need to be involved. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-05 19:18 ` Michael S. Tsirkin 2018-10-08 22:06 ` Sameeh Jubran @ 2018-10-11 1:26 ` Siwei Liu 2018-10-18 23:20 ` Siwei Liu 2018-10-19 3:45 ` Michael S. Tsirkin 1 sibling, 2 replies; 85+ messages in thread From: Siwei Liu @ 2018-10-11 1:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote: > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can > > > > start with a random MAC but use group ID to pair device instead. And > > > > only update MAC address to the real one when moving MAC filter around > > > > after PV says OK to switch datapath. > > > > > > > > Do you see any problem with this design? > > > > > > Isn't this what I proposed: > > > Maybe we can > > > start VF with a temporary MAC, then change it to a final one when guest > > > tries to use it. It will work but we run into fact that MACs are > > > currently programmed by mgmnt - in many setups qemu does not have the > > > rights to do it. > > > > > > ? > > > > > > If yes I don't see a problem with the interface design, even though > > > implementation wise it's more work as it will have to include management > > > changes. > > > > I thought we discussed this design a while back: > > https://www.spinics.net/lists/netdev/msg512232.html > > > > ... plug in a VF with a random MAC filter programmed in prior, and > > initially use that random MAC within guest. This would require: > > a) not relying on permanent MAC address to do pairing during the > > initial discovery, e.g. use the failover group ID as in this > > discussion > > b) host to toggle the MAC address filter: which includes taking down > > the tap device to return the MAC back to PF, followed by assigning > > that MAC to VF using "ip link ... set vf ..." > > c) notify guest to reload/reset VF driver for the change of hardware MAC address > > d) until VF reloads the driver it won't be able to use the datapath, > > so very short period of network outage is (still) expected > > > > though I still don't think this design can elimnate downtime. > > > No, my idea is somewhat different. As you say there is a problem > of delay at point (c). That's true, I never say the downtime can be avoided because of this delay in the guest side. But with this the downtime gets to the bare minimum and in most situations packets won't be lost on reception as long as the PF sets up the filter in timely manner. > Further, the need to poke at PF filters > with set vf does not match the current security model where > any security related configuration such as MAC filtering is done upfront. The security model belongs to the VM policy not the VF, right? I think same MAC address will always be used on the VM as it starts with virtio. Why it is a security issue that VF starts with an unused MAC before it's able to be used in the guest? > > > So I have two suggestions: > > 1. Teach pf driver not to program the filter until vf driver actually goes up. > > How do we know it went up? For example, it is highly likely > that driver will send some kind of command on init. > E.g. linux seems to always try to set the mac address during init. > We can have any kind of command received by the PF enable > the filter, until reset. I'm not sure it's a valid assumption for any guest, say Windows. The VF can start with the MAC address advertised from PF in the first reset, and the MAC filter generally will be activated at that point. Some other PF/VF variants enable the filter after that until the VF is brought up in guest, while some others enable the filter even before the VF gets assigned to guest. Trying to assume the behaviour on specific guest or specific NIC device is a slippery slope. The only thing that's reliable is the semantics of ndo_vf_xxx interface for the PF. You seem to overly assume too much on the specific PF behaviour which is not defined in the interface itself. > > In absence of an appropriate command, QEMU can detect bus master > enable and do that. > > 2. Create a variant of trusted VF where it starts out without a valid > MAC, guest can set a softmac MAC but only can set it to the specific > value that matches virtio. > Alternatively - if it's preferred for some reason - allow > guest to program just two MACs, the original one and the virtio one. > Any other value is denied. I am getting confused, I don't know why that's even needed. The management tool can set any predefined MAC that is deemed safe for VF to start with. Why it needs to be that complicated? What is the purpose of another model for trusted VF and softmac? It's the PF that changes the MAC not the VF. > > > > > However, > > it looks like as of today the MAC matching still haven't addressed the > > datapath switching and error handling in a clean way. As said, for > > SR-IOV live migration on iSCSI root disk there will be a lot of > > dancing parts going along the way, reliable network connectity and > > dedicated handshakes are critical to this kind of setup. > > > > -Siwei > > I think MAC matching removes downtime when device is removed but not > when it's re-added, yes. It has the advantage of an already present > linux driver support, but if you are prepared to work on > adding e.g. bridge based matching, that will go away. The removal order and consequence will be the same between MAC matching and group ID based matching. It's just the initial discovery that's slightly different. Why do you think the downtime will be different for the removal scenario? And why do you think it's needed to alter the current PF driver behavior to support bridge based matching? Sorry I'm really confused about your suggestion. Those PF driver model changes are not needed acutally. The fact is that the bridge based matching is supposed to work quite well for any PF driver implementation no matter when the MAC address filters gets added or enabled. Thanks, -Siwei > > > > > > > > -- > > > MST > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-11 1:26 ` Siwei Liu @ 2018-10-18 23:20 ` Siwei Liu 2018-10-18 23:40 ` Michael S. Tsirkin 2018-10-19 3:45 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Siwei Liu @ 2018-10-18 23:20 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev, liran.alon To be honest, I don't understand why there's resistance of using PV to initiate datapath switching, and the point of relying on very specifc PF behavior for datapath swtiching, There's even no point to alter the SR-IOV driver model to accomondate the (zero) downtime requirement, how potentally can Windows work with that? The current MAC based scheme is very fragile when dealing with errors, and the past lesson learnt leads me to believe that those drivers errors in the hot plug path, even not common, is NOT neglectable. -Siwei On Wed, Oct 10, 2018 at 6:26 PM Siwei Liu <loseweigh@gmail.com> wrote: > > On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote: > > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can > > > > > start with a random MAC but use group ID to pair device instead. And > > > > > only update MAC address to the real one when moving MAC filter around > > > > > after PV says OK to switch datapath. > > > > > > > > > > Do you see any problem with this design? > > > > > > > > Isn't this what I proposed: > > > > Maybe we can > > > > start VF with a temporary MAC, then change it to a final one when guest > > > > tries to use it. It will work but we run into fact that MACs are > > > > currently programmed by mgmnt - in many setups qemu does not have the > > > > rights to do it. > > > > > > > > ? > > > > > > > > If yes I don't see a problem with the interface design, even though > > > > implementation wise it's more work as it will have to include management > > > > changes. > > > > > > I thought we discussed this design a while back: > > > https://www.spinics.net/lists/netdev/msg512232.html > > > > > > ... plug in a VF with a random MAC filter programmed in prior, and > > > initially use that random MAC within guest. This would require: > > > a) not relying on permanent MAC address to do pairing during the > > > initial discovery, e.g. use the failover group ID as in this > > > discussion > > > b) host to toggle the MAC address filter: which includes taking down > > > the tap device to return the MAC back to PF, followed by assigning > > > that MAC to VF using "ip link ... set vf ..." > > > c) notify guest to reload/reset VF driver for the change of hardware MAC address > > > d) until VF reloads the driver it won't be able to use the datapath, > > > so very short period of network outage is (still) expected > > > > > > though I still don't think this design can elimnate downtime. > > > > > > No, my idea is somewhat different. As you say there is a problem > > of delay at point (c). > That's true, I never say the downtime can be avoided because of this > delay in the guest side. But with this the downtime gets to the bare > minimum and in most situations packets won't be lost on reception as > long as the PF sets up the filter in timely manner. > > > Further, the need to poke at PF filters > > with set vf does not match the current security model where > > any security related configuration such as MAC filtering is done upfront. > > The security model belongs to the VM policy not the VF, right? I think > same MAC address will always be used on the VM as it starts with > virtio. Why it is a security issue that VF starts with an unused MAC > before it's able to be used in the guest? > > > > > > > So I have two suggestions: > > > > 1. Teach pf driver not to program the filter until vf driver actually goes up. > > > > How do we know it went up? For example, it is highly likely > > that driver will send some kind of command on init. > > E.g. linux seems to always try to set the mac address during init. > > We can have any kind of command received by the PF enable > > the filter, until reset. > > I'm not sure it's a valid assumption for any guest, say Windows. The > VF can start with the MAC address advertised from PF in the first > reset, and the MAC filter generally will be activated at that point. > Some other PF/VF variants enable the filter after that until the VF is > brought up in guest, while some others enable the filter even before > the VF gets assigned to guest. Trying to assume the behaviour on > specific guest or specific NIC device is a slippery slope. The only > thing that's reliable is the semantics of ndo_vf_xxx interface for the > PF. You seem to overly assume too much on the specific PF behaviour > which is not defined in the interface itself. > > > > > In absence of an appropriate command, QEMU can detect bus master > > enable and do that. > > > > 2. Create a variant of trusted VF where it starts out without a valid > > MAC, guest can set a softmac MAC but only can set it to the specific > > value that matches virtio. > > Alternatively - if it's preferred for some reason - allow > > guest to program just two MACs, the original one and the virtio one. > > Any other value is denied. > > I am getting confused, I don't know why that's even needed. The > management tool can set any predefined MAC that is deemed safe for VF > to start with. Why it needs to be that complicated? What is the > purpose of another model for trusted VF and softmac? It's the PF that > changes the MAC not the VF. > > > > > > > > > > However, > > > it looks like as of today the MAC matching still haven't addressed the > > > datapath switching and error handling in a clean way. As said, for > > > SR-IOV live migration on iSCSI root disk there will be a lot of > > > dancing parts going along the way, reliable network connectity and > > > dedicated handshakes are critical to this kind of setup. > > > > > > -Siwei > > > > I think MAC matching removes downtime when device is removed but not > > when it's re-added, yes. It has the advantage of an already present > > linux driver support, but if you are prepared to work on > > adding e.g. bridge based matching, that will go away. > > The removal order and consequence will be the same between MAC > matching and group ID based matching. It's just the initial discovery > that's slightly different. Why do you think the downtime will be > different for the removal scenario? And why do you think it's needed > to alter the current PF driver behavior to support bridge based > matching? Sorry I'm really confused about your suggestion. Those PF > driver model changes are not needed acutally. The fact is that the > bridge based matching is supposed to work quite well for any PF driver > implementation no matter when the MAC address filters gets added or > enabled. > > Thanks, > -Siwei > > > > > > > > > > > > > > -- > > > > MST > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-18 23:20 ` Siwei Liu @ 2018-10-18 23:40 ` Michael S. Tsirkin 0 siblings, 0 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-10-18 23:40 UTC (permalink / raw) To: Siwei Liu Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev, liran.alon On Thu, Oct 18, 2018 at 04:20:13PM -0700, Siwei Liu wrote: > To be honest, I don't understand why there's resistance of using PV to > initiate datapath switching, I see no resistance. I see lack of man-power. Any extension needs: 1- guest driver support 2- qemu support 3- spec documentation Current scheme has (1) by now, a beginning of (3) and a rudimentary prototype of (2). For your proposed extension, you can either do some of this work, or do some testing to demonstrate problems and motivate others. So far emailing list about theoretical issues did not seem to motivate anyone. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-11 1:26 ` Siwei Liu 2018-10-18 23:20 ` Siwei Liu @ 2018-10-19 3:45 ` Michael S. Tsirkin 2018-11-21 15:39 ` Sameeh Jubran 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-10-19 3:45 UTC (permalink / raw) To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev On Wed, Oct 10, 2018 at 06:26:50PM -0700, Siwei Liu wrote: > On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote: > > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can > > > > > start with a random MAC but use group ID to pair device instead. And > > > > > only update MAC address to the real one when moving MAC filter around > > > > > after PV says OK to switch datapath. > > > > > > > > > > Do you see any problem with this design? > > > > > > > > Isn't this what I proposed: > > > > Maybe we can > > > > start VF with a temporary MAC, then change it to a final one when guest > > > > tries to use it. It will work but we run into fact that MACs are > > > > currently programmed by mgmnt - in many setups qemu does not have the > > > > rights to do it. > > > > > > > > ? > > > > > > > > If yes I don't see a problem with the interface design, even though > > > > implementation wise it's more work as it will have to include management > > > > changes. > > > > > > I thought we discussed this design a while back: > > > https://www.spinics.net/lists/netdev/msg512232.html > > > > > > ... plug in a VF with a random MAC filter programmed in prior, and > > > initially use that random MAC within guest. This would require: > > > a) not relying on permanent MAC address to do pairing during the > > > initial discovery, e.g. use the failover group ID as in this > > > discussion > > > b) host to toggle the MAC address filter: which includes taking down > > > the tap device to return the MAC back to PF, followed by assigning > > > that MAC to VF using "ip link ... set vf ..." > > > c) notify guest to reload/reset VF driver for the change of hardware MAC address > > > d) until VF reloads the driver it won't be able to use the datapath, > > > so very short period of network outage is (still) expected > > > > > > though I still don't think this design can elimnate downtime. > > > > > > No, my idea is somewhat different. As you say there is a problem > > of delay at point (c). > That's true, I never say the downtime can be avoided because of this > delay in the guest side. But with this the downtime gets to the bare > minimum and in most situations packets won't be lost on reception as > long as the PF sets up the filter in timely manner. It's not really the bare minimum IMHO. E.g. fixing the PF to defer filter update will give you less downtime. > > Further, the need to poke at PF filters > > with set vf does not match the current security model where > > any security related configuration such as MAC filtering is done upfront. > > The security model belongs to the VM policy not the VF, right? I think > same MAC address will always be used on the VM as it starts with > virtio. Why it is a security issue that VF starts with an unused MAC > before it's able to be used in the guest? Basically if guest is able to trigger MAC changes, it might be able to exploit some bug to escalate that to full network access. Completely blocking configuration changes after setup feels safer. Case in point, with QEMU a typical selinux policy will block attempts to change MACs, that task will have to be delegated to a suitably priveledged tool. > > > > > > > So I have two suggestions: > > > > 1. Teach pf driver not to program the filter until vf driver actually goes up. > > > > How do we know it went up? For example, it is highly likely > > that driver will send some kind of command on init. > > E.g. linux seems to always try to set the mac address during init. > > We can have any kind of command received by the PF enable > > the filter, until reset. > > I'm not sure it's a valid assumption for any guest, say Windows. The > VF can start with the MAC address advertised from PF in the first > reset, and the MAC filter generally will be activated at that point. > Some other PF/VF variants enable the filter after that until the VF is > brought up in guest, while some others enable the filter even before > the VF gets assigned to guest. Trying to assume the behaviour on > specific guest or specific NIC device is a slippery slope. Is all this just theoretical or do you observe any problems in practice? > The only > thing that's reliable is the semantics of ndo_vf_xxx interface for the > PF. ndo_vf_xxx is an internal Linux interface. That's not guaranteed to be stable at all. I think you mean the netlink interface that triggers that. That should be stable but if what you say above is true isn't fully defined. > You seem to overly assume too much on the specific PF behaviour > which is not defined in the interface itself. So IMHO it's something that we should fix in Linux, making all devices behave consistently. > > > > In absence of an appropriate command, QEMU can detect bus master > > enable and do that. > > > > 2. Create a variant of trusted VF where it starts out without a valid > > MAC, guest can set a softmac MAC but only can set it to the specific > > value that matches virtio. > > Alternatively - if it's preferred for some reason - allow > > guest to program just two MACs, the original one and the virtio one. > > Any other value is denied. > > I am getting confused, I don't know why that's even needed. The > management tool can set any predefined MAC that is deemed safe for VF > to start with. Why it needs to be that complicated? What is the > purpose of another model for trusted VF and softmac? It's the PF that > changes the MAC not the VF. This will give us a simple solution without guest driver changes for when VF is trusted. In particular it will work e.g. for PFs as well. > > > > > > > > > However, > > > it looks like as of today the MAC matching still haven't addressed the > > > datapath switching and error handling in a clean way. As said, for > > > SR-IOV live migration on iSCSI root disk there will be a lot of > > > dancing parts going along the way, reliable network connectity and > > > dedicated handshakes are critical to this kind of setup. > > > > > > -Siwei > > > > I think MAC matching removes downtime when device is removed but not > > when it's re-added, yes. It has the advantage of an already present > > linux driver support, but if you are prepared to work on > > adding e.g. bridge based matching, that will go away. > > The removal order and consequence will be the same between MAC > matching and group ID based matching. It's just the initial discovery > that's slightly different. Why do you think the downtime will be > different for the removal scenario? And why do you think it's needed > to alter the current PF driver behavior to support bridge based > matching? Sorry I'm really confused about your suggestion. Those PF > driver model changes are not needed acutally. The fact is that the > bridge based matching is supposed to work quite well for any PF driver > implementation no matter when the MAC address filters gets added or > enabled. > > Thanks, > -Siwei It seems that it requires a bunch of changes for all VF drivers though. > > > > > > > > > > > > > -- > > > > MST > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-10-19 3:45 ` Michael S. Tsirkin @ 2018-11-21 15:39 ` Sameeh Jubran 2018-11-21 18:41 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-11-21 15:39 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev [-- Attachment #1: Type: text/plain, Size: 17753 bytes --] Hi all, It's been a while since the last discussion here. I have been working on implementing the standby feature in Qemu. I have tried multiple approaches for implementation and in the end decided to implement using the hotplug/unplug infrastructure for multiple reasons which I'll go over when I send the patches. For now you can find the implementation here: https://github.com/sameehj/qemu/tree/failover_hidden_opts (the full command line I used can be found at the end of the email) I have tested my implementation in Qemu with Fedora 29 guest, I can see the failover interface successfully and assign an ip to it. The feature is acked and the primary device is plugged in with no issues. I have created a setup which has two hosts (host A and host B) with X710 10G cards connected back to back. On one host (I'll refer to this host as host A) I have configured a bridge with the PF interface as well as vitio-net's interface (standby) both attached to it. I ran the guest with the patched Qemu on host A and pinged the bridge successfully, I also have a ping between host A and Host B, however, I can't ping host B from the VM and vice versa, this only happens when the feature is enabled for some reason I have yet to figure out. I haven't tested migration yet, but on my way to do so. Since I couldn't ping from VM to host B, I did an iperf test between the VM and host A with the feature enabled and during the test I have unplugged the sriov device, the device was unplugged successfully and no drops where observed as you can see in the results below: [root@dhcp156-44 ~]# ifconfig ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.19.156.44 netmask 255.255.248.0 broadcast 10.19.159.255 inet6 fe80::d306:561f:9f43:ff77 prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:1398:9699:325b:25f9:e7bb prefixlen 64 scopeid 0x0<global> ether 56:cc:c1:01:cc:21 txqueuelen 1000 (Ethernet) RX packets 12258 bytes 870822 (850.4 KiB) RX errors 11 dropped 0 overruns 0 frame 11 TX packets 294 bytes 32432 (31.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.17 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::bc87:86b8:bc86:be4e prefixlen 64 scopeid 0x20<link> ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 41052 bytes 2775833 (2.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 47468 bytes 15629 (15.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 214 bytes 14966 (14.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 163 bytes 26498 (25.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 41052 bytes 2775833 (2.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 47468 bytes 2889827541 (2.6 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 176 bytes 19712 (19.2 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 176 bytes 19712 (19.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@dhcp156-44 ~]# iperf -c 192.168.1.117 -t 100 -i 1 ------------------------------------------------------------ Client connecting to 192.168.1.117, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.17 port 40368 connected with 192.168.1.117 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 3.47 GBytes 29.8 Gbits/sec [ 3] 1.0- 2.0 sec 4.35 GBytes 37.4 Gbits/sec [ 3] 2.0- 3.0 sec 4.10 GBytes 35.2 Gbits/sec [ 3] 3.0- 4.0 sec 4.20 GBytes 36.1 Gbits/sec [ 3] 4.0- 5.0 sec 4.20 GBytes 36.1 Gbits/sec [ 3] 5.0- 6.0 sec 4.07 GBytes 34.9 Gbits/sec [ 3] 6.0- 7.0 sec 4.53 GBytes 38.9 Gbits/sec [ 3] 7.0- 8.0 sec 4.38 GBytes 37.6 Gbits/sec [ 3] 8.0- 9.0 sec 4.60 GBytes 39.5 Gbits/sec [ 3] 9.0-10.0 sec 4.60 GBytes 39.5 Gbits/sec [ 3] 10.0-11.0 sec 4.56 GBytes 39.2 Gbits/sec [ 3] 11.0-12.0 sec 4.70 GBytes 40.4 Gbits/sec [ 3] 12.0-13.0 sec 4.65 GBytes 39.9 Gbits/sec [ 3] 13.0-14.0 sec 4.51 GBytes 38.7 Gbits/sec [ 3] 14.0-15.0 sec 4.48 GBytes 38.5 Gbits/sec [ 3] 15.0-16.0 sec 4.67 GBytes 40.2 Gbits/sec [ 3] 16.0-17.0 sec 4.37 GBytes 37.5 Gbits/sec [ 3] 17.0-18.0 sec 4.68 GBytes 40.2 Gbits/sec [ 3] 18.0-19.0 sec 4.99 GBytes 42.9 Gbits/sec [ 3] 19.0-20.0 sec 5.00 GBytes 42.9 Gbits/sec [ 3] 20.0-21.0 sec 4.90 GBytes 42.1 Gbits/sec [ 3] 21.0-22.0 sec 4.72 GBytes 40.5 Gbits/sec [ 3] 22.0-23.0 sec 4.60 GBytes 39.5 Gbits/sec [ 3] 23.0-24.0 sec 4.72 GBytes 40.6 Gbits/sec [ 3] 24.0-25.0 sec 4.42 GBytes 38.0 Gbits/sec [ 3] 25.0-26.0 sec 4.44 GBytes 38.2 Gbits/sec [ 3] 26.0-27.0 sec 4.18 GBytes 35.9 Gbits/sec [ 3] 27.0-28.0 sec 4.20 GBytes 36.1 Gbits/sec [ 3] 28.0-29.0 sec 4.27 GBytes 36.7 Gbits/sec [ 3] 29.0-30.0 sec 4.16 GBytes 35.7 Gbits/sec [ 3] 30.0-31.0 sec 4.14 GBytes 35.6 Gbits/sec [ 3] 31.0-32.0 sec 4.13 GBytes 35.4 Gbits/sec [ 3] 32.0-33.0 sec 4.16 GBytes 35.7 Gbits/sec [ 3] 33.0-34.0 sec 4.33 GBytes 37.2 Gbits/sec [ 3] 34.0-35.0 sec 4.31 GBytes 37.0 Gbits/sec [ 3] 35.0-36.0 sec 4.26 GBytes 36.6 Gbits/sec [ 3] 36.0-37.0 sec 4.36 GBytes 37.5 Gbits/sec [ 3] 37.0-38.0 sec 4.11 GBytes 35.3 Gbits/sec [ 3] 38.0-39.0 sec 4.00 GBytes 34.4 Gbits/sec [ 3] 39.0-40.0 sec 4.53 GBytes 38.9 Gbits/sec [ 3] 40.0-41.0 sec 4.06 GBytes 34.9 Gbits/sec [ 3] 41.0-42.0 sec 4.17 GBytes 35.8 Gbits/sec [ 3] 42.0-43.0 sec 4.14 GBytes 35.6 Gbits/sec [ 3] 43.0-44.0 sec 4.07 GBytes 34.9 Gbits/sec ^C[ 3] 0.0-44.5 sec 195 GBytes 37.5 Gbits/sec [root@dhcp156-44 ~]# ifconfig ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.19.156.44 netmask 255.255.248.0 broadcast 10.19.159.255 inet6 fe80::d306:561f:9f43:ff77 prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:1398:9699:325b:25f9:e7bb prefixlen 64 scopeid 0x0<global> ether 56:cc:c1:01:cc:21 txqueuelen 1000 (Ethernet) RX packets 12547 bytes 889713 (868.8 KiB) RX errors 11 dropped 0 overruns 0 frame 11 TX packets 373 bytes 45723 (44.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.17 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::bc87:86b8:bc86:be4e prefixlen 64 scopeid 0x20<link> ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 2862498 bytes 192898865 (183.9 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3414905 bytes 209192841687 (194.8 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 2862498 bytes 192898865 (183.9 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3414905 bytes 212082653599 (197.5 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 176 bytes 19712 (19.2 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 176 bytes 19712 (19.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 __________________________________________________________________________________________________________________ The command line I used: /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=cc17 \ -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=no,ifname=cc1_72,queues=4 \ -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=cc1_72,vectors=10,mq=on,primary=cc1_71 \ -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ -enable-kvm \ -name netkvm \ -m 3000M \ -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ -smp 4 \ -vga qxl \ -spice port=6110,disable-ticketing \ -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ -chardev spicevmc,name=vdagent,id=vdagent \ -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=com.redhat.spice.0 \ -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ -device virtio-serial \ -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ -monitor stdio On Fri, Oct 19, 2018 at 6:45 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Oct 10, 2018 at 06:26:50PM -0700, Siwei Liu wrote: > > On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > > > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote: > > > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > > > > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote: > > > > > > The VF's MAC can be updated by PF/host on the fly at any time. > One can > > > > > > start with a random MAC but use group ID to pair device instead. > And > > > > > > only update MAC address to the real one when moving MAC filter > around > > > > > > after PV says OK to switch datapath. > > > > > > > > > > > > Do you see any problem with this design? > > > > > > > > > > Isn't this what I proposed: > > > > > Maybe we can > > > > > start VF with a temporary MAC, then change it to a final > one when guest > > > > > tries to use it. It will work but we run into fact that > MACs are > > > > > currently programmed by mgmnt - in many setups qemu does > not have the > > > > > rights to do it. > > > > > > > > > > ? > > > > > > > > > > If yes I don't see a problem with the interface design, even though > > > > > implementation wise it's more work as it will have to include > management > > > > > changes. > > > > > > > > I thought we discussed this design a while back: > > > > https://www.spinics.net/lists/netdev/msg512232.html > > > > > > > > ... plug in a VF with a random MAC filter programmed in prior, and > > > > initially use that random MAC within guest. This would require: > > > > a) not relying on permanent MAC address to do pairing during the > > > > initial discovery, e.g. use the failover group ID as in this > > > > discussion > > > > b) host to toggle the MAC address filter: which includes taking down > > > > the tap device to return the MAC back to PF, followed by assigning > > > > that MAC to VF using "ip link ... set vf ..." > > > > c) notify guest to reload/reset VF driver for the change of hardware > MAC address > > > > d) until VF reloads the driver it won't be able to use the datapath, > > > > so very short period of network outage is (still) expected > > > > > > > > though I still don't think this design can elimnate downtime. > > > > > > > > > No, my idea is somewhat different. As you say there is a problem > > > of delay at point (c). > > That's true, I never say the downtime can be avoided because of this > > delay in the guest side. But with this the downtime gets to the bare > > minimum and in most situations packets won't be lost on reception as > > long as the PF sets up the filter in timely manner. > > It's not really the bare minimum IMHO. E.g. fixing the PF to > defer filter update will give you less downtime. > > > > Further, the need to poke at PF filters > > > with set vf does not match the current security model where > > > any security related configuration such as MAC filtering is done > upfront. > > > > The security model belongs to the VM policy not the VF, right? I think > > same MAC address will always be used on the VM as it starts with > > virtio. Why it is a security issue that VF starts with an unused MAC > > before it's able to be used in the guest? > > Basically if guest is able to trigger MAC changes, > it might be able to exploit some bug to escalate that to > full network access. Completely blocking configuration > changes after setup feels safer. > > Case in point, with QEMU a typical selinux policy will block > attempts to change MACs, that task will have to be > delegated to a suitably priveledged tool. > > > > > > > > > > > > > So I have two suggestions: > > > > > > 1. Teach pf driver not to program the filter until vf driver actually > goes up. > > > > > > How do we know it went up? For example, it is highly likely > > > that driver will send some kind of command on init. > > > E.g. linux seems to always try to set the mac address during init. > > > We can have any kind of command received by the PF enable > > > the filter, until reset. > > > > I'm not sure it's a valid assumption for any guest, say Windows. The > > VF can start with the MAC address advertised from PF in the first > > reset, and the MAC filter generally will be activated at that point. > > Some other PF/VF variants enable the filter after that until the VF is > > brought up in guest, while some others enable the filter even before > > the VF gets assigned to guest. Trying to assume the behaviour on > > specific guest or specific NIC device is a slippery slope. > > > Is all this just theoretical or do you observe any problems in practice? > > > The only > > thing that's reliable is the semantics of ndo_vf_xxx interface for the > > PF. > > ndo_vf_xxx is an internal Linux interface. That's not guaranteed to be > stable at all. I think you mean the netlink interface that triggers > that. That should be stable but if what you say above is true isn't > fully defined. > > > You seem to overly assume too much on the specific PF behaviour > > which is not defined in the interface itself. > > So IMHO it's something that we should fix in Linux, > making all devices behave consistently. > > > > > > > In absence of an appropriate command, QEMU can detect bus master > > > enable and do that. > > > > > > 2. Create a variant of trusted VF where it starts out without a valid > > > MAC, guest can set a softmac MAC but only can set it to the specific > > > value that matches virtio. > > > Alternatively - if it's preferred for some reason - allow > > > guest to program just two MACs, the original one and the virtio one. > > > Any other value is denied. > > > > I am getting confused, I don't know why that's even needed. The > > management tool can set any predefined MAC that is deemed safe for VF > > to start with. Why it needs to be that complicated? What is the > > purpose of another model for trusted VF and softmac? It's the PF that > > changes the MAC not the VF. > > This will give us a simple solution without guest driver changes for > when VF is trusted. In particular it will work e.g. for PFs as well. > > > > > > > > > > > > > > However, > > > > it looks like as of today the MAC matching still haven't addressed > the > > > > datapath switching and error handling in a clean way. As said, for > > > > SR-IOV live migration on iSCSI root disk there will be a lot of > > > > dancing parts going along the way, reliable network connectity and > > > > dedicated handshakes are critical to this kind of setup. > > > > > > > > -Siwei > > > > > > I think MAC matching removes downtime when device is removed but not > > > when it's re-added, yes. It has the advantage of an already present > > > linux driver support, but if you are prepared to work on > > > adding e.g. bridge based matching, that will go away. > > > > The removal order and consequence will be the same between MAC > > matching and group ID based matching. It's just the initial discovery > > that's slightly different. Why do you think the downtime will be > > different for the removal scenario? And why do you think it's needed > > to alter the current PF driver behavior to support bridge based > > matching? Sorry I'm really confused about your suggestion. Those PF > > driver model changes are not needed acutally. The fact is that the > > bridge based matching is supposed to work quite well for any PF driver > > implementation no matter when the MAC address filters gets added or > > enabled. > > > > Thanks, > > -Siwei > > It seems that it requires a bunch of changes for all VF drivers > though. > > > > > > > > > > > > > > > > > > > -- > > > > > MST > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: > virtio-dev-unsubscribe@lists.oasis-open.org > > > > > For additional commands, e-mail: > virtio-dev-help@lists.oasis-open.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > -- Respectfully, *Sameeh Jubran* *Linkedin <https://il.linkedin.com/pub/sameeh-jubran/87/747/a8a>* *Software Engineer @ Daynix <http://www.daynix.com>.* [-- Attachment #2: Type: text/html, Size: 22970 bytes --] ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-21 15:39 ` Sameeh Jubran @ 2018-11-21 18:41 ` Michael S. Tsirkin 2018-11-21 20:04 ` Sameeh Jubran 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-21 18:41 UTC (permalink / raw) To: Sameeh Jubran Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev Great to see you making progress on this! Some comments below: On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: > I have created a setup which has two hosts (host A and host B) with X710 10G > cards connected back to back. On one host (I'll refer to this host as host A) I > have configured a bridge with the PF interface as well as vitio-net's interface > (standby) both attached to it. ... > The command line I used: > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= > cc17 \ > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ What's e1000 doing here? Can this be reason you can not talk to host? > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= > no,ifname=cc1_72,queues=4 \ > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= > cc1_72,vectors=10,mq=on,primary=cc1_71 \ > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ > -enable-kvm \ > -name netkvm \ > -m 3000M \ > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ > -smp 4 \ > -vga qxl \ > -spice port=6110,disable-ticketing \ > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ > -chardev spicevmc,name=vdagent,id=vdagent \ > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= > com.redhat.spice.0 \ > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ > -device virtio-serial \ > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ > -monitor stdio ... > Since I couldn't ping from VM to host B, I did an iperf test between the VM and > host A with the feature enabled and during the test I have unplugged the sriov > device, the device was unplugged successfully and no drops where observed as > you can see in the results below: > > [root@dhcp156-44 ~]# ifconfig Well I suspect this won't tell you anything, this shows packet drops at the hardware level. When e.g. link is down linux won't send any packets out. The simplest test is to monitor latency and throughput and see that while it is lower for the duration of migration, there are no huge spikes around the switch. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-21 18:41 ` Michael S. Tsirkin @ 2018-11-21 20:04 ` Sameeh Jubran 2018-11-21 23:51 ` Samudrala, Sridhar 2018-11-22 18:27 ` Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: Sameeh Jubran @ 2018-11-21 20:04 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > Great to see you making progress on this! > Some comments below: > > On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: > > I have created a setup which has two hosts (host A and host B) with X710 10G > > cards connected back to back. On one host (I'll refer to this host as host A) I > > have configured a bridge with the PF interface as well as vitio-net's interface > > (standby) both attached to it. > > ... > > > The command line I used: > > > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ > > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= > > cc17 \ > > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ > > What's e1000 doing here? > Can this be reason you can not talk to host? I don't think so, the e1000 is for enabling WAN connection on the guest for downloading packages and ssh connection. It is connected to a separate bridge which is connected to the external interface of the host. > > > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= > > no,ifname=cc1_72,queues=4 \ > > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= > > cc1_72,vectors=10,mq=on,primary=cc1_71 \ > > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ > > -enable-kvm \ > > -name netkvm \ > > -m 3000M \ > > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ > > -smp 4 \ > > -vga qxl \ > > -spice port=6110,disable-ticketing \ > > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ > > -chardev spicevmc,name=vdagent,id=vdagent \ > > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= > > com.redhat.spice.0 \ > > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ > > -device virtio-serial \ > > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ > > -monitor stdio > > > ... > > > Since I couldn't ping from VM to host B, I did an iperf test between the VM and > > host A with the feature enabled and during the test I have unplugged the sriov > > device, the device was unplugged successfully and no drops where observed as > > you can see in the results below: > > > > [root@dhcp156-44 ~]# ifconfig > > Well I suspect this won't tell you anything, this shows packet drops at > the hardware level. When e.g. link is down linux won't send any packets > out. The simplest test is to monitor latency and throughput and see that > while it is lower for the duration of migration, there are no huge > spikes around the switch. Oh, okay will do that. I have noticed some nasty lag when I tried to ssh to the VM using the failover interface while I didn't experience that with the e1000. Sridhar Any idea what might be the cause? > > -- > MST -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-21 20:04 ` Sameeh Jubran @ 2018-11-21 23:51 ` Samudrala, Sridhar 2018-11-22 13:55 ` Sameeh Jubran 2018-11-22 18:27 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-21 23:51 UTC (permalink / raw) To: Sameeh Jubran, Michael S. Tsirkin Cc: Siwei Liu, venu.busireddy, cohuck, virtio-dev On 11/21/2018 12:04 PM, Sameeh Jubran wrote: > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: >> Great to see you making progress on this! >> Some comments below: >> >> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: >>> I have created a setup which has two hosts (host A and host B) with X710 10G >>> cards connected back to back. On one host (I'll refer to this host as host A) I >>> have configured a bridge with the PF interface as well as vitio-net's interface >>> (standby) both attached to it. >> ... >> >>> The command line I used: >>> >>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ >>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= >>> cc17 \ >>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ >> What's e1000 doing here? >> Can this be reason you can not talk to host? > I don't think so, the e1000 is for enabling WAN connection on the > guest for downloading packages and ssh connection. It is connected to > a separate bridge which is connected to the external interface of the > host. >>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= >>> no,ifname=cc1_72,queues=4 \ >>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= >>> cc1_72,vectors=10,mq=on,primary=cc1_71 \ >>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ >>> -enable-kvm \ >>> -name netkvm \ >>> -m 3000M \ >>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ >>> -smp 4 \ >>> -vga qxl \ >>> -spice port=6110,disable-ticketing \ >>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ >>> -chardev spicevmc,name=vdagent,id=vdagent \ >>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= >>> com.redhat.spice.0 \ >>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ >>> -device virtio-serial \ >>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ >>> -monitor stdio >> >> ... >> >>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and >>> host A with the feature enabled and during the test I have unplugged the sriov >>> device, the device was unplugged successfully and no drops where observed as >>> you can see in the results below: >>> >>> [root@dhcp156-44 ~]# ifconfig >> Well I suspect this won't tell you anything, this shows packet drops at >> the hardware level. When e.g. link is down linux won't send any packets >> out. The simplest test is to monitor latency and throughput and see that >> while it is lower for the duration of migration, there are no huge >> spikes around the switch. > Oh, okay will do that. > > I have noticed some nasty lag when I tried to ssh to the VM using the > failover interface while I didn't experience that with the e1000. > Sridhar Any idea what might be the cause? > When using failover interface, i guess you have the VF interface plugged in and UP. So you should be using the primary interface. Do you see the VFs MAC configured correctly at the host PF? You can do ip link show dev <pf> and it should show the MACs of all the VFs associated with that PF. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-21 23:51 ` Samudrala, Sridhar @ 2018-11-22 13:55 ` Sameeh Jubran 0 siblings, 0 replies; 85+ messages in thread From: Sameeh Jubran @ 2018-11-22 13:55 UTC (permalink / raw) To: sridhar.samudrala Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck, virtio-dev On Thu, Nov 22, 2018 at 1:51 AM Samudrala, Sridhar <sridhar.samudrala@intel.com> wrote: > > On 11/21/2018 12:04 PM, Sameeh Jubran wrote: > > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >> Great to see you making progress on this! > >> Some comments below: > >> > >> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: > >>> I have created a setup which has two hosts (host A and host B) with X710 10G > >>> cards connected back to back. On one host (I'll refer to this host as host A) I > >>> have configured a bridge with the PF interface as well as vitio-net's interface > >>> (standby) both attached to it. > >> ... > >> > >>> The command line I used: > >>> > >>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ > >>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= > >>> cc17 \ > >>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ > >> What's e1000 doing here? > >> Can this be reason you can not talk to host? > > I don't think so, the e1000 is for enabling WAN connection on the > > guest for downloading packages and ssh connection. It is connected to > > a separate bridge which is connected to the external interface of the > > host. > >>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= > >>> no,ifname=cc1_72,queues=4 \ > >>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= > >>> cc1_72,vectors=10,mq=on,primary=cc1_71 \ > >>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ > >>> -enable-kvm \ > >>> -name netkvm \ > >>> -m 3000M \ > >>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ > >>> -smp 4 \ > >>> -vga qxl \ > >>> -spice port=6110,disable-ticketing \ > >>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ > >>> -chardev spicevmc,name=vdagent,id=vdagent \ > >>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= > >>> com.redhat.spice.0 \ > >>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ > >>> -device virtio-serial \ > >>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ > >>> -monitor stdio > >> > >> ... > >> > >>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and > >>> host A with the feature enabled and during the test I have unplugged the sriov > >>> device, the device was unplugged successfully and no drops where observed as > >>> you can see in the results below: > >>> > >>> [root@dhcp156-44 ~]# ifconfig > >> Well I suspect this won't tell you anything, this shows packet drops at > >> the hardware level. When e.g. link is down linux won't send any packets > >> out. The simplest test is to monitor latency and throughput and see that > >> while it is lower for the duration of migration, there are no huge > >> spikes around the switch. > > Oh, okay will do that. > > > > I have noticed some nasty lag when I tried to ssh to the VM using the > > failover interface while I didn't experience that with the e1000. > > Sridhar Any idea what might be the cause? > > > When using failover interface, i guess you have the VF interface plugged in and UP. > So you should be using the primary interface. > > Do you see the VFs MAC configured correctly at the host PF? > You can do > ip link show dev <pf> > and it should show the MACs of all the VFs associated with that PF. You are correct, the vf mac was zet to all zeroes, it can be set by using "ip link set ens2f0 vf 1 mac 8a:f7:20:29:3b:cb" for example [root@virtlab517 netkvm_dev]# ip link show dev ens2f0 2: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master test_br0 state UP mode DEFAULT group default qlen 1000 link/ether f8:f2:1e:33:43:30 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off vf 1 MAC 8a:f7:20:29:3b:cb, spoof checking on, link-state auto, trust off vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off I have tried the same test above now but with Iperf test and ping from the VM to Host B and during the test I eject the primary device using device_del command. This is the same mechanism which my implementation does during migration, however, the traffic is lost now as you can see below. Did you test the failover interface with such scenarios ? [root@dhcp156-44 ~]# ifconfig ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.19.156.44 netmask 255.255.248.0 broadcast 10.19.159.255 inet6 2620:52:0:1398:9699:325b:25f9:e7bb prefixlen 64 scopeid 0x0<global> inet6 fe80::d306:561f:9f43:ff77 prefixlen 64 scopeid 0x20<link> ether 56:cc:c1:01:cc:21 txqueuelen 1000 (Ethernet) RX packets 55201 bytes 3496532 (3.3 MiB) RX errors 72 dropped 0 overruns 0 frame 72 TX packets 738 bytes 70323 (68.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.17 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::bc87:86b8:bc86:be4e prefixlen 64 scopeid 0x20<link> ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 41260 bytes 3067573 (2.9 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1156043 bytes 1749160131 (1.6 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 40183 bytes 2848249 (2.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1156043 bytes 1749160131 (1.6 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 8a:f7:20:29:3b:cb txqueuelen 1000 (Ethernet) RX packets 1077 bytes 219324 (214.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 3 bytes 264 (264.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3 bytes 264 (264.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 _____________________________________________________________________________________________________________________________________ [root@dhcp156-44 ~]# iperf -c 192.168.1.118 -t 100 -i 1 ------------------------------------------------------------ Client connecting to 192.168.1.118, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.17 port 42210 connected with 192.168.1.118 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 1.0- 2.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 2.0- 3.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 3.0- 4.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 4.0- 5.0 sec 1.09 GBytes 9.40 Gbits/sec [ 3] 5.0- 6.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 6.0- 7.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 7.0- 8.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 8.0- 9.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 9.0-10.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 10.0-11.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 11.0-12.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 12.0-13.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 13.0-14.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 14.0-15.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 15.0-16.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 16.0-17.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 17.0-18.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 18.0-19.0 sec 1.09 GBytes 9.40 Gbits/sec [ 3] 19.0-20.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 20.0-21.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 21.0-22.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 22.0-23.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 23.0-24.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 24.0-25.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 25.0-26.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 26.0-27.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 27.0-28.0 sec 1.10 GBytes 9.42 Gbits/sec [ 3] 28.0-29.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 29.0-30.0 sec 1.10 GBytes 9.41 Gbits/sec [ 3] 30.0-31.0 sec 318 MBytes 2.66 Gbits/sec [ 3] 31.0-32.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 32.0-33.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 33.0-34.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 34.0-35.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 35.0-36.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 36.0-37.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 37.0-38.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 38.0-39.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 39.0-40.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 40.0-41.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 41.0-42.0 sec 0.00 Bytes 0.00 bits/sec ^C[ 3] 42.0-43.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 0.0-43.4 sec 33.2 GBytes 6.57 Gbits/sec _____________________________________________________________________________________________________________________________________ [root@dhcp156-44 ~]# ping 192.168.1.118 PING 192.168.1.118 (192.168.1.118) 56(84) bytes of data. 64 bytes from 192.168.1.118: icmp_seq=1 ttl=64 time=0.264 ms 64 bytes from 192.168.1.118: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 192.168.1.118: icmp_seq=3 ttl=64 time=0.168 ms 64 bytes from 192.168.1.118: icmp_seq=4 ttl=64 time=0.174 ms 64 bytes from 192.168.1.118: icmp_seq=5 ttl=64 time=0.168 ms 64 bytes from 192.168.1.118: icmp_seq=6 ttl=64 time=0.282 ms 64 bytes from 192.168.1.118: icmp_seq=7 ttl=64 time=0.179 ms 64 bytes from 192.168.1.118: icmp_seq=8 ttl=64 time=0.141 ms 64 bytes from 192.168.1.118: icmp_seq=9 ttl=64 time=0.165 ms 64 bytes from 192.168.1.118: icmp_seq=10 ttl=64 time=0.168 ms 64 bytes from 192.168.1.118: icmp_seq=11 ttl=64 time=0.153 ms 64 bytes from 192.168.1.118: icmp_seq=12 ttl=64 time=0.296 ms 64 bytes from 192.168.1.118: icmp_seq=13 ttl=64 time=0.258 ms 64 bytes from 192.168.1.118: icmp_seq=14 ttl=64 time=0.236 ms 64 bytes from 192.168.1.118: icmp_seq=15 ttl=64 time=0.190 ms 64 bytes from 192.168.1.118: icmp_seq=16 ttl=64 time=0.245 ms 64 bytes from 192.168.1.118: icmp_seq=17 ttl=64 time=0.187 ms 64 bytes from 192.168.1.118: icmp_seq=18 ttl=64 time=0.222 ms 64 bytes from 192.168.1.118: icmp_seq=19 ttl=64 time=0.231 ms 64 bytes from 192.168.1.118: icmp_seq=20 ttl=64 time=0.279 ms 64 bytes from 192.168.1.118: icmp_seq=21 ttl=64 time=0.271 ms 64 bytes from 192.168.1.118: icmp_seq=22 ttl=64 time=0.319 ms 64 bytes from 192.168.1.118: icmp_seq=23 ttl=64 time=0.350 ms 64 bytes from 192.168.1.118: icmp_seq=24 ttl=64 time=0.311 ms 64 bytes from 192.168.1.118: icmp_seq=25 ttl=64 time=0.249 ms 64 bytes from 192.168.1.118: icmp_seq=26 ttl=64 time=0.258 ms 64 bytes from 192.168.1.118: icmp_seq=27 ttl=64 time=0.220 ms 64 bytes from 192.168.1.118: icmp_seq=28 ttl=64 time=0.299 ms 64 bytes from 192.168.1.118: icmp_seq=29 ttl=64 time=0.281 ms 64 bytes from 192.168.1.118: icmp_seq=30 ttl=64 time=0.271 ms 64 bytes from 192.168.1.118: icmp_seq=31 ttl=64 time=0.241 ms 64 bytes from 192.168.1.118: icmp_seq=32 ttl=64 time=0.245 ms 64 bytes from 192.168.1.118: icmp_seq=33 ttl=64 time=0.245 ms 64 bytes from 192.168.1.118: icmp_seq=34 ttl=64 time=0.287 ms 64 bytes from 192.168.1.118: icmp_seq=35 ttl=64 time=0.322 ms 64 bytes from 192.168.1.118: icmp_seq=36 ttl=64 time=0.247 ms 64 bytes from 192.168.1.118: icmp_seq=37 ttl=64 time=0.316 ms 64 bytes from 192.168.1.118: icmp_seq=38 ttl=64 time=0.255 ms 64 bytes from 192.168.1.118: icmp_seq=39 ttl=64 time=0.308 ms 64 bytes from 192.168.1.118: icmp_seq=40 ttl=64 time=0.250 ms 64 bytes from 192.168.1.118: icmp_seq=41 ttl=64 time=0.260 ms ^C --- 192.168.1.118 ping statistics --- 63 packets transmitted, 41 received, 34.9206% packet loss, time 568ms rtt min/avg/max/mdev = 0.141/0.243/0.350/0.054 ms > > > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-21 20:04 ` Sameeh Jubran 2018-11-21 23:51 ` Samudrala, Sridhar @ 2018-11-22 18:27 ` Michael S. Tsirkin 2018-11-26 15:13 ` Sameeh Jubran 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-22 18:27 UTC (permalink / raw) To: Sameeh Jubran Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote: > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > Great to see you making progress on this! > > Some comments below: > > > > On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: > > > I have created a setup which has two hosts (host A and host B) with X710 10G > > > cards connected back to back. On one host (I'll refer to this host as host A) I > > > have configured a bridge with the PF interface as well as vitio-net's interface > > > (standby) both attached to it. > > > > ... > > > > > The command line I used: > > > > > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ > > > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= > > > cc17 \ > > > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ > > > > What's e1000 doing here? > > Can this be reason you can not talk to host? > I don't think so, the e1000 is for enabling WAN connection on the > guest for downloading packages and ssh connection. It is connected to > a separate bridge which is connected to the external interface of the > host. > > > > > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= > > > no,ifname=cc1_72,queues=4 \ > > > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= > > > cc1_72,vectors=10,mq=on,primary=cc1_71 \ > > > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ > > > -enable-kvm \ > > > -name netkvm \ > > > -m 3000M \ > > > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ > > > -smp 4 \ > > > -vga qxl \ > > > -spice port=6110,disable-ticketing \ > > > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ > > > -chardev spicevmc,name=vdagent,id=vdagent \ > > > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= > > > com.redhat.spice.0 \ > > > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ > > > -device virtio-serial \ > > > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ > > > -monitor stdio > > > > > > ... > > > > > Since I couldn't ping from VM to host B, I did an iperf test between the VM and > > > host A with the feature enabled and during the test I have unplugged the sriov > > > device, the device was unplugged successfully and no drops where observed as > > > you can see in the results below: > > > > > > [root@dhcp156-44 ~]# ifconfig > > > > Well I suspect this won't tell you anything, this shows packet drops at > > the hardware level. When e.g. link is down linux won't send any packets > > out. The simplest test is to monitor latency and throughput and see that > > while it is lower for the duration of migration, there are no huge > > spikes around the switch. > Oh, okay will do that. > > I have noticed some nasty lag when I tried to ssh to the VM using the > failover interface while I didn't experience that with the e1000. > Sridhar Any idea what might be the cause? Try tcpdump? > > > > -- > > MST > > > > -- > Respectfully, > Sameeh Jubran > Linkedin > Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-22 18:27 ` Michael S. Tsirkin @ 2018-11-26 15:13 ` Sameeh Jubran 2018-11-26 15:43 ` Sameeh Jubran 0 siblings, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-11-26 15:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev, liran.alon, Yan Vugenfirer On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote: > > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > Great to see you making progress on this! > > > Some comments below: > > > > > > On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: > > > > I have created a setup which has two hosts (host A and host B) with X710 10G > > > > cards connected back to back. On one host (I'll refer to this host as host A) I > > > > have configured a bridge with the PF interface as well as vitio-net's interface > > > > (standby) both attached to it. > > > > > > ... > > > > > > > The command line I used: > > > > > > > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ > > > > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= > > > > cc17 \ > > > > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ > > > > > > What's e1000 doing here? > > > Can this be reason you can not talk to host? > > I don't think so, the e1000 is for enabling WAN connection on the > > guest for downloading packages and ssh connection. It is connected to > > a separate bridge which is connected to the external interface of the > > host. > > > > > > > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= > > > > no,ifname=cc1_72,queues=4 \ > > > > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= > > > > cc1_72,vectors=10,mq=on,primary=cc1_71 \ > > > > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ > > > > -enable-kvm \ > > > > -name netkvm \ > > > > -m 3000M \ > > > > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ > > > > -smp 4 \ > > > > -vga qxl \ > > > > -spice port=6110,disable-ticketing \ > > > > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ > > > > -chardev spicevmc,name=vdagent,id=vdagent \ > > > > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= > > > > com.redhat.spice.0 \ > > > > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ > > > > -device virtio-serial \ > > > > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ > > > > -monitor stdio > > > > > > > > > ... > > > > > > > Since I couldn't ping from VM to host B, I did an iperf test between the VM and > > > > host A with the feature enabled and during the test I have unplugged the sriov > > > > device, the device was unplugged successfully and no drops where observed as > > > > you can see in the results below: > > > > > > > > [root@dhcp156-44 ~]# ifconfig > > > > > > Well I suspect this won't tell you anything, this shows packet drops at > > > the hardware level. When e.g. link is down linux won't send any packets > > > out. The simplest test is to monitor latency and throughput and see that > > > while it is lower for the duration of migration, there are no huge > > > spikes around the switch. > > Oh, okay will do that. > > > > I have noticed some nasty lag when I tried to ssh to the VM using the > > failover interface while I didn't experience that with the e1000. > > Sridhar Any idea what might be the cause? > > Try tcpdump? I have investigated this and this is what I have so far, maybe you can help me with some insights to figure what's going on. The setup is as follows: |_VM_| __||___ |host A|----X710---------back-to-back--------X710---|host B| _______________________________________________________________________ - On the host A: I have the following interfaces attached to the "test_br0" bridge: virtio-net's netdev, cc1_72 X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected in the back to back setup) The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117 _______________________________________________________________________ - On the host B: I have the following interfaces attached to the "test_br0" bridge: X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected in the back to back setup) The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118 _______________________________________________________________________ - On the VM: The failover interface has the ip: 192.168.1.17 _______________________________________________________________________ I can successfully ping 118 from 17. (host B from the VM), however I can't see the ICMP requests on host A anywhere! I can see them inside host B on ens2f0, I can see them in the VM on the failover interface but not on Host A. Not on the brdige (test_br0) as I would expect, not on the ens2f0 interface, not co cc1_72 (virtio-net) interface and of-course not on the world interface. This leads me to think that the icmp requests are send on the "vf" interface which I cant see on the host. The thing that further confirms my theory is when I use device_del to unplug the primary interface, the ping get disconnected. Using tcpdump I can see that the ping requests arrive to host B and there is a suitable ping reply, however the reply is not present on Host A or the VM anywhere, moreover, when the primary gets disconnected I start seeing the ping requests on Host A on the "test_br0" and "ens2f0". Liran do you think this is related to the mac vtables and vfs issue that you've mentioned on the monthly meeting? > > > > > > > -- > > > MST > > > > > > > > -- > > Respectfully, > > Sameeh Jubran > > Linkedin > > Software Engineer @ Daynix. -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-26 15:13 ` Sameeh Jubran @ 2018-11-26 15:43 ` Sameeh Jubran 2018-11-26 20:22 ` Samudrala, Sridhar 0 siblings, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-11-26 15:43 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev, liran.alon, Yan Vugenfirer On Mon, Nov 26, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote: > > On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote: > > > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > Great to see you making progress on this! > > > > Some comments below: > > > > > > > > On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: > > > > > I have created a setup which has two hosts (host A and host B) with X710 10G > > > > > cards connected back to back. On one host (I'll refer to this host as host A) I > > > > > have configured a bridge with the PF interface as well as vitio-net's interface > > > > > (standby) both attached to it. > > > > > > > > ... > > > > > > > > > The command line I used: > > > > > > > > > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ > > > > > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= > > > > > cc17 \ > > > > > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ > > > > > > > > What's e1000 doing here? > > > > Can this be reason you can not talk to host? > > > I don't think so, the e1000 is for enabling WAN connection on the > > > guest for downloading packages and ssh connection. It is connected to > > > a separate bridge which is connected to the external interface of the > > > host. > > > > > > > > > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= > > > > > no,ifname=cc1_72,queues=4 \ > > > > > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= > > > > > cc1_72,vectors=10,mq=on,primary=cc1_71 \ > > > > > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ > > > > > -enable-kvm \ > > > > > -name netkvm \ > > > > > -m 3000M \ > > > > > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ > > > > > -smp 4 \ > > > > > -vga qxl \ > > > > > -spice port=6110,disable-ticketing \ > > > > > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ > > > > > -chardev spicevmc,name=vdagent,id=vdagent \ > > > > > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= > > > > > com.redhat.spice.0 \ > > > > > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ > > > > > -device virtio-serial \ > > > > > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ > > > > > -monitor stdio > > > > > > > > > > > > ... > > > > > > > > > Since I couldn't ping from VM to host B, I did an iperf test between the VM and > > > > > host A with the feature enabled and during the test I have unplugged the sriov > > > > > device, the device was unplugged successfully and no drops where observed as > > > > > you can see in the results below: > > > > > > > > > > [root@dhcp156-44 ~]# ifconfig > > > > > > > > Well I suspect this won't tell you anything, this shows packet drops at > > > > the hardware level. When e.g. link is down linux won't send any packets > > > > out. The simplest test is to monitor latency and throughput and see that > > > > while it is lower for the duration of migration, there are no huge > > > > spikes around the switch. > > > Oh, okay will do that. > > > > > > I have noticed some nasty lag when I tried to ssh to the VM using the > > > failover interface while I didn't experience that with the e1000. > > > Sridhar Any idea what might be the cause? > > > > Try tcpdump? > I have investigated this and this is what I have so far, maybe you can > help me with some insights to figure what's going on. > The setup is as follows: > > > > |_VM_| > __||___ > |host A|----X710---------back-to-back--------X710---|host B| > > _______________________________________________________________________ > - On the host A: > > I have the following interfaces attached to the "test_br0" bridge: > > virtio-net's netdev, cc1_72 > X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected > in the back to back setup) > > The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117 > _______________________________________________________________________ > - On the host B: > > I have the following interfaces attached to the "test_br0" bridge: > > X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected > in the back to back setup) > > The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118 > _______________________________________________________________________ > - On the VM: > The failover interface has the ip: 192.168.1.17 > _______________________________________________________________________ > > I can successfully ping 118 from 17. (host B from the VM), however I > can't see the ICMP requests on host A anywhere! > I can see them inside host B on ens2f0, I can see them in the VM on > the failover interface but not on Host A. > Not on the brdige (test_br0) as I would expect, not on the ens2f0 > interface, not co cc1_72 (virtio-net) interface and of-course not on > the world interface. > This leads me to think that the icmp requests are send on the "vf" > interface which I cant see on the host. The thing that further > confirms my theory is when > I use device_del to unplug the primary interface, the ping get > disconnected. Using tcpdump I can see that the ping requests arrive to > host B and there is a > suitable ping reply, however the reply is not present on Host A or the > VM anywhere, moreover, when the primary gets disconnected I start > seeing the ping > requests on Host A on the "test_br0" and "ens2f0". > > Liran do you think this is related to the mac vtables and vfs issue > that you've mentioned on the monthly meeting? > > Update: I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac 00:00:00:00:00:00) after unplugging it (the primary device) and the pings started working again on the failover interface. So it seems like the frames were arriving to the vf on the host. > > > > > > > > > > -- > > > > MST > > > > > > > > > > > > -- > > > Respectfully, > > > Sameeh Jubran > > > Linkedin > > > Software Engineer @ Daynix. > > > > -- > Respectfully, > Sameeh Jubran > Linkedin > Software Engineer @ Daynix. -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-26 15:43 ` Sameeh Jubran @ 2018-11-26 20:22 ` Samudrala, Sridhar 2018-11-27 11:24 ` Sameeh Jubran 2018-11-28 17:08 ` Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-26 20:22 UTC (permalink / raw) To: Sameeh Jubran, Michael S. Tsirkin Cc: Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer On 11/26/2018 7:43 AM, Sameeh Jubran wrote: > On Mon, Nov 26, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote: >> On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>> On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote: >>>> On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>> Great to see you making progress on this! >>>>> Some comments below: >>>>> >>>>> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: >>>>>> I have created a setup which has two hosts (host A and host B) with X710 10G >>>>>> cards connected back to back. On one host (I'll refer to this host as host A) I >>>>>> have configured a bridge with the PF interface as well as vitio-net's interface >>>>>> (standby) both attached to it. >>>>> ... >>>>> >>>>>> The command line I used: >>>>>> >>>>>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ >>>>>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= >>>>>> cc17 \ >>>>>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ >>>>> What's e1000 doing here? >>>>> Can this be reason you can not talk to host? >>>> I don't think so, the e1000 is for enabling WAN connection on the >>>> guest for downloading packages and ssh connection. It is connected to >>>> a separate bridge which is connected to the external interface of the >>>> host. >>>>>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= >>>>>> no,ifname=cc1_72,queues=4 \ >>>>>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= >>>>>> cc1_72,vectors=10,mq=on,primary=cc1_71 \ >>>>>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ >>>>>> -enable-kvm \ >>>>>> -name netkvm \ >>>>>> -m 3000M \ >>>>>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ >>>>>> -smp 4 \ >>>>>> -vga qxl \ >>>>>> -spice port=6110,disable-ticketing \ >>>>>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ >>>>>> -chardev spicevmc,name=vdagent,id=vdagent \ >>>>>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= >>>>>> com.redhat.spice.0 \ >>>>>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ >>>>>> -device virtio-serial \ >>>>>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ >>>>>> -monitor stdio >>>>> >>>>> ... >>>>> >>>>>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and >>>>>> host A with the feature enabled and during the test I have unplugged the sriov >>>>>> device, the device was unplugged successfully and no drops where observed as >>>>>> you can see in the results below: >>>>>> >>>>>> [root@dhcp156-44 ~]# ifconfig >>>>> Well I suspect this won't tell you anything, this shows packet drops at >>>>> the hardware level. When e.g. link is down linux won't send any packets >>>>> out. The simplest test is to monitor latency and throughput and see that >>>>> while it is lower for the duration of migration, there are no huge >>>>> spikes around the switch. >>>> Oh, okay will do that. >>>> >>>> I have noticed some nasty lag when I tried to ssh to the VM using the >>>> failover interface while I didn't experience that with the e1000. >>>> Sridhar Any idea what might be the cause? >>> Try tcpdump? >> I have investigated this and this is what I have so far, maybe you can >> help me with some insights to figure what's going on. >> The setup is as follows: >> >> >> >> |_VM_| >> __||___ >> |host A|----X710---------back-to-back--------X710---|host B| >> >> _______________________________________________________________________ >> - On the host A: >> >> I have the following interfaces attached to the "test_br0" bridge: >> >> virtio-net's netdev, cc1_72 >> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected >> in the back to back setup) >> >> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117 >> _______________________________________________________________________ >> - On the host B: >> >> I have the following interfaces attached to the "test_br0" bridge: >> >> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected >> in the back to back setup) >> >> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118 >> _______________________________________________________________________ >> - On the VM: >> The failover interface has the ip: 192.168.1.17 >> _______________________________________________________________________ >> >> I can successfully ping 118 from 17. (host B from the VM), however I >> can't see the ICMP requests on host A anywhere! >> I can see them inside host B on ens2f0, I can see them in the VM on >> the failover interface but not on Host A. >> Not on the brdige (test_br0) as I would expect, not on the ens2f0 >> interface, not co cc1_72 (virtio-net) interface and of-course not on >> the world interface. This is the expected behavior when VF is directly attached to the VM and is being used as the primary interface. You don't see any packets on Host A. >> This leads me to think that the icmp requests are send on the "vf" >> interface which I cant see on the host. The thing that further >> confirms my theory is when >> I use device_del to unplug the primary interface, the ping get >> disconnected. Using tcpdump I can see that the ping requests arrive to >> host B and there is a >> suitable ping reply, however the reply is not present on Host A or the >> VM anywhere, moreover, when the primary gets disconnected I start >> seeing the ping >> requests on Host A on the "test_br0" and "ens2f0". >> >> Liran do you think this is related to the mac vtables and vfs issue >> that you've mentioned on the monthly meeting? >> >> > Update: > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > 00:00:00:00:00:00) after unplugging it (the primary device) and the > pings started working again on the failover interface. So it seems > like the frames were arriving to the vf on the host. > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets with VMs MAC start flowing via VF, bridge and the virtio interface. Have you looked at this documentation that shows a sample script to initiate live migration? https://www.kernel.org/doc/html/latest/networking/net_failover.html -Sridhar --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-26 20:22 ` Samudrala, Sridhar @ 2018-11-27 11:24 ` Sameeh Jubran 2018-11-28 17:08 ` Michael S. Tsirkin 1 sibling, 0 replies; 85+ messages in thread From: Sameeh Jubran @ 2018-11-27 11:24 UTC (permalink / raw) To: sridhar.samudrala Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer On Mon, Nov 26, 2018 at 10:22 PM Samudrala, Sridhar <sridhar.samudrala@intel.com> wrote: > > On 11/26/2018 7:43 AM, Sameeh Jubran wrote: > > On Mon, Nov 26, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote: > >> On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >>> On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote: > >>>> On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>>> Great to see you making progress on this! > >>>>> Some comments below: > >>>>> > >>>>> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote: > >>>>>> I have created a setup which has two hosts (host A and host B) with X710 10G > >>>>>> cards connected back to back. On one host (I'll refer to this host as host A) I > >>>>>> have configured a bridge with the PF interface as well as vitio-net's interface > >>>>>> (standby) both attached to it. > >>>>> ... > >>>>> > >>>>>> The command line I used: > >>>>>> > >>>>>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \ > >>>>>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname= > >>>>>> cc17 \ > >>>>>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \ > >>>>> What's e1000 doing here? > >>>>> Can this be reason you can not talk to host? > >>>> I don't think so, the e1000 is for enabling WAN connection on the > >>>> guest for downloading packages and ssh connection. It is connected to > >>>> a separate bridge which is connected to the external interface of the > >>>> host. > >>>>>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript= > >>>>>> no,ifname=cc1_72,queues=4 \ > >>>>>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id= > >>>>>> cc1_72,vectors=10,mq=on,primary=cc1_71 \ > >>>>>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \ > >>>>>> -enable-kvm \ > >>>>>> -name netkvm \ > >>>>>> -m 3000M \ > >>>>>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \ > >>>>>> -smp 4 \ > >>>>>> -vga qxl \ > >>>>>> -spice port=6110,disable-ticketing \ > >>>>>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \ > >>>>>> -chardev spicevmc,name=vdagent,id=vdagent \ > >>>>>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name= > >>>>>> com.redhat.spice.0 \ > >>>>>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ > >>>>>> -device virtio-serial \ > >>>>>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \ > >>>>>> -monitor stdio > >>>>> > >>>>> ... > >>>>> > >>>>>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and > >>>>>> host A with the feature enabled and during the test I have unplugged the sriov > >>>>>> device, the device was unplugged successfully and no drops where observed as > >>>>>> you can see in the results below: > >>>>>> > >>>>>> [root@dhcp156-44 ~]# ifconfig > >>>>> Well I suspect this won't tell you anything, this shows packet drops at > >>>>> the hardware level. When e.g. link is down linux won't send any packets > >>>>> out. The simplest test is to monitor latency and throughput and see that > >>>>> while it is lower for the duration of migration, there are no huge > >>>>> spikes around the switch. > >>>> Oh, okay will do that. > >>>> > >>>> I have noticed some nasty lag when I tried to ssh to the VM using the > >>>> failover interface while I didn't experience that with the e1000. > >>>> Sridhar Any idea what might be the cause? > >>> Try tcpdump? > >> I have investigated this and this is what I have so far, maybe you can > >> help me with some insights to figure what's going on. > >> The setup is as follows: > >> > >> > >> > >> |_VM_| > >> __||___ > >> |host A|----X710---------back-to-back--------X710---|host B| > >> > >> _______________________________________________________________________ > >> - On the host A: > >> > >> I have the following interfaces attached to the "test_br0" bridge: > >> > >> virtio-net's netdev, cc1_72 > >> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected > >> in the back to back setup) > >> > >> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117 > >> _______________________________________________________________________ > >> - On the host B: > >> > >> I have the following interfaces attached to the "test_br0" bridge: > >> > >> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected > >> in the back to back setup) > >> > >> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118 > >> _______________________________________________________________________ > >> - On the VM: > >> The failover interface has the ip: 192.168.1.17 > >> _______________________________________________________________________ > >> > >> I can successfully ping 118 from 17. (host B from the VM), however I > >> can't see the ICMP requests on host A anywhere! > >> I can see them inside host B on ens2f0, I can see them in the VM on > >> the failover interface but not on Host A. > >> Not on the brdige (test_br0) as I would expect, not on the ens2f0 > >> interface, not co cc1_72 (virtio-net) interface and of-course not on > >> the world interface. > > This is the expected behavior when VF is directly attached to the VM and is > being used as the primary interface. You don't see any packets on Host A. > > >> This leads me to think that the icmp requests are send on the "vf" > >> interface which I cant see on the host. The thing that further > >> confirms my theory is when > >> I use device_del to unplug the primary interface, the ping get > >> disconnected. Using tcpdump I can see that the ping requests arrive to > >> host B and there is a > >> suitable ping reply, however the reply is not present on Host A or the > >> VM anywhere, moreover, when the primary gets disconnected I start > >> seeing the ping > >> requests on Host A on the "test_br0" and "ens2f0". > >> > >> Liran do you think this is related to the mac vtables and vfs issue > >> that you've mentioned on the monthly meeting? > >> > >> > > Update: > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > pings started working again on the failover interface. So it seems > > like the frames were arriving to the vf on the host. > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > with VMs MAC start flowing via VF, bridge and the virtio interface. > > Have you looked at this documentation that shows a sample script to initiate live > migration? > https://www.kernel.org/doc/html/latest/networking/net_failover.html This means that we need the management cooperation to do this and Qemu-driver implementation only won't suffice. What do you suggest as an alternatives? Please share your thoughts > > -Sridhar > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-26 20:22 ` Samudrala, Sridhar 2018-11-27 11:24 ` Sameeh Jubran @ 2018-11-28 17:08 ` Michael S. Tsirkin 2018-11-28 17:31 ` Samudrala, Sridhar 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-28 17:08 UTC (permalink / raw) To: Samudrala, Sridhar Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > Update: > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > pings started working again on the failover interface. So it seems > > like the frames were arriving to the vf on the host. > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > with VMs MAC start flowing via VF, bridge and the virtio interface. > > Have you looked at this documentation that shows a sample script to initiate live > migration? > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > -Sridhar Interesting I didn't notice it does this. So in fact just defining VF mac will immediately divert packets to the VF? Given guest driver did not initialize VF yet won't a bunch of packets be dropped? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 17:08 ` Michael S. Tsirkin @ 2018-11-28 17:31 ` Samudrala, Sridhar 2018-11-28 17:35 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-28 17:31 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>> Update: >>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>> pings started working again on the failover interface. So it seems >>> like the frames were arriving to the vf on the host. >>> >>> >> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >> with VMs MAC start flowing via VF, bridge and the virtio interface. >> >> Have you looked at this documentation that shows a sample script to initiate live >> migration? >> https://www.kernel.org/doc/html/latest/networking/net_failover.html >> >> -Sridhar > Interesting I didn't notice it does this. So in fact > just defining VF mac will immediately divert packets > to the VF? Given guest driver did not initialize VF > yet won't a bunch of packets be dropped? There is typo in my stmt above (VF->PF) When the VF is unplugged, you need to reset the VFs MAC so that the packets with VMs MAC start flowing via PF, bridge and the virtio interface. When the VF is plugged in, ideally the MAC filter for the VF should be added to the HW once the guest driver comes up and can receive packets. Currently with intel drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver comes up in the VM. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 17:31 ` Samudrala, Sridhar @ 2018-11-28 17:35 ` Michael S. Tsirkin 2018-11-28 18:39 ` Samudrala, Sridhar 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-28 17:35 UTC (permalink / raw) To: Samudrala, Sridhar Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > > > Update: > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > > > pings started working again on the failover interface. So it seems > > > > like the frames were arriving to the vf on the host. > > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > with VMs MAC start flowing via VF, bridge and the virtio interface. > > > > > > Have you looked at this documentation that shows a sample script to initiate live > > > migration? > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > -Sridhar > > Interesting I didn't notice it does this. So in fact > > just defining VF mac will immediately divert packets > > to the VF? Given guest driver did not initialize VF > > yet won't a bunch of packets be dropped? > > There is typo in my stmt above (VF->PF) > When the VF is unplugged, you need to reset the VFs MAC so that the packets > with VMs MAC start flowing via PF, bridge and the virtio interface. > > When the VF is plugged in, ideally the MAC filter for the VF should be added to > the HW once the guest driver comes up and can receive packets. Currently with intel > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver > comes up in the VM. > > Can this be fixed in the intel drivers? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 17:35 ` Michael S. Tsirkin @ 2018-11-28 18:39 ` Samudrala, Sridhar 2018-11-28 18:51 ` Michael S. Tsirkin 2018-11-28 20:06 ` Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-28 18:39 UTC (permalink / raw) To: Michael S. Tsirkin, Carolyn Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >> >> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>> Update: >>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>> pings started working again on the failover interface. So it seems >>>>> like the frames were arriving to the vf on the host. >>>>> >>>>> >>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>> >>>> Have you looked at this documentation that shows a sample script to initiate live >>>> migration? >>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>> >>>> -Sridhar >>> Interesting I didn't notice it does this. So in fact >>> just defining VF mac will immediately divert packets >>> to the VF? Given guest driver did not initialize VF >>> yet won't a bunch of packets be dropped? >> There is typo in my stmt above (VF->PF) >> When the VF is unplugged, you need to reset the VFs MAC so that the packets >> with VMs MAC start flowing via PF, bridge and the virtio interface. >> >> When the VF is plugged in, ideally the MAC filter for the VF should be added to >> the HW once the guest driver comes up and can receive packets. Currently with intel >> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >> comes up in the VM. >> >> > Can this be fixed in the intel drivers? I just checked and it looks like this seems to have been addressed in the ice 100Gb driver. Will bring this up issue internally to see if we can change this behavior in i40e/ixgbe drivers. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 18:39 ` Samudrala, Sridhar @ 2018-11-28 18:51 ` Michael S. Tsirkin 2018-11-29 6:29 ` Samudrala, Sridhar 2018-11-28 20:06 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-28 18:51 UTC (permalink / raw) To: Samudrala, Sridhar Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: > > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > > > > > Update: > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > > > > > pings started working again on the failover interface. So it seems > > > > > > like the frames were arriving to the vf on the host. > > > > > > > > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface. > > > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live > > > > > migration? > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > -Sridhar > > > > Interesting I didn't notice it does this. So in fact > > > > just defining VF mac will immediately divert packets > > > > to the VF? Given guest driver did not initialize VF > > > > yet won't a bunch of packets be dropped? > > > There is typo in my stmt above (VF->PF) > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > with VMs MAC start flowing via PF, bridge and the virtio interface. > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to > > > the HW once the guest driver comes up and can receive packets. Currently with intel > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver > > > comes up in the VM. > > > > > > > > Can this be fixed in the intel drivers? > > I just checked and it looks like this seems to have been addressed in the > ice 100Gb driver. Thanks! Could you pls point out the relevant code/commit id? > Will bring this up issue internally to see if we can change this > behavior in i40e/ixgbe drivers. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 18:51 ` Michael S. Tsirkin @ 2018-11-29 6:29 ` Samudrala, Sridhar 0 siblings, 0 replies; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-29 6:29 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On 11/28/2018 10:51 AM, Michael S. Tsirkin wrote: > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>>>> Update: >>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>>>> pings started working again on the failover interface. So it seems >>>>>>> like the frames were arriving to the vf on the host. >>>>>>> >>>>>>> >>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>>>> >>>>>> Have you looked at this documentation that shows a sample script to initiate live >>>>>> migration? >>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>> >>>>>> -Sridhar >>>>> Interesting I didn't notice it does this. So in fact >>>>> just defining VF mac will immediately divert packets >>>>> to the VF? Given guest driver did not initialize VF >>>>> yet won't a bunch of packets be dropped? >>>> There is typo in my stmt above (VF->PF) >>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>> with VMs MAC start flowing via PF, bridge and the virtio interface. >>>> >>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to >>>> the HW once the guest driver comes up and can receive packets. Currently with intel >>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >>>> comes up in the VM. >>>> >>>> >>> Can this be fixed in the intel drivers? >> I just checked and it looks like this seems to have been addressed in the >> ice 100Gb driver. > Thanks! Could you pls point out the relevant code/commit id? You can look into ice_set_vf_mac(). Compare it with i40e_ndo_set_vf_mac() --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 18:39 ` Samudrala, Sridhar 2018-11-28 18:51 ` Michael S. Tsirkin @ 2018-11-28 20:06 ` Michael S. Tsirkin 2018-11-28 20:28 ` si-wei liu 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-28 20:06 UTC (permalink / raw) To: Samudrala, Sridhar Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: > > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > > > > > Update: > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > > > > > pings started working again on the failover interface. So it seems > > > > > > like the frames were arriving to the vf on the host. > > > > > > > > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface. > > > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live > > > > > migration? > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > -Sridhar > > > > Interesting I didn't notice it does this. So in fact > > > > just defining VF mac will immediately divert packets > > > > to the VF? Given guest driver did not initialize VF > > > > yet won't a bunch of packets be dropped? > > > There is typo in my stmt above (VF->PF) > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > with VMs MAC start flowing via PF, bridge and the virtio interface. > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to > > > the HW once the guest driver comes up and can receive packets. Currently with intel > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver > > > comes up in the VM. > > > > > > > > Can this be fixed in the intel drivers? > > I just checked and it looks like this seems to have been addressed in the > ice 100Gb driver. Will bring this up issue internally to see if we can change this > behavior in i40e/ixgbe drivers. Also what happens if the mac is programmed both in PF (e.g. with macvtap) and VF? Ideally VF will take precedence. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 20:06 ` Michael S. Tsirkin @ 2018-11-28 20:28 ` si-wei liu 2018-11-28 20:43 ` Michael S. Tsirkin 2018-11-29 1:15 ` Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: si-wei liu @ 2018-11-28 20:28 UTC (permalink / raw) To: Michael S. Tsirkin, Samudrala, Sridhar Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>>>> Update: >>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>>>> pings started working again on the failover interface. So it seems >>>>>>> like the frames were arriving to the vf on the host. >>>>>>> >>>>>>> >>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>>>> >>>>>> Have you looked at this documentation that shows a sample script to initiate live >>>>>> migration? >>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>> >>>>>> -Sridhar >>>>> Interesting I didn't notice it does this. So in fact >>>>> just defining VF mac will immediately divert packets >>>>> to the VF? Given guest driver did not initialize VF >>>>> yet won't a bunch of packets be dropped? >>>> There is typo in my stmt above (VF->PF) >>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>> with VMs MAC start flowing via PF, bridge and the virtio interface. >>>> >>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to >>>> the HW once the guest driver comes up and can receive packets. Currently with intel >>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >>>> comes up in the VM. >>>> >>>> >>> Can this be fixed in the intel drivers? >> I just checked and it looks like this seems to have been addressed in the >> ice 100Gb driver. Will bring this up issue internally to see if we can change this >> behavior in i40e/ixgbe drivers. > Also what happens if the mac is programmed both in PF (e.g. with > macvtap) and VF? Ideally VF will take precedence. I'm seriously doubtful that legacy Intel NIC hardware can do that instead of mucking around with software workaround in the PF driver. Actually, the same applies to other NIC vendors when hardware sees duplicate filters. There's no such control of precedence on one over the other. -Siwei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 20:28 ` si-wei liu @ 2018-11-28 20:43 ` Michael S. Tsirkin 2018-11-28 20:47 ` si-wei liu 2018-11-29 1:15 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-28 20:43 UTC (permalink / raw) To: si-wei liu Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: > > > On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: > > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: > > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: > > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > > > > > > > Update: > > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > > > > > > > pings started working again on the failover interface. So it seems > > > > > > > > like the frames were arriving to the vf on the host. > > > > > > > > > > > > > > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface. > > > > > > > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live > > > > > > > migration? > > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > > > > > -Sridhar > > > > > > Interesting I didn't notice it does this. So in fact > > > > > > just defining VF mac will immediately divert packets > > > > > > to the VF? Given guest driver did not initialize VF > > > > > > yet won't a bunch of packets be dropped? > > > > > There is typo in my stmt above (VF->PF) > > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > with VMs MAC start flowing via PF, bridge and the virtio interface. > > > > > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to > > > > > the HW once the guest driver comes up and can receive packets. Currently with intel > > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via > > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver > > > > > comes up in the VM. > > > > > > > > > > > > > > Can this be fixed in the intel drivers? > > > I just checked and it looks like this seems to have been addressed in the > > > ice 100Gb driver. Will bring this up issue internally to see if we can change this > > > behavior in i40e/ixgbe drivers. > > Also what happens if the mac is programmed both in PF (e.g. with > > macvtap) and VF? Ideally VF will take precedence. > I'm seriously doubtful that legacy Intel NIC hardware can do that instead of > mucking around with software workaround in the PF driver. Actually, the same > applies to other NIC vendors when hardware sees duplicate filters. There's > no such control of precedence on one over the other. > > > -Siwei > > > OK I guess we will need another feature bit for a software workaround then. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 20:43 ` Michael S. Tsirkin @ 2018-11-28 20:47 ` si-wei liu 0 siblings, 0 replies; 85+ messages in thread From: si-wei liu @ 2018-11-28 20:47 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On 11/28/2018 12:43 PM, Michael S. Tsirkin wrote: > On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >> >> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>>>>>> Update: >>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>>>>>> pings started working again on the failover interface. So it seems >>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>>>>>> >>>>>>>> Have you looked at this documentation that shows a sample script to initiate live >>>>>>>> migration? >>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>> >>>>>>>> -Sridhar >>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>> just defining VF mac will immediately divert packets >>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>> yet won't a bunch of packets be dropped? >>>>>> There is typo in my stmt above (VF->PF) >>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface. >>>>>> >>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to >>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel >>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >>>>>> comes up in the VM. >>>>>> >>>>>> >>>>> Can this be fixed in the intel drivers? >>>> I just checked and it looks like this seems to have been addressed in the >>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this >>>> behavior in i40e/ixgbe drivers. >>> Also what happens if the mac is programmed both in PF (e.g. with >>> macvtap) and VF? Ideally VF will take precedence. >> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of >> mucking around with software workaround in the PF driver. Actually, the same >> applies to other NIC vendors when hardware sees duplicate filters. There's >> no such control of precedence on one over the other. >> >> >> -Siwei >> >> >> > OK I guess we will need another feature bit for a software > workaround then. > OK but then what if the NIC vendor is not willing to take a software workaround for this feature? -Siwei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-28 20:28 ` si-wei liu 2018-11-28 20:43 ` Michael S. Tsirkin @ 2018-11-29 1:15 ` Michael S. Tsirkin 2018-11-29 6:37 ` Samudrala, Sridhar 2018-11-29 20:14 ` si-wei liu 1 sibling, 2 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-29 1:15 UTC (permalink / raw) To: si-wei liu Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: > > > On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: > > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: > > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: > > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > > > > > > > Update: > > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > > > > > > > pings started working again on the failover interface. So it seems > > > > > > > > like the frames were arriving to the vf on the host. > > > > > > > > > > > > > > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface. > > > > > > > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live > > > > > > > migration? > > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > > > > > -Sridhar > > > > > > Interesting I didn't notice it does this. So in fact > > > > > > just defining VF mac will immediately divert packets > > > > > > to the VF? Given guest driver did not initialize VF > > > > > > yet won't a bunch of packets be dropped? > > > > > There is typo in my stmt above (VF->PF) > > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > with VMs MAC start flowing via PF, bridge and the virtio interface. > > > > > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to > > > > > the HW once the guest driver comes up and can receive packets. Currently with intel > > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via > > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver > > > > > comes up in the VM. > > > > > > > > > > > > > > Can this be fixed in the intel drivers? > > > I just checked and it looks like this seems to have been addressed in the > > > ice 100Gb driver. Will bring this up issue internally to see if we can change this > > > behavior in i40e/ixgbe drivers. > > Also what happens if the mac is programmed both in PF (e.g. with > > macvtap) and VF? Ideally VF will take precedence. > I'm seriously doubtful that legacy Intel NIC hardware can do that instead of > mucking around with software workaround in the PF driver. Actually, the same > applies to other NIC vendors when hardware sees duplicate filters. There's > no such control of precedence on one over the other. > > > -Siwei > > Well removing a MAC from the PF filter when we are adding it to the VF filter should always be possible. Need to keep it in a separate list and re-add it when removing the MAC from VF filter. This can be handled in the net core, no need for driver specific hacks. Still, let's prioritize things correctly. IMHO it's fine if we initially assume promisc mode on the PF. macvlan has this mode too after all. Question is how does userspace know driver isn't broken in this respect? Let's add a "vf failover" flag somewhere so this can be probed? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-29 1:15 ` Michael S. Tsirkin @ 2018-11-29 6:37 ` Samudrala, Sridhar 2018-11-29 20:14 ` si-wei liu 1 sibling, 0 replies; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-29 6:37 UTC (permalink / raw) To: Michael S. Tsirkin, si-wei liu Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: > On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >> >> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>>>>>> Update: >>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>>>>>> pings started working again on the failover interface. So it seems >>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>>>>>> >>>>>>>> Have you looked at this documentation that shows a sample script to initiate live >>>>>>>> migration? >>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>> >>>>>>>> -Sridhar >>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>> just defining VF mac will immediately divert packets >>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>> yet won't a bunch of packets be dropped? >>>>>> There is typo in my stmt above (VF->PF) >>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface. >>>>>> >>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to >>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel >>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >>>>>> comes up in the VM. >>>>>> >>>>>> >>>>> Can this be fixed in the intel drivers? >>>> I just checked and it looks like this seems to have been addressed in the >>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this >>>> behavior in i40e/ixgbe drivers. >>> Also what happens if the mac is programmed both in PF (e.g. with >>> macvtap) and VF? Ideally VF will take precedence. >> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of >> mucking around with software workaround in the PF driver. Actually, the same >> applies to other NIC vendors when hardware sees duplicate filters. There's >> no such control of precedence on one over the other. >> >> >> -Siwei >> >> > Well removing a MAC from the PF filter when we are adding it to the VF > filter should always be possible. Need to keep it in a separate list and > re-add it when removing the MAC from VF filter. This can be handled in > the net core, no need for driver specific hacks. We don't explicitly add MAC to the PF filter list. Just resetting VFs MAC will cause the frames to reach PF as that is the default port. Also, setting MACs is a privileged operation and i think managing them during live migration should be handled by a management layer. > > > Still, let's prioritize things correctly. IMHO it's fine if we > initially assume promisc mode on the PF. macvlan has this mode too > after all. Yes. We don't need to explicitly set the MAC on the PF. > > Question is how does userspace know driver isn't broken in this respect? > Let's add a "vf failover" flag somewhere so this can be probed? I don't understand the need for this additional flag. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-29 1:15 ` Michael S. Tsirkin 2018-11-29 6:37 ` Samudrala, Sridhar @ 2018-11-29 20:14 ` si-wei liu 2018-11-29 21:17 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: si-wei liu @ 2018-11-29 20:14 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: > On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >> >> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>>>>>> Update: >>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>>>>>> pings started working again on the failover interface. So it seems >>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>> >>>>>>>>> >>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>>>>>> >>>>>>>> Have you looked at this documentation that shows a sample script to initiate live >>>>>>>> migration? >>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>> >>>>>>>> -Sridhar >>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>> just defining VF mac will immediately divert packets >>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>> yet won't a bunch of packets be dropped? >>>>>> There is typo in my stmt above (VF->PF) >>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface. >>>>>> >>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to >>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel >>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >>>>>> comes up in the VM. >>>>>> >>>>>> >>>>> Can this be fixed in the intel drivers? >>>> I just checked and it looks like this seems to have been addressed in the >>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this >>>> behavior in i40e/ixgbe drivers. >>> Also what happens if the mac is programmed both in PF (e.g. with >>> macvtap) and VF? Ideally VF will take precedence. >> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of >> mucking around with software workaround in the PF driver. Actually, the same >> applies to other NIC vendors when hardware sees duplicate filters. There's >> no such control of precedence on one over the other. >> >> >> -Siwei >> >> > Well removing a MAC from the PF filter when we are adding it to the VF > filter should always be possible. Need to keep it in a separate list and > re-add it when removing the MAC from VF filter. This can be handled in > the net core, no need for driver specific hacks. So that is what I ever said - essentially what you need is a netdev API, rather than to add dirty hacks on each driver. That is fine, but how would you implement it? Note there's no equivalent driver level .ndo API to "move" filters, and all existing .ndo APIs manipulate at the MAC address level as opposed to filters. Are you going to convince netdev this is the right thing to do and we should add such API to the net core and each individual driver? > Still, let's prioritize things correctly. IMHO it's fine if we > initially assume promisc mode on the PF. macvlan has this mode too > after all. I'm not sure what promisc mode you talked about. As far as I understand it for macvlan/macvtap the NIC is only put into promisc mode when running out of MAC filter entries. Before that all MAC addresses will be added to the NIC as unicast filters. In addition, people prefer macvlan/macvtap for adding isolation in a multi-tenant cloud as well as avoiding performance penalty due to noisy neighbors. I'd rather to hear that claim to be that the current MAC-based pairing scheme doesn't work well with macvtap and only works with bridged setup which has promisc enabled. That would be more helpful for people to understand the situation better. Thanks, -Siwei > > Question is how does userspace know driver isn't broken in this respect? > Let's add a "vf failover" flag somewhere so this can be probed? > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-29 20:14 ` si-wei liu @ 2018-11-29 21:17 ` Michael S. Tsirkin 2018-11-29 22:53 ` si-wei liu 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-29 21:17 UTC (permalink / raw) To: si-wei liu Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: > > > On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: > > On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: > > > > > > On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: > > > > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: > > > > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: > > > > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: > > > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > > > > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > > > > > > > > > Update: > > > > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > > > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > > > > > > > > > pings started working again on the failover interface. So it seems > > > > > > > > > > like the frames were arriving to the vf on the host. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface. > > > > > > > > > > > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live > > > > > > > > > migration? > > > > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > > > > > > > > > -Sridhar > > > > > > > > Interesting I didn't notice it does this. So in fact > > > > > > > > just defining VF mac will immediately divert packets > > > > > > > > to the VF? Given guest driver did not initialize VF > > > > > > > > yet won't a bunch of packets be dropped? > > > > > > > There is typo in my stmt above (VF->PF) > > > > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > > > with VMs MAC start flowing via PF, bridge and the virtio interface. > > > > > > > > > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to > > > > > > > the HW once the guest driver comes up and can receive packets. Currently with intel > > > > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via > > > > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver > > > > > > > comes up in the VM. > > > > > > > > > > > > > > > > > > > > Can this be fixed in the intel drivers? > > > > > I just checked and it looks like this seems to have been addressed in the > > > > > ice 100Gb driver. Will bring this up issue internally to see if we can change this > > > > > behavior in i40e/ixgbe drivers. > > > > Also what happens if the mac is programmed both in PF (e.g. with > > > > macvtap) and VF? Ideally VF will take precedence. > > > I'm seriously doubtful that legacy Intel NIC hardware can do that instead of > > > mucking around with software workaround in the PF driver. Actually, the same > > > applies to other NIC vendors when hardware sees duplicate filters. There's > > > no such control of precedence on one over the other. > > > > > > > > > -Siwei > > > > > > > > Well removing a MAC from the PF filter when we are adding it to the VF > > filter should always be possible. Need to keep it in a separate list and > > re-add it when removing the MAC from VF filter. This can be handled in > > the net core, no need for driver specific hacks. > So that is what I ever said - essentially what you need is a netdev API, > rather than to add dirty hacks on each driver. That is fine, but how would > you implement it? Note there's no equivalent driver level .ndo API to "move" > filters, and all existing .ndo APIs manipulate at the MAC address level as > opposed to filters. Are you going to convince netdev this is the right thing > to do and we should add such API to the net core and each individual driver? There's no need for a new API IMO. You drop it from list of uc macs, then call .ndo_set_rx_mode. This can be done without changing existing drivers. > > > Still, let's prioritize things correctly. IMHO it's fine if we > > initially assume promisc mode on the PF. macvlan has this mode too > > after all. > I'm not sure what promisc mode you talked about. As far as I understand it > for macvlan/macvtap the NIC is only put into promisc mode when running out > of MAC filter entries. Before that all MAC addresses will be added to the > NIC as unicast filters. In addition, people prefer macvlan/macvtap for > adding isolation in a multi-tenant cloud as well as avoiding performance > penalty due to noisy neighbors. I'd rather to hear that claim to be that the > current MAC-based pairing scheme doesn't work well with macvtap and only > works with bridged setup which has promisc enabled. That would be more > helpful for people to understand the situation better. > > Thanks, > -Siwei > As a first step that's fine. Still this assumes just creating a VF doesn't yet program the on-card filter to cause packet drops. Let's assume drivers are fixed to do that. How does userspace know that's the case? We might need some kind of attribute so userspace can detect it. > > > > Question is how does userspace know driver isn't broken in this respect? > > Let's add a "vf failover" flag somewhere so this can be probed? > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-29 21:17 ` Michael S. Tsirkin @ 2018-11-29 22:53 ` si-wei liu 2018-11-29 23:53 ` Samudrala, Sridhar 2018-11-30 6:21 ` Michael S. Tsirkin 0 siblings, 2 replies; 85+ messages in thread From: si-wei liu @ 2018-11-29 22:53 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote: > On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: >> >> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: >>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>>>>>>>> Update: >>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>>>>>>>> pings started working again on the failover interface. So it seems >>>>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>>>>>>>> >>>>>>>>>> Have you looked at this documentation that shows a sample script to initiate live >>>>>>>>>> migration? >>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>>>> >>>>>>>>>> -Sridhar >>>>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>>>> just defining VF mac will immediately divert packets >>>>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>>>> yet won't a bunch of packets be dropped? >>>>>>>> There is typo in my stmt above (VF->PF) >>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface. >>>>>>>> >>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to >>>>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel >>>>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >>>>>>>> comes up in the VM. >>>>>>>> >>>>>>>> >>>>>>> Can this be fixed in the intel drivers? >>>>>> I just checked and it looks like this seems to have been addressed in the >>>>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this >>>>>> behavior in i40e/ixgbe drivers. >>>>> Also what happens if the mac is programmed both in PF (e.g. with >>>>> macvtap) and VF? Ideally VF will take precedence. >>>> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of >>>> mucking around with software workaround in the PF driver. Actually, the same >>>> applies to other NIC vendors when hardware sees duplicate filters. There's >>>> no such control of precedence on one over the other. >>>> >>>> >>>> -Siwei >>>> >>>> >>> Well removing a MAC from the PF filter when we are adding it to the VF >>> filter should always be possible. Need to keep it in a separate list and >>> re-add it when removing the MAC from VF filter. This can be handled in >>> the net core, no need for driver specific hacks. >> So that is what I ever said - essentially what you need is a netdev API, >> rather than to add dirty hacks on each driver. That is fine, but how would >> you implement it? Note there's no equivalent driver level .ndo API to "move" >> filters, and all existing .ndo APIs manipulate at the MAC address level as >> opposed to filters. Are you going to convince netdev this is the right thing >> to do and we should add such API to the net core and each individual driver? > There's no need for a new API IMO. > You drop it from list of uc macs, then call .ndo_set_rx_mode. Then still you need a new netlink API - effectively it alters the running state of macvtap as it steals certain filters out from the NIC that affects the datapath of macvtap. I assume we talk about some kernel mechanism to do automatic datapath switching without involving userspace management stack/orchestration software. In the kernel's (net core's) view that also needs some weak binding/coordination between the VF and the macvtap for which MAC filter needs to be activated. Still this senses to me a new API rather than tweaking the current and long-existing default behavior and making it work transparently just for this case. Otherwise, without introducing a new API, how does the userspace infer that the running kernel supports this new behavior. > This can be done without changing existing drivers. > >>> Still, let's prioritize things correctly. IMHO it's fine if we >>> initially assume promisc mode on the PF. macvlan has this mode too >>> after all. >> I'm not sure what promisc mode you talked about. As far as I understand it >> for macvlan/macvtap the NIC is only put into promisc mode when running out >> of MAC filter entries. Before that all MAC addresses will be added to the >> NIC as unicast filters. In addition, people prefer macvlan/macvtap for >> adding isolation in a multi-tenant cloud as well as avoiding performance >> penalty due to noisy neighbors. I'd rather to hear that claim to be that the >> current MAC-based pairing scheme doesn't work well with macvtap and only >> works with bridged setup which has promisc enabled. That would be more >> helpful for people to understand the situation better. >> >> Thanks, >> -Siwei >> > As a first step that's fine. Well, I specifically called it out one year ago as this work was started that macvtap is what we look into (we don't care about bridge with promiscuous enabled) and the answer I got at the point was that the current model would work well for macvtap too (which I've been very doubtful from the very beginning). Eventually turns out this is not true and it looks like this is slowly converging to what Hyper-V netvsc already supported quite a few years if not a decade ago, sighs... > Still this assumes just creating a VF > doesn't yet program the on-card filter to cause packet drops. Suppose this behavior is fixable in legacy Intel NIC, you would still need to evacuate the filter programmed by macvtap previously when VF's filter gets activated (typically when VF's netdev is netif_running() in a Linux guest). That's what we and NetVSC call as "datapath switching", and where this could be handled (driver, net core, or userspace) is the core for the architectural design that I spent much time on. Having said it, I don't expect or would desperately wait on one vendor to fix a legacy driver which wasn't quite motivated, then no work would be done on that. If you'd go the way, please make sure Intel could change their driver first. > Let's > assume drivers are fixed to do that. How does userspace know > that's the case? We might need some kind of attribute so > userspace can detect it. Where do you envision the new attribute could be at? Supposedly it'd be exposed by the kernel, which constitutes a new API or API changes. Thanks, -Siwei > >>> Question is how does userspace know driver isn't broken in this respect? >>> Let's add a "vf failover" flag somewhere so this can be probed? >>> --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-29 22:53 ` si-wei liu @ 2018-11-29 23:53 ` Samudrala, Sridhar 2018-11-30 0:24 ` si-wei liu 2018-11-30 6:21 ` Michael S. Tsirkin 1 sibling, 1 reply; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-29 23:53 UTC (permalink / raw) To: si-wei liu, Michael S. Tsirkin Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On 11/29/2018 2:53 PM, si-wei liu wrote: > > > On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote: >> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: >>> >>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: >>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar >>>>>>>> wrote: >>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar >>>>>>>>>> wrote: >>>>>>>>>>>> Update: >>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set >>>>>>>>>>>> ens2f0 vf 1 mac >>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) >>>>>>>>>>>> and the >>>>>>>>>>>> pings started working again on the failover interface. So >>>>>>>>>>>> it seems >>>>>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC >>>>>>>>>>> so that the packets >>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio >>>>>>>>>>> interface. >>>>>>>>>>> >>>>>>>>>>> Have you looked at this documentation that shows a sample >>>>>>>>>>> script to initiate live >>>>>>>>>>> migration? >>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Sridhar >>>>>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>>>>> just defining VF mac will immediately divert packets >>>>>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>>>>> yet won't a bunch of packets be dropped? >>>>>>>>> There is typo in my stmt above (VF->PF) >>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so >>>>>>>>> that the packets >>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio >>>>>>>>> interface. >>>>>>>>> >>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF >>>>>>>>> should be added to >>>>>>>>> the HW once the guest driver comes up and can receive packets. >>>>>>>>> Currently with intel >>>>>>>>> drivers, the filter gets added to HW as soon as the host admin >>>>>>>>> sets the VFs MAC via >>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet >>>>>>>>> drops until the VF driver >>>>>>>>> comes up in the VM. >>>>>>>>> >>>>>>>>> >>>>>>>> Can this be fixed in the intel drivers? >>>>>>> I just checked and it looks like this seems to have been >>>>>>> addressed in the >>>>>>> ice 100Gb driver. Will bring this up issue internally to see if >>>>>>> we can change this >>>>>>> behavior in i40e/ixgbe drivers. >>>>>> Also what happens if the mac is programmed both in PF (e.g. with >>>>>> macvtap) and VF? Ideally VF will take precedence. >>>>> I'm seriously doubtful that legacy Intel NIC hardware can do that >>>>> instead of >>>>> mucking around with software workaround in the PF driver. >>>>> Actually, the same >>>>> applies to other NIC vendors when hardware sees duplicate filters. >>>>> There's >>>>> no such control of precedence on one over the other. >>>>> >>>>> >>>>> -Siwei >>>>> >>>>> >>>> Well removing a MAC from the PF filter when we are adding it to the VF >>>> filter should always be possible. Need to keep it in a separate >>>> list and >>>> re-add it when removing the MAC from VF filter. This can be >>>> handled in >>>> the net core, no need for driver specific hacks. >>> So that is what I ever said - essentially what you need is a netdev >>> API, >>> rather than to add dirty hacks on each driver. That is fine, but how >>> would >>> you implement it? Note there's no equivalent driver level .ndo API >>> to "move" >>> filters, and all existing .ndo APIs manipulate at the MAC address >>> level as >>> opposed to filters. Are you going to convince netdev this is the >>> right thing >>> to do and we should add such API to the net core and each individual >>> driver? >> There's no need for a new API IMO. >> You drop it from list of uc macs, then call .ndo_set_rx_mode. > Then still you need a new netlink API - effectively it alters the > running state of macvtap as it steals certain filters out from the NIC > that affects the datapath of macvtap. I assume we talk about some > kernel mechanism to do automatic datapath switching without involving > userspace management stack/orchestration software. In the kernel's > (net core's) view that also needs some weak binding/coordination > between the VF and the macvtap for which MAC filter needs to be > activated. Still this senses to me a new API rather than tweaking the > current and long-existing default behavior and making it work > transparently just for this case. Otherwise, without introducing a new > API, how does the userspace infer that the running kernel supports > this new behavior. In case of virtio backed by macvtap, you can change the mac address of the macvtap interface. When VF is plugged in, change macvtap's MAC to an unassigned MAC and bring the virtio link down. When VF in unplugged, set macvtap's MAC to VMs mac and bring up virtio link. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-29 23:53 ` Samudrala, Sridhar @ 2018-11-30 0:24 ` si-wei liu 2018-11-30 3:08 ` Samudrala, Sridhar 0 siblings, 1 reply; 85+ messages in thread From: si-wei liu @ 2018-11-30 0:24 UTC (permalink / raw) To: Samudrala, Sridhar, Michael S. Tsirkin Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On 11/29/2018 3:53 PM, Samudrala, Sridhar wrote: > On 11/29/2018 2:53 PM, si-wei liu wrote: >> >> >> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote: >>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: >>>> >>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: >>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar >>>>>>>>> wrote: >>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar >>>>>>>>>>> wrote: >>>>>>>>>>>>> Update: >>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set >>>>>>>>>>>>> ens2f0 vf 1 mac >>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary >>>>>>>>>>>>> device) and the >>>>>>>>>>>>> pings started working again on the failover interface. So >>>>>>>>>>>>> it seems >>>>>>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs >>>>>>>>>>>> MAC so that the packets >>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio >>>>>>>>>>>> interface. >>>>>>>>>>>> >>>>>>>>>>>> Have you looked at this documentation that shows a sample >>>>>>>>>>>> script to initiate live >>>>>>>>>>>> migration? >>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -Sridhar >>>>>>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>>>>>> just defining VF mac will immediately divert packets >>>>>>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>>>>>> yet won't a bunch of packets be dropped? >>>>>>>>>> There is typo in my stmt above (VF->PF) >>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so >>>>>>>>>> that the packets >>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio >>>>>>>>>> interface. >>>>>>>>>> >>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF >>>>>>>>>> should be added to >>>>>>>>>> the HW once the guest driver comes up and can receive >>>>>>>>>> packets. Currently with intel >>>>>>>>>> drivers, the filter gets added to HW as soon as the host >>>>>>>>>> admin sets the VFs MAC via >>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet >>>>>>>>>> drops until the VF driver >>>>>>>>>> comes up in the VM. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Can this be fixed in the intel drivers? >>>>>>>> I just checked and it looks like this seems to have been >>>>>>>> addressed in the >>>>>>>> ice 100Gb driver. Will bring this up issue internally to see if >>>>>>>> we can change this >>>>>>>> behavior in i40e/ixgbe drivers. >>>>>>> Also what happens if the mac is programmed both in PF (e.g. with >>>>>>> macvtap) and VF? Ideally VF will take precedence. >>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do that >>>>>> instead of >>>>>> mucking around with software workaround in the PF driver. >>>>>> Actually, the same >>>>>> applies to other NIC vendors when hardware sees duplicate >>>>>> filters. There's >>>>>> no such control of precedence on one over the other. >>>>>> >>>>>> >>>>>> -Siwei >>>>>> >>>>>> >>>>> Well removing a MAC from the PF filter when we are adding it to >>>>> the VF >>>>> filter should always be possible. Need to keep it in a separate >>>>> list and >>>>> re-add it when removing the MAC from VF filter. This can be >>>>> handled in >>>>> the net core, no need for driver specific hacks. >>>> So that is what I ever said - essentially what you need is a netdev >>>> API, >>>> rather than to add dirty hacks on each driver. That is fine, but >>>> how would >>>> you implement it? Note there's no equivalent driver level .ndo API >>>> to "move" >>>> filters, and all existing .ndo APIs manipulate at the MAC address >>>> level as >>>> opposed to filters. Are you going to convince netdev this is the >>>> right thing >>>> to do and we should add such API to the net core and each >>>> individual driver? >>> There's no need for a new API IMO. >>> You drop it from list of uc macs, then call .ndo_set_rx_mode. >> Then still you need a new netlink API - effectively it alters the >> running state of macvtap as it steals certain filters out from the >> NIC that affects the datapath of macvtap. I assume we talk about some >> kernel mechanism to do automatic datapath switching without involving >> userspace management stack/orchestration software. In the kernel's >> (net core's) view that also needs some weak binding/coordination >> between the VF and the macvtap for which MAC filter needs to be >> activated. Still this senses to me a new API rather than tweaking the >> current and long-existing default behavior and making it work >> transparently just for this case. Otherwise, without introducing a >> new API, how does the userspace infer that the running kernel >> supports this new behavior. > > In case of virtio backed by macvtap, you can change the mac address of > the macvtap > interface. When VF is plugged in, change macvtap's MAC to an > unassigned MAC and bring > the virtio link down. > When VF in unplugged, set macvtap's MAC to VMs mac and bring up virtio > link. > This needs management software to orchestrate, right? What MST and I are discussing is to how to do this switching automatically without involving management software. -Siwei > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-30 0:24 ` si-wei liu @ 2018-11-30 3:08 ` Samudrala, Sridhar 2018-11-30 4:46 ` si-wei liu 0 siblings, 1 reply; 85+ messages in thread From: Samudrala, Sridhar @ 2018-11-30 3:08 UTC (permalink / raw) To: si-wei liu, Michael S. Tsirkin Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On 11/29/2018 4:24 PM, si-wei liu wrote: > > > On 11/29/2018 3:53 PM, Samudrala, Sridhar wrote: >> On 11/29/2018 2:53 PM, si-wei liu wrote: >>> >>> >>> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote: >>>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: >>>>> >>>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: >>>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >>>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar >>>>>>>> wrote: >>>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar >>>>>>>>>> wrote: >>>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, >>>>>>>>>>>> Sridhar wrote: >>>>>>>>>>>>>> Update: >>>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set >>>>>>>>>>>>>> ens2f0 vf 1 mac >>>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary >>>>>>>>>>>>>> device) and the >>>>>>>>>>>>>> pings started working again on the failover interface. So >>>>>>>>>>>>>> it seems >>>>>>>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs >>>>>>>>>>>>> MAC so that the packets >>>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio >>>>>>>>>>>>> interface. >>>>>>>>>>>>> >>>>>>>>>>>>> Have you looked at this documentation that shows a sample >>>>>>>>>>>>> script to initiate live >>>>>>>>>>>>> migration? >>>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Sridhar >>>>>>>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>>>>>>> just defining VF mac will immediately divert packets >>>>>>>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>>>>>>> yet won't a bunch of packets be dropped? >>>>>>>>>>> There is typo in my stmt above (VF->PF) >>>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so >>>>>>>>>>> that the packets >>>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio >>>>>>>>>>> interface. >>>>>>>>>>> >>>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF >>>>>>>>>>> should be added to >>>>>>>>>>> the HW once the guest driver comes up and can receive >>>>>>>>>>> packets. Currently with intel >>>>>>>>>>> drivers, the filter gets added to HW as soon as the host >>>>>>>>>>> admin sets the VFs MAC via >>>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet >>>>>>>>>>> drops until the VF driver >>>>>>>>>>> comes up in the VM. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Can this be fixed in the intel drivers? >>>>>>>>> I just checked and it looks like this seems to have been >>>>>>>>> addressed in the >>>>>>>>> ice 100Gb driver. Will bring this up issue internally to see >>>>>>>>> if we can change this >>>>>>>>> behavior in i40e/ixgbe drivers. >>>>>>>> Also what happens if the mac is programmed both in PF (e.g. with >>>>>>>> macvtap) and VF? Ideally VF will take precedence. >>>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do >>>>>>> that instead of >>>>>>> mucking around with software workaround in the PF driver. >>>>>>> Actually, the same >>>>>>> applies to other NIC vendors when hardware sees duplicate >>>>>>> filters. There's >>>>>>> no such control of precedence on one over the other. >>>>>>> >>>>>>> >>>>>>> -Siwei >>>>>>> >>>>>>> >>>>>> Well removing a MAC from the PF filter when we are adding it to >>>>>> the VF >>>>>> filter should always be possible. Need to keep it in a separate >>>>>> list and >>>>>> re-add it when removing the MAC from VF filter. This can be >>>>>> handled in >>>>>> the net core, no need for driver specific hacks. >>>>> So that is what I ever said - essentially what you need is a >>>>> netdev API, >>>>> rather than to add dirty hacks on each driver. That is fine, but >>>>> how would >>>>> you implement it? Note there's no equivalent driver level .ndo API >>>>> to "move" >>>>> filters, and all existing .ndo APIs manipulate at the MAC address >>>>> level as >>>>> opposed to filters. Are you going to convince netdev this is the >>>>> right thing >>>>> to do and we should add such API to the net core and each >>>>> individual driver? >>>> There's no need for a new API IMO. >>>> You drop it from list of uc macs, then call .ndo_set_rx_mode. >>> Then still you need a new netlink API - effectively it alters the >>> running state of macvtap as it steals certain filters out from the >>> NIC that affects the datapath of macvtap. I assume we talk about >>> some kernel mechanism to do automatic datapath switching without >>> involving userspace management stack/orchestration software. In the >>> kernel's (net core's) view that also needs some weak >>> binding/coordination between the VF and the macvtap for which MAC >>> filter needs to be activated. Still this senses to me a new API >>> rather than tweaking the current and long-existing default behavior >>> and making it work transparently just for this case. Otherwise, >>> without introducing a new API, how does the userspace infer that the >>> running kernel supports this new behavior. >> >> In case of virtio backed by macvtap, you can change the mac address >> of the macvtap >> interface. When VF is plugged in, change macvtap's MAC to an >> unassigned MAC and bring >> the virtio link down. >> When VF in unplugged, set macvtap's MAC to VMs mac and bring up >> virtio link. >> > This needs management software to orchestrate, right? Yes. Isn't that a good option as live migration is initiated and orchestrated via mgmt. software. > What MST and I are discussing is to how to do this switching > automatically without involving management software. OK. I agree that it would be nice if we can do all this automatically via Qemu when the orchestration sw initiates live migration rather than the mgmt. sw having to do some pre and post migration steps. It may be possible to do these pre and post migration steps in qemu via netlink api to the kernel to update the MAC addresses as we are now associating the primary and standby interfaces. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-30 3:08 ` Samudrala, Sridhar @ 2018-11-30 4:46 ` si-wei liu 0 siblings, 0 replies; 85+ messages in thread From: si-wei liu @ 2018-11-30 4:46 UTC (permalink / raw) To: Samudrala, Sridhar, Michael S. Tsirkin Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On 11/29/2018 07:08 PM, Samudrala, Sridhar wrote: > On 11/29/2018 4:24 PM, si-wei liu wrote: >> >> >> On 11/29/2018 3:53 PM, Samudrala, Sridhar wrote: >>> On 11/29/2018 2:53 PM, si-wei liu wrote: >>>> >>>> >>>> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote: >>>>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: >>>>>> >>>>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: >>>>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >>>>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar >>>>>>>>> wrote: >>>>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar >>>>>>>>>>> wrote: >>>>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, >>>>>>>>>>>>> Sridhar wrote: >>>>>>>>>>>>>>> Update: >>>>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set >>>>>>>>>>>>>>> ens2f0 vf 1 mac >>>>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary >>>>>>>>>>>>>>> device) and the >>>>>>>>>>>>>>> pings started working again on the failover interface. >>>>>>>>>>>>>>> So it seems >>>>>>>>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs >>>>>>>>>>>>>> MAC so that the packets >>>>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio >>>>>>>>>>>>>> interface. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Have you looked at this documentation that shows a sample >>>>>>>>>>>>>> script to initiate live >>>>>>>>>>>>>> migration? >>>>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Sridhar >>>>>>>>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>>>>>>>> just defining VF mac will immediately divert packets >>>>>>>>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>>>>>>>> yet won't a bunch of packets be dropped? >>>>>>>>>>>> There is typo in my stmt above (VF->PF) >>>>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so >>>>>>>>>>>> that the packets >>>>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio >>>>>>>>>>>> interface. >>>>>>>>>>>> >>>>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the >>>>>>>>>>>> VF should be added to >>>>>>>>>>>> the HW once the guest driver comes up and can receive >>>>>>>>>>>> packets. Currently with intel >>>>>>>>>>>> drivers, the filter gets added to HW as soon as the host >>>>>>>>>>>> admin sets the VFs MAC via >>>>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet >>>>>>>>>>>> drops until the VF driver >>>>>>>>>>>> comes up in the VM. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Can this be fixed in the intel drivers? >>>>>>>>>> I just checked and it looks like this seems to have been >>>>>>>>>> addressed in the >>>>>>>>>> ice 100Gb driver. Will bring this up issue internally to see >>>>>>>>>> if we can change this >>>>>>>>>> behavior in i40e/ixgbe drivers. >>>>>>>>> Also what happens if the mac is programmed both in PF (e.g. with >>>>>>>>> macvtap) and VF? Ideally VF will take precedence. >>>>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do >>>>>>>> that instead of >>>>>>>> mucking around with software workaround in the PF driver. >>>>>>>> Actually, the same >>>>>>>> applies to other NIC vendors when hardware sees duplicate >>>>>>>> filters. There's >>>>>>>> no such control of precedence on one over the other. >>>>>>>> >>>>>>>> >>>>>>>> -Siwei >>>>>>>> >>>>>>>> >>>>>>> Well removing a MAC from the PF filter when we are adding it to >>>>>>> the VF >>>>>>> filter should always be possible. Need to keep it in a separate >>>>>>> list and >>>>>>> re-add it when removing the MAC from VF filter. This can be >>>>>>> handled in >>>>>>> the net core, no need for driver specific hacks. >>>>>> So that is what I ever said - essentially what you need is a >>>>>> netdev API, >>>>>> rather than to add dirty hacks on each driver. That is fine, but >>>>>> how would >>>>>> you implement it? Note there's no equivalent driver level .ndo >>>>>> API to "move" >>>>>> filters, and all existing .ndo APIs manipulate at the MAC address >>>>>> level as >>>>>> opposed to filters. Are you going to convince netdev this is the >>>>>> right thing >>>>>> to do and we should add such API to the net core and each >>>>>> individual driver? >>>>> There's no need for a new API IMO. >>>>> You drop it from list of uc macs, then call .ndo_set_rx_mode. >>>> Then still you need a new netlink API - effectively it alters the >>>> running state of macvtap as it steals certain filters out from the >>>> NIC that affects the datapath of macvtap. I assume we talk about >>>> some kernel mechanism to do automatic datapath switching without >>>> involving userspace management stack/orchestration software. In the >>>> kernel's (net core's) view that also needs some weak >>>> binding/coordination between the VF and the macvtap for which MAC >>>> filter needs to be activated. Still this senses to me a new API >>>> rather than tweaking the current and long-existing default behavior >>>> and making it work transparently just for this case. Otherwise, >>>> without introducing a new API, how does the userspace infer that >>>> the running kernel supports this new behavior. >>> >>> In case of virtio backed by macvtap, you can change the mac address >>> of the macvtap >>> interface. When VF is plugged in, change macvtap's MAC to an >>> unassigned MAC and bring >>> the virtio link down. >>> When VF in unplugged, set macvtap's MAC to VMs mac and bring up >>> virtio link. >>> >> This needs management software to orchestrate, right? > > Yes. Isn't that a good option as live migration is initiated and > orchestrated via mgmt. software. The motivation is to reduce the down time to zero to get in par with HyperV. Or maybe even better. But you won't be able to achieve that if initiating datapath switching from the userspace via mgmt software. > >> What MST and I are discussing is to how to do this switching >> automatically without involving management software. > > OK. I agree that it would be nice if we can do all this automatically > via Qemu when the orchestration sw > initiates live migration rather than the mgmt. sw having to do some > pre and post migration steps. > It may be possible to do these pre and post migration steps in qemu > via netlink api to the kernel to > update the MAC addresses as we are now associating the primary and > standby interfaces. The number one blocker for that approach now is can Intel ixgbe and i40e driver be fixed to defer adding MAC filter to the NIC until VF is up and running in guest? Particularly we'd limit the fix to PF side only with VF driver intact, using the existing mailbox or adminq interface. Thanks, -Siwei --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-29 22:53 ` si-wei liu 2018-11-29 23:53 ` Samudrala, Sridhar @ 2018-11-30 6:21 ` Michael S. Tsirkin 2018-12-04 2:09 ` si-wei liu 1 sibling, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-11-30 6:21 UTC (permalink / raw) To: si-wei liu Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On Thu, Nov 29, 2018 at 02:53:08PM -0800, si-wei liu wrote: > > > On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote: > > On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: > > > > > > On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: > > > > On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: > > > > > On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: > > > > > > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: > > > > > > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: > > > > > > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: > > > > > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: > > > > > > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: > > > > > > > > > > > > Update: > > > > > > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac > > > > > > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the > > > > > > > > > > > > pings started working again on the failover interface. So it seems > > > > > > > > > > > > like the frames were arriving to the vf on the host. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface. > > > > > > > > > > > > > > > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live > > > > > > > > > > > migration? > > > > > > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html > > > > > > > > > > > > > > > > > > > > > > -Sridhar > > > > > > > > > > Interesting I didn't notice it does this. So in fact > > > > > > > > > > just defining VF mac will immediately divert packets > > > > > > > > > > to the VF? Given guest driver did not initialize VF > > > > > > > > > > yet won't a bunch of packets be dropped? > > > > > > > > > There is typo in my stmt above (VF->PF) > > > > > > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets > > > > > > > > > with VMs MAC start flowing via PF, bridge and the virtio interface. > > > > > > > > > > > > > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to > > > > > > > > > the HW once the guest driver comes up and can receive packets. Currently with intel > > > > > > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via > > > > > > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver > > > > > > > > > comes up in the VM. > > > > > > > > > > > > > > > > > > > > > > > > > > Can this be fixed in the intel drivers? > > > > > > > I just checked and it looks like this seems to have been addressed in the > > > > > > > ice 100Gb driver. Will bring this up issue internally to see if we can change this > > > > > > > behavior in i40e/ixgbe drivers. > > > > > > Also what happens if the mac is programmed both in PF (e.g. with > > > > > > macvtap) and VF? Ideally VF will take precedence. > > > > > I'm seriously doubtful that legacy Intel NIC hardware can do that instead of > > > > > mucking around with software workaround in the PF driver. Actually, the same > > > > > applies to other NIC vendors when hardware sees duplicate filters. There's > > > > > no such control of precedence on one over the other. > > > > > > > > > > > > > > > -Siwei > > > > > > > > > > > > > > Well removing a MAC from the PF filter when we are adding it to the VF > > > > filter should always be possible. Need to keep it in a separate list and > > > > re-add it when removing the MAC from VF filter. This can be handled in > > > > the net core, no need for driver specific hacks. > > > So that is what I ever said - essentially what you need is a netdev API, > > > rather than to add dirty hacks on each driver. That is fine, but how would > > > you implement it? Note there's no equivalent driver level .ndo API to "move" > > > filters, and all existing .ndo APIs manipulate at the MAC address level as > > > opposed to filters. Are you going to convince netdev this is the right thing > > > to do and we should add such API to the net core and each individual driver? > > There's no need for a new API IMO. > > You drop it from list of uc macs, then call .ndo_set_rx_mode. > Then still you need a new netlink API > - effectively it alters the running > state of macvtap as it steals certain filters out from the NIC that affects > the datapath of macvtap. I assume we talk about some kernel mechanism to do > automatic datapath switching without involving userspace management > stack/orchestration software. In the kernel's (net core's) view that also > needs some weak binding/coordination between the VF and the macvtap for > which MAC filter needs to be activated. Still this senses to me a new API > rather than tweaking the current and long-existing default behavior and > making it work transparently just for this case. Otherwise, without > introducing a new API, how does the userspace infer that the running kernel > supports this new behavior. I agree. But a single flag is not much of an extension. We don't even need it in netlink, can be anywhere in e.g. sysfs. > > This can be done without changing existing drivers. > > > > > > Still, let's prioritize things correctly. IMHO it's fine if we > > > > initially assume promisc mode on the PF. macvlan has this mode too > > > > after all. > > > I'm not sure what promisc mode you talked about. As far as I understand it > > > for macvlan/macvtap the NIC is only put into promisc mode when running out > > > of MAC filter entries. Before that all MAC addresses will be added to the > > > NIC as unicast filters. In addition, people prefer macvlan/macvtap for > > > adding isolation in a multi-tenant cloud as well as avoiding performance > > > penalty due to noisy neighbors. I'd rather to hear that claim to be that the > > > current MAC-based pairing scheme doesn't work well with macvtap and only > > > works with bridged setup which has promisc enabled. That would be more > > > helpful for people to understand the situation better. > > > > > > Thanks, > > > -Siwei > > > > > As a first step that's fine. > Well, I specifically called it out one year ago as this work was started > that macvtap is what we look into (we don't care about bridge with > promiscuous enabled) and the answer I got at the point was that the current > model would work well for macvtap too (which I've been very doubtful from > the very beginning). At least I personally did not realize it's about macvtap. I wish there were example command lines showing what's broken. Liran got hold of me at the KVM forum and explained it's about macvlan that's the first I heard about it, but that was offline, others might hear just now first. The issue between macvlan and configuring a VF can be tested with a couple of simple commands maybe using e.g. netsniff with no need for a VM at all. Pity these were never posted - interested in posting a test tool that can be used to demonstrate/test the issue on various cards? > Eventually turns out this is not true and it looks like > this is slowly converging to what Hyper-V netvsc already supported quite a > few years if not a decade ago, sighs... Oh we'll see. Meanwhile what's missing and was missing all along for the change you seem to be advocating for to get off the ground is people who are ready to actually send e.g. spec, guest driver, test patches. > > Still this assumes just creating a VF > > doesn't yet program the on-card filter to cause packet drops. > Suppose this behavior is fixable in legacy Intel NIC, you would still need > to evacuate the filter programmed by macvtap previously when VF's filter > gets activated (typically when VF's netdev is netif_running() in a Linux > guest). That's what we and NetVSC call as "datapath switching", and where > this could be handled (driver, net core, or userspace) is the core for the > architectural design that I spent much time on. > > Having said it, I don't expect or would desperately wait on one vendor to > fix a legacy driver which wasn't quite motivated, then no work would be done > on that. Then that device can't be used with the mechanism in question. Or if there are lots of drivers like this maybe someone will be motivated enough to post a better implementation with a new feature bit. It's not that I'm arguing against that. But given the options of teaching management to play with netlink API in response to guest actions, and with VCPU stopped, and doing it all in host kernel drivers, I know I'll prefer host kernel changes. > If you'd go the way, please make sure Intel could change their > driver first. We'll see what happens with that. It's Sridhar from intel that implemented the guest changes after all, so I expect he's motivated to make them work well. > > Let's > > assume drivers are fixed to do that. How does userspace know > > that's the case? We might need some kind of attribute so > > userspace can detect it. > Where do you envision the new attribute could be at? Supposedly it'd be > exposed by the kernel, which constitutes a new API or API changes. > > > Thanks, > -Siwei People add e.g. new attributes in sysfs left and right. It's unlikely to be a matter of serious contention. > > > > > > Question is how does userspace know driver isn't broken in this respect? > > > > Let's add a "vf failover" flag somewhere so this can be probed? > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-11-30 6:21 ` Michael S. Tsirkin @ 2018-12-04 2:09 ` si-wei liu 2018-12-04 3:59 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: si-wei liu @ 2018-12-04 2:09 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On 11/29/2018 10:21 PM, Michael S. Tsirkin wrote: > On Thu, Nov 29, 2018 at 02:53:08PM -0800, si-wei liu wrote: >> >> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote: >>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote: >>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote: >>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote: >>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote: >>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote: >>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote: >>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote: >>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote: >>>>>>>>>>>>> Update: >>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac >>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the >>>>>>>>>>>>> pings started working again on the failover interface. So it seems >>>>>>>>>>>>> like the frames were arriving to the vf on the host. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface. >>>>>>>>>>>> >>>>>>>>>>>> Have you looked at this documentation that shows a sample script to initiate live >>>>>>>>>>>> migration? >>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html >>>>>>>>>>>> >>>>>>>>>>>> -Sridhar >>>>>>>>>>> Interesting I didn't notice it does this. So in fact >>>>>>>>>>> just defining VF mac will immediately divert packets >>>>>>>>>>> to the VF? Given guest driver did not initialize VF >>>>>>>>>>> yet won't a bunch of packets be dropped? >>>>>>>>>> There is typo in my stmt above (VF->PF) >>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets >>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface. >>>>>>>>>> >>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to >>>>>>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel >>>>>>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via >>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver >>>>>>>>>> comes up in the VM. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Can this be fixed in the intel drivers? >>>>>>>> I just checked and it looks like this seems to have been addressed in the >>>>>>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this >>>>>>>> behavior in i40e/ixgbe drivers. >>>>>>> Also what happens if the mac is programmed both in PF (e.g. with >>>>>>> macvtap) and VF? Ideally VF will take precedence. >>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of >>>>>> mucking around with software workaround in the PF driver. Actually, the same >>>>>> applies to other NIC vendors when hardware sees duplicate filters. There's >>>>>> no such control of precedence on one over the other. >>>>>> >>>>>> >>>>>> -Siwei >>>>>> >>>>>> >>>>> Well removing a MAC from the PF filter when we are adding it to the VF >>>>> filter should always be possible. Need to keep it in a separate list and >>>>> re-add it when removing the MAC from VF filter. This can be handled in >>>>> the net core, no need for driver specific hacks. >>>> So that is what I ever said - essentially what you need is a netdev API, >>>> rather than to add dirty hacks on each driver. That is fine, but how would >>>> you implement it? Note there's no equivalent driver level .ndo API to "move" >>>> filters, and all existing .ndo APIs manipulate at the MAC address level as >>>> opposed to filters. Are you going to convince netdev this is the right thing >>>> to do and we should add such API to the net core and each individual driver? >>> There's no need for a new API IMO. >>> You drop it from list of uc macs, then call .ndo_set_rx_mode. >> Then still you need a new netlink API >> - effectively it alters the running >> state of macvtap as it steals certain filters out from the NIC that affects >> the datapath of macvtap. I assume we talk about some kernel mechanism to do >> automatic datapath switching without involving userspace management >> stack/orchestration software. In the kernel's (net core's) view that also >> needs some weak binding/coordination between the VF and the macvtap for >> which MAC filter needs to be activated. Still this senses to me a new API >> rather than tweaking the current and long-existing default behavior and >> making it work transparently just for this case. Otherwise, without >> introducing a new API, how does the userspace infer that the running kernel >> supports this new behavior. > I agree. But a single flag is not much of an extension. We don't even > need it in netlink, can be anywhere in e.g. sysfs. I think sysfs attribute is for exposing the capability, while you still need to set up macvtap with some special mode via netlink. That way it doesn't break current behavior, and when VF's MAC filter is added macvtap would need to react to remove the filter from NIC. And add the one back when VF's MAC is removed. > >>> This can be done without changing existing drivers. >>> >>>>> Still, let's prioritize things correctly. IMHO it's fine if we >>>>> initially assume promisc mode on the PF. macvlan has this mode too >>>>> after all. >>>> I'm not sure what promisc mode you talked about. As far as I understand it >>>> for macvlan/macvtap the NIC is only put into promisc mode when running out >>>> of MAC filter entries. Before that all MAC addresses will be added to the >>>> NIC as unicast filters. In addition, people prefer macvlan/macvtap for >>>> adding isolation in a multi-tenant cloud as well as avoiding performance >>>> penalty due to noisy neighbors. I'd rather to hear that claim to be that the >>>> current MAC-based pairing scheme doesn't work well with macvtap and only >>>> works with bridged setup which has promisc enabled. That would be more >>>> helpful for people to understand the situation better. >>>> >>>> Thanks, >>>> -Siwei >>>> >>> As a first step that's fine. >> Well, I specifically called it out one year ago as this work was started >> that macvtap is what we look into (we don't care about bridge with >> promiscuous enabled) and the answer I got at the point was that the current >> model would work well for macvtap too (which I've been very doubtful from >> the very beginning). > At least I personally did not realize it's about macvtap. Wouldn't macvtap a very common backend that any virtio-net feature has to support? I thought it has tighter integration with vhost-net than bridge and tap. > I wish there > were example command lines showing what's broken. Liran got hold of me > at the KVM forum and explained it's about macvlan that's the first I > heard about it, but that was offline, others might hear just now first. > > The issue between macvlan and configuring a VF can be > tested with a couple of simple commands maybe using e.g. netsniff > with no need for a VM at all. > Pity these were never posted - interested in posting a test > tool that can be used to demonstrate/test the issue on various cards? > >> Eventually turns out this is not true and it looks like >> this is slowly converging to what Hyper-V netvsc already supported quite a >> few years if not a decade ago, sighs... > Oh we'll see. > > Meanwhile what's missing and was missing all along for the change you > seem to be advocating for to get off the ground is people who > are ready to actually send e.g. spec, guest driver, test patches. Partly because it hadn't been converged to the best way to do it (even the group ID mechanism with PCI bridge can address our need you don't seem to think it is valuable). The in-kernel approach is fine at its appearance, but I personally don't believe changing every legacy driver is the way to go. It's the choice of implementation and what has been implemented in those drivers today IMHO is nothing wrong. > >>> Still this assumes just creating a VF >>> doesn't yet program the on-card filter to cause packet drops. >> Suppose this behavior is fixable in legacy Intel NIC, you would still need >> to evacuate the filter programmed by macvtap previously when VF's filter >> gets activated (typically when VF's netdev is netif_running() in a Linux >> guest). That's what we and NetVSC call as "datapath switching", and where >> this could be handled (driver, net core, or userspace) is the core for the >> architectural design that I spent much time on. >> >> Having said it, I don't expect or would desperately wait on one vendor to >> fix a legacy driver which wasn't quite motivated, then no work would be done >> on that. > Then that device can't be used with the mechanism in question. > Or if there are lots of drivers like this maybe someone will be > motivated enough to post a better implementation with a new > feature bit. It's not that I'm arguing against that. > > But given the options of teaching management to play with > netlink API in response to guest actions, and with VCPU stopped, > and doing it all in host kernel drivers, I know I'll prefer host kernel > changes. We have some internal patches that leverage management to respond to various guest actions. If you're interested we can post them. The thing is no one would like to work on the libvirt changes, while internally we have our own orchestration software which is not libvirt. But if you think it's fine we can definitely share our QEMU patches while leaving out libvirt. Thanks, -Siwei > >> If you'd go the way, please make sure Intel could change their >> driver first. > We'll see what happens with that. It's Sridhar from intel that implemented > the guest changes after all, so I expect he's motivated to make them > work well. > > >>> Let's >>> assume drivers are fixed to do that. How does userspace know >>> that's the case? We might need some kind of attribute so >>> userspace can detect it. >> Where do you envision the new attribute could be at? Supposedly it'd be >> exposed by the kernel, which constitutes a new API or API changes. >> >> >> Thanks, >> -Siwei > People add e.g. new attributes in sysfs left and right. It's unlikely > to be a matter of serious contention. > >>>>> Question is how does userspace know driver isn't broken in this respect? >>>>> Let's add a "vf failover" flag somewhere so this can be probed? >>>>> > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-04 2:09 ` si-wei liu @ 2018-12-04 3:59 ` Michael S. Tsirkin 2018-12-05 16:18 ` Sameeh Jubran 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-12-04 3:59 UTC (permalink / raw) To: si-wei liu Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse, Boris Ostrovsky On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote: > > I agree. But a single flag is not much of an extension. We don't even > > need it in netlink, can be anywhere in e.g. sysfs. > I think sysfs attribute is for exposing the capability, while you still need > to set up macvtap with some special mode via netlink. That way it doesn't > break current behavior, and when VF's MAC filter is added macvtap would need > to react to remove the filter from NIC. And add the one back when VF's MAC > is removed. All this will be up to the developers actually working on it. My understanding is that intel is going to just change the behaviour unconditionally, and it's already the case for Mellanox. That creates a critical mass large enough that maybe others just need to confirm. ... > > Meanwhile what's missing and was missing all along for the change you > > seem to be advocating for to get off the ground is people who > > are ready to actually send e.g. spec, guest driver, test patches. > Partly because it hadn't been converged to the best way to do it (even the > group ID mechanism with PCI bridge can address our need you don't seem to > think it is valuable). The in-kernel approach is fine at its appearance, but > I personally don't believe changing every legacy driver is the way to go. > It's the choice of implementation and what has been implemented in those > drivers today IMHO is nothing wrong. It's not a question of being wrong as such. A standard behaviour is clearly better than each driver doing its own thing which is the case now. As long as we ar standardizing, let's standardize on something that matches our needs? But I really see no problem with also supporting other options, as long as someone is prepared to actually put in the work. > > > > > > Still this assumes just creating a VF > > > > doesn't yet program the on-card filter to cause packet drops. > > > Suppose this behavior is fixable in legacy Intel NIC, you would still need > > > to evacuate the filter programmed by macvtap previously when VF's filter > > > gets activated (typically when VF's netdev is netif_running() in a Linux > > > guest). That's what we and NetVSC call as "datapath switching", and where > > > this could be handled (driver, net core, or userspace) is the core for the > > > architectural design that I spent much time on. > > > > > > Having said it, I don't expect or would desperately wait on one vendor to > > > fix a legacy driver which wasn't quite motivated, then no work would be done > > > on that. > > Then that device can't be used with the mechanism in question. > > Or if there are lots of drivers like this maybe someone will be > > motivated enough to post a better implementation with a new > > feature bit. It's not that I'm arguing against that. > > > > But given the options of teaching management to play with > > netlink API in response to guest actions, and with VCPU stopped, > > and doing it all in host kernel drivers, I know I'll prefer host kernel > > changes. > We have some internal patches that leverage management to respond to various > guest actions. If you're interested we can post them. The thing is no one > would like to work on the libvirt changes, while internally we have our own > orchestration software which is not libvirt. But if you think it's fine we > can definitely share our QEMU patches while leaving out libvirt. > > Thanks, > -Siwei Sure, why not. The following is generally necessary for any virtio project to happen: - guest patches - qemu patches - spec documentation Some extras are sometimes a dependency, e.g. host kernel patches. Typically at least two of these are enough for people to be able to figure out how things work. > > > > > If you'd go the way, please make sure Intel could change their > > > driver first. > > We'll see what happens with that. It's Sridhar from intel that implemented > > the guest changes after all, so I expect he's motivated to make them > > work well. > > > > > > > > Let's > > > > assume drivers are fixed to do that. How does userspace know > > > > that's the case? We might need some kind of attribute so > > > > userspace can detect it. > > > Where do you envision the new attribute could be at? Supposedly it'd be > > > exposed by the kernel, which constitutes a new API or API changes. > > > > > > > > > Thanks, > > > -Siwei > > People add e.g. new attributes in sysfs left and right. It's unlikely > > to be a matter of serious contention. > > > > > > > > Question is how does userspace know driver isn't broken in this respect? > > > > > > Let's add a "vf failover" flag somewhere so this can be probed? > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-04 3:59 ` Michael S. Tsirkin @ 2018-12-05 16:18 ` Sameeh Jubran 2018-12-05 17:18 ` Michael S. Tsirkin 2018-12-08 1:54 ` si-wei liu 0 siblings, 2 replies; 85+ messages in thread From: Sameeh Jubran @ 2018-12-05 16:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky Hi all, This is a followup on the discussion in the DPDK and Virtio monthly meeting. Michael suggested that layer 2 tests should be created in order to test the PF/VF behavior in different scenarios without using VMs at all which should speed up the testing process. The following "mausezahn" tool - which is part of netsniff-ng package - can be used in order to generate layer 2 packets as follows: mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd" The packets can be sniffed using tcpdump or netsniff-ng. I am not completely sure how the setup should look like on the host, but here is a script which assigns macvlan to the PF and sets it's mac address to be the same as the VF mac address. The scripts assumes that the sriov is already configured and the vf are present. [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh MACVLAN_NAME=macvlan0 PF_NAME=enp59s0 VF_NUMBER=1 MAC_ADDR=20:71:c6:2a:68:38 echo "$PF_NAME vf status before setting mac" ip link show dev $PF_NAME ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan ip link set $PF_NAME up echo "$PF_NAME vf status after setting mac" ip link show dev $PF_NAME Please share your thoughts on how the different test scenarios should go, I can customize the scripts further more and host them somewhere. On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote: > > > I agree. But a single flag is not much of an extension. We don't even > > > need it in netlink, can be anywhere in e.g. sysfs. > > I think sysfs attribute is for exposing the capability, while you still need > > to set up macvtap with some special mode via netlink. That way it doesn't > > break current behavior, and when VF's MAC filter is added macvtap would need > > to react to remove the filter from NIC. And add the one back when VF's MAC > > is removed. > > All this will be up to the developers actually working on it. My > understanding is that intel is going to just change the behaviour > unconditionally, and it's already the case for Mellanox. > That creates a critical mass large enough that maybe others > just need to confirm. > > ... > > > > > Meanwhile what's missing and was missing all along for the change you > > > seem to be advocating for to get off the ground is people who > > > are ready to actually send e.g. spec, guest driver, test patches. > > Partly because it hadn't been converged to the best way to do it (even the > > group ID mechanism with PCI bridge can address our need you don't seem to > > think it is valuable). The in-kernel approach is fine at its appearance, but > > I personally don't believe changing every legacy driver is the way to go. > > It's the choice of implementation and what has been implemented in those > > drivers today IMHO is nothing wrong. > > It's not a question of being wrong as such. > A standard behaviour is clearly better than each driver doing its > own thing which is the case now. As long as we ar standardizing, > let's standardize on something that matches our needs? > But I really see no problem with also supporting other options, > as long as someone is prepared to actually put in the work. > > > > > > > > > > Still this assumes just creating a VF > > > > > doesn't yet program the on-card filter to cause packet drops. > > > > Suppose this behavior is fixable in legacy Intel NIC, you would still need > > > > to evacuate the filter programmed by macvtap previously when VF's filter > > > > gets activated (typically when VF's netdev is netif_running() in a Linux > > > > guest). That's what we and NetVSC call as "datapath switching", and where > > > > this could be handled (driver, net core, or userspace) is the core for the > > > > architectural design that I spent much time on. > > > > > > > > Having said it, I don't expect or would desperately wait on one vendor to > > > > fix a legacy driver which wasn't quite motivated, then no work would be done > > > > on that. > > > Then that device can't be used with the mechanism in question. > > > Or if there are lots of drivers like this maybe someone will be > > > motivated enough to post a better implementation with a new > > > feature bit. It's not that I'm arguing against that. > > > > > > But given the options of teaching management to play with > > > netlink API in response to guest actions, and with VCPU stopped, > > > and doing it all in host kernel drivers, I know I'll prefer host kernel > > > changes. > > We have some internal patches that leverage management to respond to various > > guest actions. If you're interested we can post them. The thing is no one > > would like to work on the libvirt changes, while internally we have our own > > orchestration software which is not libvirt. But if you think it's fine we > > can definitely share our QEMU patches while leaving out libvirt. > > > > Thanks, > > -Siwei > > Sure, why not. > > The following is generally necessary for any virtio project to happen: > - guest patches > - qemu patches > - spec documentation > > Some extras are sometimes a dependency, e.g. host kernel patches. > > > Typically at least two of these are enough for people to > be able to figure out how things work. > > > > > > > > > > > If you'd go the way, please make sure Intel could change their > > > > driver first. > > > We'll see what happens with that. It's Sridhar from intel that implemented > > > the guest changes after all, so I expect he's motivated to make them > > > work well. > > > > > > > > > > > Let's > > > > > assume drivers are fixed to do that. How does userspace know > > > > > that's the case? We might need some kind of attribute so > > > > > userspace can detect it. > > > > Where do you envision the new attribute could be at? Supposedly it'd be > > > > exposed by the kernel, which constitutes a new API or API changes. > > > > > > > > > > > > Thanks, > > > > -Siwei > > > People add e.g. new attributes in sysfs left and right. It's unlikely > > > to be a matter of serious contention. > > > > > > > > > > Question is how does userspace know driver isn't broken in this respect? > > > > > > > Let's add a "vf failover" flag somewhere so this can be probed? > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-05 16:18 ` Sameeh Jubran @ 2018-12-05 17:18 ` Michael S. Tsirkin 2018-12-08 1:54 ` si-wei liu 1 sibling, 0 replies; 85+ messages in thread From: Michael S. Tsirkin @ 2018-12-05 17:18 UTC (permalink / raw) To: Sameeh Jubran Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky On Wed, Dec 05, 2018 at 06:18:05PM +0200, Sameeh Jubran wrote: > Hi all, > > This is a followup on the discussion in the DPDK and Virtio monthly meeting. > > Michael suggested that layer 2 tests should be created in order to > test the PF/VF behavior in different scenarios without using VMs at > all which should speed up the testing process. > > The following "mausezahn" tool - which is part of netsniff-ng package > - can be used in order to generate layer 2 packets as follows: > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd" > > The packets can be sniffed using tcpdump or netsniff-ng. > > I am not completely sure how the setup should look like on the host, > but here is a script which assigns macvlan to the PF and sets it's mac > address to be the same as the VF mac address. The scripts assumes that > the sriov is already configured and the vf are present. > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh > MACVLAN_NAME=macvlan0 > PF_NAME=enp59s0 > VF_NUMBER=1 > MAC_ADDR=20:71:c6:2a:68:38 > > echo "$PF_NAME vf status before setting mac" > ip link show dev $PF_NAME > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan > ip link set $PF_NAME up > echo "$PF_NAME vf status after setting mac" > ip link show dev $PF_NAME > > Please share your thoughts on how the different test scenarios should > go, I can customize the scripts further more and host them somewhere. OK so for starters need code to send the packets (maybe multiple ones with a counter so drops can be detected?) and also to sniff and verify their arrival on either of the two interfaces? And then on top there would be all the different ways to switch between the two interfaces back and forth while this is going on. The tool would ideally do this several times with each method and report observed downtime. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-05 16:18 ` Sameeh Jubran 2018-12-05 17:18 ` Michael S. Tsirkin @ 2018-12-08 1:54 ` si-wei liu 2018-12-10 15:13 ` Sameeh Jubran 1 sibling, 1 reply; 85+ messages in thread From: si-wei liu @ 2018-12-08 1:54 UTC (permalink / raw) To: Sameeh Jubran, Michael S. Tsirkin Cc: sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky On 12/05/2018 08:18 AM, Sameeh Jubran wrote: > Hi all, > > This is a followup on the discussion in the DPDK and Virtio monthly meeting. > > Michael suggested that layer 2 tests should be created in order to > test the PF/VF behavior in different scenarios without using VMs at > all which should speed up the testing process. > > The following "mausezahn" tool - which is part of netsniff-ng package > - can be used in order to generate layer 2 packets as follows: > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd" > > The packets can be sniffed using tcpdump or netsniff-ng. Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default? Try disable it when you monitor/capture the L2 packets. > > I am not completely sure how the setup should look like on the host, > but here is a script which assigns macvlan to the PF and sets it's mac > address to be the same as the VF mac address. The scripts assumes that > the sriov is already configured and the vf are present. > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh > MACVLAN_NAME=macvlan0 > PF_NAME=enp59s0 > VF_NUMBER=1 > MAC_ADDR=20:71:c6:2a:68:38 > > echo "$PF_NAME vf status before setting mac" > ip link show dev $PF_NAME > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan > ip link set $PF_NAME up > echo "$PF_NAME vf status after setting mac" > ip link show dev $PF_NAME > > Please share your thoughts on how the different test scenarios should > go, I can customize the scripts further more and host them somewhere. You can do something like below: FAKE_VLAN=123 ip link set $MACVLAN_NAME up ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN Datapath now switched to macvlan0, which should get the L2 packets from over the wire. ip link set $PF_NAME vf $VF_NUMBER vlan 0 ip link set $MACVLAN_NAME down Datapath now switched back to VF. VF#1 should get packets. For a more accurate downtime test, replace 'ip link set vf .. vlan ...' to unbind VF from the original driver and bind it to vfio-pci. Regards, -Siwei > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote: >>>> I agree. But a single flag is not much of an extension. We don't even >>>> need it in netlink, can be anywhere in e.g. sysfs. >>> I think sysfs attribute is for exposing the capability, while you still need >>> to set up macvtap with some special mode via netlink. That way it doesn't >>> break current behavior, and when VF's MAC filter is added macvtap would need >>> to react to remove the filter from NIC. And add the one back when VF's MAC >>> is removed. >> All this will be up to the developers actually working on it. My >> understanding is that intel is going to just change the behaviour >> unconditionally, and it's already the case for Mellanox. >> That creates a critical mass large enough that maybe others >> just need to confirm. >> >> ... >> >> >>>> Meanwhile what's missing and was missing all along for the change you >>>> seem to be advocating for to get off the ground is people who >>>> are ready to actually send e.g. spec, guest driver, test patches. >>> Partly because it hadn't been converged to the best way to do it (even the >>> group ID mechanism with PCI bridge can address our need you don't seem to >>> think it is valuable). The in-kernel approach is fine at its appearance, but >>> I personally don't believe changing every legacy driver is the way to go. >>> It's the choice of implementation and what has been implemented in those >>> drivers today IMHO is nothing wrong. >> It's not a question of being wrong as such. >> A standard behaviour is clearly better than each driver doing its >> own thing which is the case now. As long as we ar standardizing, >> let's standardize on something that matches our needs? >> But I really see no problem with also supporting other options, >> as long as someone is prepared to actually put in the work. >> >> >>>>>> Still this assumes just creating a VF >>>>>> doesn't yet program the on-card filter to cause packet drops. >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need >>>>> to evacuate the filter programmed by macvtap previously when VF's filter >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux >>>>> guest). That's what we and NetVSC call as "datapath switching", and where >>>>> this could be handled (driver, net core, or userspace) is the core for the >>>>> architectural design that I spent much time on. >>>>> >>>>> Having said it, I don't expect or would desperately wait on one vendor to >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done >>>>> on that. >>>> Then that device can't be used with the mechanism in question. >>>> Or if there are lots of drivers like this maybe someone will be >>>> motivated enough to post a better implementation with a new >>>> feature bit. It's not that I'm arguing against that. >>>> >>>> But given the options of teaching management to play with >>>> netlink API in response to guest actions, and with VCPU stopped, >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel >>>> changes. >>> We have some internal patches that leverage management to respond to various >>> guest actions. If you're interested we can post them. The thing is no one >>> would like to work on the libvirt changes, while internally we have our own >>> orchestration software which is not libvirt. But if you think it's fine we >>> can definitely share our QEMU patches while leaving out libvirt. >>> >>> Thanks, >>> -Siwei >> Sure, why not. >> >> The following is generally necessary for any virtio project to happen: >> - guest patches >> - qemu patches >> - spec documentation >> >> Some extras are sometimes a dependency, e.g. host kernel patches. >> >> >> Typically at least two of these are enough for people to >> be able to figure out how things work. >> >> >> >> >>>>> If you'd go the way, please make sure Intel could change their >>>>> driver first. >>>> We'll see what happens with that. It's Sridhar from intel that implemented >>>> the guest changes after all, so I expect he's motivated to make them >>>> work well. >>>> >>>> >>>>>> Let's >>>>>> assume drivers are fixed to do that. How does userspace know >>>>>> that's the case? We might need some kind of attribute so >>>>>> userspace can detect it. >>>>> Where do you envision the new attribute could be at? Supposedly it'd be >>>>> exposed by the kernel, which constitutes a new API or API changes. >>>>> >>>>> >>>>> Thanks, >>>>> -Siwei >>>> People add e.g. new attributes in sysfs left and right. It's unlikely >>>> to be a matter of serious contention. >>>> >>>>>>>> Question is how does userspace know driver isn't broken in this respect? >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed? >>>>>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >>>> > > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-08 1:54 ` si-wei liu @ 2018-12-10 15:13 ` Sameeh Jubran 2018-12-10 15:34 ` Sameeh Jubran 0 siblings, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-12-10 15:13 UTC (permalink / raw) To: si-wei.liu Cc: Michael S. Tsirkin, sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote: > > > > On 12/05/2018 08:18 AM, Sameeh Jubran wrote: > > Hi all, > > > > This is a followup on the discussion in the DPDK and Virtio monthly meeting. > > > > Michael suggested that layer 2 tests should be created in order to > > test the PF/VF behavior in different scenarios without using VMs at > > all which should speed up the testing process. > > > > The following "mausezahn" tool - which is part of netsniff-ng package > > - can be used in order to generate layer 2 packets as follows: > > > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd" > > > > The packets can be sniffed using tcpdump or netsniff-ng. > Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default? > Try disable it when you monitor/capture the L2 packets. netsniff-ng enables promiscuous mode by default, however the -M flag can disable this. > > > > > I am not completely sure how the setup should look like on the host, > > but here is a script which assigns macvlan to the PF and sets it's mac > > address to be the same as the VF mac address. The scripts assumes that > > the sriov is already configured and the vf are present. > > > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh > > MACVLAN_NAME=macvlan0 > > PF_NAME=enp59s0 > > VF_NUMBER=1 > > MAC_ADDR=20:71:c6:2a:68:38 > > > > echo "$PF_NAME vf status before setting mac" > > ip link show dev $PF_NAME > > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR > > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan > > ip link set $PF_NAME up > > echo "$PF_NAME vf status after setting mac" > > ip link show dev $PF_NAME > > > > Please share your thoughts on how the different test scenarios should > > go, I can customize the scripts further more and host them somewhere. > You can do something like below: > > FAKE_VLAN=123 > ip link set $MACVLAN_NAME up > ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN > > Datapath now switched to macvlan0, which should get the L2 packets from > over the wire. > > ip link set $PF_NAME vf $VF_NUMBER vlan 0 > ip link set $MACVLAN_NAME down > > Datapath now switched back to VF. VF#1 should get packets. > > For a more accurate downtime test, replace 'ip link set vf .. vlan ...' > to unbind VF from the original driver and bind it to vfio-pci. Yup. > > > Regards, > -Siwei > > > > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote: > >>>> I agree. But a single flag is not much of an extension. We don't even > >>>> need it in netlink, can be anywhere in e.g. sysfs. > >>> I think sysfs attribute is for exposing the capability, while you still need > >>> to set up macvtap with some special mode via netlink. That way it doesn't > >>> break current behavior, and when VF's MAC filter is added macvtap would need > >>> to react to remove the filter from NIC. And add the one back when VF's MAC > >>> is removed. > >> All this will be up to the developers actually working on it. My > >> understanding is that intel is going to just change the behaviour > >> unconditionally, and it's already the case for Mellanox. > >> That creates a critical mass large enough that maybe others > >> just need to confirm. > >> > >> ... > >> > >> > >>>> Meanwhile what's missing and was missing all along for the change you > >>>> seem to be advocating for to get off the ground is people who > >>>> are ready to actually send e.g. spec, guest driver, test patches. > >>> Partly because it hadn't been converged to the best way to do it (even the > >>> group ID mechanism with PCI bridge can address our need you don't seem to > >>> think it is valuable). The in-kernel approach is fine at its appearance, but > >>> I personally don't believe changing every legacy driver is the way to go. > >>> It's the choice of implementation and what has been implemented in those > >>> drivers today IMHO is nothing wrong. > >> It's not a question of being wrong as such. > >> A standard behaviour is clearly better than each driver doing its > >> own thing which is the case now. As long as we ar standardizing, > >> let's standardize on something that matches our needs? > >> But I really see no problem with also supporting other options, > >> as long as someone is prepared to actually put in the work. > >> > >> > >>>>>> Still this assumes just creating a VF > >>>>>> doesn't yet program the on-card filter to cause packet drops. > >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need > >>>>> to evacuate the filter programmed by macvtap previously when VF's filter > >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux > >>>>> guest). That's what we and NetVSC call as "datapath switching", and where > >>>>> this could be handled (driver, net core, or userspace) is the core for the > >>>>> architectural design that I spent much time on. > >>>>> > >>>>> Having said it, I don't expect or would desperately wait on one vendor to > >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done > >>>>> on that. > >>>> Then that device can't be used with the mechanism in question. > >>>> Or if there are lots of drivers like this maybe someone will be > >>>> motivated enough to post a better implementation with a new > >>>> feature bit. It's not that I'm arguing against that. > >>>> > >>>> But given the options of teaching management to play with > >>>> netlink API in response to guest actions, and with VCPU stopped, > >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel > >>>> changes. > >>> We have some internal patches that leverage management to respond to various > >>> guest actions. If you're interested we can post them. The thing is no one > >>> would like to work on the libvirt changes, while internally we have our own > >>> orchestration software which is not libvirt. But if you think it's fine we > >>> can definitely share our QEMU patches while leaving out libvirt. > >>> > >>> Thanks, > >>> -Siwei > >> Sure, why not. > >> > >> The following is generally necessary for any virtio project to happen: > >> - guest patches > >> - qemu patches > >> - spec documentation > >> > >> Some extras are sometimes a dependency, e.g. host kernel patches. > >> > >> > >> Typically at least two of these are enough for people to > >> be able to figure out how things work. > >> > >> > >> > >> > >>>>> If you'd go the way, please make sure Intel could change their > >>>>> driver first. > >>>> We'll see what happens with that. It's Sridhar from intel that implemented > >>>> the guest changes after all, so I expect he's motivated to make them > >>>> work well. > >>>> > >>>> > >>>>>> Let's > >>>>>> assume drivers are fixed to do that. How does userspace know > >>>>>> that's the case? We might need some kind of attribute so > >>>>>> userspace can detect it. > >>>>> Where do you envision the new attribute could be at? Supposedly it'd be > >>>>> exposed by the kernel, which constitutes a new API or API changes. > >>>>> > >>>>> > >>>>> Thanks, > >>>>> -Siwei > >>>> People add e.g. new attributes in sysfs left and right. It's unlikely > >>>> to be a matter of serious contention. > >>>> > >>>>>>>> Question is how does userspace know driver isn't broken in this respect? > >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed? > >>>>>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > >>>> > > > > > -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-10 15:13 ` Sameeh Jubran @ 2018-12-10 15:34 ` Sameeh Jubran 2018-12-10 17:46 ` Michael S. Tsirkin 0 siblings, 1 reply; 85+ messages in thread From: Sameeh Jubran @ 2018-12-10 15:34 UTC (permalink / raw) To: si-wei.liu Cc: Michael S. Tsirkin, sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky On Mon, Dec 10, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote: > > On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote: > > > > > > > > On 12/05/2018 08:18 AM, Sameeh Jubran wrote: > > > Hi all, > > > > > > This is a followup on the discussion in the DPDK and Virtio monthly meeting. > > > > > > Michael suggested that layer 2 tests should be created in order to > > > test the PF/VF behavior in different scenarios without using VMs at > > > all which should speed up the testing process. > > > > > > The following "mausezahn" tool - which is part of netsniff-ng package > > > - can be used in order to generate layer 2 packets as follows: > > > > > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd" > > > > > > The packets can be sniffed using tcpdump or netsniff-ng. > > Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default? > > Try disable it when you monitor/capture the L2 packets. > netsniff-ng enables promiscuous mode by default, however the -M flag > can disable this. > > > > > > > > > I am not completely sure how the setup should look like on the host, > > > but here is a script which assigns macvlan to the PF and sets it's mac > > > address to be the same as the VF mac address. The scripts assumes that > > > the sriov is already configured and the vf are present. > > > > > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh > > > MACVLAN_NAME=macvlan0 > > > PF_NAME=enp59s0 > > > VF_NUMBER=1 > > > MAC_ADDR=20:71:c6:2a:68:38 > > > > > > echo "$PF_NAME vf status before setting mac" > > > ip link show dev $PF_NAME > > > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR > > > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan > > > ip link set $PF_NAME up > > > echo "$PF_NAME vf status after setting mac" > > > ip link show dev $PF_NAME > > > > > > Please share your thoughts on how the different test scenarios should > > > go, I can customize the scripts further more and host them somewhere. > > You can do something like below: > > > > FAKE_VLAN=123 > > ip link set $MACVLAN_NAME up > > ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN > > > > Datapath now switched to macvlan0, which should get the L2 packets from > > over the wire. > > > > ip link set $PF_NAME vf $VF_NUMBER vlan 0 > > ip link set $MACVLAN_NAME down > > > > Datapath now switched back to VF. VF#1 should get packets. > > > > For a more accurate downtime test, replace 'ip link set vf .. vlan ...' > > to unbind VF from the original driver and bind it to vfio-pci. > Yup. The only issue that I'm not sure on how to deal with, is how to listen to the packets on the vf. How can I make sure that they are arriving there? > > > > > > Regards, > > -Siwei > > > > > > > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote: > > >>>> I agree. But a single flag is not much of an extension. We don't even > > >>>> need it in netlink, can be anywhere in e.g. sysfs. > > >>> I think sysfs attribute is for exposing the capability, while you still need > > >>> to set up macvtap with some special mode via netlink. That way it doesn't > > >>> break current behavior, and when VF's MAC filter is added macvtap would need > > >>> to react to remove the filter from NIC. And add the one back when VF's MAC > > >>> is removed. > > >> All this will be up to the developers actually working on it. My > > >> understanding is that intel is going to just change the behaviour > > >> unconditionally, and it's already the case for Mellanox. > > >> That creates a critical mass large enough that maybe others > > >> just need to confirm. > > >> > > >> ... > > >> > > >> > > >>>> Meanwhile what's missing and was missing all along for the change you > > >>>> seem to be advocating for to get off the ground is people who > > >>>> are ready to actually send e.g. spec, guest driver, test patches. > > >>> Partly because it hadn't been converged to the best way to do it (even the > > >>> group ID mechanism with PCI bridge can address our need you don't seem to > > >>> think it is valuable). The in-kernel approach is fine at its appearance, but > > >>> I personally don't believe changing every legacy driver is the way to go. > > >>> It's the choice of implementation and what has been implemented in those > > >>> drivers today IMHO is nothing wrong. > > >> It's not a question of being wrong as such. > > >> A standard behaviour is clearly better than each driver doing its > > >> own thing which is the case now. As long as we ar standardizing, > > >> let's standardize on something that matches our needs? > > >> But I really see no problem with also supporting other options, > > >> as long as someone is prepared to actually put in the work. > > >> > > >> > > >>>>>> Still this assumes just creating a VF > > >>>>>> doesn't yet program the on-card filter to cause packet drops. > > >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need > > >>>>> to evacuate the filter programmed by macvtap previously when VF's filter > > >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux > > >>>>> guest). That's what we and NetVSC call as "datapath switching", and where > > >>>>> this could be handled (driver, net core, or userspace) is the core for the > > >>>>> architectural design that I spent much time on. > > >>>>> > > >>>>> Having said it, I don't expect or would desperately wait on one vendor to > > >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done > > >>>>> on that. > > >>>> Then that device can't be used with the mechanism in question. > > >>>> Or if there are lots of drivers like this maybe someone will be > > >>>> motivated enough to post a better implementation with a new > > >>>> feature bit. It's not that I'm arguing against that. > > >>>> > > >>>> But given the options of teaching management to play with > > >>>> netlink API in response to guest actions, and with VCPU stopped, > > >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel > > >>>> changes. > > >>> We have some internal patches that leverage management to respond to various > > >>> guest actions. If you're interested we can post them. The thing is no one > > >>> would like to work on the libvirt changes, while internally we have our own > > >>> orchestration software which is not libvirt. But if you think it's fine we > > >>> can definitely share our QEMU patches while leaving out libvirt. > > >>> > > >>> Thanks, > > >>> -Siwei > > >> Sure, why not. > > >> > > >> The following is generally necessary for any virtio project to happen: > > >> - guest patches > > >> - qemu patches > > >> - spec documentation > > >> > > >> Some extras are sometimes a dependency, e.g. host kernel patches. > > >> > > >> > > >> Typically at least two of these are enough for people to > > >> be able to figure out how things work. > > >> > > >> > > >> > > >> > > >>>>> If you'd go the way, please make sure Intel could change their > > >>>>> driver first. > > >>>> We'll see what happens with that. It's Sridhar from intel that implemented > > >>>> the guest changes after all, so I expect he's motivated to make them > > >>>> work well. > > >>>> > > >>>> > > >>>>>> Let's > > >>>>>> assume drivers are fixed to do that. How does userspace know > > >>>>>> that's the case? We might need some kind of attribute so > > >>>>>> userspace can detect it. > > >>>>> Where do you envision the new attribute could be at? Supposedly it'd be > > >>>>> exposed by the kernel, which constitutes a new API or API changes. > > >>>>> > > >>>>> > > >>>>> Thanks, > > >>>>> -Siwei > > >>>> People add e.g. new attributes in sysfs left and right. It's unlikely > > >>>> to be a matter of serious contention. > > >>>> > > >>>>>>>> Question is how does userspace know driver isn't broken in this respect? > > >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed? > > >>>>>>>> > > >>>> --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > >>>> > > > > > > > > > > > -- > Respectfully, > Sameeh Jubran > Linkedin > Software Engineer @ Daynix. -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-10 15:34 ` Sameeh Jubran @ 2018-12-10 17:46 ` Michael S. Tsirkin 2018-12-11 15:50 ` Sameeh Jubran 0 siblings, 1 reply; 85+ messages in thread From: Michael S. Tsirkin @ 2018-12-10 17:46 UTC (permalink / raw) To: Sameeh Jubran Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky On Mon, Dec 10, 2018 at 05:34:53PM +0200, Sameeh Jubran wrote: > On Mon, Dec 10, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote: > > > > On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote: > > > > > > > > > > > > On 12/05/2018 08:18 AM, Sameeh Jubran wrote: > > > > Hi all, > > > > > > > > This is a followup on the discussion in the DPDK and Virtio monthly meeting. > > > > > > > > Michael suggested that layer 2 tests should be created in order to > > > > test the PF/VF behavior in different scenarios without using VMs at > > > > all which should speed up the testing process. > > > > > > > > The following "mausezahn" tool - which is part of netsniff-ng package > > > > - can be used in order to generate layer 2 packets as follows: > > > > > > > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd" > > > > > > > > The packets can be sniffed using tcpdump or netsniff-ng. > > > Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default? > > > Try disable it when you monitor/capture the L2 packets. > > netsniff-ng enables promiscuous mode by default, however the -M flag > > can disable this. > > > > > > > > > > > > > I am not completely sure how the setup should look like on the host, > > > > but here is a script which assigns macvlan to the PF and sets it's mac > > > > address to be the same as the VF mac address. The scripts assumes that > > > > the sriov is already configured and the vf are present. > > > > > > > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh > > > > MACVLAN_NAME=macvlan0 > > > > PF_NAME=enp59s0 > > > > VF_NUMBER=1 > > > > MAC_ADDR=20:71:c6:2a:68:38 > > > > > > > > echo "$PF_NAME vf status before setting mac" > > > > ip link show dev $PF_NAME > > > > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR > > > > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan > > > > ip link set $PF_NAME up > > > > echo "$PF_NAME vf status after setting mac" > > > > ip link show dev $PF_NAME > > > > > > > > Please share your thoughts on how the different test scenarios should > > > > go, I can customize the scripts further more and host them somewhere. > > > You can do something like below: > > > > > > FAKE_VLAN=123 > > > ip link set $MACVLAN_NAME up > > > ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN > > > > > > Datapath now switched to macvlan0, which should get the L2 packets from > > > over the wire. > > > > > > ip link set $PF_NAME vf $VF_NUMBER vlan 0 > > > ip link set $MACVLAN_NAME down > > > > > > Datapath now switched back to VF. VF#1 should get packets. > > > > > > For a more accurate downtime test, replace 'ip link set vf .. vlan ...' > > > to unbind VF from the original driver and bind it to vfio-pci. > > Yup. > > The only issue that I'm not sure on how to deal with, is how to listen > to the packets on the vf. How can I make sure that they are arriving > there? Using --dev flag to bind to the vf device? > > > > > > > > > Regards, > > > -Siwei > > > > > > > > > > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote: > > > >>>> I agree. But a single flag is not much of an extension. We don't even > > > >>>> need it in netlink, can be anywhere in e.g. sysfs. > > > >>> I think sysfs attribute is for exposing the capability, while you still need > > > >>> to set up macvtap with some special mode via netlink. That way it doesn't > > > >>> break current behavior, and when VF's MAC filter is added macvtap would need > > > >>> to react to remove the filter from NIC. And add the one back when VF's MAC > > > >>> is removed. > > > >> All this will be up to the developers actually working on it. My > > > >> understanding is that intel is going to just change the behaviour > > > >> unconditionally, and it's already the case for Mellanox. > > > >> That creates a critical mass large enough that maybe others > > > >> just need to confirm. > > > >> > > > >> ... > > > >> > > > >> > > > >>>> Meanwhile what's missing and was missing all along for the change you > > > >>>> seem to be advocating for to get off the ground is people who > > > >>>> are ready to actually send e.g. spec, guest driver, test patches. > > > >>> Partly because it hadn't been converged to the best way to do it (even the > > > >>> group ID mechanism with PCI bridge can address our need you don't seem to > > > >>> think it is valuable). The in-kernel approach is fine at its appearance, but > > > >>> I personally don't believe changing every legacy driver is the way to go. > > > >>> It's the choice of implementation and what has been implemented in those > > > >>> drivers today IMHO is nothing wrong. > > > >> It's not a question of being wrong as such. > > > >> A standard behaviour is clearly better than each driver doing its > > > >> own thing which is the case now. As long as we ar standardizing, > > > >> let's standardize on something that matches our needs? > > > >> But I really see no problem with also supporting other options, > > > >> as long as someone is prepared to actually put in the work. > > > >> > > > >> > > > >>>>>> Still this assumes just creating a VF > > > >>>>>> doesn't yet program the on-card filter to cause packet drops. > > > >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need > > > >>>>> to evacuate the filter programmed by macvtap previously when VF's filter > > > >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux > > > >>>>> guest). That's what we and NetVSC call as "datapath switching", and where > > > >>>>> this could be handled (driver, net core, or userspace) is the core for the > > > >>>>> architectural design that I spent much time on. > > > >>>>> > > > >>>>> Having said it, I don't expect or would desperately wait on one vendor to > > > >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done > > > >>>>> on that. > > > >>>> Then that device can't be used with the mechanism in question. > > > >>>> Or if there are lots of drivers like this maybe someone will be > > > >>>> motivated enough to post a better implementation with a new > > > >>>> feature bit. It's not that I'm arguing against that. > > > >>>> > > > >>>> But given the options of teaching management to play with > > > >>>> netlink API in response to guest actions, and with VCPU stopped, > > > >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel > > > >>>> changes. > > > >>> We have some internal patches that leverage management to respond to various > > > >>> guest actions. If you're interested we can post them. The thing is no one > > > >>> would like to work on the libvirt changes, while internally we have our own > > > >>> orchestration software which is not libvirt. But if you think it's fine we > > > >>> can definitely share our QEMU patches while leaving out libvirt. > > > >>> > > > >>> Thanks, > > > >>> -Siwei > > > >> Sure, why not. > > > >> > > > >> The following is generally necessary for any virtio project to happen: > > > >> - guest patches > > > >> - qemu patches > > > >> - spec documentation > > > >> > > > >> Some extras are sometimes a dependency, e.g. host kernel patches. > > > >> > > > >> > > > >> Typically at least two of these are enough for people to > > > >> be able to figure out how things work. > > > >> > > > >> > > > >> > > > >> > > > >>>>> If you'd go the way, please make sure Intel could change their > > > >>>>> driver first. > > > >>>> We'll see what happens with that. It's Sridhar from intel that implemented > > > >>>> the guest changes after all, so I expect he's motivated to make them > > > >>>> work well. > > > >>>> > > > >>>> > > > >>>>>> Let's > > > >>>>>> assume drivers are fixed to do that. How does userspace know > > > >>>>>> that's the case? We might need some kind of attribute so > > > >>>>>> userspace can detect it. > > > >>>>> Where do you envision the new attribute could be at? Supposedly it'd be > > > >>>>> exposed by the kernel, which constitutes a new API or API changes. > > > >>>>> > > > >>>>> > > > >>>>> Thanks, > > > >>>>> -Siwei > > > >>>> People add e.g. new attributes in sysfs left and right. It's unlikely > > > >>>> to be a matter of serious contention. > > > >>>> > > > >>>>>>>> Question is how does userspace know driver isn't broken in this respect? > > > >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed? > > > >>>>>>>> > > > >>>> --------------------------------------------------------------------- > > > >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > >>>> > > > > > > > > > > > > > > > > > -- > > Respectfully, > > Sameeh Jubran > > Linkedin > > Software Engineer @ Daynix. > > > > -- > Respectfully, > Sameeh Jubran > Linkedin > Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature 2018-12-10 17:46 ` Michael S. Tsirkin @ 2018-12-11 15:50 ` Sameeh Jubran 0 siblings, 0 replies; 85+ messages in thread From: Sameeh Jubran @ 2018-12-11 15:50 UTC (permalink / raw) To: Michael S. Tsirkin Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky On Mon, Dec 10, 2018 at 7:46 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Dec 10, 2018 at 05:34:53PM +0200, Sameeh Jubran wrote: > > On Mon, Dec 10, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote: > > > > > > On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote: > > > > > > > > > > > > > > > > On 12/05/2018 08:18 AM, Sameeh Jubran wrote: > > > > > Hi all, > > > > > > > > > > This is a followup on the discussion in the DPDK and Virtio monthly meeting. > > > > > > > > > > Michael suggested that layer 2 tests should be created in order to > > > > > test the PF/VF behavior in different scenarios without using VMs at > > > > > all which should speed up the testing process. > > > > > > > > > > The following "mausezahn" tool - which is part of netsniff-ng package > > > > > - can be used in order to generate layer 2 packets as follows: > > > > > > > > > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd" > > > > > > > > > > The packets can be sniffed using tcpdump or netsniff-ng. > > > > Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default? > > > > Try disable it when you monitor/capture the L2 packets. > > > netsniff-ng enables promiscuous mode by default, however the -M flag > > > can disable this. > > > > > > > > > > > > > > > > > I am not completely sure how the setup should look like on the host, > > > > > but here is a script which assigns macvlan to the PF and sets it's mac > > > > > address to be the same as the VF mac address. The scripts assumes that > > > > > the sriov is already configured and the vf are present. > > > > > > > > > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh > > > > > MACVLAN_NAME=macvlan0 > > > > > PF_NAME=enp59s0 > > > > > VF_NUMBER=1 > > > > > MAC_ADDR=20:71:c6:2a:68:38 > > > > > > > > > > echo "$PF_NAME vf status before setting mac" > > > > > ip link show dev $PF_NAME > > > > > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR > > > > > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan > > > > > ip link set $PF_NAME up > > > > > echo "$PF_NAME vf status after setting mac" > > > > > ip link show dev $PF_NAME > > > > > > > > > > Please share your thoughts on how the different test scenarios should > > > > > go, I can customize the scripts further more and host them somewhere. > > > > You can do something like below: > > > > > > > > FAKE_VLAN=123 > > > > ip link set $MACVLAN_NAME up > > > > ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN > > > > > > > > Datapath now switched to macvlan0, which should get the L2 packets from > > > > over the wire. > > > > > > > > ip link set $PF_NAME vf $VF_NUMBER vlan 0 > > > > ip link set $MACVLAN_NAME down > > > > > > > > Datapath now switched back to VF. VF#1 should get packets. > > > > > > > > For a more accurate downtime test, replace 'ip link set vf .. vlan ...' > > > > to unbind VF from the original driver and bind it to vfio-pci. > > > Yup. > > > > The only issue that I'm not sure on how to deal with, is how to listen > > to the packets on the vf. How can I make sure that they are arriving > > there? > > Using --dev flag to bind to the vf device? Nope this doesn't work since there is no vf interface on the host. I have tried to specify the vf's device address as well but it doesn't seem to work too. > > > > > > > > > > > > > > Regards, > > > > -Siwei > > > > > > > > > > > > > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote: > > > > >>>> I agree. But a single flag is not much of an extension. We don't even > > > > >>>> need it in netlink, can be anywhere in e.g. sysfs. > > > > >>> I think sysfs attribute is for exposing the capability, while you still need > > > > >>> to set up macvtap with some special mode via netlink. That way it doesn't > > > > >>> break current behavior, and when VF's MAC filter is added macvtap would need > > > > >>> to react to remove the filter from NIC. And add the one back when VF's MAC > > > > >>> is removed. > > > > >> All this will be up to the developers actually working on it. My > > > > >> understanding is that intel is going to just change the behaviour > > > > >> unconditionally, and it's already the case for Mellanox. > > > > >> That creates a critical mass large enough that maybe others > > > > >> just need to confirm. > > > > >> > > > > >> ... > > > > >> > > > > >> > > > > >>>> Meanwhile what's missing and was missing all along for the change you > > > > >>>> seem to be advocating for to get off the ground is people who > > > > >>>> are ready to actually send e.g. spec, guest driver, test patches. > > > > >>> Partly because it hadn't been converged to the best way to do it (even the > > > > >>> group ID mechanism with PCI bridge can address our need you don't seem to > > > > >>> think it is valuable). The in-kernel approach is fine at its appearance, but > > > > >>> I personally don't believe changing every legacy driver is the way to go. > > > > >>> It's the choice of implementation and what has been implemented in those > > > > >>> drivers today IMHO is nothing wrong. > > > > >> It's not a question of being wrong as such. > > > > >> A standard behaviour is clearly better than each driver doing its > > > > >> own thing which is the case now. As long as we ar standardizing, > > > > >> let's standardize on something that matches our needs? > > > > >> But I really see no problem with also supporting other options, > > > > >> as long as someone is prepared to actually put in the work. > > > > >> > > > > >> > > > > >>>>>> Still this assumes just creating a VF > > > > >>>>>> doesn't yet program the on-card filter to cause packet drops. > > > > >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need > > > > >>>>> to evacuate the filter programmed by macvtap previously when VF's filter > > > > >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux > > > > >>>>> guest). That's what we and NetVSC call as "datapath switching", and where > > > > >>>>> this could be handled (driver, net core, or userspace) is the core for the > > > > >>>>> architectural design that I spent much time on. > > > > >>>>> > > > > >>>>> Having said it, I don't expect or would desperately wait on one vendor to > > > > >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done > > > > >>>>> on that. > > > > >>>> Then that device can't be used with the mechanism in question. > > > > >>>> Or if there are lots of drivers like this maybe someone will be > > > > >>>> motivated enough to post a better implementation with a new > > > > >>>> feature bit. It's not that I'm arguing against that. > > > > >>>> > > > > >>>> But given the options of teaching management to play with > > > > >>>> netlink API in response to guest actions, and with VCPU stopped, > > > > >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel > > > > >>>> changes. > > > > >>> We have some internal patches that leverage management to respond to various > > > > >>> guest actions. If you're interested we can post them. The thing is no one > > > > >>> would like to work on the libvirt changes, while internally we have our own > > > > >>> orchestration software which is not libvirt. But if you think it's fine we > > > > >>> can definitely share our QEMU patches while leaving out libvirt. > > > > >>> > > > > >>> Thanks, > > > > >>> -Siwei > > > > >> Sure, why not. > > > > >> > > > > >> The following is generally necessary for any virtio project to happen: > > > > >> - guest patches > > > > >> - qemu patches > > > > >> - spec documentation > > > > >> > > > > >> Some extras are sometimes a dependency, e.g. host kernel patches. > > > > >> > > > > >> > > > > >> Typically at least two of these are enough for people to > > > > >> be able to figure out how things work. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >>>>> If you'd go the way, please make sure Intel could change their > > > > >>>>> driver first. > > > > >>>> We'll see what happens with that. It's Sridhar from intel that implemented > > > > >>>> the guest changes after all, so I expect he's motivated to make them > > > > >>>> work well. > > > > >>>> > > > > >>>> > > > > >>>>>> Let's > > > > >>>>>> assume drivers are fixed to do that. How does userspace know > > > > >>>>>> that's the case? We might need some kind of attribute so > > > > >>>>>> userspace can detect it. > > > > >>>>> Where do you envision the new attribute could be at? Supposedly it'd be > > > > >>>>> exposed by the kernel, which constitutes a new API or API changes. > > > > >>>>> > > > > >>>>> > > > > >>>>> Thanks, > > > > >>>>> -Siwei > > > > >>>> People add e.g. new attributes in sysfs left and right. It's unlikely > > > > >>>> to be a matter of serious contention. > > > > >>>> > > > > >>>>>>>> Question is how does userspace know driver isn't broken in this respect? > > > > >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed? > > > > >>>>>>>> > > > > >>>> --------------------------------------------------------------------- > > > > >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > > >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > -- > > > Respectfully, > > > Sameeh Jubran > > > Linkedin > > > Software Engineer @ Daynix. > > > > > > > > -- > > Respectfully, > > Sameeh Jubran > > Linkedin > > Software Engineer @ Daynix. -- Respectfully, Sameeh Jubran Linkedin Software Engineer @ Daynix. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 85+ messages in thread
end of thread, other threads:[~2018-12-11 15:51 UTC | newest] Thread overview: 85+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-15 18:49 [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature Sridhar Samudrala 2018-08-27 8:40 ` [virtio-dev] " Cornelia Huck 2018-08-27 12:34 ` Michael S. Tsirkin 2018-08-27 16:50 ` Samudrala, Sridhar 2018-08-28 12:13 ` Michael S. Tsirkin 2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin 2018-09-12 15:17 ` Samudrala, Sridhar 2018-09-12 15:22 ` Michael S. Tsirkin 2018-09-18 10:20 ` Cornelia Huck 2018-09-18 10:37 ` Sameeh Jubran 2018-09-18 13:25 ` Michael S. Tsirkin 2018-09-18 18:30 ` Siwei Liu 2018-09-18 18:39 ` Michael S. Tsirkin 2018-09-18 19:10 ` Siwei Liu 2018-09-20 3:04 ` Michael S. Tsirkin 2018-09-19 5:03 ` Samudrala, Sridhar 2018-09-20 5:51 ` Sameeh Jubran 2018-09-18 13:35 ` Michael S. Tsirkin 2018-09-18 15:13 ` Venu Busireddy 2018-09-18 15:31 ` Michael S. Tsirkin 2018-09-18 18:48 ` Siwei Liu 2018-09-20 3:11 ` Michael S. Tsirkin 2018-09-20 23:57 ` Siwei Liu 2018-09-21 2:23 ` Michael S. Tsirkin 2018-09-21 2:34 ` Michael S. Tsirkin 2018-09-27 0:18 ` Siwei Liu 2018-09-27 7:17 ` Sameeh Jubran 2018-09-27 16:17 ` Michael S. Tsirkin 2018-09-27 17:23 ` Samudrala, Sridhar 2018-09-27 23:45 ` Michael S. Tsirkin 2018-09-30 9:17 ` Sameeh Jubran 2018-09-30 13:50 ` Sameeh Jubran 2018-09-27 16:32 ` Michael S. Tsirkin 2018-10-02 8:42 ` Siwei Liu 2018-10-02 12:43 ` Michael S. Tsirkin 2018-10-05 0:03 ` Siwei Liu 2018-10-05 5:17 ` Samudrala, Sridhar 2018-10-10 14:40 ` Michael S. Tsirkin 2018-10-11 0:16 ` Samudrala, Sridhar 2018-10-05 19:18 ` Michael S. Tsirkin 2018-10-08 22:06 ` Sameeh Jubran 2018-10-10 14:43 ` Michael S. Tsirkin 2018-10-11 1:26 ` Siwei Liu 2018-10-18 23:20 ` Siwei Liu 2018-10-18 23:40 ` Michael S. Tsirkin 2018-10-19 3:45 ` Michael S. Tsirkin 2018-11-21 15:39 ` Sameeh Jubran 2018-11-21 18:41 ` Michael S. Tsirkin 2018-11-21 20:04 ` Sameeh Jubran 2018-11-21 23:51 ` Samudrala, Sridhar 2018-11-22 13:55 ` Sameeh Jubran 2018-11-22 18:27 ` Michael S. Tsirkin 2018-11-26 15:13 ` Sameeh Jubran 2018-11-26 15:43 ` Sameeh Jubran 2018-11-26 20:22 ` Samudrala, Sridhar 2018-11-27 11:24 ` Sameeh Jubran 2018-11-28 17:08 ` Michael S. Tsirkin 2018-11-28 17:31 ` Samudrala, Sridhar 2018-11-28 17:35 ` Michael S. Tsirkin 2018-11-28 18:39 ` Samudrala, Sridhar 2018-11-28 18:51 ` Michael S. Tsirkin 2018-11-29 6:29 ` Samudrala, Sridhar 2018-11-28 20:06 ` Michael S. Tsirkin 2018-11-28 20:28 ` si-wei liu 2018-11-28 20:43 ` Michael S. Tsirkin 2018-11-28 20:47 ` si-wei liu 2018-11-29 1:15 ` Michael S. Tsirkin 2018-11-29 6:37 ` Samudrala, Sridhar 2018-11-29 20:14 ` si-wei liu 2018-11-29 21:17 ` Michael S. Tsirkin 2018-11-29 22:53 ` si-wei liu 2018-11-29 23:53 ` Samudrala, Sridhar 2018-11-30 0:24 ` si-wei liu 2018-11-30 3:08 ` Samudrala, Sridhar 2018-11-30 4:46 ` si-wei liu 2018-11-30 6:21 ` Michael S. Tsirkin 2018-12-04 2:09 ` si-wei liu 2018-12-04 3:59 ` Michael S. Tsirkin 2018-12-05 16:18 ` Sameeh Jubran 2018-12-05 17:18 ` Michael S. Tsirkin 2018-12-08 1:54 ` si-wei liu 2018-12-10 15:13 ` Sameeh Jubran 2018-12-10 15:34 ` Sameeh Jubran 2018-12-10 17:46 ` Michael S. Tsirkin 2018-12-11 15:50 ` Sameeh Jubran
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.