[PATCH net-next 0/2] Enable virtio to act as a master for a passthru device

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
@ 2018-01-03  0:35 Sridhar Samudrala
  2018-01-03  0:35 ` [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit Sridhar Samudrala
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Sridhar Samudrala @ 2018-01-03  0:35 UTC (permalink / raw)
  To: mst, stephen, netdev, virtualization, virtio-dev,
	jesse.brandeburg, sridhar.samudrala

This patch series enables virtio to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration of a VM
with a direct attached VF without the need to setup a bond/team between a
VF and virtio net device in the guest.

The hypervisor needs to unplug the VF device from the guest on the source
host and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed, the
destination hypervisor sets the MAC filter on the VF and plugs it back to
the guest to switch over to VF datapath.

It is based on netvsc implementation and it may be possible to make this code 
generic and move it to a common location that can be shared by netvsc and virtio.

This patch series is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

Sridhar Samudrala (2):
  virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit
  virtio_net: Extend virtio to use VF datapath when available

 drivers/net/virtio_net.c        | 308 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/virtio_net.h |   1 +
 2 files changed, 306 insertions(+), 3 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit
  2018-01-03  0:35 [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Sridhar Samudrala
@ 2018-01-03  0:35 ` Sridhar Samudrala
  2018-01-03 16:36   ` David Miller
  2018-01-03 17:43   ` Michael S. Tsirkin
  2018-01-03  0:35 ` [PATCH net-next 2/2] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
  2018-01-03  2:16 ` [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Jakub Kicinski
  2 siblings, 2 replies; 12+ messages in thread
From: Sridhar Samudrala @ 2018-01-03  0:35 UTC (permalink / raw)
  To: mst, stephen, netdev, virtualization, virtio-dev,
	jesse.brandeburg, sridhar.samudrala

This feature bit can be used by hypervisor to indicate virtio_net device to
act as a master for a directly attached passthru device with the same MAC
address.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/virtio_net.c        | 3 ++-
 include/uapi/linux/virtio_net.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6fb7b658a6cc..46844a1d9a62 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2796,7 +2796,8 @@ static struct virtio_device_id id_table[] = {
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
-	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
+	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
+	VIRTIO_NET_F_MASTER
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index fc353b518288..a9b4e0836786 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -56,6 +56,7 @@
 #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
+#define VIRTIO_NET_F_MASTER	62	/* act as master for a VF device */
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/2] virtio_net: Extend virtio to use VF datapath when available
  2018-01-03  0:35 [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Sridhar Samudrala
  2018-01-03  0:35 ` [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit Sridhar Samudrala
@ 2018-01-03  0:35 ` Sridhar Samudrala
  2018-01-03  2:16 ` [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Jakub Kicinski
  2 siblings, 0 replies; 12+ messages in thread
From: Sridhar Samudrala @ 2018-01-03  0:35 UTC (permalink / raw)
  To: mst, stephen, netdev, virtualization, virtio-dev,
	jesse.brandeburg, sridhar.samudrala

This patch enables virtio to switch over to a VF datapath when a VF netdev
is present with the same MAC address.  It allows live migration of a VM
with a direct attached VF without the need to setup a bond/team between a
VF and virtio net device in the guest.

The hypervisor needs to unplug the VF device from the guest on the source
host and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed, the
destination hypervisor sets the MAC filter on the VF and plugs it back to
the guest to switch over to VF datapath.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
 drivers/net/virtio_net.c | 305 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 303 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 46844a1d9a62..6516dcb55b7d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -31,6 +31,8 @@
 #include <linux/average.h>
 #include <linux/filter.h>
 #include <net/route.h>
+#include <linux/netdevice.h>
+#include <linux/netpoll.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -117,6 +119,15 @@ struct receive_queue {
 	char name[40];
 };
 
+struct virtnet_vf_pcpu_stats {
+	u64	rx_packets;
+	u64	rx_bytes;
+	u64	tx_packets;
+	u64	tx_bytes;
+	struct u64_stats_sync   syncp;
+	u32	tx_dropped;
+};
+
 struct virtnet_info {
 	struct virtio_device *vdev;
 	struct virtqueue *cvq;
@@ -179,6 +190,10 @@ struct virtnet_info {
 	u32 speed;
 
 	unsigned long guest_offloads;
+
+	/* State to manage the associated VF interface. */
+	struct net_device __rcu *vf_netdev;
+	struct virtnet_vf_pcpu_stats __percpu *vf_stats;
 };
 
 struct padded_vnet_hdr {
@@ -1303,16 +1318,51 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
 }
 
+/* Send skb on the slave VF device. */
+static int virtnet_vf_xmit(struct net_device *dev, struct net_device *vf_netdev,
+			   struct sk_buff *skb)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	unsigned int len = skb->len;
+	int rc;
+
+	skb->dev = vf_netdev;
+	skb->queue_mapping = qdisc_skb_cb(skb)->slave_dev_queue_mapping;
+
+	rc = dev_queue_xmit(skb);
+	if (likely(rc == NET_XMIT_SUCCESS || rc == NET_XMIT_CN)) {
+		struct virtnet_vf_pcpu_stats *pcpu_stats
+			= this_cpu_ptr(vi->vf_stats);
+
+		u64_stats_update_begin(&pcpu_stats->syncp);
+		pcpu_stats->tx_packets++;
+		pcpu_stats->tx_bytes += len;
+		u64_stats_update_end(&pcpu_stats->syncp);
+	} else {
+		this_cpu_inc(vi->vf_stats->tx_dropped);
+	}
+
+	return rc;
+}
+
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
+	struct net_device *vf_netdev;
 	int err;
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
 	bool use_napi = sq->napi.weight;
 
+	/* If VF is present and up then redirect packets
+	 * called with rcu_read_lock_bh
+	 */
+	vf_netdev = rcu_dereference_bh(vi->vf_netdev);
+	if (vf_netdev && netif_running(vf_netdev) && !netpoll_tx_running(dev))
+		return virtnet_vf_xmit(dev, vf_netdev, skb);
+
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(sq);
 
@@ -1459,10 +1509,41 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
 	return ret;
 }
 
+static void virtnet_get_vf_stats(struct net_device *dev,
+				 struct virtnet_vf_pcpu_stats *tot)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int i;
+
+	memset(tot, 0, sizeof(*tot));
+
+	for_each_possible_cpu(i) {
+		const struct virtnet_vf_pcpu_stats *stats
+				= per_cpu_ptr(vi->vf_stats, i);
+		u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
+		unsigned int start;
+
+		do {
+			start = u64_stats_fetch_begin_irq(&stats->syncp);
+			rx_packets = stats->rx_packets;
+			tx_packets = stats->tx_packets;
+			rx_bytes = stats->rx_bytes;
+			tx_bytes = stats->tx_bytes;
+		} while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+		tot->rx_packets += rx_packets;
+		tot->tx_packets += tx_packets;
+		tot->rx_bytes   += rx_bytes;
+		tot->tx_bytes   += tx_bytes;
+		tot->tx_dropped += stats->tx_dropped;
+	}
+}
+
 static void virtnet_stats(struct net_device *dev,
 			  struct rtnl_link_stats64 *tot)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	struct virtnet_vf_pcpu_stats vf_stats;
 	int cpu;
 	unsigned int start;
 
@@ -1493,6 +1574,13 @@ static void virtnet_stats(struct net_device *dev,
 	tot->rx_dropped = dev->stats.rx_dropped;
 	tot->rx_length_errors = dev->stats.rx_length_errors;
 	tot->rx_frame_errors = dev->stats.rx_frame_errors;
+
+	virtnet_get_vf_stats(dev, &vf_stats);
+	tot->rx_packets += vf_stats.rx_packets;
+	tot->tx_packets += vf_stats.tx_packets;
+	tot->rx_bytes += vf_stats.rx_bytes;
+	tot->tx_bytes += vf_stats.tx_bytes;
+	tot->tx_dropped += vf_stats.tx_dropped;
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -2604,6 +2692,13 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_MASTER)) {
+		vi->vf_stats =
+			netdev_alloc_pcpu_stats(struct virtnet_vf_pcpu_stats);
+		if (!vi->vf_stats)
+			goto free_stats;
+	}
+
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
@@ -2637,7 +2732,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 			 */
 			dev_err(&vdev->dev, "device MTU appears to have changed "
 				"it is now %d < %d", mtu, dev->min_mtu);
-			goto free_stats;
+			goto free_vf_stats;
 		}
 
 		dev->mtu = mtu;
@@ -2661,7 +2756,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
 	err = init_vqs(vi);
 	if (err)
-		goto free_stats;
+		goto free_vf_stats;
 
 #ifdef CONFIG_SYSFS
 	if (vi->mergeable_rx_bufs)
@@ -2715,6 +2810,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	cancel_delayed_work_sync(&vi->refill);
 	free_receive_page_frags(vi);
 	virtnet_del_vqs(vi);
+free_vf_stats:
+	free_percpu(vi->vf_stats);
 free_stats:
 	free_percpu(vi->stats);
 free:
@@ -2736,19 +2833,184 @@ static void remove_vq_common(struct virtnet_info *vi)
 	virtnet_del_vqs(vi);
 }
 
+static struct net_device *get_virtio_bymac(const u8 *mac)
+{
+	struct net_device *dev;
+
+	ASSERT_RTNL();
+
+	for_each_netdev(&init_net, dev) {
+		if (dev->netdev_ops != &virtnet_netdev)
+			continue;       /* not a virtio_net device */
+
+		if (ether_addr_equal(mac, dev->perm_addr))
+			return dev;
+	}
+
+	return NULL;
+}
+
+static struct net_device *get_virtio_byref(struct net_device *vf_netdev)
+{
+	struct net_device *dev;
+
+	ASSERT_RTNL();
+
+	for_each_netdev(&init_net, dev) {
+		struct virtnet_info *vi;
+
+		if (dev->netdev_ops != &virtnet_netdev)
+			continue;	/* not a virtio_net device */
+
+		vi = netdev_priv(dev);
+		if (rtnl_dereference(vi->vf_netdev) == vf_netdev)
+			return dev;	/* a match */
+	}
+
+	return NULL;
+}
+
+/* Called when VF is injecting data into network stack.
+ * Change the associated network device from VF to virtio.
+ * note: already called with rcu_read_lock
+ */
+static rx_handler_result_t virtnet_vf_handle_frame(struct sk_buff **pskb)
+{
+	struct sk_buff *skb = *pskb;
+	struct net_device *ndev = rcu_dereference(skb->dev->rx_handler_data);
+	struct virtnet_info *vi = netdev_priv(ndev);
+	struct virtnet_vf_pcpu_stats *pcpu_stats =
+				this_cpu_ptr(vi->vf_stats);
+
+	skb->dev = ndev;
+
+	u64_stats_update_begin(&pcpu_stats->syncp);
+	pcpu_stats->rx_packets++;
+	pcpu_stats->rx_bytes += skb->len;
+	u64_stats_update_end(&pcpu_stats->syncp);
+
+	return RX_HANDLER_ANOTHER;
+}
+
+static int virtnet_vf_join(struct net_device *vf_netdev,
+			   struct net_device *ndev)
+{
+	int ret;
+
+	ret = netdev_rx_handler_register(vf_netdev,
+					 virtnet_vf_handle_frame, ndev);
+	if (ret != 0) {
+		netdev_err(vf_netdev,
+			   "can not register virtio VF receive handler (err = %d)\n",
+			   ret);
+		goto rx_handler_failed;
+	}
+
+	ret = netdev_upper_dev_link(vf_netdev, ndev, NULL);
+	if (ret != 0) {
+		netdev_err(vf_netdev,
+			   "can not set master device %s (err = %d)\n",
+			   ndev->name, ret);
+		goto upper_link_failed;
+	}
+
+	vf_netdev->flags |= IFF_SLAVE;
+
+	/* Align MTU of VF with master */
+	ret = dev_set_mtu(vf_netdev, ndev->mtu);
+	if (ret)
+		netdev_warn(vf_netdev,
+			    "unable to change mtu to %u\n", ndev->mtu);
+
+	call_netdevice_notifiers(NETDEV_JOIN, vf_netdev);
+
+	netdev_info(vf_netdev, "joined to %s\n", ndev->name);
+	return 0;
+
+upper_link_failed:
+	netdev_rx_handler_unregister(vf_netdev);
+rx_handler_failed:
+	return ret;
+}
+
+static int virtnet_register_vf(struct net_device *vf_netdev)
+{
+	struct net_device *ndev;
+	struct virtnet_info *vi;
+
+	if (vf_netdev->addr_len != ETH_ALEN)
+		return NOTIFY_DONE;
+
+	/* We will use the MAC address to locate the virtio_net interface to
+	 * associate with the VF interface. If we don't find a matching
+	 * virtio interface, move on.
+	 */
+	ndev = get_virtio_bymac(vf_netdev->perm_addr);
+	if (!ndev)
+		return NOTIFY_DONE;
+
+	vi = netdev_priv(ndev);
+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_MASTER))
+		return NOTIFY_DONE;
+
+	if (rtnl_dereference(vi->vf_netdev))
+		return NOTIFY_DONE;
+
+	if (virtnet_vf_join(vf_netdev, ndev) != 0)
+		return NOTIFY_DONE;
+
+	netdev_info(ndev, "VF registering %s\n", vf_netdev->name);
+
+	dev_hold(vf_netdev);
+	rcu_assign_pointer(vi->vf_netdev, vf_netdev);
+
+	return NOTIFY_OK;
+}
+
+static int virtnet_unregister_vf(struct net_device *vf_netdev)
+{
+	struct net_device *ndev;
+	struct virtnet_info *vi;
+
+	ndev = get_virtio_byref(vf_netdev);
+	if (!ndev)
+		return NOTIFY_DONE;
+
+	vi = netdev_priv(ndev);
+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_MASTER))
+		return NOTIFY_DONE;
+
+	netdev_info(ndev, "VF unregistering %s\n", vf_netdev->name);
+
+	netdev_rx_handler_unregister(vf_netdev);
+	netdev_upper_dev_unlink(vf_netdev, ndev);
+	RCU_INIT_POINTER(vi->vf_netdev, NULL);
+	dev_put(vf_netdev);
+
+	return NOTIFY_OK;
+}
+
 static void virtnet_remove(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
+	struct net_device *vf_netdev;
 
 	virtnet_cpu_notif_remove(vi);
 
 	/* Make sure no work handler is accessing the device. */
 	flush_work(&vi->config_work);
 
+	rtnl_lock();
+	vf_netdev = rtnl_dereference(vi->vf_netdev);
+	if (vf_netdev)
+		virtnet_unregister_vf(vf_netdev);
+	rtnl_unlock();
+
 	unregister_netdev(vi->dev);
 
 	remove_vq_common(vi);
 
+	free_percpu(vi->vf_stats);
 	free_percpu(vi->stats);
 	free_netdev(vi->dev);
 }
@@ -2827,6 +3089,42 @@ static struct virtio_driver virtio_net_driver = {
 #endif
 };
 
+static int virtio_netdev_event(struct notifier_block *this,
+			       unsigned long event, void *ptr)
+{
+	struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
+
+	/* Skip our own events */
+	if (event_dev->netdev_ops == &virtnet_netdev)
+		return NOTIFY_DONE;
+
+	/* Avoid non-Ethernet type devices */
+	if (event_dev->type != ARPHRD_ETHER)
+		return NOTIFY_DONE;
+
+	/* Avoid Vlan dev with same MAC registering as VF */
+	if (is_vlan_dev(event_dev))
+		return NOTIFY_DONE;
+
+	/* Avoid Bonding master dev with same MAC registering as VF */
+	if ((event_dev->priv_flags & IFF_BONDING) &&
+	    (event_dev->flags & IFF_MASTER))
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_REGISTER:
+		return virtnet_register_vf(event_dev);
+	case NETDEV_UNREGISTER:
+		return virtnet_unregister_vf(event_dev);
+	default:
+		return NOTIFY_DONE;
+	}
+}
+
+static struct notifier_block virtio_netdev_notifier = {
+	.notifier_call = virtio_netdev_event,
+};
+
 static __init int virtio_net_driver_init(void)
 {
 	int ret;
@@ -2845,6 +3143,8 @@ static __init int virtio_net_driver_init(void)
         ret = register_virtio_driver(&virtio_net_driver);
 	if (ret)
 		goto err_virtio;
+
+	register_netdevice_notifier(&virtio_netdev_notifier);
 	return 0;
 err_virtio:
 	cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
@@ -2857,6 +3157,7 @@ module_init(virtio_net_driver_init);
 
 static __exit void virtio_net_driver_exit(void)
 {
+	unregister_netdevice_notifier(&virtio_netdev_notifier);
 	unregister_virtio_driver(&virtio_net_driver);
 	cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
 	cpuhp_remove_multi_state(virtionet_online);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
  2018-01-03  0:35 [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Sridhar Samudrala
  2018-01-03  0:35 ` [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit Sridhar Samudrala
  2018-01-03  0:35 ` [PATCH net-next 2/2] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
@ 2018-01-03  2:16 ` Jakub Kicinski
  2018-01-03 16:59   ` Alexander Duyck
  2 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2018-01-03  2:16 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: Sridhar Samudrala, mst, stephen, netdev, virtualization,
	virtio-dev, Alexander Duyck

On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
> This patch series enables virtio to switch over to a VF datapath when a VF
> netdev is present with the same MAC address. It allows live migration of a VM
> with a direct attached VF without the need to setup a bond/team between a
> VF and virtio net device in the guest.
> 
> The hypervisor needs to unplug the VF device from the guest on the source
> host and reset the MAC filter of the VF to initiate failover of datapath to
> virtio before starting the migration. After the migration is completed, the
> destination hypervisor sets the MAC filter on the VF and plugs it back to
> the guest to switch over to VF datapath.
> 
> It is based on netvsc implementation and it may be possible to make this code 
> generic and move it to a common location that can be shared by netvsc and virtio.
> 
> This patch series is based on the discussion initiated by Jesse on this thread.
> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

How does the notion of a device which is both a bond and a leg of a
bond fit with Alex's recent discussions about feature propagation?
Which propagation rules will apply to VirtIO master?  Meaning of the
flags on a software upper device may be different.  Why muddy the
architecture like this and not introduce a synthetic bond device?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit
  2018-01-03  0:35 ` [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit Sridhar Samudrala
@ 2018-01-03 16:36   ` David Miller
  2018-01-03 17:43   ` Michael S. Tsirkin
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2018-01-03 16:36 UTC (permalink / raw)
  To: sridhar.samudrala
  Cc: mst, stephen, netdev, virtualization, virtio-dev,
	jesse.brandeburg

From: Sridhar Samudrala <sridhar.samudrala@intel.com>
Date: Tue,  2 Jan 2018 16:35:37 -0800

> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index fc353b518288..a9b4e0836786 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -56,6 +56,7 @@
>  #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
>  					 * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
> +#define VIRTIO_NET_F_MASTER	62	/* act as master for a VF device */

Please respin this series with something in this commit message which
explains why we are using "62" rather than something like "24" here.

Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
  2018-01-03  2:16 ` [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Jakub Kicinski
@ 2018-01-03 16:59   ` Alexander Duyck
  2018-01-03 18:14     ` Samudrala, Sridhar
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Duyck @ 2018-01-03 16:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Brandeburg, Jesse, Sridhar Samudrala, Michael S. Tsirkin,
	Stephen Hemminger, Netdev, virtualization, virtio-dev,
	Alexander Duyck

On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>> This patch series enables virtio to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration of a VM
>> with a direct attached VF without the need to setup a bond/team between a
>> VF and virtio net device in the guest.
>>
>> The hypervisor needs to unplug the VF device from the guest on the source
>> host and reset the MAC filter of the VF to initiate failover of datapath to
>> virtio before starting the migration. After the migration is completed, the
>> destination hypervisor sets the MAC filter on the VF and plugs it back to
>> the guest to switch over to VF datapath.
>>
>> It is based on netvsc implementation and it may be possible to make this code
>> generic and move it to a common location that can be shared by netvsc and virtio.
>>
>> This patch series is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>
> How does the notion of a device which is both a bond and a leg of a
> bond fit with Alex's recent discussions about feature propagation?
> Which propagation rules will apply to VirtIO master?  Meaning of the
> flags on a software upper device may be different.  Why muddy the
> architecture like this and not introduce a synthetic bond device?

It doesn't really fit with the notion I had. I think there may have
been a bit of a disconnect as I have been out for the last week or so
for the holidays.

My thought on this was that the feature bit should be spawning a new
para-virtual bond device and that bond should have the virto and the
VF as slaves. Also I thought there was some discussion about trying to
reuse as much of the netvsc code as possible for this so that we could
avoid duplication of effort and have the two drivers use the same
approach. It seems like it should be pretty straight forward since you
would have the feature bit in the case of virto, and netvsc just does
this sort of thing by default if I am not mistaken.

- Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit
  2018-01-03  0:35 ` [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit Sridhar Samudrala
  2018-01-03 16:36   ` David Miller
@ 2018-01-03 17:43   ` Michael S. Tsirkin
  2018-01-03 18:17     ` Samudrala, Sridhar
  1 sibling, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2018-01-03 17:43 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: stephen, netdev, virtualization, virtio-dev, jesse.brandeburg

On Tue, Jan 02, 2018 at 04:35:37PM -0800, Sridhar Samudrala wrote:
> This feature bit can be used by hypervisor to indicate virtio_net device to
> act as a master for a directly attached passthru device with the same MAC
> address.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/virtio_net.c        | 3 ++-
>  include/uapi/linux/virtio_net.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6fb7b658a6cc..46844a1d9a62 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2796,7 +2796,8 @@ static struct virtio_device_id id_table[] = {
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> -	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> +	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> +	VIRTIO_NET_F_MASTER
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index fc353b518288..a9b4e0836786 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -56,6 +56,7 @@
>  #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
>  					 * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
> +#define VIRTIO_NET_F_MASTER	62	/* act as master for a VF device */

Well the virtio hardware doesn't really know whether it's a master or
the slave of a bond. In fact quite possibly down the road we'll add a
virtual device on top as others seem to want to do - that other one will
be the master then.

And we do nothing do check it's a VF.

So maybe a better name would be

#define VIRTIO_NET_F_BACKUP	62	/* Act as backup for another device with the same MAC */

>  
>  #ifndef VIRTIO_NET_NO_LEGACY
>  #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
> -- 
> 2.14.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
  2018-01-03 16:59   ` Alexander Duyck
@ 2018-01-03 18:14     ` Samudrala, Sridhar
  2018-01-03 18:28       ` Alexander Duyck
  0 siblings, 1 reply; 12+ messages in thread
From: Samudrala, Sridhar @ 2018-01-03 18:14 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski
  Cc: Brandeburg, Jesse, Michael S. Tsirkin, Stephen Hemminger, Netdev,
	virtualization, virtio-dev, Alexander Duyck



On 1/3/2018 8:59 AM, Alexander Duyck wrote:
> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>> This patch series enables virtio to switch over to a VF datapath when a VF
>>> netdev is present with the same MAC address. It allows live migration of a VM
>>> with a direct attached VF without the need to setup a bond/team between a
>>> VF and virtio net device in the guest.
>>>
>>> The hypervisor needs to unplug the VF device from the guest on the source
>>> host and reset the MAC filter of the VF to initiate failover of datapath to
>>> virtio before starting the migration. After the migration is completed, the
>>> destination hypervisor sets the MAC filter on the VF and plugs it back to
>>> the guest to switch over to VF datapath.
>>>
>>> It is based on netvsc implementation and it may be possible to make this code
>>> generic and move it to a common location that can be shared by netvsc and virtio.
>>>
>>> This patch series is based on the discussion initiated by Jesse on this thread.
>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> How does the notion of a device which is both a bond and a leg of a
>> bond fit with Alex's recent discussions about feature propagation?
>> Which propagation rules will apply to VirtIO master?  Meaning of the
>> flags on a software upper device may be different.  Why muddy the
>> architecture like this and not introduce a synthetic bond device?
> It doesn't really fit with the notion I had. I think there may have
> been a bit of a disconnect as I have been out for the last week or so
> for the holidays.
>
> My thought on this was that the feature bit should be spawning a new
> para-virtual bond device and that bond should have the virto and the
> VF as slaves. Also I thought there was some discussion about trying to
> reuse as much of the netvsc code as possible for this so that we could
> avoid duplication of effort and have the two drivers use the same
> approach. It seems like it should be pretty straight forward since you
> would have the feature bit in the case of virto, and netvsc just does
> this sort of thing by default if I am not mistaken.
This patch is mostly based on netvsc implementation. The only change is 
avoiding the
explicit dev_open() call of the VF netdev after a delay. I am assuming 
that the guest userspace
will bring up the VF netdev and the hypervisor will update the MAC 
filters to switch to
the right data path.
We could commonize the code and make it shared between netvsc and 
virtio. Do we want
to do this right away or later? If so, what would be a good location for 
these shared functions?
Is it net/core/dev.c?

Also, if we want to go with a solution that creates a bond device, do we 
want virtio_net/netvsc
drivers to create a upper device?  Such a solution is already possible 
via config scripts that can
create a bond with virtio and a VF net device as slaves.  netvsc and 
this patch series is trying to
make it as simple as possible for the VM to use directly attached 
devices and support live migration
by switching to virtio datapath as a backup during the migration process 
when the VF device
is unplugged.

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit
  2018-01-03 17:43   ` Michael S. Tsirkin
@ 2018-01-03 18:17     ` Samudrala, Sridhar
  0 siblings, 0 replies; 12+ messages in thread
From: Samudrala, Sridhar @ 2018-01-03 18:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: stephen, netdev, virtualization, virtio-dev, jesse.brandeburg



On 1/3/2018 9:43 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 04:35:37PM -0800, Sridhar Samudrala wrote:
>> This feature bit can be used by hypervisor to indicate virtio_net device to
>> act as a master for a directly attached passthru device with the same MAC
>> address.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
>>   drivers/net/virtio_net.c        | 3 ++-
>>   include/uapi/linux/virtio_net.h | 1 +
>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 6fb7b658a6cc..46844a1d9a62 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2796,7 +2796,8 @@ static struct virtio_device_id id_table[] = {
>>   	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>>   	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>>   	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>> -	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>> +	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>> +	VIRTIO_NET_F_MASTER
>>   
>>   static unsigned int features[] = {
>>   	VIRTNET_FEATURES,
>> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
>> index fc353b518288..a9b4e0836786 100644
>> --- a/include/uapi/linux/virtio_net.h
>> +++ b/include/uapi/linux/virtio_net.h
>> @@ -56,6 +56,7 @@
>>   #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
>>   					 * Steering */
>>   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>> +#define VIRTIO_NET_F_MASTER	62	/* act as master for a VF device */
> Well the virtio hardware doesn't really know whether it's a master or
> the slave of a bond. In fact quite possibly down the road we'll add a
> virtual device on top as others seem to want to do - that other one will
> be the master then.
>
> And we do nothing do check it's a VF.
>
> So maybe a better name would be
>
> #define VIRTIO_NET_F_BACKUP	62	/* Act as backup for another device with the same MAC */
Yes. BACKUP  does make more sense as virtio is only used as a BACKUP 
datapath when the other
device is unplugged.


>
>>   
>>   #ifndef VIRTIO_NET_NO_LEGACY
>>   #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
>> -- 
>> 2.14.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
  2018-01-03 18:14     ` Samudrala, Sridhar
@ 2018-01-03 18:28       ` Alexander Duyck
  2018-01-04  0:22         ` Samudrala, Sridhar
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Duyck @ 2018-01-03 18:28 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Jakub Kicinski, Brandeburg, Jesse, Michael S. Tsirkin,
	Stephen Hemminger, Netdev, virtualization, virtio-dev,
	Alexander Duyck

On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
>
> On 1/3/2018 8:59 AM, Alexander Duyck wrote:
>>
>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>
>>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>>>
>>>> This patch series enables virtio to switch over to a VF datapath when a
>>>> VF
>>>> netdev is present with the same MAC address. It allows live migration of
>>>> a VM
>>>> with a direct attached VF without the need to setup a bond/team between
>>>> a
>>>> VF and virtio net device in the guest.
>>>>
>>>> The hypervisor needs to unplug the VF device from the guest on the
>>>> source
>>>> host and reset the MAC filter of the VF to initiate failover of datapath
>>>> to
>>>> virtio before starting the migration. After the migration is completed,
>>>> the
>>>> destination hypervisor sets the MAC filter on the VF and plugs it back
>>>> to
>>>> the guest to switch over to VF datapath.
>>>>
>>>> It is based on netvsc implementation and it may be possible to make this
>>>> code
>>>> generic and move it to a common location that can be shared by netvsc
>>>> and virtio.
>>>>
>>>> This patch series is based on the discussion initiated by Jesse on this
>>>> thread.
>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>
>>> How does the notion of a device which is both a bond and a leg of a
>>> bond fit with Alex's recent discussions about feature propagation?
>>> Which propagation rules will apply to VirtIO master?  Meaning of the
>>> flags on a software upper device may be different.  Why muddy the
>>> architecture like this and not introduce a synthetic bond device?
>>
>> It doesn't really fit with the notion I had. I think there may have
>> been a bit of a disconnect as I have been out for the last week or so
>> for the holidays.
>>
>> My thought on this was that the feature bit should be spawning a new
>> para-virtual bond device and that bond should have the virto and the
>> VF as slaves. Also I thought there was some discussion about trying to
>> reuse as much of the netvsc code as possible for this so that we could
>> avoid duplication of effort and have the two drivers use the same
>> approach. It seems like it should be pretty straight forward since you
>> would have the feature bit in the case of virto, and netvsc just does
>> this sort of thing by default if I am not mistaken.
>
> This patch is mostly based on netvsc implementation. The only change is
> avoiding the
> explicit dev_open() call of the VF netdev after a delay. I am assuming that
> the guest userspace
> will bring up the VF netdev and the hypervisor will update the MAC filters
> to switch to
> the right data path.
> We could commonize the code and make it shared between netvsc and virtio. Do
> we want
> to do this right away or later? If so, what would be a good location for
> these shared functions?
> Is it net/core/dev.c?

No, I would think about starting a new driver file in "/drivers/net/".
The idea is this driver would be utilized to create a bond
automatically and set the appropriate registration hooks. If nothing
else you could probably just call it something generic like virt-bond
or vbond or whatever.

> Also, if we want to go with a solution that creates a bond device, do we
> want virtio_net/netvsc
> drivers to create a upper device?  Such a solution is already possible via
> config scripts that can
> create a bond with virtio and a VF net device as slaves.  netvsc and this
> patch series is trying to
> make it as simple as possible for the VM to use directly attached devices
> and support live migration
> by switching to virtio datapath as a backup during the migration process
> when the VF device
> is unplugged.

We all understand that. But you are making the solution very virtio
specific. We want to see this be usable for other interfaces such as
netsc and whatever other virtual interfaces are floating around out
there.

Also I haven't seen us address what happens as far as how we will
handle this on the host. My thought was we should have a paired
interface. Something like veth, but made up of a bond on each end. So
in the host we should have one bond that has a tap/vhost interface and
a VF port representor, and on the other we would be looking at the
virtio interface and the VF. Attaching the tap/vhost to the bond could
be a way of triggering the feature bit to be set in the virtio. That
way communication between the guest and the host won't get too
confusing as you will see all traffic from the bonded MAC address
always show up on the host side bond instead of potentially showing up
on two unrelated interfaces. It would also make for a good way to
resolve the east/west traffic problem on hosts since you could just
send the broadcast/multicast traffic via the tap/vhost/virtio channel
instead of having to send it back through the port representor and eat
up all that PCIe bus traffic.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
  2018-01-03 18:28       ` Alexander Duyck
@ 2018-01-04  0:22         ` Samudrala, Sridhar
       [not found]           ` <CADGSJ22TXcyJ1iAAtKho7S=jGfUzUsqvAx-HV2XuwV2sLwfR_Q@mail.gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Samudrala, Sridhar @ 2018-01-04  0:22 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Brandeburg, Jesse, Michael S. Tsirkin,
	Stephen Hemminger, Netdev, virtualization, virtio-dev,
	Alexander Duyck

On 1/3/2018 10:28 AM, Alexander Duyck wrote:
> On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>>
>> On 1/3/2018 8:59 AM, Alexander Duyck wrote:
>>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>>>> This patch series enables virtio to switch over to a VF datapath when a
>>>>> VF
>>>>> netdev is present with the same MAC address. It allows live migration of
>>>>> a VM
>>>>> with a direct attached VF without the need to setup a bond/team between
>>>>> a
>>>>> VF and virtio net device in the guest.
>>>>>
>>>>> The hypervisor needs to unplug the VF device from the guest on the
>>>>> source
>>>>> host and reset the MAC filter of the VF to initiate failover of datapath
>>>>> to
>>>>> virtio before starting the migration. After the migration is completed,
>>>>> the
>>>>> destination hypervisor sets the MAC filter on the VF and plugs it back
>>>>> to
>>>>> the guest to switch over to VF datapath.
>>>>>
>>>>> It is based on netvsc implementation and it may be possible to make this
>>>>> code
>>>>> generic and move it to a common location that can be shared by netvsc
>>>>> and virtio.
>>>>>
>>>>> This patch series is based on the discussion initiated by Jesse on this
>>>>> thread.
>>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>> How does the notion of a device which is both a bond and a leg of a
>>>> bond fit with Alex's recent discussions about feature propagation?
>>>> Which propagation rules will apply to VirtIO master?  Meaning of the
>>>> flags on a software upper device may be different.  Why muddy the
>>>> architecture like this and not introduce a synthetic bond device?
>>> It doesn't really fit with the notion I had. I think there may have
>>> been a bit of a disconnect as I have been out for the last week or so
>>> for the holidays.
>>>
>>> My thought on this was that the feature bit should be spawning a new
>>> para-virtual bond device and that bond should have the virto and the
>>> VF as slaves. Also I thought there was some discussion about trying to
>>> reuse as much of the netvsc code as possible for this so that we could
>>> avoid duplication of effort and have the two drivers use the same
>>> approach. It seems like it should be pretty straight forward since you
>>> would have the feature bit in the case of virto, and netvsc just does
>>> this sort of thing by default if I am not mistaken.
>> This patch is mostly based on netvsc implementation. The only change is
>> avoiding the
>> explicit dev_open() call of the VF netdev after a delay. I am assuming that
>> the guest userspace
>> will bring up the VF netdev and the hypervisor will update the MAC filters
>> to switch to
>> the right data path.
>> We could commonize the code and make it shared between netvsc and virtio. Do
>> we want
>> to do this right away or later? If so, what would be a good location for
>> these shared functions?
>> Is it net/core/dev.c?
> No, I would think about starting a new driver file in "/drivers/net/".
> The idea is this driver would be utilized to create a bond
> automatically and set the appropriate registration hooks. If nothing
> else you could probably just call it something generic like virt-bond
> or vbond or whatever.

We are trying to avoid creating another driver or a device.  Can we look 
into
consolidation of the 2 implementations(virtio & netvsc) as a later patch?
>
>> Also, if we want to go with a solution that creates a bond device, do we
>> want virtio_net/netvsc
>> drivers to create a upper device?  Such a solution is already possible via
>> config scripts that can
>> create a bond with virtio and a VF net device as slaves.  netvsc and this
>> patch series is trying to
>> make it as simple as possible for the VM to use directly attached devices
>> and support live migration
>> by switching to virtio datapath as a backup during the migration process
>> when the VF device
>> is unplugged.
> We all understand that. But you are making the solution very virtio
> specific. We want to see this be usable for other interfaces such as
> netsc and whatever other virtual interfaces are floating around out
> there.
>
> Also I haven't seen us address what happens as far as how we will
> handle this on the host. My thought was we should have a paired
> interface. Something like veth, but made up of a bond on each end. So
> in the host we should have one bond that has a tap/vhost interface and
> a VF port representor, and on the other we would be looking at the
> virtio interface and the VF. Attaching the tap/vhost to the bond could
> be a way of triggering the feature bit to be set in the virtio. That
> way communication between the guest and the host won't get too
> confusing as you will see all traffic from the bonded MAC address
> always show up on the host side bond instead of potentially showing up
> on two unrelated interfaces. It would also make for a good way to
> resolve the east/west traffic problem on hosts since you could just
> send the broadcast/multicast traffic via the tap/vhost/virtio channel
> instead of having to send it back through the port representor and eat
> up all that PCIe bus traffic.
 From the host point of view, here is a simple script that needs to be 
run to do the
live migration. We don't need any bond configuration on the host.

virsh detach-interface $DOMAIN hostdev --mac $MAC
ip link set $PF vf $VF_NUM mac $ZERO_MAC

virsh migrate --live $DOMAIN qemu+ssh://$REMOTE_HOST/system

ssh $REMOTE_HOST ip link set $PF vf $VF_NUM mac $MAC
ssh $REMOTE_HOST virsh attach-interface $DOMAIN hostdev $REMOTE_HOSTDEV 
--mac $MAC

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
       [not found]                   ` <CADGSJ21rDfqzsnd+m5f6w=LqoTY3Bb0p1Pe+_tppdEm_8HysKA@mail.gmail.com>
@ 2018-01-22 19:00                     ` Siwei Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Siwei Liu @ 2018-01-22 19:00 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Jakub Kicinski, Brandeburg, Jesse, Michael S. Tsirkin,
	Stephen Hemminger, Netdev, virtualization, virtio-dev,
	Alexander Duyck

Apologies I didn't notice that the discussion was mistakenly taken
offline. Post it back.

-Siwei

On Sat, Jan 13, 2018 at 7:25 AM, Siwei Liu <loseweigh@gmail.com> wrote:
> On Thu, Jan 11, 2018 at 12:32 PM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>> On 1/8/2018 9:22 AM, Siwei Liu wrote:
>>>
>>> On Sat, Jan 6, 2018 at 2:33 AM, Samudrala, Sridhar
>>> <sridhar.samudrala@intel.com> wrote:
>>>>
>>>> On 1/5/2018 9:07 AM, Siwei Liu wrote:
>>>>>
>>>>> On Thu, Jan 4, 2018 at 8:22 AM, Samudrala, Sridhar
>>>>> <sridhar.samudrala@intel.com> wrote:
>>>>>>
>>>>>> On 1/3/2018 10:28 AM, Alexander Duyck wrote:
>>>>>>>
>>>>>>> On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar
>>>>>>> <sridhar.samudrala@intel.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/3/2018 8:59 AM, Alexander Duyck wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>>>>>>>>>>
>>>>>>>>>>> This patch series enables virtio to switch over to a VF datapath
>>>>>>>>>>> when
>>>>>>>>>>> a
>>>>>>>>>>> VF
>>>>>>>>>>> netdev is present with the same MAC address. It allows live
>>>>>>>>>>> migration
>>>>>>>>>>> of
>>>>>>>>>>> a VM
>>>>>>>>>>> with a direct attached VF without the need to setup a bond/team
>>>>>>>>>>> between
>>>>>>>>>>> a
>>>>>>>>>>> VF and virtio net device in the guest.
>>>>>>>>>>>
>>>>>>>>>>> The hypervisor needs to unplug the VF device from the guest on the
>>>>>>>>>>> source
>>>>>>>>>>> host and reset the MAC filter of the VF to initiate failover of
>>>>>>>>>>> datapath
>>>>>>>>>>> to
>>>>>>>>>>> virtio before starting the migration. After the migration is
>>>>>>>>>>> completed,
>>>>>>>>>>> the
>>>>>>>>>>> destination hypervisor sets the MAC filter on the VF and plugs it
>>>>>>>>>>> back
>>>>>>>>>>> to
>>>>>>>>>>> the guest to switch over to VF datapath.
>>>>>>>>>>>
>>>>>>>>>>> It is based on netvsc implementation and it may be possible to
>>>>>>>>>>> make
>>>>>>>>>>> this
>>>>>>>>>>> code
>>>>>>>>>>> generic and move it to a common location that can be shared by
>>>>>>>>>>> netvsc
>>>>>>>>>>> and virtio.
>>>>>>>>>>>
>>>>>>>>>>> This patch series is based on the discussion initiated by Jesse on
>>>>>>>>>>> this
>>>>>>>>>>> thread.
>>>>>>>>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>>>>>>>>
>>>>>>>>>> How does the notion of a device which is both a bond and a leg of a
>>>>>>>>>> bond fit with Alex's recent discussions about feature propagation?
>>>>>>>>>> Which propagation rules will apply to VirtIO master?  Meaning of
>>>>>>>>>> the
>>>>>>>>>> flags on a software upper device may be different.  Why muddy the
>>>>>>>>>> architecture like this and not introduce a synthetic bond device?
>>>>>>>>>
>>>>>>>>> It doesn't really fit with the notion I had. I think there may have
>>>>>>>>> been a bit of a disconnect as I have been out for the last week or
>>>>>>>>> so
>>>>>>>>> for the holidays.
>>>>>>>>>
>>>>>>>>> My thought on this was that the feature bit should be spawning a new
>>>>>>>>> para-virtual bond device and that bond should have the virto and the
>>>>>>>>> VF as slaves. Also I thought there was some discussion about trying
>>>>>>>>> to
>>>>>>>>> reuse as much of the netvsc code as possible for this so that we
>>>>>>>>> could
>>>>>>>>> avoid duplication of effort and have the two drivers use the same
>>>>>>>>> approach. It seems like it should be pretty straight forward since
>>>>>>>>> you
>>>>>>>>> would have the feature bit in the case of virto, and netvsc just
>>>>>>>>> does
>>>>>>>>> this sort of thing by default if I am not mistaken.
>>>>>>>>
>>>>>>>> This patch is mostly based on netvsc implementation. The only change
>>>>>>>> is
>>>>>>>> avoiding the
>>>>>>>> explicit dev_open() call of the VF netdev after a delay. I am
>>>>>>>> assuming
>>>>>>>> that
>>>>>>>> the guest userspace
>>>>>>>> will bring up the VF netdev and the hypervisor will update the MAC
>>>>>>>> filters
>>>>>>>> to switch to
>>>>>>>> the right data path.
>>>>>>>> We could commonize the code and make it shared between netvsc and
>>>>>>>> virtio.
>>>>>>>> Do
>>>>>>>> we want
>>>>>>>> to do this right away or later? If so, what would be a good location
>>>>>>>> for
>>>>>>>> these shared functions?
>>>>>>>> Is it net/core/dev.c?
>>>>>>>
>>>>>>> No, I would think about starting a new driver file in "/drivers/net/".
>>>>>>> The idea is this driver would be utilized to create a bond
>>>>>>> automatically and set the appropriate registration hooks. If nothing
>>>>>>> else you could probably just call it something generic like virt-bond
>>>>>>> or vbond or whatever.
>>>>>>
>>>>>>
>>>>>> We are trying to avoid creating another driver or a device.  Can we
>>>>>> look
>>>>>> into
>>>>>> consolidation of the 2 implementations(virtio & netvsc) as a later
>>>>>> patch?
>>>>>>
>>>>>>>> Also, if we want to go with a solution that creates a bond device, do
>>>>>>>> we
>>>>>>>> want virtio_net/netvsc
>>>>>>>> drivers to create a upper device?  Such a solution is already
>>>>>>>> possible
>>>>>>>> via
>>>>>>>> config scripts that can
>>>>>>>> create a bond with virtio and a VF net device as slaves.  netvsc and
>>>>>>>> this
>>>>>>>> patch series is trying to
>>>>>>>> make it as simple as possible for the VM to use directly attached
>>>>>>>> devices
>>>>>>>> and support live migration
>>>>>>>> by switching to virtio datapath as a backup during the migration
>>>>>>>> process
>>>>>>>> when the VF device
>>>>>>>> is unplugged.
>>>>>>>
>>>>>>> We all understand that. But you are making the solution very virtio
>>>>>>> specific. We want to see this be usable for other interfaces such as
>>>>>>> netsc and whatever other virtual interfaces are floating around out
>>>>>>> there.
>>>>>>>
>>>>>>> Also I haven't seen us address what happens as far as how we will
>>>>>>> handle this on the host. My thought was we should have a paired
>>>>>>> interface. Something like veth, but made up of a bond on each end. So
>>>>>>> in the host we should have one bond that has a tap/vhost interface and
>>>>>>> a VF port representor, and on the other we would be looking at the
>>>>>>> virtio interface and the VF. Attaching the tap/vhost to the bond could
>>>>>>> be a way of triggering the feature bit to be set in the virtio. That
>>>>>>> way communication between the guest and the host won't get too
>>>>>>> confusing as you will see all traffic from the bonded MAC address
>>>>>>> always show up on the host side bond instead of potentially showing up
>>>>>>> on two unrelated interfaces. It would also make for a good way to
>>>>>>> resolve the east/west traffic problem on hosts since you could just
>>>>>>> send the broadcast/multicast traffic via the tap/vhost/virtio channel
>>>>>>> instead of having to send it back through the port representor and eat
>>>>>>> up all that PCIe bus traffic.
>>>>>>
>>>>>>   From the host point of view, here is a simple script that needs to be
>>>>>> run to
>>>>>> do the
>>>>>> live migration. We don't need any bond configuration on the host.
>>>>>>
>>>>>> virsh detach-interface $DOMAIN hostdev --mac $MAC
>>>>>> ip link set $PF vf $VF_NUM mac $ZERO_MAC
>>>>>
>>>>> I'm not sure I understand how this script may work with regard to
>>>>> "live" migration.
>>>>>
>>>>> I'm confused, this script seems to require virtio-net to be configured
>>>>> on top of a different PF than where the migrating VF is seated. Or
>>>>> else, how does identical MAC address filter get programmed to one PF
>>>>> with two (or more) child virtual interfaces (e.g. one macvtap for
>>>>> virtio-net plus one VF)? The coincidence of it being able to work on
>>>>> the NIC of one/some vendor(s) does not apply to the others AFAIK.
>>>>>
>>>>> If you're planning to use a different PF, I don't see how gratuitous
>>>>> ARP announcements are generated to make this a "live" migration.
>>>>
>>>>
>>>> I am not using a different PF.  virtio is backed by a tap/bridge with PF
>>>> attached
>>>> to that bridge.  When we reset VF MAC after it is unplugged, all the
>>>> packets
>>>> for
>>>> the guest MAC will go to PF and reach virtio via the bridge.
>>>>
>>> That is the limitation of this scheme: it only works for virtio backed
>>> by tap/bridge, rather than backed by macvtap on top of the
>>> corresponding *PF*. Nowadays more datacenter users prefer macvtap as
>>> opposed to bridge, simply because of better isolation and performance
>>> (e.g. host stack consumption on NIC promiscuity processing are not
>>> scalable for bridges). Additionally, the ongoing virtio receive
>>> zero-copy work will be tightly integrated with macvtap, the
>>> performance optimization of which is apparently difficult (if
>>> technically possible at all) to be done on bridge. Why do we limit the
>>> host backend support to only bridge at this point?
>>
>>
>> No. This should work with virtio backed by macvtap over PF too.
>>
>>>
>>>> If we want to use virtio backed by macvtap on top of another VF as the
>>>> backup
>>>> channel, and we could set the guest MAC to that VF after unplugging the
>>>> directly
>>>> attached VF.
>>>
>>> I meant macvtap on the regarding PF instead of another VF. You know,
>>> users shouldn't have to change guest MAC back and forth. Live
>>> migration shouldn't involve any form of user intervention IMHO.
>>
>> Yes. macvtap on top of PF should work too. Hypervisor doesn't need to change
>> the guest MAC.  The PF driver needs to program the HW MAC filters so that
>> the
>> frames reach PF when VF is unplugged.
>
> So the HW MAC filter is deferred to get programmed for virtio only
> until VF is unplugged, correct? This is not the regular plumbing order
> for macvtap. Unless I miss something obvious, how does this get
> reflected in the script below?
>
>  virsh detach-interface $DOMAIN hostdev --mac $MAC
>  ip link set $PF vf $VF_NUM mac $ZERO_MAC
>
> i.e. commands above won't automatically trigger the programming of MAC
> filters for virtio.
>
> If you program two identical MAC address filters for both VF and
> virito at the same point, I'm sure if won't work at all. It does not
> sound clear to me how you propose to make it work if you don't plan
> to change the plumbing order?
>
>>
>>
>>>
>>>>>> virsh migrate --live $DOMAIN qemu+ssh://$REMOTE_HOST/system
>>>>>>
>>>>>> ssh $REMOTE_HOST ip link set $PF vf $VF_NUM mac $MAC
>>>>>> ssh $REMOTE_HOST virsh attach-interface $DOMAIN hostdev $REMOTE_HOSTDEV
>>>>>> --mac $MAC
>>>>>
>>>>> How do you keep guest side VF configurations e.g. MTU and VLAN filters
>>>>> around across the migration? More broadly, how do you make sure the
>>>>> new VF still as performant as previously done such that all hardware
>>>>> ring tunings and offload settings can be kept as much as it can be?
>>>>> I'm afraid this simple script won't work for those real-world
>>>>> scenarios.
>>>>
>>>>
>>>>> I would agree with Alex that we'll soon need a host-side stub/entity
>>>>> with cached guest configurations that may make VF switching
>>>>> straightforward and transparent.
>>>>
>>>> The script is only adding MAC filter to the VF on the destination. If the
>>>> source host has
>>>> done any additional tunings on the VF they need to be done on the
>>>> destination host too.
>>>
>>> I was mainly saying the VF's run-time configuration in the guest more
>>> than those to be configured from the host side. Let's say guest admin
>>> had changed the VF's MTU value, the default of which is 1500, to 9000
>>> before the migration. How do you save and restore the old running
>>> config for the VF across the migration?
>>
>> Such optimizations should be possible on top of this patch.  We need to sync
>> up
>> any changes/updates to VF configuration/features with virtio.
>
> This is possible but not the ideal way to build it. Virtio perhaps
> would not be the best  place to stack this (VF specifics for live
> migration) up further. We need a new driver and do it right from the
> very beginning.
>
> Thanks,
> -Siwei
>
>>
>>>
>>>> It is also possible that the VF on the destination is based on a totally
>>>> different NIC which
>>>> may be more or less performant. Or the destination may not even support a
>>>> VF
>>>> datapath too.
>>>
>>> This argument is rather weak. In almost all real-world live migration
>>> scenarios, the hardware configurations on both source and destination
>>> are (required to be) identical. Being able to support heterogenous
>>> live migration doesn't mean we can do nothing but throw all running
>>> configs or driver tunings away when it's done. Specifically, I don't
>>> find a reason not to apply the guest network configs including NIC
>>> offload settings if those are commonly supported on both ends, even on
>>> virtio-net. While for some of the configs it might be noticeable for
>>> user to respond to the loss or change, complaints would still arise
>>> when issues are painful to troubleshoot and/or difficult to get them
>>> detected and restored. This is why I say real-world scenarios are more
>>> complex than just switch and go.
>>>
>>
>> Sure. These patches by themselves don't enable live migration automatically.
>> Hypervisor
>> needs to do some additional setup before and after the migration.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-01-22 19:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-03  0:35 [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Sridhar Samudrala
2018-01-03  0:35 ` [PATCH net-next 1/2] virtio_net: Introduce VIRTIO_NET_F_MASTER feature bit Sridhar Samudrala
2018-01-03 16:36   ` David Miller
2018-01-03 17:43   ` Michael S. Tsirkin
2018-01-03 18:17     ` Samudrala, Sridhar
2018-01-03  0:35 ` [PATCH net-next 2/2] virtio_net: Extend virtio to use VF datapath when available Sridhar Samudrala
2018-01-03  2:16 ` [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device Jakub Kicinski
2018-01-03 16:59   ` Alexander Duyck
2018-01-03 18:14     ` Samudrala, Sridhar
2018-01-03 18:28       ` Alexander Duyck
2018-01-04  0:22         ` Samudrala, Sridhar
     [not found]           ` <CADGSJ22TXcyJ1iAAtKho7S=jGfUzUsqvAx-HV2XuwV2sLwfR_Q@mail.gmail.com>
     [not found]             ` <b623709f-69ab-b491-3931-7df74745b723@intel.com>
     [not found]               ` <CADGSJ223aJjt6X-dA=pKVP3_5+BOQ==Rd017LqBrBsKWqXiNSw@mail.gmail.com>
     [not found]                 ` <432a4b0a-fadc-3025-b9f6-09fd3e1b384a@intel.com>
     [not found]                   ` <CADGSJ21rDfqzsnd+m5f6w=LqoTY3Bb0p1Pe+_tppdEm_8HysKA@mail.gmail.com>
2018-01-22 19:00                     ` Siwei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).