* [PATCH net-next 0/5] refine virtio-net XDP
@ 2017-07-17 12:43 Jason Wang
  2017-07-17 12:43 ` [PATCH net-next 1/5] virtio_ring: allow to store zero as the ctx Jason Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Jason Wang @ 2017-07-17 12:43 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
Hi:
This series brings two optimizations for virtio-net XDP:
- avoid reset during XDP set
- turn off offloads on demand
Please review.
Thanks
Jason Wang (5):
  virtio_ring: allow to store zero as the ctx
  virtio-net: pack headroom into ctx for mergeable buffer
  virtio-net: switch to use new ctx API for small buffer
  virtio-net: do not reset during XDP set
  virtio-net: switch off offloads on demand if possible on XDP set
 drivers/net/virtio_net.c     | 325 +++++++++++++++++++++++++------------------
 drivers/virtio/virtio_ring.c |   2 +-
 2 files changed, 194 insertions(+), 133 deletions(-)
-- 
2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH net-next 1/5] virtio_ring: allow to store zero as the ctx
  2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
@ 2017-07-17 12:43 ` Jason Wang
  2017-07-17 12:43 ` [PATCH net-next 2/5] virtio-net: pack headroom into ctx for mergeable buffer Jason Wang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2017-07-17 12:43 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
Allow zero to be store as a ctx, with this we could store e.g zero
value which could be meaningful for the case of storing headroom
through ctx.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5e1b548..9aaa177 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -391,7 +391,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	vq->desc_state[head].data = data;
 	if (indirect)
 		vq->desc_state[head].indir_desc = desc;
-	if (ctx)
+	else
 		vq->desc_state[head].indir_desc = ctx;
 
 	/* Put entry in available array (but don't update avail->idx until they
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH net-next 2/5] virtio-net: pack headroom into ctx for mergeable buffer
  2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
  2017-07-17 12:43 ` [PATCH net-next 1/5] virtio_ring: allow to store zero as the ctx Jason Wang
@ 2017-07-17 12:43 ` Jason Wang
  2017-07-18 18:59   ` Michael S. Tsirkin
  2017-07-17 12:43 ` [PATCH net-next 3/5] virtio-net: switch to use new ctx API for small buffer Jason Wang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2017-07-17 12:43 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
Pack headroom into ctx, then during XDP set, we could know the size of
headroom and copy if needed. This is required for avoiding reset on
XDP.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1f8c15c..8fae9a8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -270,6 +270,23 @@ static void skb_xmit_done(struct virtqueue *vq)
 		netif_wake_subqueue(vi->dev, vq2txq(vq));
 }
 
+#define MRG_CTX_HEADER_SHIFT 22
+static void *mergeable_len_to_ctx(unsigned int truesize,
+				  unsigned int headroom)
+{
+	return (void *)(unsigned long)((headroom << MRG_CTX_HEADER_SHIFT) | truesize);
+}
+
+static unsigned int mergeable_ctx_to_headroom(void *mrg_ctx)
+{
+	return (unsigned long)mrg_ctx >> MRG_CTX_HEADER_SHIFT;
+}
+
+static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
+{
+	return (unsigned long)mrg_ctx & ((1 << MRG_CTX_HEADER_SHIFT) - 1);
+}
+
 /* Called from bottom half context */
 static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 				   struct receive_queue *rq,
@@ -639,13 +656,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
-	if (unlikely(len > (unsigned long)ctx)) {
+	truesize = mergeable_ctx_to_truesize(ctx);
+	if (unlikely(len > truesize)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 			 dev->name, len, (unsigned long)ctx);
 		dev->stats.rx_length_errors++;
 		goto err_skb;
 	}
-	truesize = (unsigned long)ctx;
+
 	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
 	curr_skb = head_skb;
 
@@ -665,13 +683,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		}
 
 		page = virt_to_head_page(buf);
-		if (unlikely(len > (unsigned long)ctx)) {
+
+		truesize = mergeable_ctx_to_truesize(ctx);
+		if (unlikely(len > truesize)) {
 			pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 				 dev->name, len, (unsigned long)ctx);
 			dev->stats.rx_length_errors++;
 			goto err_skb;
 		}
-		truesize = (unsigned long)ctx;
 
 		num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
 		if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
@@ -889,7 +908,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
 	buf += headroom; /* advance address leaving hole at front of pkt */
-	ctx = (void *)(unsigned long)len;
+	ctx = mergeable_len_to_ctx(len, headroom);
 	get_page(alloc_frag->page);
 	alloc_frag->offset += len + headroom;
 	hole = alloc_frag->size - alloc_frag->offset;
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH net-next 3/5] virtio-net: switch to use new ctx API for small buffer
  2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
  2017-07-17 12:43 ` [PATCH net-next 1/5] virtio_ring: allow to store zero as the ctx Jason Wang
  2017-07-17 12:43 ` [PATCH net-next 2/5] virtio-net: pack headroom into ctx for mergeable buffer Jason Wang
@ 2017-07-17 12:43 ` Jason Wang
  2017-07-18 19:20   ` Michael S. Tsirkin
  2017-07-17 12:44 ` [PATCH net-next 4/5] virtio-net: do not reset during XDP set Jason Wang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2017-07-17 12:43 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
Switch to use ctx API for small buffer, this is need for avoiding
reset on XDP.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8fae9a8..e31b5b2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -410,7 +410,8 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 static struct sk_buff *receive_small(struct net_device *dev,
 				     struct virtnet_info *vi,
 				     struct receive_queue *rq,
-				     void *buf, unsigned int len)
+				     void *buf, void *ctx,
+				     unsigned int len)
 {
 	struct sk_buff *skb;
 	struct bpf_prog *xdp_prog;
@@ -773,7 +774,7 @@ static int receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 	else if (vi->big_packets)
 		skb = receive_big(dev, vi, rq, buf, len);
 	else
-		skb = receive_small(dev, vi, rq, buf, len);
+		skb = receive_small(dev, vi, rq, buf, ctx, len);
 
 	if (unlikely(!skb))
 		return 0;
@@ -812,6 +813,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	struct page_frag *alloc_frag = &rq->alloc_frag;
 	char *buf;
 	unsigned int xdp_headroom = virtnet_get_headroom(vi);
+	void *ctx = (void *)(unsigned long)xdp_headroom;
 	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
 	int err;
 
@@ -825,7 +827,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	alloc_frag->offset += len;
 	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
 		    vi->hdr_len + GOOD_PACKET_LEN);
-	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, buf, gfp);
+	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));
 
@@ -1034,7 +1036,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget)
 	void *buf;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 
-	if (vi->mergeable_rx_bufs) {
+	if (!vi->big_packets || vi->mergeable_rx_bufs) {
 		void *ctx;
 
 		while (received < budget &&
@@ -2198,7 +2200,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 	names = kmalloc(total_vqs * sizeof(*names), GFP_KERNEL);
 	if (!names)
 		goto err_names;
-	if (vi->mergeable_rx_bufs) {
+	if (!vi->big_packets || vi->mergeable_rx_bufs) {
 		ctx = kzalloc(total_vqs * sizeof(*ctx), GFP_KERNEL);
 		if (!ctx)
 			goto err_ctx;
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH net-next 4/5] virtio-net: do not reset during XDP set
  2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
                   ` (2 preceding siblings ...)
  2017-07-17 12:43 ` [PATCH net-next 3/5] virtio-net: switch to use new ctx API for small buffer Jason Wang
@ 2017-07-17 12:44 ` Jason Wang
  2017-07-18 19:49   ` Michael S. Tsirkin
  2017-07-17 12:44 ` [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on " Jason Wang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2017-07-17 12:44 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
We used to reset during XDP set, the main reason is we need allocate
extra headroom for header adjustment but there's no way to know the
headroom of exist receive buffer. This works buy maybe complex and may
cause the network down for a while which is bad for user
experience. So this patch tries to avoid this by:
- packing headroom into receive buffer ctx
- check the headroom during XDP, and if it was not sufficient, copy
  the packet into a location which has a large enough headroom
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 230 ++++++++++++++++++++++-------------------------
 1 file changed, 105 insertions(+), 125 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e31b5b2..e732bd6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -407,6 +407,67 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
 }
 
+/* We copy and linearize packet in the following cases:
+ *
+ * 1) Packet across multiple buffers, this happens normally when rx
+ *    buffer size is underestimated. Rarely, since spec does not
+ *    forbid using more than one buffer even if a single buffer is
+ *    sufficient for the packet, we should also deal with this case.
+ * 2) The header room is smaller than what XDP required. In this case
+ *    we should copy the packet and reserve enough headroom for this.
+ *    This would be slow but we at most we can copy times of queue
+ *    size, this is acceptable. What's more important, this help to
+ *    avoid resetting.
+ */
+static struct page *xdp_linearize_page(struct receive_queue *rq,
+				       u16 *num_buf,
+				       struct page *p,
+				       int offset,
+				       int page_off,
+				       unsigned int *len)
+{
+	struct page *page = alloc_page(GFP_ATOMIC);
+
+	if (!page)
+		return NULL;
+
+	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
+	page_off += *len;
+
+	while (--*num_buf) {
+		unsigned int buflen;
+		void *buf;
+		int off;
+
+		buf = virtqueue_get_buf(rq->vq, &buflen);
+		if (unlikely(!buf))
+			goto err_buf;
+
+		p = virt_to_head_page(buf);
+		off = buf - page_address(p);
+
+		/* guard against a misconfigured or uncooperative backend that
+		 * is sending packet larger than the MTU.
+		 */
+		if ((page_off + buflen) > PAGE_SIZE) {
+			put_page(p);
+			goto err_buf;
+		}
+
+		memcpy(page_address(page) + page_off,
+		       page_address(p) + off, buflen);
+		page_off += buflen;
+		put_page(p);
+	}
+
+	/* Headroom does not contribute to packet length */
+	*len = page_off - VIRTIO_XDP_HEADROOM;
+	return page;
+err_buf:
+	__free_pages(page, 0);
+	return NULL;
+}
+
 static struct sk_buff *receive_small(struct net_device *dev,
 				     struct virtnet_info *vi,
 				     struct receive_queue *rq,
@@ -415,12 +476,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
 {
 	struct sk_buff *skb;
 	struct bpf_prog *xdp_prog;
-	unsigned int xdp_headroom = virtnet_get_headroom(vi);
+	unsigned int xdp_headroom = (unsigned long)ctx;
 	unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
 	unsigned int headroom = vi->hdr_len + header_offset;
 	unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	struct page *page = virt_to_head_page(buf);
 	unsigned int delta = 0;
+	struct page *xdp_page;
 	len -= vi->hdr_len;
 
 	rcu_read_lock();
@@ -434,6 +497,27 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
 
+		if (unlikely(xdp_headroom != virtnet_get_headroom(vi))) {
+			int offset = buf - page_address(page) + header_offset;
+			unsigned int tlen = len + vi->hdr_len;
+			u16 num_buf = 1;
+
+			xdp_headroom = virtnet_get_headroom(vi);
+			header_offset = VIRTNET_RX_PAD + xdp_headroom;
+			headroom = vi->hdr_len + header_offset;
+			buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
+				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+			xdp_page = xdp_linearize_page(rq, &num_buf, page,
+						      offset, header_offset,
+						      &tlen);
+			if (!xdp_page)
+				goto err_xdp;
+
+			buf = page_address(xdp_page);
+			put_page(page);
+			page = xdp_page;
+		}
+
 		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
 		xdp.data = xdp.data_hard_start + xdp_headroom;
 		xdp.data_end = xdp.data + len;
@@ -462,7 +546,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 
 	skb = build_skb(buf, buflen);
 	if (!skb) {
-		put_page(virt_to_head_page(buf));
+		put_page(page);
 		goto err;
 	}
 	skb_reserve(skb, headroom - delta);
@@ -478,7 +562,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 err_xdp:
 	rcu_read_unlock();
 	dev->stats.rx_dropped++;
-	put_page(virt_to_head_page(buf));
+	put_page(page);
 xdp_xmit:
 	return NULL;
 }
@@ -503,66 +587,6 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
-/* The conditions to enable XDP should preclude the underlying device from
- * sending packets across multiple buffers (num_buf > 1). However per spec
- * it does not appear to be illegal to do so but rather just against convention.
- * So in order to avoid making a system unresponsive the packets are pushed
- * into a page and the XDP program is run. This will be extremely slow and we
- * push a warning to the user to fix this as soon as possible. Fixing this may
- * require resolving the underlying hardware to determine why multiple buffers
- * are being received or simply loading the XDP program in the ingress stack
- * after the skb is built because there is no advantage to running it here
- * anymore.
- */
-static struct page *xdp_linearize_page(struct receive_queue *rq,
-				       u16 *num_buf,
-				       struct page *p,
-				       int offset,
-				       unsigned int *len)
-{
-	struct page *page = alloc_page(GFP_ATOMIC);
-	unsigned int page_off = VIRTIO_XDP_HEADROOM;
-
-	if (!page)
-		return NULL;
-
-	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
-	page_off += *len;
-
-	while (--*num_buf) {
-		unsigned int buflen;
-		void *buf;
-		int off;
-
-		buf = virtqueue_get_buf(rq->vq, &buflen);
-		if (unlikely(!buf))
-			goto err_buf;
-
-		p = virt_to_head_page(buf);
-		off = buf - page_address(p);
-
-		/* guard against a misconfigured or uncooperative backend that
-		 * is sending packet larger than the MTU.
-		 */
-		if ((page_off + buflen) > PAGE_SIZE) {
-			put_page(p);
-			goto err_buf;
-		}
-
-		memcpy(page_address(page) + page_off,
-		       page_address(p) + off, buflen);
-		page_off += buflen;
-		put_page(p);
-	}
-
-	/* Headroom does not contribute to packet length */
-	*len = page_off - VIRTIO_XDP_HEADROOM;
-	return page;
-err_buf:
-	__free_pages(page, 0);
-	return NULL;
-}
-
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
@@ -577,6 +601,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct sk_buff *head_skb, *curr_skb;
 	struct bpf_prog *xdp_prog;
 	unsigned int truesize;
+	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
 
 	head_skb = NULL;
 
@@ -589,10 +614,13 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		u32 act;
 
 		/* This happens when rx buffer size is underestimated */
-		if (unlikely(num_buf > 1)) {
+		if (unlikely(num_buf > 1 ||
+			     headroom < virtnet_get_headroom(vi))) {
 			/* linearize data for XDP */
 			xdp_page = xdp_linearize_page(rq, &num_buf,
-						      page, offset, &len);
+						      page, offset,
+						      VIRTIO_XDP_HEADROOM,
+						      &len);
 			if (!xdp_page)
 				goto err_xdp;
 			offset = VIRTIO_XDP_HEADROOM;
@@ -830,7 +858,6 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));
-
 	return err;
 }
 
@@ -1834,7 +1861,6 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 }
 
 static int init_vqs(struct virtnet_info *vi);
-static void _remove_vq_common(struct virtnet_info *vi);
 
 static int virtnet_restore_up(struct virtio_device *vdev)
 {
@@ -1863,39 +1889,6 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	return err;
 }
 
-static int virtnet_reset(struct virtnet_info *vi, int curr_qp, int xdp_qp)
-{
-	struct virtio_device *dev = vi->vdev;
-	int ret;
-
-	virtio_config_disable(dev);
-	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
-	virtnet_freeze_down(dev);
-	_remove_vq_common(vi);
-
-	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
-	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
-
-	ret = virtio_finalize_features(dev);
-	if (ret)
-		goto err;
-
-	vi->xdp_queue_pairs = xdp_qp;
-	ret = virtnet_restore_up(dev);
-	if (ret)
-		goto err;
-	ret = _virtnet_set_queues(vi, curr_qp);
-	if (ret)
-		goto err;
-
-	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
-	virtio_config_enable(dev);
-	return 0;
-err:
-	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
-	return ret;
-}
-
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 			   struct netlink_ext_ack *extack)
 {
@@ -1942,35 +1935,31 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 			return PTR_ERR(prog);
 	}
 
-	/* Changing the headroom in buffers is a disruptive operation because
-	 * existing buffers must be flushed and reallocated. This will happen
-	 * when a xdp program is initially added or xdp is disabled by removing
-	 * the xdp program resulting in number of XDP queues changing.
+	/* synchronize with NAPI which may do XDP_TX based on queue
+	 * pair numbers.
 	 */
-	if (vi->xdp_queue_pairs != xdp_qp) {
-		err = virtnet_reset(vi, curr_qp + xdp_qp, xdp_qp);
-		if (err) {
-			dev_warn(&dev->dev, "XDP reset failure.\n");
-			goto virtio_reset_err;
-		}
-	}
+	for (i = 0; i < vi->max_queue_pairs; i++)
+		napi_disable(&vi->rq[i].napi);
 
 	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
+	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
+	if (err)
+		goto err;
+	vi->xdp_queue_pairs = xdp_qp;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
 		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
 		if (old_prog)
 			bpf_prog_put(old_prog);
+		napi_enable(&vi->rq[i].napi);
 	}
 
 	return 0;
 
-virtio_reset_err:
-	/* On reset error do our best to unwind XDP changes inflight and return
-	 * error up to user space for resolution. The underlying reset hung on
-	 * us so not much we can do here.
-	 */
+err:
+	for (i = 0; i < vi->max_queue_pairs; i++)
+		napi_enable(&vi->rq[i].napi);
 	if (prog)
 		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
 	return err;
@@ -2614,15 +2603,6 @@ static int virtnet_probe(struct virtio_device *vdev)
 	return err;
 }
 
-static void _remove_vq_common(struct virtnet_info *vi)
-{
-	vi->vdev->config->reset(vi->vdev);
-	free_unused_bufs(vi);
-	_free_receive_bufs(vi);
-	free_receive_page_frags(vi);
-	virtnet_del_vqs(vi);
-}
-
 static void remove_vq_common(struct virtnet_info *vi)
 {
 	vi->vdev->config->reset(vi->vdev);
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on XDP set
  2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
                   ` (3 preceding siblings ...)
  2017-07-17 12:44 ` [PATCH net-next 4/5] virtio-net: do not reset during XDP set Jason Wang
@ 2017-07-17 12:44 ` Jason Wang
  2017-07-18 20:07   ` Michael S. Tsirkin
  2017-07-18 18:24 ` [PATCH net-next 0/5] refine virtio-net XDP David Miller
  2017-07-18 20:13 ` Michael S. Tsirkin
  6 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2017-07-17 12:44 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel, netdev
Current XDP implementation want guest offloads feature to be disabled
on qemu cli. This is inconvenient and means guest can't benefit from
offloads if XDP is not used. This patch tries to address this
limitation by disable the offloads on demand through control guest
offloads. Guest offloads will be disabled and enabled on demand on XDP
set.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 70 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 65 insertions(+), 5 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e732bd6..d970c2d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -57,6 +57,11 @@ DECLARE_EWMA(pkt_len, 0, 64)
 
 #define VIRTNET_DRIVER_VERSION "1.0.0"
 
+const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_TSO4,
+					 VIRTIO_NET_F_GUEST_TSO6,
+					 VIRTIO_NET_F_GUEST_ECN,
+					 VIRTIO_NET_F_GUEST_UFO };
+
 struct virtnet_stats {
 	struct u64_stats_sync tx_syncp;
 	struct u64_stats_sync rx_syncp;
@@ -164,10 +169,13 @@ struct virtnet_info {
 	u8 ctrl_promisc;
 	u8 ctrl_allmulti;
 	u16 ctrl_vid;
+	u64 ctrl_offloads;
 
 	/* Ethtool settings */
 	u8 duplex;
 	u32 speed;
+
+	unsigned long guest_offloads;
 };
 
 struct padded_vnet_hdr {
@@ -1889,6 +1897,47 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	return err;
 }
 
+static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
+{
+	struct scatterlist sg;
+	vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
+
+	sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
+
+	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
+				  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
+		dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
+{
+	u64 offloads = 0;
+
+	if (!vi->guest_offloads)
+		return 0;
+
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
+		offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
+
+	return virtnet_set_guest_offloads(vi, offloads);
+}
+
+static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
+{
+	u64 offloads = vi->guest_offloads;
+
+	if (!vi->guest_offloads)
+		return 0;
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
+		offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
+
+	return virtnet_set_guest_offloads(vi, offloads);
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 			   struct netlink_ext_ack *extack)
 {
@@ -1898,10 +1947,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	u16 xdp_qp = 0, curr_qp;
 	int i, err;
 
-	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
-	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
-	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
-	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
+	    && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
+	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
+	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
+		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO))) {
 		NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO, disable LRO first");
 		return -EOPNOTSUPP;
 	}
@@ -1950,6 +2000,12 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
 		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
+		if (i == 0) {
+			if (!old_prog)
+				virtnet_clear_guest_offloads(vi);
+			if (!prog)
+				virtnet_restore_guest_offloads(vi);
+		}
 		if (old_prog)
 			bpf_prog_put(old_prog);
 		napi_enable(&vi->rq[i].napi);
@@ -2583,6 +2639,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 		netif_carrier_on(dev);
 	}
 
+	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
+		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
+			set_bit(guest_offloads[i], &vi->guest_offloads);
+
 	pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
 		 dev->name, max_queue_pairs);
 
@@ -2679,7 +2739,7 @@ static struct virtio_device_id id_table[] = {
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
-	VIRTIO_NET_F_MTU
+	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 0/5] refine virtio-net XDP
  2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
                   ` (4 preceding siblings ...)
  2017-07-17 12:44 ` [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on " Jason Wang
@ 2017-07-18 18:24 ` David Miller
  2017-07-18 18:47   ` Michael S. Tsirkin
  2017-07-18 20:13 ` Michael S. Tsirkin
  6 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2017-07-18 18:24 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, linux-kernel, mst
From: Jason Wang <jasowang@redhat.com>
Date: Mon, 17 Jul 2017 20:43:56 +0800
> This series brings two optimizations for virtio-net XDP:
> 
> - avoid reset during XDP set
> - turn off offloads on demand
> 
> Please review.
Michael, please review Jason's changes.
Thanks.
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 0/5] refine virtio-net XDP
  2017-07-18 18:24 ` [PATCH net-next 0/5] refine virtio-net XDP David Miller
@ 2017-07-18 18:47   ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-18 18:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, virtualization
On Tue, Jul 18, 2017 at 11:24:42AM -0700, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Mon, 17 Jul 2017 20:43:56 +0800
> 
> > This series brings two optimizations for virtio-net XDP:
> > 
> > - avoid reset during XDP set
> > - turn off offloads on demand
> > 
> > Please review.
> 
> Michael, please review Jason's changes.
> 
> Thanks.
Doing that, thanks for the reminder.
-- 
MST
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/5] virtio-net: pack headroom into ctx for mergeable buffer
  2017-07-17 12:43 ` [PATCH net-next 2/5] virtio-net: pack headroom into ctx for mergeable buffer Jason Wang
@ 2017-07-18 18:59   ` Michael S. Tsirkin
  2017-07-19  2:29     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-18 18:59 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, linux-kernel, netdev
On Mon, Jul 17, 2017 at 08:43:58PM +0800, Jason Wang wrote:
> Pack headroom into ctx, then during XDP set, we could know the size of
> headroom and copy if needed. This is required for avoiding reset on
> XDP.
Not really when XDP is set - it's when buffers are used.
virtio-net: pack headroom into ctx for mergeable buffers
Pack headroom into ctx - this way when we get a buffer we can figure out
the actual headroom that was allocated for the buffer. Will be helpful
to optimize switching between XDP and non-XDP modes which have different
headroom requirements.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 1f8c15c..8fae9a8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -270,6 +270,23 @@ static void skb_xmit_done(struct virtqueue *vq)
>  		netif_wake_subqueue(vi->dev, vq2txq(vq));
>  }
>  
> +#define MRG_CTX_HEADER_SHIFT 22
> +static void *mergeable_len_to_ctx(unsigned int truesize,
> +				  unsigned int headroom)
> +{
> +	return (void *)(unsigned long)((headroom << MRG_CTX_HEADER_SHIFT) | truesize);
> +}
> +
> +static unsigned int mergeable_ctx_to_headroom(void *mrg_ctx)
> +{
> +	return (unsigned long)mrg_ctx >> MRG_CTX_HEADER_SHIFT;
> +}
> +
> +static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
> +{
> +	return (unsigned long)mrg_ctx & ((1 << MRG_CTX_HEADER_SHIFT) - 1);
> +}
> +
>  /* Called from bottom half context */
>  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  				   struct receive_queue *rq,
> @@ -639,13 +656,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	}
>  	rcu_read_unlock();
>  
> -	if (unlikely(len > (unsigned long)ctx)) {
> +	truesize = mergeable_ctx_to_truesize(ctx);
> +	if (unlikely(len > truesize)) {
>  		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>  			 dev->name, len, (unsigned long)ctx);
>  		dev->stats.rx_length_errors++;
>  		goto err_skb;
>  	}
> -	truesize = (unsigned long)ctx;
> +
>  	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
>  	curr_skb = head_skb;
>  
> @@ -665,13 +683,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		}
>  
>  		page = virt_to_head_page(buf);
> -		if (unlikely(len > (unsigned long)ctx)) {
> +
> +		truesize = mergeable_ctx_to_truesize(ctx);
> +		if (unlikely(len > truesize)) {
>  			pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>  				 dev->name, len, (unsigned long)ctx);
>  			dev->stats.rx_length_errors++;
>  			goto err_skb;
>  		}
> -		truesize = (unsigned long)ctx;
>  
>  		num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
>  		if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
> @@ -889,7 +908,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  
>  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>  	buf += headroom; /* advance address leaving hole at front of pkt */
> -	ctx = (void *)(unsigned long)len;
> +	ctx = mergeable_len_to_ctx(len, headroom);
>  	get_page(alloc_frag->page);
>  	alloc_frag->offset += len + headroom;
>  	hole = alloc_frag->size - alloc_frag->offset;
> -- 
> 2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 3/5] virtio-net: switch to use new ctx API for small buffer
  2017-07-17 12:43 ` [PATCH net-next 3/5] virtio-net: switch to use new ctx API for small buffer Jason Wang
@ 2017-07-18 19:20   ` Michael S. Tsirkin
  2017-07-19  2:30     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-18 19:20 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, linux-kernel, netdev
what's needed is ability to store the headroom there.
virtio-net: switch to use ctx API for small buffers
Use ctx API to store headroom for small buffers.
Following patches will retrieve this info and use it for XDP.
On Mon, Jul 17, 2017 at 08:43:59PM +0800, Jason Wang wrote:
> Switch to use ctx API for small buffer, this is need for avoiding
> reset on XDP.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8fae9a8..e31b5b2 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -410,7 +410,8 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>  static struct sk_buff *receive_small(struct net_device *dev,
>  				     struct virtnet_info *vi,
>  				     struct receive_queue *rq,
> -				     void *buf, unsigned int len)
> +				     void *buf, void *ctx,
> +				     unsigned int len)
>  {
>  	struct sk_buff *skb;
>  	struct bpf_prog *xdp_prog;
> @@ -773,7 +774,7 @@ static int receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  	else if (vi->big_packets)
>  		skb = receive_big(dev, vi, rq, buf, len);
>  	else
> -		skb = receive_small(dev, vi, rq, buf, len);
> +		skb = receive_small(dev, vi, rq, buf, ctx, len);
>  
>  	if (unlikely(!skb))
>  		return 0;
> @@ -812,6 +813,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
Let's document that ctx API is used a bit differently here:
/* Unlike mergeable buffers, all buffers are allocated to the same size,
 * except for the headroom. For this reason we do not need to use
 * mergeable_len_to_ctx here - it is enough to store the headroom as the
 * context ignoring the truesize.
 */
as an alternative, reuse the same format as mergeable buffers.
>  	struct page_frag *alloc_frag = &rq->alloc_frag;
>  	char *buf;
>  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> +	void *ctx = (void *)(unsigned long)xdp_headroom;
>  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
>  	int err;
>  
> @@ -825,7 +827,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	alloc_frag->offset += len;
>  	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
>  		    vi->hdr_len + GOOD_PACKET_LEN);
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, buf, gfp);
> +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>  	if (err < 0)
>  		put_page(virt_to_head_page(buf));
>  
> @@ -1034,7 +1036,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget)
>  	void *buf;
>  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  
> -	if (vi->mergeable_rx_bufs) {
> +	if (!vi->big_packets || vi->mergeable_rx_bufs) {
>  		void *ctx;
>  
>  		while (received < budget &&
> @@ -2198,7 +2200,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
>  	names = kmalloc(total_vqs * sizeof(*names), GFP_KERNEL);
>  	if (!names)
>  		goto err_names;
> -	if (vi->mergeable_rx_bufs) {
> +	if (!vi->big_packets || vi->mergeable_rx_bufs) {
>  		ctx = kzalloc(total_vqs * sizeof(*ctx), GFP_KERNEL);
>  		if (!ctx)
>  			goto err_ctx;
> -- 
> 2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 4/5] virtio-net: do not reset during XDP set
  2017-07-17 12:44 ` [PATCH net-next 4/5] virtio-net: do not reset during XDP set Jason Wang
@ 2017-07-18 19:49   ` Michael S. Tsirkin
  2017-07-19  2:35     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-18 19:49 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, linux-kernel, netdev
On Mon, Jul 17, 2017 at 08:44:00PM +0800, Jason Wang wrote:
> We used to reset during XDP set, the main reason is we need allocate
> extra headroom for header adjustment but there's no way to know the
> headroom of exist receive buffer. This works buy maybe complex and may
> cause the network down for a while which is bad for user
> experience. So this patch tries to avoid this by:
> 
> - packing headroom into receive buffer ctx
> - check the headroom during XDP, and if it was not sufficient, copy
>   the packet into a location which has a large enough headroom
The packing is actually done by previous patches. Here is a
corrected version:
We currently reset the device during XDP set, the main reason is
that we allocate more headroom with XDP (for header adjustment).
This works but causes network downtime for users.
Previous patches encoded the headroom in the buffer context,
this makes it possible to detect the case where a buffer
with headroom insufficient for XDP is added to the queue and
XDP is enabled afterwards.
Upon detection, we handle this case by copying the packet
(slow, but it's a temporary condition).
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 230 ++++++++++++++++++++++-------------------------
>  1 file changed, 105 insertions(+), 125 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e31b5b2..e732bd6 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -407,6 +407,67 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>  	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
>  }
>  
> +/* We copy and linearize packet in the following cases:
> + *
> + * 1) Packet across multiple buffers, this happens normally when rx
> + *    buffer size is underestimated. Rarely, since spec does not
> + *    forbid using more than one buffer even if a single buffer is
> + *    sufficient for the packet, we should also deal with this case.
Latest SVN of the spec actually forbids this. See:
    net: clarify device rules for mergeable buffers
> + * 2) The header room is smaller than what XDP required. In this case
> + *    we should copy the packet and reserve enough headroom for this.
> + *    This would be slow but we at most we can copy times of queue
> + *    size, this is acceptable. What's more important, this help to
> + *    avoid resetting.
Last part of the comment applies to both cases. So
+/* We copy the packet for XDP in the following cases:
+ *
+ * 1) Packet is scattered across multiple rx buffers.
+ * 2) Headroom space is insufficient.
+ *
+ * This is inefficient but it's a temporary condition that
+ * we hit right after XDP is enabled and until queue is refilled
+ * with large buffers with sufficient headroom - so it should affect
+ * at most queue size packets.
+ * Afterwards, the conditions to enable
+ * XDP should preclude the underlying device from sending packets
+ * across multiple buffers (num_buf > 1), and we make sure buffers
+ * have enough headroom.
+ */
> + * 2) The header room is smaller than what XDP required. In this case
> + *    we should copy the packet and reserve enough headroom for this.
> + *    This would be slow but we at most we can copy times of queue
> + *    size, this is acceptable. What's more important, this help to
> + *    avoid resetting.
> + */
> +static struct page *xdp_linearize_page(struct receive_queue *rq,
> +				       u16 *num_buf,
> +				       struct page *p,
> +				       int offset,
> +				       int page_off,
> +				       unsigned int *len)
> +{
> +	struct page *page = alloc_page(GFP_ATOMIC);
> +
> +	if (!page)
> +		return NULL;
> +
> +	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
> +	page_off += *len;
> +
> +	while (--*num_buf) {
> +		unsigned int buflen;
> +		void *buf;
> +		int off;
> +
> +		buf = virtqueue_get_buf(rq->vq, &buflen);
> +		if (unlikely(!buf))
> +			goto err_buf;
> +
> +		p = virt_to_head_page(buf);
> +		off = buf - page_address(p);
> +
> +		/* guard against a misconfigured or uncooperative backend that
> +		 * is sending packet larger than the MTU.
> +		 */
> +		if ((page_off + buflen) > PAGE_SIZE) {
> +			put_page(p);
> +			goto err_buf;
> +		}
> +
> +		memcpy(page_address(page) + page_off,
> +		       page_address(p) + off, buflen);
> +		page_off += buflen;
> +		put_page(p);
> +	}
> +
> +	/* Headroom does not contribute to packet length */
> +	*len = page_off - VIRTIO_XDP_HEADROOM;
> +	return page;
> +err_buf:
> +	__free_pages(page, 0);
> +	return NULL;
> +}
> +
>  static struct sk_buff *receive_small(struct net_device *dev,
>  				     struct virtnet_info *vi,
>  				     struct receive_queue *rq,
> @@ -415,12 +476,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  {
>  	struct sk_buff *skb;
>  	struct bpf_prog *xdp_prog;
> -	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> +	unsigned int xdp_headroom = (unsigned long)ctx;
>  	unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
>  	unsigned int headroom = vi->hdr_len + header_offset;
>  	unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
>  			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +	struct page *page = virt_to_head_page(buf);
>  	unsigned int delta = 0;
> +	struct page *xdp_page;
>  	len -= vi->hdr_len;
>  
>  	rcu_read_lock();
> @@ -434,6 +497,27 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>  			goto err_xdp;
>  
> +		if (unlikely(xdp_headroom != virtnet_get_headroom(vi))) {
Should this be xdp_headroom < virtnet_get_headroom(vi)?
Just in case we add more modes down the road.
> +			int offset = buf - page_address(page) + header_offset;
> +			unsigned int tlen = len + vi->hdr_len;
> +			u16 num_buf = 1;
> +
> +			xdp_headroom = virtnet_get_headroom(vi);
> +			header_offset = VIRTNET_RX_PAD + xdp_headroom;
> +			headroom = vi->hdr_len + header_offset;
> +			buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> +				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +			xdp_page = xdp_linearize_page(rq, &num_buf, page,
> +						      offset, header_offset,
> +						      &tlen);
> +			if (!xdp_page)
> +				goto err_xdp;
> +
> +			buf = page_address(xdp_page);
> +			put_page(page);
> +			page = xdp_page;
> +		}
> +
>  		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
>  		xdp.data = xdp.data_hard_start + xdp_headroom;
>  		xdp.data_end = xdp.data + len;
> @@ -462,7 +546,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  
>  	skb = build_skb(buf, buflen);
>  	if (!skb) {
> -		put_page(virt_to_head_page(buf));
> +		put_page(page);
>  		goto err;
>  	}
>  	skb_reserve(skb, headroom - delta);
> @@ -478,7 +562,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  err_xdp:
>  	rcu_read_unlock();
>  	dev->stats.rx_dropped++;
> -	put_page(virt_to_head_page(buf));
> +	put_page(page);
>  xdp_xmit:
>  	return NULL;
>  }
> @@ -503,66 +587,6 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  	return NULL;
>  }
>  
> -/* The conditions to enable XDP should preclude the underlying device from
> - * sending packets across multiple buffers (num_buf > 1). However per spec
> - * it does not appear to be illegal to do so but rather just against convention.
> - * So in order to avoid making a system unresponsive the packets are pushed
> - * into a page and the XDP program is run. This will be extremely slow and we
> - * push a warning to the user to fix this as soon as possible. Fixing this may
> - * require resolving the underlying hardware to determine why multiple buffers
> - * are being received or simply loading the XDP program in the ingress stack
> - * after the skb is built because there is no advantage to running it here
> - * anymore.
> - */
> -static struct page *xdp_linearize_page(struct receive_queue *rq,
> -				       u16 *num_buf,
> -				       struct page *p,
> -				       int offset,
> -				       unsigned int *len)
> -{
> -	struct page *page = alloc_page(GFP_ATOMIC);
> -	unsigned int page_off = VIRTIO_XDP_HEADROOM;
> -
> -	if (!page)
> -		return NULL;
> -
> -	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
> -	page_off += *len;
> -
> -	while (--*num_buf) {
> -		unsigned int buflen;
> -		void *buf;
> -		int off;
> -
> -		buf = virtqueue_get_buf(rq->vq, &buflen);
> -		if (unlikely(!buf))
> -			goto err_buf;
> -
> -		p = virt_to_head_page(buf);
> -		off = buf - page_address(p);
> -
> -		/* guard against a misconfigured or uncooperative backend that
> -		 * is sending packet larger than the MTU.
> -		 */
> -		if ((page_off + buflen) > PAGE_SIZE) {
> -			put_page(p);
> -			goto err_buf;
> -		}
> -
> -		memcpy(page_address(page) + page_off,
> -		       page_address(p) + off, buflen);
> -		page_off += buflen;
> -		put_page(p);
> -	}
> -
> -	/* Headroom does not contribute to packet length */
> -	*len = page_off - VIRTIO_XDP_HEADROOM;
> -	return page;
> -err_buf:
> -	__free_pages(page, 0);
> -	return NULL;
> -}
> -
>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>  					 struct virtnet_info *vi,
>  					 struct receive_queue *rq,
> @@ -577,6 +601,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	struct sk_buff *head_skb, *curr_skb;
>  	struct bpf_prog *xdp_prog;
>  	unsigned int truesize;
> +	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>  
>  	head_skb = NULL;
>  
> @@ -589,10 +614,13 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		u32 act;
>  
>  		/* This happens when rx buffer size is underestimated */
> -		if (unlikely(num_buf > 1)) {
> +		if (unlikely(num_buf > 1 ||
> +			     headroom < virtnet_get_headroom(vi))) {
>  			/* linearize data for XDP */
>  			xdp_page = xdp_linearize_page(rq, &num_buf,
> -						      page, offset, &len);
> +						      page, offset,
> +						      VIRTIO_XDP_HEADROOM,
> +						      &len);
>  			if (!xdp_page)
>  				goto err_xdp;
>  			offset = VIRTIO_XDP_HEADROOM;
> @@ -830,7 +858,6 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>  	if (err < 0)
>  		put_page(virt_to_head_page(buf));
> -
>  	return err;
>  }
>  
> @@ -1834,7 +1861,6 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>  }
>  
>  static int init_vqs(struct virtnet_info *vi);
> -static void _remove_vq_common(struct virtnet_info *vi);
>  
>  static int virtnet_restore_up(struct virtio_device *vdev)
>  {
> @@ -1863,39 +1889,6 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>  	return err;
>  }
>  
> -static int virtnet_reset(struct virtnet_info *vi, int curr_qp, int xdp_qp)
> -{
> -	struct virtio_device *dev = vi->vdev;
> -	int ret;
> -
> -	virtio_config_disable(dev);
> -	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> -	virtnet_freeze_down(dev);
> -	_remove_vq_common(vi);
> -
> -	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> -	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> -
> -	ret = virtio_finalize_features(dev);
> -	if (ret)
> -		goto err;
> -
> -	vi->xdp_queue_pairs = xdp_qp;
> -	ret = virtnet_restore_up(dev);
> -	if (ret)
> -		goto err;
> -	ret = _virtnet_set_queues(vi, curr_qp);
> -	if (ret)
> -		goto err;
> -
> -	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -	virtio_config_enable(dev);
> -	return 0;
> -err:
> -	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> -	return ret;
> -}
> -
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  			   struct netlink_ext_ack *extack)
>  {
> @@ -1942,35 +1935,31 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  			return PTR_ERR(prog);
>  	}
>  
> -	/* Changing the headroom in buffers is a disruptive operation because
> -	 * existing buffers must be flushed and reallocated. This will happen
> -	 * when a xdp program is initially added or xdp is disabled by removing
> -	 * the xdp program resulting in number of XDP queues changing.
> +	/* synchronize with NAPI which may do XDP_TX based on queue
> +	 * pair numbers.
I think you mean
 	/* Make sure NAPI is not using any XDP TX queues for RX. */
is that it?
> -	if (vi->xdp_queue_pairs != xdp_qp) {
> -		err = virtnet_reset(vi, curr_qp + xdp_qp, xdp_qp);
> -		if (err) {
> -			dev_warn(&dev->dev, "XDP reset failure.\n");
> -			goto virtio_reset_err;
> -		}
> -	}
> +	for (i = 0; i < vi->max_queue_pairs; i++)
> +		napi_disable(&vi->rq[i].napi);
>  
This is pretty slow if queues are busy.  Should we avoid this for queues
which aren't effected?
>  	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
> +	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
> +	if (err)
> +		goto err;
> +	vi->xdp_queue_pairs = xdp_qp;
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>  		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
>  		if (old_prog)
>  			bpf_prog_put(old_prog);
> +		napi_enable(&vi->rq[i].napi);
This seems racy. See comment around virtnet_napi_enable.
>  	}
>  
>  	return 0;
>  
> -virtio_reset_err:
> -	/* On reset error do our best to unwind XDP changes inflight and return
> -	 * error up to user space for resolution. The underlying reset hung on
> -	 * us so not much we can do here.
> -	 */
> +err:
> +	for (i = 0; i < vi->max_queue_pairs; i++)
> +		napi_enable(&vi->rq[i].napi);
>  	if (prog)
>  		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
>  	return err;
> @@ -2614,15 +2603,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	return err;
>  }
>  
> -static void _remove_vq_common(struct virtnet_info *vi)
> -{
> -	vi->vdev->config->reset(vi->vdev);
> -	free_unused_bufs(vi);
> -	_free_receive_bufs(vi);
> -	free_receive_page_frags(vi);
> -	virtnet_del_vqs(vi);
> -}
> -
>  static void remove_vq_common(struct virtnet_info *vi)
>  {
>  	vi->vdev->config->reset(vi->vdev);
> -- 
> 2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on XDP set
  2017-07-17 12:44 ` [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on " Jason Wang
@ 2017-07-18 20:07   ` Michael S. Tsirkin
  2017-07-19  2:39     ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-18 20:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
On Mon, Jul 17, 2017 at 08:44:01PM +0800, Jason Wang wrote:
> Current XDP implementation want guest offloads feature to be disabled
s/want/wants/
> on qemu cli.
on the device.
> This is inconvenient and means guest can't benefit from
> offloads if XDP is not used. This patch tries to address this
> limitation by disable
disabling
> the offloads on demand through control guest
> offloads. Guest offloads will be disabled and enabled on demand on XDP
> set.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
In fact, since we no longer reset when XDP is set,
here device might have offloads enabled, buffers are
used but not consumed, then XDP is set.
This can result in
- packet scattered across multiple buffers
  (handled correctly but need to update the comment)
- packet may have VIRTIO_NET_HDR_F_NEEDS_CSUM, in that case
  the spec says "The checksum on the packet is incomplete".
  (probably needs to be handled by calculating the checksum).
Ideas for follow-up patches:
- skip looking at packet data completely
  won't work if you play with checksums dynamically
  but can be done if disabled on device
- allow ethtools to tweak offloads from userspace as well
> ---
>  drivers/net/virtio_net.c | 70 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e732bd6..d970c2d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -57,6 +57,11 @@ DECLARE_EWMA(pkt_len, 0, 64)
>  
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
> +const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_TSO4,
> +					 VIRTIO_NET_F_GUEST_TSO6,
> +					 VIRTIO_NET_F_GUEST_ECN,
> +					 VIRTIO_NET_F_GUEST_UFO };
> +
>  struct virtnet_stats {
>  	struct u64_stats_sync tx_syncp;
>  	struct u64_stats_sync rx_syncp;
> @@ -164,10 +169,13 @@ struct virtnet_info {
>  	u8 ctrl_promisc;
>  	u8 ctrl_allmulti;
>  	u16 ctrl_vid;
> +	u64 ctrl_offloads;
>  
>  	/* Ethtool settings */
>  	u8 duplex;
>  	u32 speed;
> +
> +	unsigned long guest_offloads;
>  };
>  
>  struct padded_vnet_hdr {
> @@ -1889,6 +1897,47 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
> +{
> +	struct scatterlist sg;
> +	vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
> +
> +	sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
> +
> +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> +				  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
> +		dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
> +{
> +	u64 offloads = 0;
> +
> +	if (!vi->guest_offloads)
> +		return 0;
> +
> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> +		offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> +
> +	return virtnet_set_guest_offloads(vi, offloads);
> +}
> +
> +static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> +{
> +	u64 offloads = vi->guest_offloads;
> +
> +	if (!vi->guest_offloads)
> +		return 0;
> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> +		offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> +
> +	return virtnet_set_guest_offloads(vi, offloads);
> +}
> +
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  			   struct netlink_ext_ack *extack)
>  {
> @@ -1898,10 +1947,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  	u16 xdp_qp = 0, curr_qp;
>  	int i, err;
>  
> -	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
> +	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> +	    && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> +	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> +	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> +		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO))) {
>  		NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO, disable LRO first");
>  		return -EOPNOTSUPP;
>  	}
> @@ -1950,6 +2000,12 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>  		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> +		if (i == 0) {
> +			if (!old_prog)
> +				virtnet_clear_guest_offloads(vi);
> +			if (!prog)
> +				virtnet_restore_guest_offloads(vi);
> +		}
>  		if (old_prog)
>  			bpf_prog_put(old_prog);
>  		napi_enable(&vi->rq[i].napi);
> @@ -2583,6 +2639,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		netif_carrier_on(dev);
>  	}
>  
> +	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
> +		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
> +			set_bit(guest_offloads[i], &vi->guest_offloads);
> +
>  	pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
>  		 dev->name, max_queue_pairs);
>  
> @@ -2679,7 +2739,7 @@ static struct virtio_device_id id_table[] = {
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> -	VIRTIO_NET_F_MTU
> +	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,
> -- 
> 2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 0/5] refine virtio-net XDP
  2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
                   ` (5 preceding siblings ...)
  2017-07-18 18:24 ` [PATCH net-next 0/5] refine virtio-net XDP David Miller
@ 2017-07-18 20:13 ` Michael S. Tsirkin
  2017-07-19  2:40   ` Jason Wang
  6 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-18 20:13 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, linux-kernel, netdev
On Mon, Jul 17, 2017 at 08:43:56PM +0800, Jason Wang wrote:
> Hi:
> 
> This series brings two optimizations for virtio-net XDP:
> 
> - avoid reset during XDP set
> - turn off offloads on demand
I'm glad to see this take shape - this can be
extended to optimize virtnet_get_headroom so we don't
waste room if adjust_head is enabled.
I see a couple of issues, responded to individual patches.
> Please review.
> 
> Thanks
> 
> Jason Wang (5):
>   virtio_ring: allow to store zero as the ctx
>   virtio-net: pack headroom into ctx for mergeable buffer
>   virtio-net: switch to use new ctx API for small buffer
>   virtio-net: do not reset during XDP set
>   virtio-net: switch off offloads on demand if possible on XDP set
> 
>  drivers/net/virtio_net.c     | 325 +++++++++++++++++++++++++------------------
>  drivers/virtio/virtio_ring.c |   2 +-
>  2 files changed, 194 insertions(+), 133 deletions(-)
> 
> -- 
> 2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/5] virtio-net: pack headroom into ctx for mergeable buffer
  2017-07-18 18:59   ` Michael S. Tsirkin
@ 2017-07-19  2:29     ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2017-07-19  2:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, linux-kernel, netdev
On 2017年07月19日 02:59, Michael S. Tsirkin wrote:
> On Mon, Jul 17, 2017 at 08:43:58PM +0800, Jason Wang wrote:
>> Pack headroom into ctx, then during XDP set, we could know the size of
>> headroom and copy if needed. This is required for avoiding reset on
>> XDP.
> Not really when XDP is set - it's when buffers are used.
Of course :)
>
> virtio-net: pack headroom into ctx for mergeable buffers
>
> Pack headroom into ctx - this way when we get a buffer we can figure out
> the actual headroom that was allocated for the buffer. Will be helpful
> to optimize switching between XDP and non-XDP modes which have different
> headroom requirements.
Thanks, let me use this as the commit log.
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 29 ++++++++++++++++++++++++-----
>>   1 file changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 1f8c15c..8fae9a8 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -270,6 +270,23 @@ static void skb_xmit_done(struct virtqueue *vq)
>>   		netif_wake_subqueue(vi->dev, vq2txq(vq));
>>   }
>>   
>> +#define MRG_CTX_HEADER_SHIFT 22
>> +static void *mergeable_len_to_ctx(unsigned int truesize,
>> +				  unsigned int headroom)
>> +{
>> +	return (void *)(unsigned long)((headroom << MRG_CTX_HEADER_SHIFT) | truesize);
>> +}
>> +
>> +static unsigned int mergeable_ctx_to_headroom(void *mrg_ctx)
>> +{
>> +	return (unsigned long)mrg_ctx >> MRG_CTX_HEADER_SHIFT;
>> +}
>> +
>> +static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
>> +{
>> +	return (unsigned long)mrg_ctx & ((1 << MRG_CTX_HEADER_SHIFT) - 1);
>> +}
>> +
>>   /* Called from bottom half context */
>>   static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>   				   struct receive_queue *rq,
>> @@ -639,13 +656,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   	}
>>   	rcu_read_unlock();
>>   
>> -	if (unlikely(len > (unsigned long)ctx)) {
>> +	truesize = mergeable_ctx_to_truesize(ctx);
>> +	if (unlikely(len > truesize)) {
>>   		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>>   			 dev->name, len, (unsigned long)ctx);
>>   		dev->stats.rx_length_errors++;
>>   		goto err_skb;
>>   	}
>> -	truesize = (unsigned long)ctx;
>> +
>>   	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
>>   	curr_skb = head_skb;
>>   
>> @@ -665,13 +683,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   		}
>>   
>>   		page = virt_to_head_page(buf);
>> -		if (unlikely(len > (unsigned long)ctx)) {
>> +
>> +		truesize = mergeable_ctx_to_truesize(ctx);
>> +		if (unlikely(len > truesize)) {
>>   			pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>>   				 dev->name, len, (unsigned long)ctx);
>>   			dev->stats.rx_length_errors++;
>>   			goto err_skb;
>>   		}
>> -		truesize = (unsigned long)ctx;
>>   
>>   		num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
>>   		if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
>> @@ -889,7 +908,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>   
>>   	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>>   	buf += headroom; /* advance address leaving hole at front of pkt */
>> -	ctx = (void *)(unsigned long)len;
>> +	ctx = mergeable_len_to_ctx(len, headroom);
>>   	get_page(alloc_frag->page);
>>   	alloc_frag->offset += len + headroom;
>>   	hole = alloc_frag->size - alloc_frag->offset;
>> -- 
>> 2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 3/5] virtio-net: switch to use new ctx API for small buffer
  2017-07-18 19:20   ` Michael S. Tsirkin
@ 2017-07-19  2:30     ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2017-07-19  2:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
On 2017年07月19日 03:20, Michael S. Tsirkin wrote:
> what's needed is ability to store the headroom there.
>
> virtio-net: switch to use ctx API for small buffers
>
> Use ctx API to store headroom for small buffers.
> Following patches will retrieve this info and use it for XDP.
>
> On Mon, Jul 17, 2017 at 08:43:59PM +0800, Jason Wang wrote:
>> Switch to use ctx API for small buffer, this is need for avoiding
>> reset on XDP.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 8fae9a8..e31b5b2 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -410,7 +410,8 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>>   static struct sk_buff *receive_small(struct net_device *dev,
>>   				     struct virtnet_info *vi,
>>   				     struct receive_queue *rq,
>> -				     void *buf, unsigned int len)
>> +				     void *buf, void *ctx,
>> +				     unsigned int len)
>>   {
>>   	struct sk_buff *skb;
>>   	struct bpf_prog *xdp_prog;
>> @@ -773,7 +774,7 @@ static int receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>>   	else if (vi->big_packets)
>>   		skb = receive_big(dev, vi, rq, buf, len);
>>   	else
>> -		skb = receive_small(dev, vi, rq, buf, len);
>> +		skb = receive_small(dev, vi, rq, buf, ctx, len);
>>   
>>   	if (unlikely(!skb))
>>   		return 0;
>> @@ -812,6 +813,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> Let's document that ctx API is used a bit differently here:
>
> /* Unlike mergeable buffers, all buffers are allocated to the same size,
>   * except for the headroom. For this reason we do not need to use
>   * mergeable_len_to_ctx here - it is enough to store the headroom as the
>   * context ignoring the truesize.
>   */
Ok.
Thanks
> as an alternative, reuse the same format as mergeable buffers.
>
>>   	struct page_frag *alloc_frag = &rq->alloc_frag;
>>   	char *buf;
>>   	unsigned int xdp_headroom = virtnet_get_headroom(vi);
>> +	void *ctx = (void *)(unsigned long)xdp_headroom;
>>   	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
>>   	int err;
>>   
>> @@ -825,7 +827,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>   	alloc_frag->offset += len;
>>   	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
>>   		    vi->hdr_len + GOOD_PACKET_LEN);
>> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, buf, gfp);
>> +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>>   	if (err < 0)
>>   		put_page(virt_to_head_page(buf));
>>   
>> @@ -1034,7 +1036,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget)
>>   	void *buf;
>>   	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>   
>> -	if (vi->mergeable_rx_bufs) {
>> +	if (!vi->big_packets || vi->mergeable_rx_bufs) {
>>   		void *ctx;
>>   
>>   		while (received < budget &&
>> @@ -2198,7 +2200,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
>>   	names = kmalloc(total_vqs * sizeof(*names), GFP_KERNEL);
>>   	if (!names)
>>   		goto err_names;
>> -	if (vi->mergeable_rx_bufs) {
>> +	if (!vi->big_packets || vi->mergeable_rx_bufs) {
>>   		ctx = kzalloc(total_vqs * sizeof(*ctx), GFP_KERNEL);
>>   		if (!ctx)
>>   			goto err_ctx;
>> -- 
>> 2.7.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 4/5] virtio-net: do not reset during XDP set
  2017-07-18 19:49   ` Michael S. Tsirkin
@ 2017-07-19  2:35     ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2017-07-19  2:35 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
On 2017年07月19日 03:49, Michael S. Tsirkin wrote:
> On Mon, Jul 17, 2017 at 08:44:00PM +0800, Jason Wang wrote:
>> We used to reset during XDP set, the main reason is we need allocate
>> extra headroom for header adjustment but there's no way to know the
>> headroom of exist receive buffer. This works buy maybe complex and may
>> cause the network down for a while which is bad for user
>> experience. So this patch tries to avoid this by:
>>
>> - packing headroom into receive buffer ctx
>> - check the headroom during XDP, and if it was not sufficient, copy
>>    the packet into a location which has a large enough headroom
> The packing is actually done by previous patches. Here is a
> corrected version:
>
> We currently reset the device during XDP set, the main reason is
> that we allocate more headroom with XDP (for header adjustment).
>
> This works but causes network downtime for users.
>
> Previous patches encoded the headroom in the buffer context,
> this makes it possible to detect the case where a buffer
> with headroom insufficient for XDP is added to the queue and
> XDP is enabled afterwards.
>
> Upon detection, we handle this case by copying the packet
> (slow, but it's a temporary condition).
Ok.
>
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 230 ++++++++++++++++++++++-------------------------
>>   1 file changed, 105 insertions(+), 125 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index e31b5b2..e732bd6 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -407,6 +407,67 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>>   	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
>>   }
>>   
>> +/* We copy and linearize packet in the following cases:
>> + *
>> + * 1) Packet across multiple buffers, this happens normally when rx
>> + *    buffer size is underestimated. Rarely, since spec does not
>> + *    forbid using more than one buffer even if a single buffer is
>> + *    sufficient for the packet, we should also deal with this case.
> Latest SVN of the spec actually forbids this. See:
>      net: clarify device rules for mergeable buffers
Good to know this.
>
>
>> + * 2) The header room is smaller than what XDP required. In this case
>> + *    we should copy the packet and reserve enough headroom for this.
>> + *    This would be slow but we at most we can copy times of queue
>> + *    size, this is acceptable. What's more important, this help to
>> + *    avoid resetting.
> Last part of the comment applies to both cases. So
>
> +/* We copy the packet for XDP in the following cases:
> + *
> + * 1) Packet is scattered across multiple rx buffers.
> + * 2) Headroom space is insufficient.
> + *
> + * This is inefficient but it's a temporary condition that
> + * we hit right after XDP is enabled and until queue is refilled
> + * with large buffers with sufficient headroom - so it should affect
> + * at most queue size packets.
>
> + * Afterwards, the conditions to enable
> + * XDP should preclude the underlying device from sending packets
> + * across multiple buffers (num_buf > 1), and we make sure buffers
> + * have enough headroom.
> + */
>
Ok.
>
>> + * 2) The header room is smaller than what XDP required. In this case
>> + *    we should copy the packet and reserve enough headroom for this.
>> + *    This would be slow but we at most we can copy times of queue
>> + *    size, this is acceptable. What's more important, this help to
>> + *    avoid resetting.
>
>
>> + */
>> +static struct page *xdp_linearize_page(struct receive_queue *rq,
>> +				       u16 *num_buf,
>> +				       struct page *p,
>> +				       int offset,
>> +				       int page_off,
>> +				       unsigned int *len)
>> +{
>> +	struct page *page = alloc_page(GFP_ATOMIC);
>> +
>> +	if (!page)
>> +		return NULL;
>> +
>> +	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
>> +	page_off += *len;
>> +
>> +	while (--*num_buf) {
>> +		unsigned int buflen;
>> +		void *buf;
>> +		int off;
>> +
>> +		buf = virtqueue_get_buf(rq->vq, &buflen);
>> +		if (unlikely(!buf))
>> +			goto err_buf;
>> +
>> +		p = virt_to_head_page(buf);
>> +		off = buf - page_address(p);
>> +
>> +		/* guard against a misconfigured or uncooperative backend that
>> +		 * is sending packet larger than the MTU.
>> +		 */
>> +		if ((page_off + buflen) > PAGE_SIZE) {
>> +			put_page(p);
>> +			goto err_buf;
>> +		}
>> +
>> +		memcpy(page_address(page) + page_off,
>> +		       page_address(p) + off, buflen);
>> +		page_off += buflen;
>> +		put_page(p);
>> +	}
>> +
>> +	/* Headroom does not contribute to packet length */
>> +	*len = page_off - VIRTIO_XDP_HEADROOM;
>> +	return page;
>> +err_buf:
>> +	__free_pages(page, 0);
>> +	return NULL;
>> +}
>> +
>>   static struct sk_buff *receive_small(struct net_device *dev,
>>   				     struct virtnet_info *vi,
>>   				     struct receive_queue *rq,
>> @@ -415,12 +476,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>   {
>>   	struct sk_buff *skb;
>>   	struct bpf_prog *xdp_prog;
>> -	unsigned int xdp_headroom = virtnet_get_headroom(vi);
>> +	unsigned int xdp_headroom = (unsigned long)ctx;
>>   	unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
>>   	unsigned int headroom = vi->hdr_len + header_offset;
>>   	unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
>>   			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +	struct page *page = virt_to_head_page(buf);
>>   	unsigned int delta = 0;
>> +	struct page *xdp_page;
>>   	len -= vi->hdr_len;
>>   
>>   	rcu_read_lock();
>> @@ -434,6 +497,27 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>   		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>>   			goto err_xdp;
>>   
>> +		if (unlikely(xdp_headroom != virtnet_get_headroom(vi))) {
> Should this be xdp_headroom < virtnet_get_headroom(vi)?
> Just in case we add more modes down the road.
Yes, this looks better.
>
>
>> +			int offset = buf - page_address(page) + header_offset;
>> +			unsigned int tlen = len + vi->hdr_len;
>> +			u16 num_buf = 1;
>> +
>> +			xdp_headroom = virtnet_get_headroom(vi);
>> +			header_offset = VIRTNET_RX_PAD + xdp_headroom;
>> +			headroom = vi->hdr_len + header_offset;
>> +			buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
>> +				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +			xdp_page = xdp_linearize_page(rq, &num_buf, page,
>> +						      offset, header_offset,
>> +						      &tlen);
>> +			if (!xdp_page)
>> +				goto err_xdp;
>> +
>> +			buf = page_address(xdp_page);
>> +			put_page(page);
>> +			page = xdp_page;
>> +		}
>> +
>>   		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
>>   		xdp.data = xdp.data_hard_start + xdp_headroom;
>>   		xdp.data_end = xdp.data + len;
>> @@ -462,7 +546,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>   
>>   	skb = build_skb(buf, buflen);
>>   	if (!skb) {
>> -		put_page(virt_to_head_page(buf));
>> +		put_page(page);
>>   		goto err;
>>   	}
>>   	skb_reserve(skb, headroom - delta);
>> @@ -478,7 +562,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>   err_xdp:
>>   	rcu_read_unlock();
>>   	dev->stats.rx_dropped++;
>> -	put_page(virt_to_head_page(buf));
>> +	put_page(page);
>>   xdp_xmit:
>>   	return NULL;
>>   }
>> @@ -503,66 +587,6 @@ static struct sk_buff *receive_big(struct net_device *dev,
>>   	return NULL;
>>   }
>>   
>> -/* The conditions to enable XDP should preclude the underlying device from
>> - * sending packets across multiple buffers (num_buf > 1). However per spec
>> - * it does not appear to be illegal to do so but rather just against convention.
>> - * So in order to avoid making a system unresponsive the packets are pushed
>> - * into a page and the XDP program is run. This will be extremely slow and we
>> - * push a warning to the user to fix this as soon as possible. Fixing this may
>> - * require resolving the underlying hardware to determine why multiple buffers
>> - * are being received or simply loading the XDP program in the ingress stack
>> - * after the skb is built because there is no advantage to running it here
>> - * anymore.
>> - */
>> -static struct page *xdp_linearize_page(struct receive_queue *rq,
>> -				       u16 *num_buf,
>> -				       struct page *p,
>> -				       int offset,
>> -				       unsigned int *len)
>> -{
>> -	struct page *page = alloc_page(GFP_ATOMIC);
>> -	unsigned int page_off = VIRTIO_XDP_HEADROOM;
>> -
>> -	if (!page)
>> -		return NULL;
>> -
>> -	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
>> -	page_off += *len;
>> -
>> -	while (--*num_buf) {
>> -		unsigned int buflen;
>> -		void *buf;
>> -		int off;
>> -
>> -		buf = virtqueue_get_buf(rq->vq, &buflen);
>> -		if (unlikely(!buf))
>> -			goto err_buf;
>> -
>> -		p = virt_to_head_page(buf);
>> -		off = buf - page_address(p);
>> -
>> -		/* guard against a misconfigured or uncooperative backend that
>> -		 * is sending packet larger than the MTU.
>> -		 */
>> -		if ((page_off + buflen) > PAGE_SIZE) {
>> -			put_page(p);
>> -			goto err_buf;
>> -		}
>> -
>> -		memcpy(page_address(page) + page_off,
>> -		       page_address(p) + off, buflen);
>> -		page_off += buflen;
>> -		put_page(p);
>> -	}
>> -
>> -	/* Headroom does not contribute to packet length */
>> -	*len = page_off - VIRTIO_XDP_HEADROOM;
>> -	return page;
>> -err_buf:
>> -	__free_pages(page, 0);
>> -	return NULL;
>> -}
>> -
>>   static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   					 struct virtnet_info *vi,
>>   					 struct receive_queue *rq,
>> @@ -577,6 +601,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   	struct sk_buff *head_skb, *curr_skb;
>>   	struct bpf_prog *xdp_prog;
>>   	unsigned int truesize;
>> +	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>   
>>   	head_skb = NULL;
>>   
>> @@ -589,10 +614,13 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   		u32 act;
>>   
>>   		/* This happens when rx buffer size is underestimated */
>> -		if (unlikely(num_buf > 1)) {
>> +		if (unlikely(num_buf > 1 ||
>> +			     headroom < virtnet_get_headroom(vi))) {
>>   			/* linearize data for XDP */
>>   			xdp_page = xdp_linearize_page(rq, &num_buf,
>> -						      page, offset, &len);
>> +						      page, offset,
>> +						      VIRTIO_XDP_HEADROOM,
>> +						      &len);
>>   			if (!xdp_page)
>>   				goto err_xdp;
>>   			offset = VIRTIO_XDP_HEADROOM;
>> @@ -830,7 +858,6 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>   	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>>   	if (err < 0)
>>   		put_page(virt_to_head_page(buf));
>> -
>>   	return err;
>>   }
>>   
>> @@ -1834,7 +1861,6 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>>   }
>>   
>>   static int init_vqs(struct virtnet_info *vi);
>> -static void _remove_vq_common(struct virtnet_info *vi);
>>   
>>   static int virtnet_restore_up(struct virtio_device *vdev)
>>   {
>> @@ -1863,39 +1889,6 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>>   	return err;
>>   }
>>   
>> -static int virtnet_reset(struct virtnet_info *vi, int curr_qp, int xdp_qp)
>> -{
>> -	struct virtio_device *dev = vi->vdev;
>> -	int ret;
>> -
>> -	virtio_config_disable(dev);
>> -	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
>> -	virtnet_freeze_down(dev);
>> -	_remove_vq_common(vi);
>> -
>> -	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>> -	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>> -
>> -	ret = virtio_finalize_features(dev);
>> -	if (ret)
>> -		goto err;
>> -
>> -	vi->xdp_queue_pairs = xdp_qp;
>> -	ret = virtnet_restore_up(dev);
>> -	if (ret)
>> -		goto err;
>> -	ret = _virtnet_set_queues(vi, curr_qp);
>> -	if (ret)
>> -		goto err;
>> -
>> -	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>> -	virtio_config_enable(dev);
>> -	return 0;
>> -err:
>> -	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>> -	return ret;
>> -}
>> -
>>   static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>   			   struct netlink_ext_ack *extack)
>>   {
>> @@ -1942,35 +1935,31 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>   			return PTR_ERR(prog);
>>   	}
>>   
>> -	/* Changing the headroom in buffers is a disruptive operation because
>> -	 * existing buffers must be flushed and reallocated. This will happen
>> -	 * when a xdp program is initially added or xdp is disabled by removing
>> -	 * the xdp program resulting in number of XDP queues changing.
>> +	/* synchronize with NAPI which may do XDP_TX based on queue
>> +	 * pair numbers.
> I think you mean
>
>   	/* Make sure NAPI is not using any XDP TX queues for RX. */
>
> is that it?
Yes.
>
>> -	if (vi->xdp_queue_pairs != xdp_qp) {
>> -		err = virtnet_reset(vi, curr_qp + xdp_qp, xdp_qp);
>> -		if (err) {
>> -			dev_warn(&dev->dev, "XDP reset failure.\n");
>> -			goto virtio_reset_err;
>> -		}
>> -	}
>> +	for (i = 0; i < vi->max_queue_pairs; i++)
>> +		napi_disable(&vi->rq[i].napi);
>>   
> This is pretty slow if queues are busy.  Should we avoid this for queues
> which aren't effected?
The problem is we attach xdp prog to all RX queues.
>
>>   	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
>> +	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
>> +	if (err)
>> +		goto err;
>> +	vi->xdp_queue_pairs = xdp_qp;
>>   
>>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>>   		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>>   		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
>>   		if (old_prog)
>>   			bpf_prog_put(old_prog);
>> +		napi_enable(&vi->rq[i].napi);
> This seems racy. See comment around virtnet_napi_enable.
Right, will use virtnet_napi_enable() instead.
Thanks
>
>>   	}
>>   
>>   	return 0;
>>   
>> -virtio_reset_err:
>> -	/* On reset error do our best to unwind XDP changes inflight and return
>> -	 * error up to user space for resolution. The underlying reset hung on
>> -	 * us so not much we can do here.
>> -	 */
>> +err:
>> +	for (i = 0; i < vi->max_queue_pairs; i++)
>> +		napi_enable(&vi->rq[i].napi);
>>   	if (prog)
>>   		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
>>   	return err;
>> @@ -2614,15 +2603,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>>   	return err;
>>   }
>>   
>> -static void _remove_vq_common(struct virtnet_info *vi)
>> -{
>> -	vi->vdev->config->reset(vi->vdev);
>> -	free_unused_bufs(vi);
>> -	_free_receive_bufs(vi);
>> -	free_receive_page_frags(vi);
>> -	virtnet_del_vqs(vi);
>> -}
>> -
>>   static void remove_vq_common(struct virtnet_info *vi)
>>   {
>>   	vi->vdev->config->reset(vi->vdev);
>> -- 
>> 2.7.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on XDP set
  2017-07-18 20:07   ` Michael S. Tsirkin
@ 2017-07-19  2:39     ` Jason Wang
  2017-07-24 21:36       ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2017-07-19  2:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, linux-kernel, netdev
On 2017年07月19日 04:07, Michael S. Tsirkin wrote:
> On Mon, Jul 17, 2017 at 08:44:01PM +0800, Jason Wang wrote:
>> Current XDP implementation want guest offloads feature to be disabled
> s/want/wants/
>
>> on qemu cli.
> on the device.
>
>> This is inconvenient and means guest can't benefit from
>> offloads if XDP is not used. This patch tries to address this
>> limitation by disable
> disabling
>
>> the offloads on demand through control guest
>> offloads. Guest offloads will be disabled and enabled on demand on XDP
>> set.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> In fact, since we no longer reset when XDP is set,
> here device might have offloads enabled, buffers are
> used but not consumed, then XDP is set.
>
> This can result in
> - packet scattered across multiple buffers
>    (handled correctly but need to update the comment)
Ok.
> - packet may have VIRTIO_NET_HDR_F_NEEDS_CSUM, in that case
>    the spec says "The checksum on the packet is incomplete".
>    (probably needs to be handled by calculating the checksum).
That's an option. Maybe it's tricky but I was thinking whether or not we 
can just keep the CHECKSUM_PARTIAL here.
>
>
> Ideas for follow-up patches:
>
> - skip looking at packet data completely
>    won't work if you play with checksums dynamically
>    but can be done if disabled on device
> - allow ethtools to tweak offloads from userspace as well
Right.
Thanks
>
>> ---
>>   drivers/net/virtio_net.c | 70 ++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 65 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index e732bd6..d970c2d 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -57,6 +57,11 @@ DECLARE_EWMA(pkt_len, 0, 64)
>>   
>>   #define VIRTNET_DRIVER_VERSION "1.0.0"
>>   
>> +const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_TSO4,
>> +					 VIRTIO_NET_F_GUEST_TSO6,
>> +					 VIRTIO_NET_F_GUEST_ECN,
>> +					 VIRTIO_NET_F_GUEST_UFO };
>> +
>>   struct virtnet_stats {
>>   	struct u64_stats_sync tx_syncp;
>>   	struct u64_stats_sync rx_syncp;
>> @@ -164,10 +169,13 @@ struct virtnet_info {
>>   	u8 ctrl_promisc;
>>   	u8 ctrl_allmulti;
>>   	u16 ctrl_vid;
>> +	u64 ctrl_offloads;
>>   
>>   	/* Ethtool settings */
>>   	u8 duplex;
>>   	u32 speed;
>> +
>> +	unsigned long guest_offloads;
>>   };
>>   
>>   struct padded_vnet_hdr {
>> @@ -1889,6 +1897,47 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>>   	return err;
>>   }
>>   
>> +static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
>> +{
>> +	struct scatterlist sg;
>> +	vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
>> +
>> +	sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
>> +
>> +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
>> +				  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
>> +		dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
>> +{
>> +	u64 offloads = 0;
>> +
>> +	if (!vi->guest_offloads)
>> +		return 0;
>> +
>> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
>> +		offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
>> +
>> +	return virtnet_set_guest_offloads(vi, offloads);
>> +}
>> +
>> +static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
>> +{
>> +	u64 offloads = vi->guest_offloads;
>> +
>> +	if (!vi->guest_offloads)
>> +		return 0;
>> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
>> +		offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
>> +
>> +	return virtnet_set_guest_offloads(vi, offloads);
>> +}
>> +
>>   static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>   			   struct netlink_ext_ack *extack)
>>   {
>> @@ -1898,10 +1947,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>   	u16 xdp_qp = 0, curr_qp;
>>   	int i, err;
>>   
>> -	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>> -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>> -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
>> +	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
>> +	    && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>> +	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>> +	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> +		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO))) {
>>   		NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO, disable LRO first");
>>   		return -EOPNOTSUPP;
>>   	}
>> @@ -1950,6 +2000,12 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>>   		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>>   		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
>> +		if (i == 0) {
>> +			if (!old_prog)
>> +				virtnet_clear_guest_offloads(vi);
>> +			if (!prog)
>> +				virtnet_restore_guest_offloads(vi);
>> +		}
>>   		if (old_prog)
>>   			bpf_prog_put(old_prog);
>>   		napi_enable(&vi->rq[i].napi);
>> @@ -2583,6 +2639,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>>   		netif_carrier_on(dev);
>>   	}
>>   
>> +	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
>> +		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
>> +			set_bit(guest_offloads[i], &vi->guest_offloads);
>> +
>>   	pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
>>   		 dev->name, max_queue_pairs);
>>   
>> @@ -2679,7 +2739,7 @@ static struct virtio_device_id id_table[] = {
>>   	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>>   	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>>   	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>> -	VIRTIO_NET_F_MTU
>> +	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>>   
>>   static unsigned int features[] = {
>>   	VIRTNET_FEATURES,
>> -- 
>> 2.7.4
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 0/5] refine virtio-net XDP
  2017-07-18 20:13 ` Michael S. Tsirkin
@ 2017-07-19  2:40   ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2017-07-19  2:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
On 2017年07月19日 04:13, Michael S. Tsirkin wrote:
> On Mon, Jul 17, 2017 at 08:43:56PM +0800, Jason Wang wrote:
>> Hi:
>>
>> This series brings two optimizations for virtio-net XDP:
>>
>> - avoid reset during XDP set
>> - turn off offloads on demand
> I'm glad to see this take shape - this can be
> extended to optimize virtnet_get_headroom so we don't
> waste room if adjust_head is enabled.
Right, we can do it on top.
> I see a couple of issues, responded to individual patches.
>
>
Thanks for the reviewing.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on XDP set
  2017-07-19  2:39     ` Jason Wang
@ 2017-07-24 21:36       ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-24 21:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
On Wed, Jul 19, 2017 at 10:39:53AM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月19日 04:07, Michael S. Tsirkin wrote:
> > On Mon, Jul 17, 2017 at 08:44:01PM +0800, Jason Wang wrote:
> > > Current XDP implementation want guest offloads feature to be disabled
> > s/want/wants/
> > 
> > > on qemu cli.
> > on the device.
> > 
> > > This is inconvenient and means guest can't benefit from
> > > offloads if XDP is not used. This patch tries to address this
> > > limitation by disable
> > disabling
> > 
> > > the offloads on demand through control guest
> > > offloads. Guest offloads will be disabled and enabled on demand on XDP
> > > set.
> > > 
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > In fact, since we no longer reset when XDP is set,
> > here device might have offloads enabled, buffers are
> > used but not consumed, then XDP is set.
> > 
> > This can result in
> > - packet scattered across multiple buffers
> >    (handled correctly but need to update the comment)
> 
> Ok.
> 
> > - packet may have VIRTIO_NET_HDR_F_NEEDS_CSUM, in that case
> >    the spec says "The checksum on the packet is incomplete".
> >    (probably needs to be handled by calculating the checksum).
> 
> That's an option. Maybe it's tricky but I was thinking whether or not we can
> just keep the CHECKSUM_PARTIAL here.
XDP programs do not expect this currently. As it's a temporary
condition, let's just fix it up.
> > 
> > 
> > Ideas for follow-up patches:
> > 
> > - skip looking at packet data completely
> >    won't work if you play with checksums dynamically
> >    but can be done if disabled on device
> > - allow ethtools to tweak offloads from userspace as well
> 
> Right.
> 
> Thanks
> 
> > 
> > > ---
> > >   drivers/net/virtio_net.c | 70 ++++++++++++++++++++++++++++++++++++++++++++----
> > >   1 file changed, 65 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index e732bd6..d970c2d 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -57,6 +57,11 @@ DECLARE_EWMA(pkt_len, 0, 64)
> > >   #define VIRTNET_DRIVER_VERSION "1.0.0"
> > > +const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_TSO4,
> > > +					 VIRTIO_NET_F_GUEST_TSO6,
> > > +					 VIRTIO_NET_F_GUEST_ECN,
> > > +					 VIRTIO_NET_F_GUEST_UFO };
> > > +
> > >   struct virtnet_stats {
> > >   	struct u64_stats_sync tx_syncp;
> > >   	struct u64_stats_sync rx_syncp;
> > > @@ -164,10 +169,13 @@ struct virtnet_info {
> > >   	u8 ctrl_promisc;
> > >   	u8 ctrl_allmulti;
> > >   	u16 ctrl_vid;
> > > +	u64 ctrl_offloads;
> > >   	/* Ethtool settings */
> > >   	u8 duplex;
> > >   	u32 speed;
> > > +
> > > +	unsigned long guest_offloads;
> > >   };
> > >   struct padded_vnet_hdr {
> > > @@ -1889,6 +1897,47 @@ static int virtnet_restore_up(struct virtio_device *vdev)
> > >   	return err;
> > >   }
> > > +static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
> > > +{
> > > +	struct scatterlist sg;
> > > +	vi->ctrl_offloads = cpu_to_virtio64(vi->vdev, offloads);
> > > +
> > > +	sg_init_one(&sg, &vi->ctrl_offloads, sizeof(vi->ctrl_offloads));
> > > +
> > > +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> > > +				  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, &sg)) {
> > > +		dev_warn(&vi->dev->dev, "Fail to set guest offload. \n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
> > > +{
> > > +	u64 offloads = 0;
> > > +
> > > +	if (!vi->guest_offloads)
> > > +		return 0;
> > > +
> > > +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > +		offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> > > +
> > > +	return virtnet_set_guest_offloads(vi, offloads);
> > > +}
> > > +
> > > +static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> > > +{
> > > +	u64 offloads = vi->guest_offloads;
> > > +
> > > +	if (!vi->guest_offloads)
> > > +		return 0;
> > > +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > +		offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> > > +
> > > +	return virtnet_set_guest_offloads(vi, offloads);
> > > +}
> > > +
> > >   static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> > >   			   struct netlink_ext_ack *extack)
> > >   {
> > > @@ -1898,10 +1947,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> > >   	u16 xdp_qp = 0, curr_qp;
> > >   	int i, err;
> > > -	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > > -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > > +	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> > > +	    && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > +	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > > +	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > +		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO))) {
> > >   		NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO, disable LRO first");
> > >   		return -EOPNOTSUPP;
> > >   	}
> > > @@ -1950,6 +2000,12 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> > >   	for (i = 0; i < vi->max_queue_pairs; i++) {
> > >   		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
> > >   		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> > > +		if (i == 0) {
> > > +			if (!old_prog)
> > > +				virtnet_clear_guest_offloads(vi);
> > > +			if (!prog)
> > > +				virtnet_restore_guest_offloads(vi);
> > > +		}
> > >   		if (old_prog)
> > >   			bpf_prog_put(old_prog);
> > >   		napi_enable(&vi->rq[i].napi);
> > > @@ -2583,6 +2639,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >   		netif_carrier_on(dev);
> > >   	}
> > > +	for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
> > > +		if (virtio_has_feature(vi->vdev, guest_offloads[i]))
> > > +			set_bit(guest_offloads[i], &vi->guest_offloads);
> > > +
> > >   	pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
> > >   		 dev->name, max_queue_pairs);
> > > @@ -2679,7 +2739,7 @@ static struct virtio_device_id id_table[] = {
> > >   	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
> > >   	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> > >   	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> > > -	VIRTIO_NET_F_MTU
> > > +	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > >   static unsigned int features[] = {
> > >   	VIRTNET_FEATURES,
> > > -- 
> > > 2.7.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply	[flat|nested] 19+ messages in thread
end of thread, other threads:[~2017-07-24 21:36 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-17 12:43 [PATCH net-next 0/5] refine virtio-net XDP Jason Wang
2017-07-17 12:43 ` [PATCH net-next 1/5] virtio_ring: allow to store zero as the ctx Jason Wang
2017-07-17 12:43 ` [PATCH net-next 2/5] virtio-net: pack headroom into ctx for mergeable buffer Jason Wang
2017-07-18 18:59   ` Michael S. Tsirkin
2017-07-19  2:29     ` Jason Wang
2017-07-17 12:43 ` [PATCH net-next 3/5] virtio-net: switch to use new ctx API for small buffer Jason Wang
2017-07-18 19:20   ` Michael S. Tsirkin
2017-07-19  2:30     ` Jason Wang
2017-07-17 12:44 ` [PATCH net-next 4/5] virtio-net: do not reset during XDP set Jason Wang
2017-07-18 19:49   ` Michael S. Tsirkin
2017-07-19  2:35     ` Jason Wang
2017-07-17 12:44 ` [PATCH net-next 5/5] virtio-net: switch off offloads on demand if possible on " Jason Wang
2017-07-18 20:07   ` Michael S. Tsirkin
2017-07-19  2:39     ` Jason Wang
2017-07-24 21:36       ` Michael S. Tsirkin
2017-07-18 18:24 ` [PATCH net-next 0/5] refine virtio-net XDP David Miller
2017-07-18 18:47   ` Michael S. Tsirkin
2017-07-18 20:13 ` Michael S. Tsirkin
2017-07-19  2:40   ` Jason Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).