* [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy
@ 2024-06-18 7:56 Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 01/10] virtio_net: separate virtnet_rx_resize() Xuan Zhuo
` (9 more replies)
0 siblings, 10 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
v6:
1. start from supporting the rx zerocopy
v5:
1. fix the comments of last version
http://lore.kernel.org/all/20240611114147.31320-1-xuanzhuo@linux.alibaba.com
v4:
1. remove the commits that introduce the independent directory
2. remove the supporting for the rx merge mode (for limit 15
commits of net-next). Let's start with the small mode.
3. merge some commits and remove some not important commits
## AF_XDP
XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support
this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.
At present, we have completed some preparation:
1. vq-reset (virtio spec and kernel code)
2. virtio-core premapped dma
3. virtio-net xdp refactor
So it is time for Virtio-Net to complete the support for the XDP Socket
Zerocopy.
Virtio-net can not increase the queue num at will, so xsk shares the queue with
kernel.
On the other hand, Virtio-Net does not support generate interrupt from driver
manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
is also the local CPU, then we wake up napi directly.
This patch set includes some refactor to the virtio-net to let that to support
AF_XDP.
## performance
ENV: Qemu with vhost-user(polling mode).
Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
### virtio PMD in guest with testpmd
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584
RX-errors: 0
RX-nombuf: 0
TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664
Throughput (since last show)
Rx-pps: 8861574 Rx-bps: 3969985208
Tx-pps: 8861493 Tx-bps: 3969962736
############################################################################
### AF_XDP PMD in guest with testpmd
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712
RX-errors: 0
RX-nombuf: 0
TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152
Throughput (since last show)
Rx-pps: 6333196 Rx-bps: 2837272088
Tx-pps: 6333227 Tx-bps: 2837285936
############################################################################
But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
## maintain
I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
virtio-net.
Please review.
Thanks.
v3
1. virtio introduces helpers for virtio-net sq using premapped dma
2. xsk has more complete support for merge mode
3. fix some problems
v2
1. wakeup uses the way of GVE. No send ipi to wakeup napi on remote cpu.
2. remove rcu. Because we synchronize all operat, so the rcu is not needed.
3. split the commit "move to virtio_net.h" in last patch set. Just move the
struct/api to header when we use them.
4. add comments for some code
v1:
1. remove two virtio commits. Push this patchset to net-next
2. squash "virtio_net: virtnet_poll_tx support rescheduled" to xsk: support tx
3. fix some warnings
Xuan Zhuo (10):
virtio_net: separate virtnet_rx_resize()
virtio_net: separate virtnet_tx_resize()
virtio_net: separate receive_buf
virtio_net: separate receive_mergeable
virtio_net: xsk: bind/unbind xsk for rx
virtio_net: xsk: support wakeup
virtio_net: xsk: rx: support fill with xsk buffer
virtio_net: xsk: rx: support recv small mode
virtio_net: xsk: rx: support recv merge mode
virtio_net: xsk: rx: free the unused xsk buffer
drivers/net/virtio_net.c | 699 +++++++++++++++++++++++++++++++++++----
1 file changed, 628 insertions(+), 71 deletions(-)
--
2.32.0.3.g01195cf9f
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH net-next v6 01/10] virtio_net: separate virtnet_rx_resize()
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 02/10] virtio_net: separate virtnet_tx_resize() Xuan Zhuo
` (8 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This patch separates two sub-functions from virtnet_rx_resize():
* virtnet_rx_pause
* virtnet_rx_resume
Then the subsequent reset rx for xsk can share these two functions.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 61a57d134544..8ad158bbf188 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2609,28 +2609,41 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
-static int virtnet_rx_resize(struct virtnet_info *vi,
- struct receive_queue *rq, u32 ring_num)
+static void virtnet_rx_pause(struct virtnet_info *vi, struct receive_queue *rq)
{
bool running = netif_running(vi->dev);
- int err, qindex;
-
- qindex = rq - vi->rq;
if (running) {
napi_disable(&rq->napi);
cancel_work_sync(&rq->dim.work);
}
+}
- err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
- if (err)
- netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
+static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queue *rq)
+{
+ bool running = netif_running(vi->dev);
if (!try_fill_recv(vi, rq, GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
if (running)
virtnet_napi_enable(rq->vq, &rq->napi);
+}
+
+static int virtnet_rx_resize(struct virtnet_info *vi,
+ struct receive_queue *rq, u32 ring_num)
+{
+ int err, qindex;
+
+ qindex = rq - vi->rq;
+
+ virtnet_rx_pause(vi, rq);
+
+ err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
+ if (err)
+ netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
+
+ virtnet_rx_resume(vi, rq);
return err;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 02/10] virtio_net: separate virtnet_tx_resize()
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 01/10] virtio_net: separate virtnet_rx_resize() Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 03/10] virtio_net: separate receive_buf Xuan Zhuo
` (7 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This patch separates two sub-functions from virtnet_tx_resize():
* virtnet_tx_pause
* virtnet_tx_resume
Then the subsequent virtnet_tx_reset() can share these two functions.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8ad158bbf188..1ac5f472d4ef 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2647,12 +2647,11 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
return err;
}
-static int virtnet_tx_resize(struct virtnet_info *vi,
- struct send_queue *sq, u32 ring_num)
+static void virtnet_tx_pause(struct virtnet_info *vi, struct send_queue *sq)
{
bool running = netif_running(vi->dev);
struct netdev_queue *txq;
- int err, qindex;
+ int qindex;
qindex = sq - vi->sq;
@@ -2673,10 +2672,17 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
netif_stop_subqueue(vi->dev, qindex);
__netif_tx_unlock_bh(txq);
+}
- err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
- if (err)
- netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
+static void virtnet_tx_resume(struct virtnet_info *vi, struct send_queue *sq)
+{
+ bool running = netif_running(vi->dev);
+ struct netdev_queue *txq;
+ int qindex;
+
+ qindex = sq - vi->sq;
+
+ txq = netdev_get_tx_queue(vi->dev, qindex);
__netif_tx_lock_bh(txq);
sq->reset = false;
@@ -2685,6 +2691,23 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
if (running)
virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
+}
+
+static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq,
+ u32 ring_num)
+{
+ int qindex, err;
+
+ qindex = sq - vi->sq;
+
+ virtnet_tx_pause(vi, sq);
+
+ err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+ if (err)
+ netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
+
+ virtnet_tx_resume(vi, sq);
+
return err;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 03/10] virtio_net: separate receive_buf
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 01/10] virtio_net: separate virtnet_rx_resize() Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 02/10] virtio_net: separate virtnet_tx_resize() Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 04/10] virtio_net: separate receive_mergeable Xuan Zhuo
` (6 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This commit separates the function receive_buf(), then we wrap the logic
of handling the skb to an independent function virtnet_receive_done().
The subsequent commit will reuse it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 56 +++++++++++++++++++++++-----------------
1 file changed, 32 insertions(+), 24 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1ac5f472d4ef..59fcaeb6ea81 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1935,32 +1935,11 @@ static void virtio_skb_set_hash(const struct virtio_net_hdr_v1_hash *hdr_hash,
skb_set_hash(skb, __le32_to_cpu(hdr_hash->hash_value), rss_hash_type);
}
-static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
- void *buf, unsigned int len, void **ctx,
- unsigned int *xdp_xmit,
- struct virtnet_rq_stats *stats)
+static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
+ struct sk_buff *skb)
{
- struct net_device *dev = vi->dev;
- struct sk_buff *skb;
struct virtio_net_common_hdr *hdr;
-
- if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
- pr_debug("%s: short packet %i\n", dev->name, len);
- DEV_STATS_INC(dev, rx_length_errors);
- virtnet_rq_free_buf(vi, rq, buf);
- return;
- }
-
- if (vi->mergeable_rx_bufs)
- skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
- stats);
- else if (vi->big_packets)
- skb = receive_big(dev, vi, rq, buf, len, stats);
- else
- skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
-
- if (unlikely(!skb))
- return;
+ struct net_device *dev = vi->dev;
hdr = skb_vnet_common_hdr(skb);
if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
@@ -1990,6 +1969,35 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
dev_kfree_skb(skb);
}
+static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
+ void *buf, unsigned int len, void **ctx,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ struct net_device *dev = vi->dev;
+ struct sk_buff *skb;
+
+ if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
+ pr_debug("%s: short packet %i\n", dev->name, len);
+ DEV_STATS_INC(dev, rx_length_errors);
+ virtnet_rq_free_buf(vi, rq, buf);
+ return;
+ }
+
+ if (vi->mergeable_rx_bufs)
+ skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
+ stats);
+ else if (vi->big_packets)
+ skb = receive_big(dev, vi, rq, buf, len, stats);
+ else
+ skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
+
+ if (unlikely(!skb))
+ return;
+
+ virtnet_receive_done(vi, rq, skb);
+}
+
/* Unlike mergeable buffers, all buffers are allocated to the
* same size, except for the headroom. For this reason we do
* not need to use mergeable_len_to_ctx here - it is enough
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 04/10] virtio_net: separate receive_mergeable
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (2 preceding siblings ...)
2024-06-18 7:56 ` [PATCH net-next v6 03/10] virtio_net: separate receive_buf Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 05/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
` (5 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This commit separates the function receive_mergeable(),
put the logic of appending frag to the skb as an independent function.
The subsequent commit will reuse it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 77 ++++++++++++++++++++++++----------------
1 file changed, 47 insertions(+), 30 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 59fcaeb6ea81..df885cdbe658 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1788,6 +1788,49 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
return NULL;
}
+static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
+ struct sk_buff *curr_skb,
+ struct page *page, void *buf,
+ int len, int truesize)
+{
+ int num_skb_frags;
+ int offset;
+
+ num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
+ if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
+ struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
+
+ if (unlikely(!nskb))
+ return NULL;
+
+ if (curr_skb == head_skb)
+ skb_shinfo(curr_skb)->frag_list = nskb;
+ else
+ curr_skb->next = nskb;
+ curr_skb = nskb;
+ head_skb->truesize += nskb->truesize;
+ num_skb_frags = 0;
+ }
+
+ if (curr_skb != head_skb) {
+ head_skb->data_len += len;
+ head_skb->len += len;
+ head_skb->truesize += truesize;
+ }
+
+ offset = buf - page_address(page);
+ if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
+ put_page(page);
+ skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1,
+ len, truesize);
+ } else {
+ skb_add_rx_frag(curr_skb, num_skb_frags, page,
+ offset, len, truesize);
+ }
+
+ return curr_skb;
+}
+
static struct sk_buff *receive_mergeable(struct net_device *dev,
struct virtnet_info *vi,
struct receive_queue *rq,
@@ -1837,8 +1880,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
if (unlikely(!curr_skb))
goto err_skb;
while (--num_buf) {
- int num_skb_frags;
-
buf = virtnet_rq_get_buf(rq, &len, &ctx);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
@@ -1863,34 +1904,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
goto err_skb;
}
- num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
- if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
- struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
-
- if (unlikely(!nskb))
- goto err_skb;
- if (curr_skb == head_skb)
- skb_shinfo(curr_skb)->frag_list = nskb;
- else
- curr_skb->next = nskb;
- curr_skb = nskb;
- head_skb->truesize += nskb->truesize;
- num_skb_frags = 0;
- }
- if (curr_skb != head_skb) {
- head_skb->data_len += len;
- head_skb->len += len;
- head_skb->truesize += truesize;
- }
- offset = buf - page_address(page);
- if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
- put_page(page);
- skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1,
- len, truesize);
- } else {
- skb_add_rx_frag(curr_skb, num_skb_frags, page,
- offset, len, truesize);
- }
+ curr_skb = virtnet_skb_append_frag(head_skb, curr_skb, page,
+ buf, len, truesize);
+ if (!curr_skb)
+ goto err_skb;
}
ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 05/10] virtio_net: xsk: bind/unbind xsk for rx
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (3 preceding siblings ...)
2024-06-18 7:56 ` [PATCH net-next v6 04/10] virtio_net: separate receive_mergeable Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
2024-06-18 7:56 ` [PATCH net-next v6 06/10] virtio_net: xsk: support wakeup Xuan Zhuo
` (4 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This patch implement the logic of bind/unbind xsk pool to rq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 133 +++++++++++++++++++++++++++++++++++++++
1 file changed, 133 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index df885cdbe658..d8cce143be26 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -25,6 +25,7 @@
#include <net/net_failover.h>
#include <net/netdev_rx_queue.h>
#include <net/netdev_queues.h>
+#include <net/xdp_sock_drv.h>
static int napi_weight = NAPI_POLL_WEIGHT;
module_param(napi_weight, int, 0444);
@@ -348,6 +349,13 @@ struct receive_queue {
/* Record the last dma info to free after new pages is allocated. */
struct virtnet_rq_dma *last_dma;
+
+ struct {
+ struct xsk_buff_pool *pool;
+
+ /* xdp rxq used by xsk */
+ struct xdp_rxq_info xdp_rxq;
+ } xsk;
};
/* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -4970,6 +4978,129 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
return virtnet_set_guest_offloads(vi, offloads);
}
+static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct receive_queue *rq,
+ struct xsk_buff_pool *pool)
+{
+ int err, qindex;
+
+ qindex = rq - vi->rq;
+
+ if (pool) {
+ err = xdp_rxq_info_reg(&rq->xsk.xdp_rxq, vi->dev, qindex, rq->napi.napi_id);
+ if (err < 0)
+ return err;
+
+ err = xdp_rxq_info_reg_mem_model(&rq->xsk.xdp_rxq,
+ MEM_TYPE_XSK_BUFF_POOL, NULL);
+ if (err < 0)
+ goto unreg;
+
+ xsk_pool_set_rxq_info(pool, &rq->xsk.xdp_rxq);
+ }
+
+ virtnet_rx_pause(vi, rq);
+
+ err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
+ if (err) {
+ netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: %d\n", qindex, err);
+
+ pool = NULL;
+ }
+
+ rq->xsk.pool = pool;
+
+ virtnet_rx_resume(vi, rq);
+
+ if (pool)
+ return 0;
+
+unreg:
+ xdp_rxq_info_unreg(&rq->xsk.xdp_rxq);
+ return err;
+}
+
+static int virtnet_xsk_pool_enable(struct net_device *dev,
+ struct xsk_buff_pool *pool,
+ u16 qid)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct receive_queue *rq;
+ struct device *dma_dev;
+ struct send_queue *sq;
+ int err;
+
+ /* In big_packets mode, xdp cannot work, so there is no need to
+ * initialize xsk of rq.
+ */
+ if (vi->big_packets && !vi->mergeable_rx_bufs)
+ return -ENOENT;
+
+ if (qid >= vi->curr_queue_pairs)
+ return -EINVAL;
+
+ sq = &vi->sq[qid];
+ rq = &vi->rq[qid];
+
+ /* For the xsk, the tx and rx should have the same device. The af-xdp
+ * may use one buffer to receive from the rx and reuse this buffer to
+ * send by the tx. So the dma dev of sq and rq should be the same one.
+ *
+ * But vq->dma_dev allows every vq has the respective dma dev. So I
+ * check the dma dev of vq and sq is the same dev.
+ */
+ if (virtqueue_dma_dev(rq->vq) != virtqueue_dma_dev(sq->vq))
+ return -EPERM;
+
+ dma_dev = virtqueue_dma_dev(rq->vq);
+ if (!dma_dev)
+ return -EPERM;
+
+ err = xsk_pool_dma_map(pool, dma_dev, 0);
+ if (err)
+ goto err_xsk_map;
+
+ err = virtnet_rq_bind_xsk_pool(vi, rq, pool);
+ if (err)
+ goto err_rq;
+
+ return 0;
+
+err_rq:
+ xsk_pool_dma_unmap(pool, 0);
+err_xsk_map:
+ return err;
+}
+
+static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct xsk_buff_pool *pool;
+ struct receive_queue *rq;
+ int err;
+
+ if (qid >= vi->curr_queue_pairs)
+ return -EINVAL;
+
+ rq = &vi->rq[qid];
+
+ pool = rq->xsk.pool;
+
+ err = virtnet_rq_bind_xsk_pool(vi, rq, NULL);
+
+ xsk_pool_dma_unmap(pool, 0);
+
+ return err;
+}
+
+static int virtnet_xsk_pool_setup(struct net_device *dev, struct netdev_bpf *xdp)
+{
+ if (xdp->xsk.pool)
+ return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
+ xdp->xsk.queue_id);
+ else
+ return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
+}
+
static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
{
@@ -5095,6 +5226,8 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
switch (xdp->command) {
case XDP_SETUP_PROG:
return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
+ case XDP_SETUP_XSK_POOL:
+ return virtnet_xsk_pool_setup(dev, xdp);
default:
return -EINVAL;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 06/10] virtio_net: xsk: support wakeup
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (4 preceding siblings ...)
2024-06-18 7:56 ` [PATCH net-next v6 05/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
` (3 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
xsk wakeup is used to trigger the logic for xsk xmit by xsk framework or
user.
Virtio-net does not support to actively generate an interruption, so it
tries to trigger tx NAPI on the local cpu.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d8cce143be26..2bbc715f22c6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1032,6 +1032,29 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
}
}
+static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct send_queue *sq;
+
+ if (!netif_running(dev))
+ return -ENETDOWN;
+
+ if (qid >= vi->curr_queue_pairs)
+ return -EINVAL;
+
+ sq = &vi->sq[qid];
+
+ if (napi_if_scheduled_mark_missed(&sq->napi))
+ return 0;
+
+ local_bh_disable();
+ virtqueue_napi_schedule(&sq->napi, sq->vq);
+ local_bh_enable();
+
+ return 0;
+}
+
static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
struct send_queue *sq,
struct xdp_frame *xdpf)
@@ -5312,6 +5335,7 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
.ndo_bpf = virtnet_xdp,
.ndo_xdp_xmit = virtnet_xdp_xmit,
+ .ndo_xsk_wakeup = virtnet_xsk_wakeup,
.ndo_features_check = passthru_features_check,
.ndo_get_phys_port_name = virtnet_get_phys_port_name,
.ndo_set_features = virtnet_set_features,
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (5 preceding siblings ...)
2024-06-18 7:56 ` [PATCH net-next v6 06/10] virtio_net: xsk: support wakeup Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-20 10:20 ` Paolo Abeni
2024-06-28 2:19 ` Jason Wang
2024-06-18 7:56 ` [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
` (2 subsequent siblings)
9 siblings, 2 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
Implement the logic of filling rq with XSK buffers.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 68 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 66 insertions(+), 2 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2bbc715f22c6..2ac5668a94ce 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -355,6 +355,8 @@ struct receive_queue {
/* xdp rxq used by xsk */
struct xdp_rxq_info xdp_rxq;
+
+ struct xdp_buff **xsk_buffs;
} xsk;
};
@@ -1032,6 +1034,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
}
}
+static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
+{
+ sg->dma_address = addr;
+ sg->length = len;
+}
+
+static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
+ struct xsk_buff_pool *pool, gfp_t gfp)
+{
+ struct xdp_buff **xsk_buffs;
+ dma_addr_t addr;
+ u32 len, i;
+ int err = 0;
+ int num;
+
+ xsk_buffs = rq->xsk.xsk_buffs;
+
+ num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
+ if (!num)
+ return -ENOMEM;
+
+ len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
+
+ for (i = 0; i < num; ++i) {
+ /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
+ addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
+
+ sg_init_table(rq->sg, 1);
+ sg_fill_dma(rq->sg, addr, len);
+
+ err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, xsk_buffs[i], gfp);
+ if (err)
+ goto err;
+ }
+
+ return num;
+
+err:
+ if (i)
+ err = i;
+
+ for (; i < num; ++i)
+ xsk_buff_free(xsk_buffs[i]);
+
+ return err;
+}
+
static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -2206,6 +2255,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
int err;
bool oom;
+ if (rq->xsk.pool) {
+ err = virtnet_add_recvbuf_xsk(vi, rq, rq->xsk.pool, gfp);
+ goto kick;
+ }
+
do {
if (vi->mergeable_rx_bufs)
err = add_recvbuf_mergeable(vi, rq, gfp);
@@ -2214,10 +2268,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
else
err = add_recvbuf_small(vi, rq, gfp);
- oom = err == -ENOMEM;
if (err)
break;
} while (rq->vq->num_free);
+
+kick:
if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
unsigned long flags;
@@ -2226,6 +2281,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
u64_stats_update_end_irqrestore(&rq->stats.syncp, flags);
}
+ oom = err == -ENOMEM;
return !oom;
}
@@ -5050,7 +5106,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
struct receive_queue *rq;
struct device *dma_dev;
struct send_queue *sq;
- int err;
+ int err, size;
/* In big_packets mode, xdp cannot work, so there is no need to
* initialize xsk of rq.
@@ -5078,6 +5134,12 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
if (!dma_dev)
return -EPERM;
+ size = virtqueue_get_vring_size(rq->vq);
+
+ rq->xsk.xsk_buffs = kvcalloc(size, sizeof(*rq->xsk.xsk_buffs), GFP_KERNEL);
+ if (!rq->xsk.xsk_buffs)
+ return -ENOMEM;
+
err = xsk_pool_dma_map(pool, dma_dev, 0);
if (err)
goto err_xsk_map;
@@ -5112,6 +5174,8 @@ static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
xsk_pool_dma_unmap(pool, 0);
+ kvfree(rq->xsk.xsk_buffs);
+
return err;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (6 preceding siblings ...)
2024-06-18 7:56 ` [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
2024-06-18 7:56 ` [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer Xuan Zhuo
9 siblings, 1 reply; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
In the process:
1. We may need to copy data to create skb for XDP_PASS.
2. We may need to call xsk_buff_free() to release the buffer.
3. The handle for xdp_buff is difference from the buffer.
If we pushed this logic into existing receive handle(merge and small),
we would have to maintain code scattered inside merge and small (and big).
So I think it is a good choice for us to put the xsk code into an
independent function.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 135 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 133 insertions(+), 2 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2ac5668a94ce..06608d696e2e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -500,6 +500,10 @@ struct virtio_net_common_hdr {
};
static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
+static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
+ struct net_device *dev,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats);
static bool is_xdp_frame(void *ptr)
{
@@ -1040,6 +1044,120 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
sg->length = len;
}
+static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
+ struct receive_queue *rq, void *buf, u32 len)
+{
+ struct xdp_buff *xdp;
+ u32 bufsize;
+
+ xdp = (struct xdp_buff *)buf;
+
+ bufsize = xsk_pool_get_rx_frame_size(rq->xsk.pool) + vi->hdr_len;
+
+ if (unlikely(len > bufsize)) {
+ pr_debug("%s: rx error: len %u exceeds truesize %u\n",
+ vi->dev->name, len, bufsize);
+ DEV_STATS_INC(vi->dev, rx_length_errors);
+ xsk_buff_free(xdp);
+ return NULL;
+ }
+
+ xsk_buff_set_size(xdp, len);
+ xsk_buff_dma_sync_for_cpu(xdp);
+
+ return xdp;
+}
+
+static struct sk_buff *xdp_construct_skb(struct receive_queue *rq,
+ struct xdp_buff *xdp)
+{
+ unsigned int metasize = xdp->data - xdp->data_meta;
+ struct sk_buff *skb;
+ unsigned int size;
+
+ size = xdp->data_end - xdp->data_hard_start;
+ skb = napi_alloc_skb(&rq->napi, size);
+ if (unlikely(!skb)) {
+ xsk_buff_free(xdp);
+ return NULL;
+ }
+
+ skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
+
+ size = xdp->data_end - xdp->data_meta;
+ memcpy(__skb_put(skb, size), xdp->data_meta, size);
+
+ if (metasize) {
+ __skb_pull(skb, metasize);
+ skb_metadata_set(skb, metasize);
+ }
+
+ xsk_buff_free(xdp);
+
+ return skb;
+}
+
+static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
+ struct receive_queue *rq, struct xdp_buff *xdp,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ struct bpf_prog *prog;
+ u32 ret;
+
+ ret = XDP_PASS;
+ rcu_read_lock();
+ prog = rcu_dereference(rq->xdp_prog);
+ if (prog)
+ ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
+ rcu_read_unlock();
+
+ switch (ret) {
+ case XDP_PASS:
+ return xdp_construct_skb(rq, xdp);
+
+ case XDP_TX:
+ case XDP_REDIRECT:
+ return NULL;
+
+ default:
+ /* drop packet */
+ xsk_buff_free(xdp);
+ u64_stats_inc(&stats->drops);
+ return NULL;
+ }
+}
+
+static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
+ void *buf, u32 len,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ struct net_device *dev = vi->dev;
+ struct sk_buff *skb = NULL;
+ struct xdp_buff *xdp;
+
+ len -= vi->hdr_len;
+
+ u64_stats_add(&stats->bytes, len);
+
+ xdp = buf_to_xdp(vi, rq, buf, len);
+ if (!xdp)
+ return NULL;
+
+ if (unlikely(len < ETH_HLEN)) {
+ pr_debug("%s: short packet %i\n", dev->name, len);
+ DEV_STATS_INC(dev, rx_length_errors);
+ xsk_buff_free(xdp);
+ return NULL;
+ }
+
+ if (!vi->mergeable_rx_bufs)
+ skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
+
+ return skb;
+}
+
static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
struct xsk_buff_pool *pool, gfp_t gfp)
{
@@ -2363,9 +2481,22 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
void *buf;
int i;
- if (!vi->big_packets || vi->mergeable_rx_bufs) {
- void *ctx;
+ if (rq->xsk.pool) {
+ struct sk_buff *skb;
+
+ while (packets < budget) {
+ buf = virtqueue_get_buf(rq->vq, &len);
+ if (!buf)
+ break;
+ skb = virtnet_receive_xsk_buf(vi, rq, buf, len, xdp_xmit, &stats);
+ if (skb)
+ virtnet_receive_done(vi, rq, skb);
+
+ packets++;
+ }
+ } else if (!vi->big_packets || vi->mergeable_rx_bufs) {
+ void *ctx;
while (packets < budget &&
(buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (7 preceding siblings ...)
2024-06-18 7:56 ` [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-20 10:37 ` Paolo Abeni
2024-06-28 2:19 ` Jason Wang
2024-06-18 7:56 ` [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer Xuan Zhuo
9 siblings, 2 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
Support AF-XDP for merge mode.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 139 +++++++++++++++++++++++++++++++++++++++
1 file changed, 139 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 06608d696e2e..cfa106aa8039 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -504,6 +504,10 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
struct net_device *dev,
unsigned int *xdp_xmit,
struct virtnet_rq_stats *stats);
+static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
+ struct sk_buff *curr_skb,
+ struct page *page, void *buf,
+ int len, int truesize);
static bool is_xdp_frame(void *ptr)
{
@@ -1128,6 +1132,139 @@ static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct
}
}
+static void xsk_drop_follow_bufs(struct net_device *dev,
+ struct receive_queue *rq,
+ u32 num_buf,
+ struct virtnet_rq_stats *stats)
+{
+ struct xdp_buff *xdp;
+ u32 len;
+
+ while (num_buf-- > 1) {
+ xdp = virtqueue_get_buf(rq->vq, &len);
+ if (unlikely(!xdp)) {
+ pr_debug("%s: rx error: %d buffers missing\n",
+ dev->name, num_buf);
+ DEV_STATS_INC(dev, rx_length_errors);
+ break;
+ }
+ u64_stats_add(&stats->bytes, len);
+ xsk_buff_free(xdp);
+ }
+}
+
+static int xsk_append_merge_buffer(struct virtnet_info *vi,
+ struct receive_queue *rq,
+ struct sk_buff *head_skb,
+ u32 num_buf,
+ struct virtio_net_hdr_mrg_rxbuf *hdr,
+ struct virtnet_rq_stats *stats)
+{
+ struct sk_buff *curr_skb;
+ struct xdp_buff *xdp;
+ u32 len, truesize;
+ struct page *page;
+ void *buf;
+
+ curr_skb = head_skb;
+
+ while (--num_buf) {
+ buf = virtqueue_get_buf(rq->vq, &len);
+ if (unlikely(!buf)) {
+ pr_debug("%s: rx error: %d buffers out of %d missing\n",
+ vi->dev->name, num_buf,
+ virtio16_to_cpu(vi->vdev,
+ hdr->num_buffers));
+ DEV_STATS_INC(vi->dev, rx_length_errors);
+ return -EINVAL;
+ }
+
+ u64_stats_add(&stats->bytes, len);
+
+ xdp = buf_to_xdp(vi, rq, buf, len);
+ if (!xdp)
+ goto err;
+
+ buf = napi_alloc_frag(len);
+ if (!buf) {
+ xsk_buff_free(xdp);
+ goto err;
+ }
+
+ memcpy(buf, xdp->data - vi->hdr_len, len);
+
+ xsk_buff_free(xdp);
+
+ page = virt_to_page(buf);
+
+ truesize = len;
+
+ curr_skb = virtnet_skb_append_frag(head_skb, curr_skb, page,
+ buf, len, truesize);
+ if (!curr_skb) {
+ put_page(page);
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ xsk_drop_follow_bufs(vi->dev, rq, num_buf, stats);
+ return -EINVAL;
+}
+
+static struct sk_buff *virtnet_receive_xsk_merge(struct net_device *dev, struct virtnet_info *vi,
+ struct receive_queue *rq, struct xdp_buff *xdp,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ struct virtio_net_hdr_mrg_rxbuf *hdr;
+ struct bpf_prog *prog;
+ struct sk_buff *skb;
+ u32 ret, num_buf;
+
+ hdr = xdp->data - vi->hdr_len;
+ num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
+
+ ret = XDP_PASS;
+ rcu_read_lock();
+ prog = rcu_dereference(rq->xdp_prog);
+ /* TODO: support multi buffer. */
+ if (prog && num_buf == 1)
+ ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
+ rcu_read_unlock();
+
+ switch (ret) {
+ case XDP_PASS:
+ skb = xdp_construct_skb(rq, xdp);
+ if (!skb)
+ goto drop_bufs;
+
+ if (xsk_append_merge_buffer(vi, rq, skb, num_buf, hdr, stats)) {
+ dev_kfree_skb(skb);
+ goto drop;
+ }
+
+ return skb;
+
+ case XDP_TX:
+ case XDP_REDIRECT:
+ return NULL;
+
+ default:
+ /* drop packet */
+ xsk_buff_free(xdp);
+ }
+
+drop_bufs:
+ xsk_drop_follow_bufs(dev, rq, num_buf, stats);
+
+drop:
+ u64_stats_inc(&stats->drops);
+ return NULL;
+}
+
static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
void *buf, u32 len,
unsigned int *xdp_xmit,
@@ -1154,6 +1291,8 @@ static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct r
if (!vi->mergeable_rx_bufs)
skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
+ else
+ skb = virtnet_receive_xsk_merge(dev, vi, rq, xdp, xdp_xmit, stats);
return skb;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (8 preceding siblings ...)
2024-06-18 7:56 ` [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
@ 2024-06-18 7:56 ` Xuan Zhuo
2024-06-20 10:46 ` Paolo Abeni
2024-06-28 2:19 ` Jason Wang
9 siblings, 2 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-18 7:56 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
Release the xsk buffer, when the queue is releasing or the queue is
resizing.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cfa106aa8039..33695b86bd99 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -967,6 +967,11 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
rq = &vi->rq[i];
+ if (rq->xsk.pool) {
+ xsk_buff_free((struct xdp_buff *)buf);
+ return;
+ }
+
if (!vi->big_packets || vi->mergeable_rx_bufs)
virtnet_rq_unmap(rq, buf, 0);
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-06-18 7:56 ` [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
@ 2024-06-20 10:20 ` Paolo Abeni
2024-06-20 10:37 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
1 sibling, 1 reply; 32+ messages in thread
From: Paolo Abeni @ 2024-06-20 10:20 UTC (permalink / raw)
To: Xuan Zhuo, netdev
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
Hi,
On Tue, 2024-06-18 at 15:56 +0800, Xuan Zhuo wrote:
> @@ -1032,6 +1034,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> }
> }
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> + sg->dma_address = addr;
> + sg->length = len;
> +}
> +
> +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> + struct xsk_buff_pool *pool, gfp_t gfp)
> +{
> + struct xdp_buff **xsk_buffs;
> + dma_addr_t addr;
> + u32 len, i;
> + int err = 0;
Minor nit: the reverse xmas tree order is based on the full line len,
should be:
int err = 0;
u32 len, i;
[...]
> @@ -2226,6 +2281,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
> u64_stats_update_end_irqrestore(&rq->stats.syncp, flags);
> }
>
> + oom = err == -ENOMEM;
> return !oom;
Minor nit: 'oom' is used only in the above to lines. You could drop
such variable and just:
return err != -ENOMEM;
Please _do not_ repost just for the above, but please include such
changes if you should repost for other reasons.
Also try to include a detailed changelog in each patch after the tag
area and a '---' separator, it will simplify the review process.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-06-20 10:20 ` Paolo Abeni
@ 2024-06-20 10:37 ` Xuan Zhuo
0 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-20 10:37 UTC (permalink / raw)
To: Paolo Abeni
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf, netdev
On Thu, 20 Jun 2024 12:20:44 +0200, Paolo Abeni <pabeni@redhat.com> wrote:
> Hi,
>
> On Tue, 2024-06-18 at 15:56 +0800, Xuan Zhuo wrote:
> > @@ -1032,6 +1034,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> > }
> > }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > + sg->dma_address = addr;
> > + sg->length = len;
> > +}
> > +
> > +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > + struct xsk_buff_pool *pool, gfp_t gfp)
> > +{
> > + struct xdp_buff **xsk_buffs;
> > + dma_addr_t addr;
> > + u32 len, i;
> > + int err = 0;
>
> Minor nit: the reverse xmas tree order is based on the full line len,
> should be:
> int err = 0;
> u32 len, i;
Will fix.
>
> [...]
> > @@ -2226,6 +2281,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
> > u64_stats_update_end_irqrestore(&rq->stats.syncp, flags);
> > }
> >
> > + oom = err == -ENOMEM;
> > return !oom;
>
> Minor nit: 'oom' is used only in the above to lines. You could drop
> such variable and just:
> return err != -ENOMEM;
Will fix.
>
> Please _do not_ repost just for the above, but please include such
> changes if you should repost for other reasons.
OK.
>
> Also try to include a detailed changelog in each patch after the tag
> area and a '---' separator, it will simplify the review process.
Will do.
Thanks.
>
> Thanks,
>
> Paolo
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode
2024-06-18 7:56 ` [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
@ 2024-06-20 10:37 ` Paolo Abeni
2024-06-20 10:57 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
1 sibling, 1 reply; 32+ messages in thread
From: Paolo Abeni @ 2024-06-20 10:37 UTC (permalink / raw)
To: Xuan Zhuo, netdev
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Tue, 2024-06-18 at 15:56 +0800, Xuan Zhuo wrote:
> Support AF-XDP for merge mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 139 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 139 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 06608d696e2e..cfa106aa8039 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -504,6 +504,10 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> struct net_device *dev,
> unsigned int *xdp_xmit,
> struct virtnet_rq_stats *stats);
> +static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
> + struct sk_buff *curr_skb,
> + struct page *page, void *buf,
> + int len, int truesize);
>
> static bool is_xdp_frame(void *ptr)
> {
> @@ -1128,6 +1132,139 @@ static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct
> }
> }
>
> +static void xsk_drop_follow_bufs(struct net_device *dev,
> + struct receive_queue *rq,
> + u32 num_buf,
> + struct virtnet_rq_stats *stats)
> +{
> + struct xdp_buff *xdp;
> + u32 len;
> +
> + while (num_buf-- > 1) {
Why do you skip the last buffer? I thought it should be dropped, too?!?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer
2024-06-18 7:56 ` [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer Xuan Zhuo
@ 2024-06-20 10:46 ` Paolo Abeni
2024-06-20 10:48 ` Michael S. Tsirkin
2024-06-20 10:54 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
1 sibling, 2 replies; 32+ messages in thread
From: Paolo Abeni @ 2024-06-20 10:46 UTC (permalink / raw)
To: Xuan Zhuo, netdev
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Tue, 2024-06-18 at 15:56 +0800, Xuan Zhuo wrote:
> Release the xsk buffer, when the queue is releasing or the queue is
> resizing.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index cfa106aa8039..33695b86bd99 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -967,6 +967,11 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>
> rq = &vi->rq[i];
>
> + if (rq->xsk.pool) {
> + xsk_buff_free((struct xdp_buff *)buf);
> + return;
> + }
> +
> if (!vi->big_packets || vi->mergeable_rx_bufs)
> virtnet_rq_unmap(rq, buf, 0);
I'm under the impression this should be squashed in a previous patch,
likely "virtio_net: xsk: bind/unbind xsk for rx"
Thanks,
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer
2024-06-20 10:46 ` Paolo Abeni
@ 2024-06-20 10:48 ` Michael S. Tsirkin
2024-06-20 10:54 ` Xuan Zhuo
1 sibling, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2024-06-20 10:48 UTC (permalink / raw)
To: Paolo Abeni
Cc: Xuan Zhuo, netdev, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Thu, Jun 20, 2024 at 12:46:24PM +0200, Paolo Abeni wrote:
> On Tue, 2024-06-18 at 15:56 +0800, Xuan Zhuo wrote:
> > Release the xsk buffer, when the queue is releasing or the queue is
> > resizing.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index cfa106aa8039..33695b86bd99 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -967,6 +967,11 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> >
> > rq = &vi->rq[i];
> >
> > + if (rq->xsk.pool) {
> > + xsk_buff_free((struct xdp_buff *)buf);
> > + return;
> > + }
> > +
> > if (!vi->big_packets || vi->mergeable_rx_bufs)
> > virtnet_rq_unmap(rq, buf, 0);
>
>
> I'm under the impression this should be squashed in a previous patch,
> likely "virtio_net: xsk: bind/unbind xsk for rx"
>
> Thanks,
>
> Paolo
agreed, looks weird.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer
2024-06-20 10:46 ` Paolo Abeni
2024-06-20 10:48 ` Michael S. Tsirkin
@ 2024-06-20 10:54 ` Xuan Zhuo
1 sibling, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-20 10:54 UTC (permalink / raw)
To: Paolo Abeni
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf, netdev
On Thu, 20 Jun 2024 12:46:24 +0200, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2024-06-18 at 15:56 +0800, Xuan Zhuo wrote:
> > Release the xsk buffer, when the queue is releasing or the queue is
> > resizing.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index cfa106aa8039..33695b86bd99 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -967,6 +967,11 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
> >
> > rq = &vi->rq[i];
> >
> > + if (rq->xsk.pool) {
> > + xsk_buff_free((struct xdp_buff *)buf);
> > + return;
> > + }
> > +
> > if (!vi->big_packets || vi->mergeable_rx_bufs)
> > virtnet_rq_unmap(rq, buf, 0);
>
>
> I'm under the impression this should be squashed in a previous patch,
> likely "virtio_net: xsk: bind/unbind xsk for rx"
OK.
Thanks.
>
> Thanks,
>
> Paolo
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode
2024-06-20 10:37 ` Paolo Abeni
@ 2024-06-20 10:57 ` Xuan Zhuo
0 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-20 10:57 UTC (permalink / raw)
To: Paolo Abeni
Cc: Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf, netdev
On Thu, 20 Jun 2024 12:37:43 +0200, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2024-06-18 at 15:56 +0800, Xuan Zhuo wrote:
> > Support AF-XDP for merge mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 139 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 139 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 06608d696e2e..cfa106aa8039 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -504,6 +504,10 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > struct net_device *dev,
> > unsigned int *xdp_xmit,
> > struct virtnet_rq_stats *stats);
> > +static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
> > + struct sk_buff *curr_skb,
> > + struct page *page, void *buf,
> > + int len, int truesize);
> >
> > static bool is_xdp_frame(void *ptr)
> > {
> > @@ -1128,6 +1132,139 @@ static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct
> > }
> > }
> >
> > +static void xsk_drop_follow_bufs(struct net_device *dev,
> > + struct receive_queue *rq,
> > + u32 num_buf,
> > + struct virtnet_rq_stats *stats)
> > +{
> > + struct xdp_buff *xdp;
> > + u32 len;
> > +
> > + while (num_buf-- > 1) {
>
> Why do you skip the last buffer? I thought it should be dropped, too?!?
Here, we just need to drop the follow bufs (num_buf - 1). The first one have be
dropped.
Thanks.
>
> Thanks!
>
> Paolo
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 05/10] virtio_net: xsk: bind/unbind xsk for rx
2024-06-18 7:56 ` [PATCH net-next v6 05/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
@ 2024-06-28 2:19 ` Jason Wang
2024-06-28 5:42 ` Xuan Zhuo
0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2024-06-28 2:19 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This patch implement the logic of bind/unbind xsk pool to rq.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 133 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 133 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index df885cdbe658..d8cce143be26 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -25,6 +25,7 @@
> #include <net/net_failover.h>
> #include <net/netdev_rx_queue.h>
> #include <net/netdev_queues.h>
> +#include <net/xdp_sock_drv.h>
>
> static int napi_weight = NAPI_POLL_WEIGHT;
> module_param(napi_weight, int, 0444);
> @@ -348,6 +349,13 @@ struct receive_queue {
>
> /* Record the last dma info to free after new pages is allocated. */
> struct virtnet_rq_dma *last_dma;
> +
> + struct {
> + struct xsk_buff_pool *pool;
> +
> + /* xdp rxq used by xsk */
> + struct xdp_rxq_info xdp_rxq;
> + } xsk;
I don't see a special reason for having a container struct here.
> };
>
> /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -4970,6 +4978,129 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> return virtnet_set_guest_offloads(vi, offloads);
> }
>
> +static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct receive_queue *rq,
> + struct xsk_buff_pool *pool)
> +{
> + int err, qindex;
> +
> + qindex = rq - vi->rq;
> +
> + if (pool) {
> + err = xdp_rxq_info_reg(&rq->xsk.xdp_rxq, vi->dev, qindex, rq->napi.napi_id);
> + if (err < 0)
> + return err;
> +
> + err = xdp_rxq_info_reg_mem_model(&rq->xsk.xdp_rxq,
> + MEM_TYPE_XSK_BUFF_POOL, NULL);
> + if (err < 0)
> + goto unreg;
> +
> + xsk_pool_set_rxq_info(pool, &rq->xsk.xdp_rxq);
> + }
> +
> + virtnet_rx_pause(vi, rq);
> +
> + err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
> + if (err) {
> + netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: %d\n", qindex, err);
> +
> + pool = NULL;
> + }
> +
> + rq->xsk.pool = pool;
> +
> + virtnet_rx_resume(vi, rq);
> +
> + if (pool)
> + return 0;
> +
> +unreg:
> + xdp_rxq_info_unreg(&rq->xsk.xdp_rxq);
> + return err;
> +}
> +
> +static int virtnet_xsk_pool_enable(struct net_device *dev,
> + struct xsk_buff_pool *pool,
> + u16 qid)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> + struct receive_queue *rq;
> + struct device *dma_dev;
> + struct send_queue *sq;
> + int err;
> +
> + /* In big_packets mode, xdp cannot work, so there is no need to
> + * initialize xsk of rq.
> + */
> + if (vi->big_packets && !vi->mergeable_rx_bufs)
> + return -ENOENT;
> +
> + if (qid >= vi->curr_queue_pairs)
> + return -EINVAL;
> +
> + sq = &vi->sq[qid];
> + rq = &vi->rq[qid];
> +
> + /* For the xsk, the tx and rx should have the same device. The af-xdp
> + * may use one buffer to receive from the rx and reuse this buffer to
> + * send by the tx. So the dma dev of sq and rq should be the same one.
> + *
> + * But vq->dma_dev allows every vq has the respective dma dev. So I
> + * check the dma dev of vq and sq is the same dev.
Not a native speaker, but it might be better to say "xsk assumes ....
to be the same device". And it might be better to replace "should"
with "must".
Others look good.
Thanks
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-06-18 7:56 ` [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
2024-06-20 10:20 ` Paolo Abeni
@ 2024-06-28 2:19 ` Jason Wang
2024-06-28 5:42 ` Xuan Zhuo
1 sibling, 1 reply; 32+ messages in thread
From: Jason Wang @ 2024-06-28 2:19 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Implement the logic of filling rq with XSK buffers.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 68 ++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2bbc715f22c6..2ac5668a94ce 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -355,6 +355,8 @@ struct receive_queue {
>
> /* xdp rxq used by xsk */
> struct xdp_rxq_info xdp_rxq;
> +
> + struct xdp_buff **xsk_buffs;
> } xsk;
> };
>
> @@ -1032,6 +1034,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> }
> }
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> + sg->dma_address = addr;
> + sg->length = len;
> +}
> +
> +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> + struct xsk_buff_pool *pool, gfp_t gfp)
> +{
> + struct xdp_buff **xsk_buffs;
> + dma_addr_t addr;
> + u32 len, i;
> + int err = 0;
> + int num;
> +
> + xsk_buffs = rq->xsk.xsk_buffs;
> +
> + num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
> + if (!num)
> + return -ENOMEM;
> +
> + len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> +
> + for (i = 0; i < num; ++i) {
> + /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
> + addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
We had VIRTIO_XDP_HEADROOM, can we reuse it? Or if it's redundant
let's send a patch to switch to XDP_PACKET_HEADROOM.
Btw, the code assumes vi->hdr_len < xsk_pool_get_headroom(). It's
better to fail if it's not true when enabling xsk.
Thanks
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode
2024-06-18 7:56 ` [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
@ 2024-06-28 2:19 ` Jason Wang
2024-06-28 5:44 ` Xuan Zhuo
0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2024-06-28 2:19 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> In the process:
> 1. We may need to copy data to create skb for XDP_PASS.
> 2. We may need to call xsk_buff_free() to release the buffer.
> 3. The handle for xdp_buff is difference from the buffer.
>
> If we pushed this logic into existing receive handle(merge and small),
> we would have to maintain code scattered inside merge and small (and big).
> So I think it is a good choice for us to put the xsk code into an
> independent function.
I think it's better to try to reuse the existing functions.
More below:
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 135 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 133 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2ac5668a94ce..06608d696e2e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -500,6 +500,10 @@ struct virtio_net_common_hdr {
> };
>
> static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> + struct net_device *dev,
> + unsigned int *xdp_xmit,
> + struct virtnet_rq_stats *stats);
>
> static bool is_xdp_frame(void *ptr)
> {
> @@ -1040,6 +1044,120 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> sg->length = len;
> }
>
> +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> + struct receive_queue *rq, void *buf, u32 len)
> +{
> + struct xdp_buff *xdp;
> + u32 bufsize;
> +
> + xdp = (struct xdp_buff *)buf;
> +
> + bufsize = xsk_pool_get_rx_frame_size(rq->xsk.pool) + vi->hdr_len;
> +
> + if (unlikely(len > bufsize)) {
> + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> + vi->dev->name, len, bufsize);
> + DEV_STATS_INC(vi->dev, rx_length_errors);
> + xsk_buff_free(xdp);
> + return NULL;
> + }
> +
> + xsk_buff_set_size(xdp, len);
> + xsk_buff_dma_sync_for_cpu(xdp);
> +
> + return xdp;
> +}
> +
> +static struct sk_buff *xdp_construct_skb(struct receive_queue *rq,
> + struct xdp_buff *xdp)
> +{
So we have a similar caller which is receive_small_build_skb(). Any
chance to reuse that?
> + unsigned int metasize = xdp->data - xdp->data_meta;
> + struct sk_buff *skb;
> + unsigned int size;
> +
> + size = xdp->data_end - xdp->data_hard_start;
> + skb = napi_alloc_skb(&rq->napi, size);
> + if (unlikely(!skb)) {
> + xsk_buff_free(xdp);
> + return NULL;
> + }
> +
> + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> +
> + size = xdp->data_end - xdp->data_meta;
> + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> +
> + if (metasize) {
> + __skb_pull(skb, metasize);
> + skb_metadata_set(skb, metasize);
> + }
> +
> + xsk_buff_free(xdp);
> +
> + return skb;
> +}
> +
> +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> + struct receive_queue *rq, struct xdp_buff *xdp,
> + unsigned int *xdp_xmit,
> + struct virtnet_rq_stats *stats)
> +{
> + struct bpf_prog *prog;
> + u32 ret;
> +
> + ret = XDP_PASS;
> + rcu_read_lock();
> + prog = rcu_dereference(rq->xdp_prog);
> + if (prog)
> + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> + rcu_read_unlock();
> +
> + switch (ret) {
> + case XDP_PASS:
> + return xdp_construct_skb(rq, xdp);
> +
> + case XDP_TX:
> + case XDP_REDIRECT:
> + return NULL;
> +
> + default:
> + /* drop packet */
> + xsk_buff_free(xdp);
> + u64_stats_inc(&stats->drops);
> + return NULL;
> + }
> +}
> +
> +static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> + void *buf, u32 len,
> + unsigned int *xdp_xmit,
> + struct virtnet_rq_stats *stats)
> +{
> + struct net_device *dev = vi->dev;
> + struct sk_buff *skb = NULL;
> + struct xdp_buff *xdp;
> +
> + len -= vi->hdr_len;
> +
> + u64_stats_add(&stats->bytes, len);
> +
> + xdp = buf_to_xdp(vi, rq, buf, len);
> + if (!xdp)
> + return NULL;
> +
> + if (unlikely(len < ETH_HLEN)) {
> + pr_debug("%s: short packet %i\n", dev->name, len);
> + DEV_STATS_INC(dev, rx_length_errors);
> + xsk_buff_free(xdp);
> + return NULL;
> + }
> +
> + if (!vi->mergeable_rx_bufs)
> + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
> +
> + return skb;
> +}
> +
> static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> struct xsk_buff_pool *pool, gfp_t gfp)
> {
> @@ -2363,9 +2481,22 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> void *buf;
> int i;
>
> - if (!vi->big_packets || vi->mergeable_rx_bufs) {
> - void *ctx;
> + if (rq->xsk.pool) {
> + struct sk_buff *skb;
> +
> + while (packets < budget) {
> + buf = virtqueue_get_buf(rq->vq, &len);
> + if (!buf)
> + break;
>
> + skb = virtnet_receive_xsk_buf(vi, rq, buf, len, xdp_xmit, &stats);
> + if (skb)
> + virtnet_receive_done(vi, rq, skb);
> +
> + packets++;
> + }
If reusing turns out to be hard, I'd rather add new paths in receive_small().
> + } else if (!vi->big_packets || vi->mergeable_rx_bufs) {
> + void *ctx;
> while (packets < budget &&
> (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> --
> 2.32.0.3.g01195cf9f
>
Thanks
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode
2024-06-18 7:56 ` [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
2024-06-20 10:37 ` Paolo Abeni
@ 2024-06-28 2:19 ` Jason Wang
2024-06-28 5:50 ` Xuan Zhuo
1 sibling, 1 reply; 32+ messages in thread
From: Jason Wang @ 2024-06-28 2:19 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Support AF-XDP for merge mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 139 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 139 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 06608d696e2e..cfa106aa8039 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -504,6 +504,10 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> struct net_device *dev,
> unsigned int *xdp_xmit,
> struct virtnet_rq_stats *stats);
> +static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
> + struct sk_buff *curr_skb,
> + struct page *page, void *buf,
> + int len, int truesize);
>
> static bool is_xdp_frame(void *ptr)
> {
> @@ -1128,6 +1132,139 @@ static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct
> }
> }
>
> +static void xsk_drop_follow_bufs(struct net_device *dev,
> + struct receive_queue *rq,
> + u32 num_buf,
> + struct virtnet_rq_stats *stats)
> +{
> + struct xdp_buff *xdp;
> + u32 len;
> +
> + while (num_buf-- > 1) {
> + xdp = virtqueue_get_buf(rq->vq, &len);
> + if (unlikely(!xdp)) {
> + pr_debug("%s: rx error: %d buffers missing\n",
> + dev->name, num_buf);
> + DEV_STATS_INC(dev, rx_length_errors);
> + break;
> + }
> + u64_stats_add(&stats->bytes, len);
> + xsk_buff_free(xdp);
> + }
> +}
> +
> +static int xsk_append_merge_buffer(struct virtnet_info *vi,
> + struct receive_queue *rq,
> + struct sk_buff *head_skb,
> + u32 num_buf,
> + struct virtio_net_hdr_mrg_rxbuf *hdr,
> + struct virtnet_rq_stats *stats)
> +{
> + struct sk_buff *curr_skb;
> + struct xdp_buff *xdp;
> + u32 len, truesize;
> + struct page *page;
> + void *buf;
> +
> + curr_skb = head_skb;
> +
> + while (--num_buf) {
> + buf = virtqueue_get_buf(rq->vq, &len);
> + if (unlikely(!buf)) {
> + pr_debug("%s: rx error: %d buffers out of %d missing\n",
> + vi->dev->name, num_buf,
> + virtio16_to_cpu(vi->vdev,
> + hdr->num_buffers));
> + DEV_STATS_INC(vi->dev, rx_length_errors);
> + return -EINVAL;
> + }
> +
> + u64_stats_add(&stats->bytes, len);
> +
> + xdp = buf_to_xdp(vi, rq, buf, len);
> + if (!xdp)
> + goto err;
> +
> + buf = napi_alloc_frag(len);
So we don't do this for non xsk paths. Any reason we can't reuse the
existing codes?
Thanks
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer
2024-06-18 7:56 ` [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer Xuan Zhuo
2024-06-20 10:46 ` Paolo Abeni
@ 2024-06-28 2:19 ` Jason Wang
1 sibling, 0 replies; 32+ messages in thread
From: Jason Wang @ 2024-06-28 2:19 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Release the xsk buffer, when the queue is releasing or the queue is
> resizing.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> drivers/net/virtio_net.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index cfa106aa8039..33695b86bd99 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -967,6 +967,11 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
>
> rq = &vi->rq[i];
>
> + if (rq->xsk.pool) {
> + xsk_buff_free((struct xdp_buff *)buf);
> + return;
> + }
It seems it needs to be squashed into previous patches?
Thanks
> +
> if (!vi->big_packets || vi->mergeable_rx_bufs)
> virtnet_rq_unmap(rq, buf, 0);
>
> --
> 2.32.0.3.g01195cf9f
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 05/10] virtio_net: xsk: bind/unbind xsk for rx
2024-06-28 2:19 ` Jason Wang
@ 2024-06-28 5:42 ` Xuan Zhuo
0 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-28 5:42 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, 28 Jun 2024 10:19:34 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > This patch implement the logic of bind/unbind xsk pool to rq.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 133 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 133 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index df885cdbe658..d8cce143be26 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -25,6 +25,7 @@
> > #include <net/net_failover.h>
> > #include <net/netdev_rx_queue.h>
> > #include <net/netdev_queues.h>
> > +#include <net/xdp_sock_drv.h>
> >
> > static int napi_weight = NAPI_POLL_WEIGHT;
> > module_param(napi_weight, int, 0444);
> > @@ -348,6 +349,13 @@ struct receive_queue {
> >
> > /* Record the last dma info to free after new pages is allocated. */
> > struct virtnet_rq_dma *last_dma;
> > +
> > + struct {
> > + struct xsk_buff_pool *pool;
> > +
> > + /* xdp rxq used by xsk */
> > + struct xdp_rxq_info xdp_rxq;
> > + } xsk;
>
> I don't see a special reason for having a container struct here.
Will fix.
>
>
> > };
> >
> > /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -4970,6 +4978,129 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> > return virtnet_set_guest_offloads(vi, offloads);
> > }
> >
> > +static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct receive_queue *rq,
> > + struct xsk_buff_pool *pool)
> > +{
> > + int err, qindex;
> > +
> > + qindex = rq - vi->rq;
> > +
> > + if (pool) {
> > + err = xdp_rxq_info_reg(&rq->xsk.xdp_rxq, vi->dev, qindex, rq->napi.napi_id);
> > + if (err < 0)
> > + return err;
> > +
> > + err = xdp_rxq_info_reg_mem_model(&rq->xsk.xdp_rxq,
> > + MEM_TYPE_XSK_BUFF_POOL, NULL);
> > + if (err < 0)
> > + goto unreg;
> > +
> > + xsk_pool_set_rxq_info(pool, &rq->xsk.xdp_rxq);
> > + }
> > +
> > + virtnet_rx_pause(vi, rq);
> > +
> > + err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
> > + if (err) {
> > + netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: %d\n", qindex, err);
> > +
> > + pool = NULL;
> > + }
> > +
> > + rq->xsk.pool = pool;
> > +
> > + virtnet_rx_resume(vi, rq);
> > +
> > + if (pool)
> > + return 0;
> > +
> > +unreg:
> > + xdp_rxq_info_unreg(&rq->xsk.xdp_rxq);
> > + return err;
> > +}
> > +
> > +static int virtnet_xsk_pool_enable(struct net_device *dev,
> > + struct xsk_buff_pool *pool,
> > + u16 qid)
> > +{
> > + struct virtnet_info *vi = netdev_priv(dev);
> > + struct receive_queue *rq;
> > + struct device *dma_dev;
> > + struct send_queue *sq;
> > + int err;
> > +
> > + /* In big_packets mode, xdp cannot work, so there is no need to
> > + * initialize xsk of rq.
> > + */
> > + if (vi->big_packets && !vi->mergeable_rx_bufs)
> > + return -ENOENT;
> > +
> > + if (qid >= vi->curr_queue_pairs)
> > + return -EINVAL;
> > +
> > + sq = &vi->sq[qid];
> > + rq = &vi->rq[qid];
> > +
> > + /* For the xsk, the tx and rx should have the same device. The af-xdp
> > + * may use one buffer to receive from the rx and reuse this buffer to
> > + * send by the tx. So the dma dev of sq and rq should be the same one.
> > + *
> > + * But vq->dma_dev allows every vq has the respective dma dev. So I
> > + * check the dma dev of vq and sq is the same dev.
>
> Not a native speaker, but it might be better to say "xsk assumes ....
> to be the same device". And it might be better to replace "should"
> with "must".
Will fix.
Thanks.
>
> Others look good.
>
> Thanks
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-06-28 2:19 ` Jason Wang
@ 2024-06-28 5:42 ` Xuan Zhuo
2024-07-01 3:05 ` Jason Wang
0 siblings, 1 reply; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-28 5:42 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, 28 Jun 2024 10:19:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Implement the logic of filling rq with XSK buffers.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 68 ++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 66 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2bbc715f22c6..2ac5668a94ce 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -355,6 +355,8 @@ struct receive_queue {
> >
> > /* xdp rxq used by xsk */
> > struct xdp_rxq_info xdp_rxq;
> > +
> > + struct xdp_buff **xsk_buffs;
> > } xsk;
> > };
> >
> > @@ -1032,6 +1034,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> > }
> > }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > + sg->dma_address = addr;
> > + sg->length = len;
> > +}
> > +
> > +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > + struct xsk_buff_pool *pool, gfp_t gfp)
> > +{
> > + struct xdp_buff **xsk_buffs;
> > + dma_addr_t addr;
> > + u32 len, i;
> > + int err = 0;
> > + int num;
> > +
> > + xsk_buffs = rq->xsk.xsk_buffs;
> > +
> > + num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
> > + if (!num)
> > + return -ENOMEM;
> > +
> > + len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> > +
> > + for (i = 0; i < num; ++i) {
> > + /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
> > + addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
>
> We had VIRTIO_XDP_HEADROOM, can we reuse it? Or if it's redundant
> let's send a patch to switch to XDP_PACKET_HEADROOM.
Do you mean replace it inside the comment?
I want to describe use the headroom of xsk, the size of the headroom is
XDP_PACKET_HEADROOM.
>
> Btw, the code assumes vi->hdr_len < xsk_pool_get_headroom(). It's
> better to fail if it's not true when enabling xsk.
It is ok.
Thanks.
>
> Thanks
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode
2024-06-28 2:19 ` Jason Wang
@ 2024-06-28 5:44 ` Xuan Zhuo
2024-07-01 3:20 ` Jason Wang
0 siblings, 1 reply; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-28 5:44 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, 28 Jun 2024 10:19:41 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In the process:
> > 1. We may need to copy data to create skb for XDP_PASS.
> > 2. We may need to call xsk_buff_free() to release the buffer.
> > 3. The handle for xdp_buff is difference from the buffer.
> >
> > If we pushed this logic into existing receive handle(merge and small),
> > we would have to maintain code scattered inside merge and small (and big).
> > So I think it is a good choice for us to put the xsk code into an
> > independent function.
>
> I think it's better to try to reuse the existing functions.
>
> More below:
>
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 135 ++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 133 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2ac5668a94ce..06608d696e2e 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -500,6 +500,10 @@ struct virtio_net_common_hdr {
> > };
> >
> > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > + struct net_device *dev,
> > + unsigned int *xdp_xmit,
> > + struct virtnet_rq_stats *stats);
> >
> > static bool is_xdp_frame(void *ptr)
> > {
> > @@ -1040,6 +1044,120 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > sg->length = len;
> > }
> >
> > +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > + struct receive_queue *rq, void *buf, u32 len)
> > +{
> > + struct xdp_buff *xdp;
> > + u32 bufsize;
> > +
> > + xdp = (struct xdp_buff *)buf;
> > +
> > + bufsize = xsk_pool_get_rx_frame_size(rq->xsk.pool) + vi->hdr_len;
> > +
> > + if (unlikely(len > bufsize)) {
> > + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> > + vi->dev->name, len, bufsize);
> > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > + xsk_buff_free(xdp);
> > + return NULL;
> > + }
> > +
> > + xsk_buff_set_size(xdp, len);
> > + xsk_buff_dma_sync_for_cpu(xdp);
> > +
> > + return xdp;
> > +}
> > +
> > +static struct sk_buff *xdp_construct_skb(struct receive_queue *rq,
> > + struct xdp_buff *xdp)
> > +{
>
> So we have a similar caller which is receive_small_build_skb(). Any
> chance to reuse that?
receive_small_build_skb works with build_skb.
Here we need to copy the packet from the xsk buffer to the skb buffer.
So I do not think we can reuse it.
>
> > + unsigned int metasize = xdp->data - xdp->data_meta;
> > + struct sk_buff *skb;
> > + unsigned int size;
> > +
> > + size = xdp->data_end - xdp->data_hard_start;
> > + skb = napi_alloc_skb(&rq->napi, size);
> > + if (unlikely(!skb)) {
> > + xsk_buff_free(xdp);
> > + return NULL;
> > + }
> > +
> > + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > +
> > + size = xdp->data_end - xdp->data_meta;
> > + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> > +
> > + if (metasize) {
> > + __skb_pull(skb, metasize);
> > + skb_metadata_set(skb, metasize);
> > + }
> > +
> > + xsk_buff_free(xdp);
> > +
> > + return skb;
> > +}
> > +
> > +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> > + struct receive_queue *rq, struct xdp_buff *xdp,
> > + unsigned int *xdp_xmit,
> > + struct virtnet_rq_stats *stats)
> > +{
> > + struct bpf_prog *prog;
> > + u32 ret;
> > +
> > + ret = XDP_PASS;
> > + rcu_read_lock();
> > + prog = rcu_dereference(rq->xdp_prog);
> > + if (prog)
> > + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> > + rcu_read_unlock();
> > +
> > + switch (ret) {
> > + case XDP_PASS:
> > + return xdp_construct_skb(rq, xdp);
> > +
> > + case XDP_TX:
> > + case XDP_REDIRECT:
> > + return NULL;
> > +
> > + default:
> > + /* drop packet */
> > + xsk_buff_free(xdp);
> > + u64_stats_inc(&stats->drops);
> > + return NULL;
> > + }
> > +}
> > +
> > +static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > + void *buf, u32 len,
> > + unsigned int *xdp_xmit,
> > + struct virtnet_rq_stats *stats)
> > +{
> > + struct net_device *dev = vi->dev;
> > + struct sk_buff *skb = NULL;
> > + struct xdp_buff *xdp;
> > +
> > + len -= vi->hdr_len;
> > +
> > + u64_stats_add(&stats->bytes, len);
> > +
> > + xdp = buf_to_xdp(vi, rq, buf, len);
> > + if (!xdp)
> > + return NULL;
> > +
> > + if (unlikely(len < ETH_HLEN)) {
> > + pr_debug("%s: short packet %i\n", dev->name, len);
> > + DEV_STATS_INC(dev, rx_length_errors);
> > + xsk_buff_free(xdp);
> > + return NULL;
> > + }
> > +
> > + if (!vi->mergeable_rx_bufs)
> > + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
> > +
> > + return skb;
> > +}
> > +
> > static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > struct xsk_buff_pool *pool, gfp_t gfp)
> > {
> > @@ -2363,9 +2481,22 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > void *buf;
> > int i;
> >
> > - if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > - void *ctx;
> > + if (rq->xsk.pool) {
> > + struct sk_buff *skb;
> > +
> > + while (packets < budget) {
> > + buf = virtqueue_get_buf(rq->vq, &len);
> > + if (!buf)
> > + break;
> >
> > + skb = virtnet_receive_xsk_buf(vi, rq, buf, len, xdp_xmit, &stats);
> > + if (skb)
> > + virtnet_receive_done(vi, rq, skb);
> > +
> > + packets++;
> > + }
>
> If reusing turns out to be hard, I'd rather add new paths in receive_small().
The exist function is called after virtnet_rq_get_buf(), that will do dma unmap.
But for xsk, the dma unmap is not need. So xsk receive handle should use
virtqueue_get_buf directly.
Thanks.
>
> > + } else if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > + void *ctx;
> > while (packets < budget &&
> > (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > --
> > 2.32.0.3.g01195cf9f
> >
>
> Thanks
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode
2024-06-28 2:19 ` Jason Wang
@ 2024-06-28 5:50 ` Xuan Zhuo
0 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-06-28 5:50 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, 28 Jun 2024 10:19:44 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Support AF-XDP for merge mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > drivers/net/virtio_net.c | 139 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 139 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 06608d696e2e..cfa106aa8039 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -504,6 +504,10 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > struct net_device *dev,
> > unsigned int *xdp_xmit,
> > struct virtnet_rq_stats *stats);
> > +static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
> > + struct sk_buff *curr_skb,
> > + struct page *page, void *buf,
> > + int len, int truesize);
> >
> > static bool is_xdp_frame(void *ptr)
> > {
> > @@ -1128,6 +1132,139 @@ static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct
> > }
> > }
> >
> > +static void xsk_drop_follow_bufs(struct net_device *dev,
> > + struct receive_queue *rq,
> > + u32 num_buf,
> > + struct virtnet_rq_stats *stats)
> > +{
> > + struct xdp_buff *xdp;
> > + u32 len;
> > +
> > + while (num_buf-- > 1) {
> > + xdp = virtqueue_get_buf(rq->vq, &len);
> > + if (unlikely(!xdp)) {
> > + pr_debug("%s: rx error: %d buffers missing\n",
> > + dev->name, num_buf);
> > + DEV_STATS_INC(dev, rx_length_errors);
> > + break;
> > + }
> > + u64_stats_add(&stats->bytes, len);
> > + xsk_buff_free(xdp);
> > + }
> > +}
> > +
> > +static int xsk_append_merge_buffer(struct virtnet_info *vi,
> > + struct receive_queue *rq,
> > + struct sk_buff *head_skb,
> > + u32 num_buf,
> > + struct virtio_net_hdr_mrg_rxbuf *hdr,
> > + struct virtnet_rq_stats *stats)
> > +{
> > + struct sk_buff *curr_skb;
> > + struct xdp_buff *xdp;
> > + u32 len, truesize;
> > + struct page *page;
> > + void *buf;
> > +
> > + curr_skb = head_skb;
> > +
> > + while (--num_buf) {
> > + buf = virtqueue_get_buf(rq->vq, &len);
> > + if (unlikely(!buf)) {
> > + pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > + vi->dev->name, num_buf,
> > + virtio16_to_cpu(vi->vdev,
> > + hdr->num_buffers));
> > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > + return -EINVAL;
> > + }
> > +
> > + u64_stats_add(&stats->bytes, len);
> > +
> > + xdp = buf_to_xdp(vi, rq, buf, len);
> > + if (!xdp)
> > + goto err;
> > +
> > + buf = napi_alloc_frag(len);
>
> So we don't do this for non xsk paths. Any reason we can't reuse the
> existing codes?
Do you mean this code:
while (--num_buf) {
int num_skb_frags;
-> buf = virtnet_rq_get_buf(rq, &len, &ctx);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
dev->name, num_buf,
virtio16_to_cpu(vi->vdev,
hdr->num_buffers));
DEV_STATS_INC(dev, rx_length_errors);
goto err_buf;
}
u64_stats_add(&stats->bytes, len);
page = virt_to_head_page(buf);
-> truesize = mergeable_ctx_to_truesize(ctx);
-> headroom = mergeable_ctx_to_headroom(ctx);
-> tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
-> room = SKB_DATA_ALIGN(headroom + tailroom);
-> if (unlikely(len > truesize - room)) {
-> pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
-> dev->name, len, (unsigned long)(truesize - room));
-> DEV_STATS_INC(dev, rx_length_errors);
-> goto err_skb;
-> }
curr_skb = virtnet_skb_append_frag(head_skb, curr_skb, page,
buf, len, truesize);
if (!curr_skb)
goto err_skb;
}
The code lines that are marked are differ.
The same logic is separated to function virtnet_skb_append_frag().
So the code is similitude, but we can not merge them.
Thanks.
>
> Thanks
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-06-28 5:42 ` Xuan Zhuo
@ 2024-07-01 3:05 ` Jason Wang
2024-07-01 8:34 ` Xuan Zhuo
0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2024-07-01 3:05 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, Jun 28, 2024 at 1:44 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 28 Jun 2024 10:19:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Implement the logic of filling rq with XSK buffers.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > > drivers/net/virtio_net.c | 68 ++++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 66 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 2bbc715f22c6..2ac5668a94ce 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -355,6 +355,8 @@ struct receive_queue {
> > >
> > > /* xdp rxq used by xsk */
> > > struct xdp_rxq_info xdp_rxq;
> > > +
> > > + struct xdp_buff **xsk_buffs;
> > > } xsk;
> > > };
> > >
> > > @@ -1032,6 +1034,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> > > }
> > > }
> > >
> > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > +{
> > > + sg->dma_address = addr;
> > > + sg->length = len;
> > > +}
> > > +
> > > +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > > + struct xsk_buff_pool *pool, gfp_t gfp)
> > > +{
> > > + struct xdp_buff **xsk_buffs;
> > > + dma_addr_t addr;
> > > + u32 len, i;
> > > + int err = 0;
> > > + int num;
> > > +
> > > + xsk_buffs = rq->xsk.xsk_buffs;
> > > +
> > > + num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
> > > + if (!num)
> > > + return -ENOMEM;
> > > +
> > > + len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> > > +
> > > + for (i = 0; i < num; ++i) {
> > > + /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
> > > + addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
> >
> > We had VIRTIO_XDP_HEADROOM, can we reuse it? Or if it's redundant
> > let's send a patch to switch to XDP_PACKET_HEADROOM.
>
> Do you mean replace it inside the comment?
I meant a patch to s/VIRTIO_XDP_HEADROOM/XDP_PACKET_HEADROOM/g.
>
> I want to describe use the headroom of xsk, the size of the headroom is
> XDP_PACKET_HEADROOM.
>
> >
> > Btw, the code assumes vi->hdr_len < xsk_pool_get_headroom(). It's
> > better to fail if it's not true when enabling xsk.
>
> It is ok.
I mean do we need a check to fail xsk binding if vi->hdr_len >
xsk_pool_get_headroom() or it has been guaranteed by the code already.
Thanks
>
> Thanks.
>
>
> >
> > Thanks
> >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode
2024-06-28 5:44 ` Xuan Zhuo
@ 2024-07-01 3:20 ` Jason Wang
2024-07-01 3:25 ` Jason Wang
0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2024-07-01 3:20 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, Jun 28, 2024 at 1:48 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 28 Jun 2024 10:19:41 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In the process:
> > > 1. We may need to copy data to create skb for XDP_PASS.
> > > 2. We may need to call xsk_buff_free() to release the buffer.
> > > 3. The handle for xdp_buff is difference from the buffer.
> > >
> > > If we pushed this logic into existing receive handle(merge and small),
> > > we would have to maintain code scattered inside merge and small (and big).
> > > So I think it is a good choice for us to put the xsk code into an
> > > independent function.
> >
> > I think it's better to try to reuse the existing functions.
> >
> > More below:
> >
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > > drivers/net/virtio_net.c | 135 ++++++++++++++++++++++++++++++++++++++-
> > > 1 file changed, 133 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 2ac5668a94ce..06608d696e2e 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -500,6 +500,10 @@ struct virtio_net_common_hdr {
> > > };
> > >
> > > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > > + struct net_device *dev,
> > > + unsigned int *xdp_xmit,
> > > + struct virtnet_rq_stats *stats);
> > >
> > > static bool is_xdp_frame(void *ptr)
> > > {
> > > @@ -1040,6 +1044,120 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > sg->length = len;
> > > }
> > >
> > > +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > > + struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > + struct xdp_buff *xdp;
> > > + u32 bufsize;
> > > +
> > > + xdp = (struct xdp_buff *)buf;
> > > +
> > > + bufsize = xsk_pool_get_rx_frame_size(rq->xsk.pool) + vi->hdr_len;
> > > +
> > > + if (unlikely(len > bufsize)) {
> > > + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> > > + vi->dev->name, len, bufsize);
> > > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > > + xsk_buff_free(xdp);
> > > + return NULL;
> > > + }
> > > +
> > > + xsk_buff_set_size(xdp, len);
> > > + xsk_buff_dma_sync_for_cpu(xdp);
> > > +
> > > + return xdp;
> > > +}
> > > +
> > > +static struct sk_buff *xdp_construct_skb(struct receive_queue *rq,
> > > + struct xdp_buff *xdp)
> > > +{
> >
> > So we have a similar caller which is receive_small_build_skb(). Any
> > chance to reuse that?
>
> receive_small_build_skb works with build_skb.
RIght.
>
> Here we need to copy the packet from the xsk buffer to the skb buffer.
> So I do not think we can reuse it.
>
Let's rename this to xsk_construct_skb() ?
>
> >
> > > + unsigned int metasize = xdp->data - xdp->data_meta;
> > > + struct sk_buff *skb;
> > > + unsigned int size;
> > > +
> > > + size = xdp->data_end - xdp->data_hard_start;
> > > + skb = napi_alloc_skb(&rq->napi, size);
> > > + if (unlikely(!skb)) {
> > > + xsk_buff_free(xdp);
> > > + return NULL;
> > > + }
> > > +
> > > + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > > +
> > > + size = xdp->data_end - xdp->data_meta;
> > > + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> > > +
> > > + if (metasize) {
> > > + __skb_pull(skb, metasize);
> > > + skb_metadata_set(skb, metasize);
> > > + }
> > > +
> > > + xsk_buff_free(xdp);
> > > +
> > > + return skb;
> > > +}
> > > +
> > > +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> > > + struct receive_queue *rq, struct xdp_buff *xdp,
> > > + unsigned int *xdp_xmit,
> > > + struct virtnet_rq_stats *stats)
> > > +{
> > > + struct bpf_prog *prog;
> > > + u32 ret;
> > > +
> > > + ret = XDP_PASS;
> > > + rcu_read_lock();
> > > + prog = rcu_dereference(rq->xdp_prog);
> > > + if (prog)
> > > + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> > > + rcu_read_unlock();
> > > +
> > > + switch (ret) {
> > > + case XDP_PASS:
> > > + return xdp_construct_skb(rq, xdp);
> > > +
> > > + case XDP_TX:
> > > + case XDP_REDIRECT:
> > > + return NULL;
> > > +
> > > + default:
> > > + /* drop packet */
> > > + xsk_buff_free(xdp);
> > > + u64_stats_inc(&stats->drops);
> > > + return NULL;
> > > + }
> > > +}
> > > +
> > > +static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > > + void *buf, u32 len,
> > > + unsigned int *xdp_xmit,
> > > + struct virtnet_rq_stats *stats)
> > > +{
> > > + struct net_device *dev = vi->dev;
> > > + struct sk_buff *skb = NULL;
> > > + struct xdp_buff *xdp;
> > > +
> > > + len -= vi->hdr_len;
> > > +
> > > + u64_stats_add(&stats->bytes, len);
> > > +
> > > + xdp = buf_to_xdp(vi, rq, buf, len);
> > > + if (!xdp)
> > > + return NULL;
> > > +
> > > + if (unlikely(len < ETH_HLEN)) {
> > > + pr_debug("%s: short packet %i\n", dev->name, len);
> > > + DEV_STATS_INC(dev, rx_length_errors);
> > > + xsk_buff_free(xdp);
> > > + return NULL;
> > > + }
> > > +
> > > + if (!vi->mergeable_rx_bufs)
> > > + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
> > > +
> > > + return skb;
> > > +}
> > > +
> > > static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > > struct xsk_buff_pool *pool, gfp_t gfp)
> > > {
> > > @@ -2363,9 +2481,22 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > void *buf;
> > > int i;
> > >
> > > - if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > > - void *ctx;
> > > + if (rq->xsk.pool) {
> > > + struct sk_buff *skb;
> > > +
> > > + while (packets < budget) {
> > > + buf = virtqueue_get_buf(rq->vq, &len);
> > > + if (!buf)
> > > + break;
> > >
> > > + skb = virtnet_receive_xsk_buf(vi, rq, buf, len, xdp_xmit, &stats);
> > > + if (skb)
> > > + virtnet_receive_done(vi, rq, skb);
> > > +
> > > + packets++;
> > > + }
> >
> > If reusing turns out to be hard, I'd rather add new paths in receive_small().
>
> The exist function is called after virtnet_rq_get_buf(), that will do dma unmap.
> But for xsk, the dma unmap is not need. So xsk receive handle should use
> virtqueue_get_buf directly.
Probably but if it's just virtnet_rq_get_buf() we can simply did:
static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
{
void *buf;
buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
if (buf && rq->xsk.pool)
virtnet_rq_unmap(rq, buf, *len);
return buf;
}
Thanks
>
> Thanks.
>
> >
> > > + } else if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > > + void *ctx;
> > > while (packets < budget &&
> > > (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
> > Thanks
> >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode
2024-07-01 3:20 ` Jason Wang
@ 2024-07-01 3:25 ` Jason Wang
2024-07-01 8:32 ` Xuan Zhuo
0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2024-07-01 3:25 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Mon, Jul 1, 2024 at 11:20 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Jun 28, 2024 at 1:48 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 28 Jun 2024 10:19:41 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > In the process:
> > > > 1. We may need to copy data to create skb for XDP_PASS.
> > > > 2. We may need to call xsk_buff_free() to release the buffer.
> > > > 3. The handle for xdp_buff is difference from the buffer.
> > > >
> > > > If we pushed this logic into existing receive handle(merge and small),
> > > > we would have to maintain code scattered inside merge and small (and big).
> > > > So I think it is a good choice for us to put the xsk code into an
> > > > independent function.
> > >
> > > I think it's better to try to reuse the existing functions.
> > >
> > > More below:
> > >
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > > drivers/net/virtio_net.c | 135 ++++++++++++++++++++++++++++++++++++++-
> > > > 1 file changed, 133 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 2ac5668a94ce..06608d696e2e 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -500,6 +500,10 @@ struct virtio_net_common_hdr {
> > > > };
> > > >
> > > > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > > > + struct net_device *dev,
> > > > + unsigned int *xdp_xmit,
> > > > + struct virtnet_rq_stats *stats);
> > > >
> > > > static bool is_xdp_frame(void *ptr)
> > > > {
> > > > @@ -1040,6 +1044,120 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > sg->length = len;
> > > > }
> > > >
> > > > +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > > > + struct receive_queue *rq, void *buf, u32 len)
> > > > +{
> > > > + struct xdp_buff *xdp;
> > > > + u32 bufsize;
> > > > +
> > > > + xdp = (struct xdp_buff *)buf;
> > > > +
> > > > + bufsize = xsk_pool_get_rx_frame_size(rq->xsk.pool) + vi->hdr_len;
> > > > +
> > > > + if (unlikely(len > bufsize)) {
> > > > + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> > > > + vi->dev->name, len, bufsize);
> > > > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > > > + xsk_buff_free(xdp);
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + xsk_buff_set_size(xdp, len);
> > > > + xsk_buff_dma_sync_for_cpu(xdp);
> > > > +
> > > > + return xdp;
> > > > +}
> > > > +
> > > > +static struct sk_buff *xdp_construct_skb(struct receive_queue *rq,
> > > > + struct xdp_buff *xdp)
> > > > +{
> > >
> > > So we have a similar caller which is receive_small_build_skb(). Any
> > > chance to reuse that?
> >
> > receive_small_build_skb works with build_skb.
>
> RIght.
>
> >
> > Here we need to copy the packet from the xsk buffer to the skb buffer.
> > So I do not think we can reuse it.
> >
>
> Let's rename this to xsk_construct_skb() ?
>
> >
> > >
> > > > + unsigned int metasize = xdp->data - xdp->data_meta;
> > > > + struct sk_buff *skb;
> > > > + unsigned int size;
> > > > +
> > > > + size = xdp->data_end - xdp->data_hard_start;
> > > > + skb = napi_alloc_skb(&rq->napi, size);
> > > > + if (unlikely(!skb)) {
> > > > + xsk_buff_free(xdp);
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > > > +
> > > > + size = xdp->data_end - xdp->data_meta;
> > > > + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> > > > +
> > > > + if (metasize) {
> > > > + __skb_pull(skb, metasize);
> > > > + skb_metadata_set(skb, metasize);
> > > > + }
> > > > +
> > > > + xsk_buff_free(xdp);
> > > > +
> > > > + return skb;
> > > > +}
> > > > +
> > > > +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> > > > + struct receive_queue *rq, struct xdp_buff *xdp,
> > > > + unsigned int *xdp_xmit,
> > > > + struct virtnet_rq_stats *stats)
> > > > +{
> > > > + struct bpf_prog *prog;
> > > > + u32 ret;
> > > > +
> > > > + ret = XDP_PASS;
> > > > + rcu_read_lock();
> > > > + prog = rcu_dereference(rq->xdp_prog);
> > > > + if (prog)
> > > > + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> > > > + rcu_read_unlock();
> > > > +
> > > > + switch (ret) {
> > > > + case XDP_PASS:
> > > > + return xdp_construct_skb(rq, xdp);
> > > > +
> > > > + case XDP_TX:
> > > > + case XDP_REDIRECT:
> > > > + return NULL;
> > > > +
> > > > + default:
> > > > + /* drop packet */
> > > > + xsk_buff_free(xdp);
> > > > + u64_stats_inc(&stats->drops);
> > > > + return NULL;
> > > > + }
> > > > +}
> > > > +
> > > > +static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > > > + void *buf, u32 len,
> > > > + unsigned int *xdp_xmit,
> > > > + struct virtnet_rq_stats *stats)
> > > > +{
> > > > + struct net_device *dev = vi->dev;
> > > > + struct sk_buff *skb = NULL;
> > > > + struct xdp_buff *xdp;
> > > > +
> > > > + len -= vi->hdr_len;
> > > > +
> > > > + u64_stats_add(&stats->bytes, len);
> > > > +
> > > > + xdp = buf_to_xdp(vi, rq, buf, len);
> > > > + if (!xdp)
> > > > + return NULL;
> > > > +
> > > > + if (unlikely(len < ETH_HLEN)) {
> > > > + pr_debug("%s: short packet %i\n", dev->name, len);
> > > > + DEV_STATS_INC(dev, rx_length_errors);
> > > > + xsk_buff_free(xdp);
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + if (!vi->mergeable_rx_bufs)
> > > > + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
> > > > +
> > > > + return skb;
> > > > +}
> > > > +
> > > > static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > > > struct xsk_buff_pool *pool, gfp_t gfp)
> > > > {
> > > > @@ -2363,9 +2481,22 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > void *buf;
> > > > int i;
> > > >
> > > > - if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > > > - void *ctx;
> > > > + if (rq->xsk.pool) {
> > > > + struct sk_buff *skb;
> > > > +
> > > > + while (packets < budget) {
> > > > + buf = virtqueue_get_buf(rq->vq, &len);
> > > > + if (!buf)
> > > > + break;
> > > >
> > > > + skb = virtnet_receive_xsk_buf(vi, rq, buf, len, xdp_xmit, &stats);
> > > > + if (skb)
> > > > + virtnet_receive_done(vi, rq, skb);
> > > > +
> > > > + packets++;
> > > > + }
> > >
> > > If reusing turns out to be hard, I'd rather add new paths in receive_small().
> >
> > The exist function is called after virtnet_rq_get_buf(), that will do dma unmap.
> > But for xsk, the dma unmap is not need. So xsk receive handle should use
> > virtqueue_get_buf directly.
>
> Probably but if it's just virtnet_rq_get_buf() we can simply did:
>
> static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> {
> void *buf;
>
> buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> if (buf && rq->xsk.pool)
> virtnet_rq_unmap(rq, buf, *len);
>
> return buf;
> }
Or maybe it would be much more clearer if we did:
static int virtnet_receive()
{
if (rq->xsk.pool)
virtnet_receive_xsk()
else
virtnet_receive_xxx()
...
}
Thanks
>
> Thanks
>
> >
> > Thanks.
> >
> > >
> > > > + } else if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > > > + void *ctx;
> > > > while (packets < budget &&
> > > > (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> > > Thanks
> > >
> >
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode
2024-07-01 3:25 ` Jason Wang
@ 2024-07-01 8:32 ` Xuan Zhuo
0 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-07-01 8:32 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Mon, 1 Jul 2024 11:25:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 1, 2024 at 11:20 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Fri, Jun 28, 2024 at 1:48 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Fri, 28 Jun 2024 10:19:41 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > In the process:
> > > > > 1. We may need to copy data to create skb for XDP_PASS.
> > > > > 2. We may need to call xsk_buff_free() to release the buffer.
> > > > > 3. The handle for xdp_buff is difference from the buffer.
> > > > >
> > > > > If we pushed this logic into existing receive handle(merge and small),
> > > > > we would have to maintain code scattered inside merge and small (and big).
> > > > > So I think it is a good choice for us to put the xsk code into an
> > > > > independent function.
> > > >
> > > > I think it's better to try to reuse the existing functions.
> > > >
> > > > More below:
> > > >
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > > drivers/net/virtio_net.c | 135 ++++++++++++++++++++++++++++++++++++++-
> > > > > 1 file changed, 133 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 2ac5668a94ce..06608d696e2e 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -500,6 +500,10 @@ struct virtio_net_common_hdr {
> > > > > };
> > > > >
> > > > > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > > > > + struct net_device *dev,
> > > > > + unsigned int *xdp_xmit,
> > > > > + struct virtnet_rq_stats *stats);
> > > > >
> > > > > static bool is_xdp_frame(void *ptr)
> > > > > {
> > > > > @@ -1040,6 +1044,120 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > > sg->length = len;
> > > > > }
> > > > >
> > > > > +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > > > > + struct receive_queue *rq, void *buf, u32 len)
> > > > > +{
> > > > > + struct xdp_buff *xdp;
> > > > > + u32 bufsize;
> > > > > +
> > > > > + xdp = (struct xdp_buff *)buf;
> > > > > +
> > > > > + bufsize = xsk_pool_get_rx_frame_size(rq->xsk.pool) + vi->hdr_len;
> > > > > +
> > > > > + if (unlikely(len > bufsize)) {
> > > > > + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> > > > > + vi->dev->name, len, bufsize);
> > > > > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > > > > + xsk_buff_free(xdp);
> > > > > + return NULL;
> > > > > + }
> > > > > +
> > > > > + xsk_buff_set_size(xdp, len);
> > > > > + xsk_buff_dma_sync_for_cpu(xdp);
> > > > > +
> > > > > + return xdp;
> > > > > +}
> > > > > +
> > > > > +static struct sk_buff *xdp_construct_skb(struct receive_queue *rq,
> > > > > + struct xdp_buff *xdp)
> > > > > +{
> > > >
> > > > So we have a similar caller which is receive_small_build_skb(). Any
> > > > chance to reuse that?
> > >
> > > receive_small_build_skb works with build_skb.
> >
> > RIght.
> >
> > >
> > > Here we need to copy the packet from the xsk buffer to the skb buffer.
> > > So I do not think we can reuse it.
> > >
> >
> > Let's rename this to xsk_construct_skb() ?
> >
> > >
> > > >
> > > > > + unsigned int metasize = xdp->data - xdp->data_meta;
> > > > > + struct sk_buff *skb;
> > > > > + unsigned int size;
> > > > > +
> > > > > + size = xdp->data_end - xdp->data_hard_start;
> > > > > + skb = napi_alloc_skb(&rq->napi, size);
> > > > > + if (unlikely(!skb)) {
> > > > > + xsk_buff_free(xdp);
> > > > > + return NULL;
> > > > > + }
> > > > > +
> > > > > + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > > > > +
> > > > > + size = xdp->data_end - xdp->data_meta;
> > > > > + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> > > > > +
> > > > > + if (metasize) {
> > > > > + __skb_pull(skb, metasize);
> > > > > + skb_metadata_set(skb, metasize);
> > > > > + }
> > > > > +
> > > > > + xsk_buff_free(xdp);
> > > > > +
> > > > > + return skb;
> > > > > +}
> > > > > +
> > > > > +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> > > > > + struct receive_queue *rq, struct xdp_buff *xdp,
> > > > > + unsigned int *xdp_xmit,
> > > > > + struct virtnet_rq_stats *stats)
> > > > > +{
> > > > > + struct bpf_prog *prog;
> > > > > + u32 ret;
> > > > > +
> > > > > + ret = XDP_PASS;
> > > > > + rcu_read_lock();
> > > > > + prog = rcu_dereference(rq->xdp_prog);
> > > > > + if (prog)
> > > > > + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> > > > > + rcu_read_unlock();
> > > > > +
> > > > > + switch (ret) {
> > > > > + case XDP_PASS:
> > > > > + return xdp_construct_skb(rq, xdp);
> > > > > +
> > > > > + case XDP_TX:
> > > > > + case XDP_REDIRECT:
> > > > > + return NULL;
> > > > > +
> > > > > + default:
> > > > > + /* drop packet */
> > > > > + xsk_buff_free(xdp);
> > > > > + u64_stats_inc(&stats->drops);
> > > > > + return NULL;
> > > > > + }
> > > > > +}
> > > > > +
> > > > > +static struct sk_buff *virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > + void *buf, u32 len,
> > > > > + unsigned int *xdp_xmit,
> > > > > + struct virtnet_rq_stats *stats)
> > > > > +{
> > > > > + struct net_device *dev = vi->dev;
> > > > > + struct sk_buff *skb = NULL;
> > > > > + struct xdp_buff *xdp;
> > > > > +
> > > > > + len -= vi->hdr_len;
> > > > > +
> > > > > + u64_stats_add(&stats->bytes, len);
> > > > > +
> > > > > + xdp = buf_to_xdp(vi, rq, buf, len);
> > > > > + if (!xdp)
> > > > > + return NULL;
> > > > > +
> > > > > + if (unlikely(len < ETH_HLEN)) {
> > > > > + pr_debug("%s: short packet %i\n", dev->name, len);
> > > > > + DEV_STATS_INC(dev, rx_length_errors);
> > > > > + xsk_buff_free(xdp);
> > > > > + return NULL;
> > > > > + }
> > > > > +
> > > > > + if (!vi->mergeable_rx_bufs)
> > > > > + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
> > > > > +
> > > > > + return skb;
> > > > > +}
> > > > > +
> > > > > static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > struct xsk_buff_pool *pool, gfp_t gfp)
> > > > > {
> > > > > @@ -2363,9 +2481,22 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > void *buf;
> > > > > int i;
> > > > >
> > > > > - if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > > > > - void *ctx;
> > > > > + if (rq->xsk.pool) {
> > > > > + struct sk_buff *skb;
> > > > > +
> > > > > + while (packets < budget) {
> > > > > + buf = virtqueue_get_buf(rq->vq, &len);
> > > > > + if (!buf)
> > > > > + break;
> > > > >
> > > > > + skb = virtnet_receive_xsk_buf(vi, rq, buf, len, xdp_xmit, &stats);
> > > > > + if (skb)
> > > > > + virtnet_receive_done(vi, rq, skb);
> > > > > +
> > > > > + packets++;
> > > > > + }
> > > >
> > > > If reusing turns out to be hard, I'd rather add new paths in receive_small().
> > >
> > > The exist function is called after virtnet_rq_get_buf(), that will do dma unmap.
> > > But for xsk, the dma unmap is not need. So xsk receive handle should use
> > > virtqueue_get_buf directly.
> >
> > Probably but if it's just virtnet_rq_get_buf() we can simply did:
> >
> > static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > {
> > void *buf;
> >
> > buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > if (buf && rq->xsk.pool)
> > virtnet_rq_unmap(rq, buf, *len);
> >
> > return buf;
> > }
>
> Or maybe it would be much more clearer if we did:
>
> static int virtnet_receive()
> {
>
> if (rq->xsk.pool)
> virtnet_receive_xsk()
> else
> virtnet_receive_xxx()
> ...
> }
I like this.
Thanks.
>
> Thanks
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > > + } else if (!vi->big_packets || vi->mergeable_rx_bufs) {
> > > > > + void *ctx;
> > > > > while (packets < budget &&
> > > > > (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > > >
> > > > Thanks
> > > >
> > >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-07-01 3:05 ` Jason Wang
@ 2024-07-01 8:34 ` Xuan Zhuo
0 siblings, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2024-07-01 8:34 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Mon, 1 Jul 2024 11:05:33 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 28, 2024 at 1:44 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 28 Jun 2024 10:19:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jun 18, 2024 at 3:57 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Implement the logic of filling rq with XSK buffers.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > > drivers/net/virtio_net.c | 68 ++++++++++++++++++++++++++++++++++++++--
> > > > 1 file changed, 66 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 2bbc715f22c6..2ac5668a94ce 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -355,6 +355,8 @@ struct receive_queue {
> > > >
> > > > /* xdp rxq used by xsk */
> > > > struct xdp_rxq_info xdp_rxq;
> > > > +
> > > > + struct xdp_buff **xsk_buffs;
> > > > } xsk;
> > > > };
> > > >
> > > > @@ -1032,6 +1034,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> > > > }
> > > > }
> > > >
> > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > +{
> > > > + sg->dma_address = addr;
> > > > + sg->length = len;
> > > > +}
> > > > +
> > > > +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > > > + struct xsk_buff_pool *pool, gfp_t gfp)
> > > > +{
> > > > + struct xdp_buff **xsk_buffs;
> > > > + dma_addr_t addr;
> > > > + u32 len, i;
> > > > + int err = 0;
> > > > + int num;
> > > > +
> > > > + xsk_buffs = rq->xsk.xsk_buffs;
> > > > +
> > > > + num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
> > > > + if (!num)
> > > > + return -ENOMEM;
> > > > +
> > > > + len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> > > > +
> > > > + for (i = 0; i < num; ++i) {
> > > > + /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
> > > > + addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
> > >
> > > We had VIRTIO_XDP_HEADROOM, can we reuse it? Or if it's redundant
> > > let's send a patch to switch to XDP_PACKET_HEADROOM.
> >
> > Do you mean replace it inside the comment?
>
> I meant a patch to s/VIRTIO_XDP_HEADROOM/XDP_PACKET_HEADROOM/g.
I see.
>
> >
> > I want to describe use the headroom of xsk, the size of the headroom is
> > XDP_PACKET_HEADROOM.
> >
> > >
> > > Btw, the code assumes vi->hdr_len < xsk_pool_get_headroom(). It's
> > > better to fail if it's not true when enabling xsk.
> >
> > It is ok.
>
> I mean do we need a check to fail xsk binding if vi->hdr_len >
> xsk_pool_get_headroom() or it has been guaranteed by the code already.
YES.
Thanks.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2024-07-01 8:34 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-18 7:56 [PATCH net-next v6 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 01/10] virtio_net: separate virtnet_rx_resize() Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 02/10] virtio_net: separate virtnet_tx_resize() Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 03/10] virtio_net: separate receive_buf Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 04/10] virtio_net: separate receive_mergeable Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 05/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
2024-06-28 5:42 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 06/10] virtio_net: xsk: support wakeup Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 07/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
2024-06-20 10:20 ` Paolo Abeni
2024-06-20 10:37 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
2024-06-28 5:42 ` Xuan Zhuo
2024-07-01 3:05 ` Jason Wang
2024-07-01 8:34 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 08/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
2024-06-28 5:44 ` Xuan Zhuo
2024-07-01 3:20 ` Jason Wang
2024-07-01 3:25 ` Jason Wang
2024-07-01 8:32 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 09/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
2024-06-20 10:37 ` Paolo Abeni
2024-06-20 10:57 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
2024-06-28 5:50 ` Xuan Zhuo
2024-06-18 7:56 ` [PATCH net-next v6 10/10] virtio_net: xsk: rx: free the unused xsk buffer Xuan Zhuo
2024-06-20 10:46 ` Paolo Abeni
2024-06-20 10:48 ` Michael S. Tsirkin
2024-06-20 10:54 ` Xuan Zhuo
2024-06-28 2:19 ` Jason Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.