* [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy
@ 2024-07-05 7:37 Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 01/10] virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM Xuan Zhuo
` (10 more replies)
0 siblings, 11 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
v6:
1. start from supporting the rx zerocopy
v5:
1. fix the comments of last version
http://lore.kernel.org/all/20240611114147.31320-1-xuanzhuo@linux.alibaba.com
v4:
1. remove the commits that introduce the independent directory
2. remove the supporting for the rx merge mode (for limit 15
commits of net-next). Let's start with the small mode.
3. merge some commits and remove some not important commits
## AF_XDP
XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support
this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.
At present, we have completed some preparation:
1. vq-reset (virtio spec and kernel code)
2. virtio-core premapped dma
3. virtio-net xdp refactor
So it is time for Virtio-Net to complete the support for the XDP Socket
Zerocopy.
Virtio-net can not increase the queue num at will, so xsk shares the queue with
kernel.
On the other hand, Virtio-Net does not support generate interrupt from driver
manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
is also the local CPU, then we wake up napi directly.
This patch set includes some refactor to the virtio-net to let that to support
AF_XDP.
## performance
ENV: Qemu with vhost-user(polling mode).
Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
### virtio PMD in guest with testpmd
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584
RX-errors: 0
RX-nombuf: 0
TX-packets: 5959955552 TX-errors: 0 TX-bytes: 371030645664
Throughput (since last show)
Rx-pps: 8861574 Rx-bps: 3969985208
Tx-pps: 8861493 Tx-bps: 3969962736
############################################################################
### AF_XDP PMD in guest with testpmd
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712
RX-errors: 0
RX-nombuf: 0
TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152
Throughput (since last show)
Rx-pps: 6333196 Rx-bps: 2837272088
Tx-pps: 6333227 Tx-bps: 2837285936
############################################################################
But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
## maintain
I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
virtio-net.
Please review.
Thanks.
v3
1. virtio introduces helpers for virtio-net sq using premapped dma
2. xsk has more complete support for merge mode
3. fix some problems
v2
1. wakeup uses the way of GVE. No send ipi to wakeup napi on remote cpu.
2. remove rcu. Because we synchronize all operat, so the rcu is not needed.
3. split the commit "move to virtio_net.h" in last patch set. Just move the
struct/api to header when we use them.
4. add comments for some code
v1:
1. remove two virtio commits. Push this patchset to net-next
2. squash "virtio_net: virtnet_poll_tx support rescheduled" to xsk: support tx
3. fix some warnings
Xuan Zhuo (10):
virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM
virtio_net: separate virtnet_rx_resize()
virtio_net: separate virtnet_tx_resize()
virtio_net: separate receive_buf
virtio_net: separate receive_mergeable
virtio_net: xsk: bind/unbind xsk for rx
virtio_net: xsk: support wakeup
virtio_net: xsk: rx: support fill with xsk buffer
virtio_net: xsk: rx: support recv small mode
virtio_net: xsk: rx: support recv merge mode
drivers/net/virtio_net.c | 770 ++++++++++++++++++++++++++++++++++-----
1 file changed, 676 insertions(+), 94 deletions(-)
--
2.32.0.3.g01195cf9f
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH net-next v7 01/10] virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-08 6:18 ` Jason Wang
2024-07-05 7:37 ` [PATCH net-next v7 02/10] virtio_net: separate virtnet_rx_resize() Xuan Zhuo
` (9 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
virtio net has VIRTIO_XDP_HEADROOM that is equal to
XDP_PACKET_HEADROOM to calculate the headroom for xdp.
But here we should use the macro XDP_PACKET_HEADROOM from bpf.h to
calculate the headroom for xdp. So here we remove the
VIRTIO_XDP_HEADROOM, and use the XDP_PACKET_HEADROOM to replace it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
drivers/net/virtio_net.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0b4747e81464..d99898e44456 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -40,9 +40,6 @@ module_param(napi_tx, bool, 0644);
#define VIRTNET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
-/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
-#define VIRTIO_XDP_HEADROOM 256
-
/* Separating two types of XDP xmit */
#define VIRTIO_XDP_TX BIT(0)
#define VIRTIO_XDP_REDIR BIT(1)
@@ -1268,7 +1265,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
{
- return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
+ return vi->xdp_enabled ? XDP_PACKET_HEADROOM : 0;
}
/* We copy the packet for XDP in the following cases:
@@ -1332,7 +1329,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
}
/* Headroom does not contribute to packet length */
- *len = page_off - VIRTIO_XDP_HEADROOM;
+ *len = page_off - XDP_PACKET_HEADROOM;
return page;
err_buf:
__free_pages(page, 0);
@@ -1619,8 +1616,8 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
void *ctx;
xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
- xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
- VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
+ xdp_prepare_buff(xdp, buf - XDP_PACKET_HEADROOM,
+ XDP_PACKET_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
if (!*num_buf)
return 0;
@@ -1737,12 +1734,12 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi,
/* linearize data for XDP */
xdp_page = xdp_linearize_page(rq, num_buf,
*page, offset,
- VIRTIO_XDP_HEADROOM,
+ XDP_PACKET_HEADROOM,
len);
if (!xdp_page)
return NULL;
} else {
- xdp_room = SKB_DATA_ALIGN(VIRTIO_XDP_HEADROOM +
+ xdp_room = SKB_DATA_ALIGN(XDP_PACKET_HEADROOM +
sizeof(struct skb_shared_info));
if (*len + xdp_room > PAGE_SIZE)
return NULL;
@@ -1751,7 +1748,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi,
if (!xdp_page)
return NULL;
- memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
+ memcpy(page_address(xdp_page) + XDP_PACKET_HEADROOM,
page_address(*page) + offset, *len);
}
@@ -1761,7 +1758,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi,
*page = xdp_page;
- return page_address(*page) + VIRTIO_XDP_HEADROOM;
+ return page_address(*page) + XDP_PACKET_HEADROOM;
}
static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
@@ -4971,7 +4968,7 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
{
- unsigned int room = SKB_DATA_ALIGN(VIRTIO_XDP_HEADROOM +
+ unsigned int room = SKB_DATA_ALIGN(XDP_PACKET_HEADROOM +
sizeof(struct skb_shared_info));
unsigned int max_sz = PAGE_SIZE - room - ETH_HLEN;
struct virtnet_info *vi = netdev_priv(dev);
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 02/10] virtio_net: separate virtnet_rx_resize()
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 01/10] virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 03/10] virtio_net: separate virtnet_tx_resize() Xuan Zhuo
` (8 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This patch separates two sub-functions from virtnet_rx_resize():
* virtnet_rx_pause
* virtnet_rx_resume
Then the subsequent reset rx for xsk can share these two functions.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d99898e44456..df5b23374c53 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2665,28 +2665,41 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
-static int virtnet_rx_resize(struct virtnet_info *vi,
- struct receive_queue *rq, u32 ring_num)
+static void virtnet_rx_pause(struct virtnet_info *vi, struct receive_queue *rq)
{
bool running = netif_running(vi->dev);
- int err, qindex;
-
- qindex = rq - vi->rq;
if (running) {
napi_disable(&rq->napi);
virtnet_cancel_dim(vi, &rq->dim);
}
+}
- err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
- if (err)
- netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
+static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queue *rq)
+{
+ bool running = netif_running(vi->dev);
if (!try_fill_recv(vi, rq, GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
if (running)
virtnet_napi_enable(rq->vq, &rq->napi);
+}
+
+static int virtnet_rx_resize(struct virtnet_info *vi,
+ struct receive_queue *rq, u32 ring_num)
+{
+ int err, qindex;
+
+ qindex = rq - vi->rq;
+
+ virtnet_rx_pause(vi, rq);
+
+ err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
+ if (err)
+ netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
+
+ virtnet_rx_resume(vi, rq);
return err;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 03/10] virtio_net: separate virtnet_tx_resize()
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 01/10] virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 02/10] virtio_net: separate virtnet_rx_resize() Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 04/10] virtio_net: separate receive_buf Xuan Zhuo
` (7 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This patch separates two sub-functions from virtnet_tx_resize():
* virtnet_tx_pause
* virtnet_tx_resume
Then the subsequent virtnet_tx_reset() can share these two functions.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index df5b23374c53..7d762614113b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2703,12 +2703,11 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
return err;
}
-static int virtnet_tx_resize(struct virtnet_info *vi,
- struct send_queue *sq, u32 ring_num)
+static void virtnet_tx_pause(struct virtnet_info *vi, struct send_queue *sq)
{
bool running = netif_running(vi->dev);
struct netdev_queue *txq;
- int err, qindex;
+ int qindex;
qindex = sq - vi->sq;
@@ -2729,10 +2728,17 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
netif_stop_subqueue(vi->dev, qindex);
__netif_tx_unlock_bh(txq);
+}
- err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
- if (err)
- netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
+static void virtnet_tx_resume(struct virtnet_info *vi, struct send_queue *sq)
+{
+ bool running = netif_running(vi->dev);
+ struct netdev_queue *txq;
+ int qindex;
+
+ qindex = sq - vi->sq;
+
+ txq = netdev_get_tx_queue(vi->dev, qindex);
__netif_tx_lock_bh(txq);
sq->reset = false;
@@ -2741,6 +2747,23 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
if (running)
virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
+}
+
+static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq,
+ u32 ring_num)
+{
+ int qindex, err;
+
+ qindex = sq - vi->sq;
+
+ virtnet_tx_pause(vi, sq);
+
+ err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+ if (err)
+ netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
+
+ virtnet_tx_resume(vi, sq);
+
return err;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 04/10] virtio_net: separate receive_buf
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (2 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 03/10] virtio_net: separate virtnet_tx_resize() Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 05/10] virtio_net: separate receive_mergeable Xuan Zhuo
` (6 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This commit separates the function receive_buf(), then we wrap the logic
of handling the skb to an independent function virtnet_receive_done().
The subsequent commit will reuse it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 62 +++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 27 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7d762614113b..abfc84af90ce 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1968,6 +1968,40 @@ static void virtio_skb_set_hash(const struct virtio_net_hdr_v1_hash *hdr_hash,
skb_set_hash(skb, __le32_to_cpu(hdr_hash->hash_value), rss_hash_type);
}
+static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
+ struct sk_buff *skb, u8 flags)
+{
+ struct virtio_net_common_hdr *hdr;
+ struct net_device *dev = vi->dev;
+
+ hdr = skb_vnet_common_hdr(skb);
+ if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
+ virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
+
+ if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
+ virtio_is_little_endian(vi->vdev))) {
+ net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
+ dev->name, hdr->hdr.gso_type,
+ hdr->hdr.gso_size);
+ goto frame_err;
+ }
+
+ skb_record_rx_queue(skb, vq2rxq(rq->vq));
+ skb->protocol = eth_type_trans(skb, dev);
+ pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
+ ntohs(skb->protocol), skb->len, skb->pkt_type);
+
+ napi_gro_receive(&rq->napi, skb);
+ return;
+
+frame_err:
+ DEV_STATS_INC(dev, rx_frame_errors);
+ dev_kfree_skb(skb);
+}
+
static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
void *buf, unsigned int len, void **ctx,
unsigned int *xdp_xmit,
@@ -1975,7 +2009,6 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
{
struct net_device *dev = vi->dev;
struct sk_buff *skb;
- struct virtio_net_common_hdr *hdr;
u8 flags;
if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
@@ -2005,32 +2038,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
if (unlikely(!skb))
return;
- hdr = skb_vnet_common_hdr(skb);
- if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
- virtio_skb_set_hash(&hdr->hash_v1_hdr, skb);
-
- if (flags & VIRTIO_NET_HDR_F_DATA_VALID)
- skb->ip_summed = CHECKSUM_UNNECESSARY;
-
- if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
- virtio_is_little_endian(vi->vdev))) {
- net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
- dev->name, hdr->hdr.gso_type,
- hdr->hdr.gso_size);
- goto frame_err;
- }
-
- skb_record_rx_queue(skb, vq2rxq(rq->vq));
- skb->protocol = eth_type_trans(skb, dev);
- pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
- ntohs(skb->protocol), skb->len, skb->pkt_type);
-
- napi_gro_receive(&rq->napi, skb);
- return;
-
-frame_err:
- DEV_STATS_INC(dev, rx_frame_errors);
- dev_kfree_skb(skb);
+ virtnet_receive_done(vi, rq, skb, flags);
}
/* Unlike mergeable buffers, all buffers are allocated to the
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 05/10] virtio_net: separate receive_mergeable
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (3 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 04/10] virtio_net: separate receive_buf Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 06/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
` (5 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This commit separates the function receive_mergeable(),
put the logic of appending frag to the skb as an independent function.
The subsequent commit will reuse it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 77 ++++++++++++++++++++++++----------------
1 file changed, 47 insertions(+), 30 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index abfc84af90ce..3c828cdd438b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1821,6 +1821,49 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
return NULL;
}
+static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
+ struct sk_buff *curr_skb,
+ struct page *page, void *buf,
+ int len, int truesize)
+{
+ int num_skb_frags;
+ int offset;
+
+ num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
+ if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
+ struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
+
+ if (unlikely(!nskb))
+ return NULL;
+
+ if (curr_skb == head_skb)
+ skb_shinfo(curr_skb)->frag_list = nskb;
+ else
+ curr_skb->next = nskb;
+ curr_skb = nskb;
+ head_skb->truesize += nskb->truesize;
+ num_skb_frags = 0;
+ }
+
+ if (curr_skb != head_skb) {
+ head_skb->data_len += len;
+ head_skb->len += len;
+ head_skb->truesize += truesize;
+ }
+
+ offset = buf - page_address(page);
+ if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
+ put_page(page);
+ skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1,
+ len, truesize);
+ } else {
+ skb_add_rx_frag(curr_skb, num_skb_frags, page,
+ offset, len, truesize);
+ }
+
+ return curr_skb;
+}
+
static struct sk_buff *receive_mergeable(struct net_device *dev,
struct virtnet_info *vi,
struct receive_queue *rq,
@@ -1870,8 +1913,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
if (unlikely(!curr_skb))
goto err_skb;
while (--num_buf) {
- int num_skb_frags;
-
buf = virtnet_rq_get_buf(rq, &len, &ctx);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
@@ -1896,34 +1937,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
goto err_skb;
}
- num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
- if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
- struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
-
- if (unlikely(!nskb))
- goto err_skb;
- if (curr_skb == head_skb)
- skb_shinfo(curr_skb)->frag_list = nskb;
- else
- curr_skb->next = nskb;
- curr_skb = nskb;
- head_skb->truesize += nskb->truesize;
- num_skb_frags = 0;
- }
- if (curr_skb != head_skb) {
- head_skb->data_len += len;
- head_skb->len += len;
- head_skb->truesize += truesize;
- }
- offset = buf - page_address(page);
- if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
- put_page(page);
- skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1,
- len, truesize);
- } else {
- skb_add_rx_frag(curr_skb, num_skb_frags, page,
- offset, len, truesize);
- }
+ curr_skb = virtnet_skb_append_frag(head_skb, curr_skb, page,
+ buf, len, truesize);
+ if (!curr_skb)
+ goto err_skb;
}
ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 06/10] virtio_net: xsk: bind/unbind xsk for rx
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (4 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 05/10] virtio_net: separate receive_mergeable Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-08 6:36 ` Jason Wang
2024-07-05 7:37 ` [PATCH net-next v7 07/10] virtio_net: xsk: support wakeup Xuan Zhuo
` (4 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
This patch implement the logic of bind/unbind xsk pool to rq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
v7:
1. remove a container struct for xsk
2. update comments
3. add check between hdr_len and xsk headroom
drivers/net/virtio_net.c | 134 +++++++++++++++++++++++++++++++++++++++
1 file changed, 134 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3c828cdd438b..cd87b39600d4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -25,6 +25,7 @@
#include <net/net_failover.h>
#include <net/netdev_rx_queue.h>
#include <net/netdev_queues.h>
+#include <net/xdp_sock_drv.h>
static int napi_weight = NAPI_POLL_WEIGHT;
module_param(napi_weight, int, 0444);
@@ -348,6 +349,11 @@ struct receive_queue {
/* Record the last dma info to free after new pages is allocated. */
struct virtnet_rq_dma *last_dma;
+
+ struct xsk_buff_pool *xsk_pool;
+
+ /* xdp rxq used by xsk */
+ struct xdp_rxq_info xsk_rxq_info;
};
/* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -5026,6 +5032,132 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
return virtnet_set_guest_offloads(vi, offloads);
}
+static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct receive_queue *rq,
+ struct xsk_buff_pool *pool)
+{
+ int err, qindex;
+
+ qindex = rq - vi->rq;
+
+ if (pool) {
+ err = xdp_rxq_info_reg(&rq->xsk_rxq_info, vi->dev, qindex, rq->napi.napi_id);
+ if (err < 0)
+ return err;
+
+ err = xdp_rxq_info_reg_mem_model(&rq->xsk_rxq_info,
+ MEM_TYPE_XSK_BUFF_POOL, NULL);
+ if (err < 0)
+ goto unreg;
+
+ xsk_pool_set_rxq_info(pool, &rq->xsk_rxq_info);
+ }
+
+ virtnet_rx_pause(vi, rq);
+
+ err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
+ if (err) {
+ netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: %d\n", qindex, err);
+
+ pool = NULL;
+ }
+
+ rq->xsk_pool = pool;
+
+ virtnet_rx_resume(vi, rq);
+
+ if (pool)
+ return 0;
+
+unreg:
+ xdp_rxq_info_unreg(&rq->xsk_rxq_info);
+ return err;
+}
+
+static int virtnet_xsk_pool_enable(struct net_device *dev,
+ struct xsk_buff_pool *pool,
+ u16 qid)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct receive_queue *rq;
+ struct device *dma_dev;
+ struct send_queue *sq;
+ int err;
+
+ if (vi->hdr_len > xsk_pool_get_headroom(pool))
+ return -EINVAL;
+
+ /* In big_packets mode, xdp cannot work, so there is no need to
+ * initialize xsk of rq.
+ */
+ if (vi->big_packets && !vi->mergeable_rx_bufs)
+ return -ENOENT;
+
+ if (qid >= vi->curr_queue_pairs)
+ return -EINVAL;
+
+ sq = &vi->sq[qid];
+ rq = &vi->rq[qid];
+
+ /* xsk assumes that tx and rx must have the same dma device. The af-xdp
+ * may use one buffer to receive from the rx and reuse this buffer to
+ * send by the tx. So the dma dev of sq and rq must be the same one.
+ *
+ * But vq->dma_dev allows every vq has the respective dma dev. So I
+ * check the dma dev of vq and sq is the same dev.
+ */
+ if (virtqueue_dma_dev(rq->vq) != virtqueue_dma_dev(sq->vq))
+ return -EPERM;
+
+ dma_dev = virtqueue_dma_dev(rq->vq);
+ if (!dma_dev)
+ return -EPERM;
+
+ err = xsk_pool_dma_map(pool, dma_dev, 0);
+ if (err)
+ goto err_xsk_map;
+
+ err = virtnet_rq_bind_xsk_pool(vi, rq, pool);
+ if (err)
+ goto err_rq;
+
+ return 0;
+
+err_rq:
+ xsk_pool_dma_unmap(pool, 0);
+err_xsk_map:
+ return err;
+}
+
+static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct xsk_buff_pool *pool;
+ struct receive_queue *rq;
+ int err;
+
+ if (qid >= vi->curr_queue_pairs)
+ return -EINVAL;
+
+ rq = &vi->rq[qid];
+
+ pool = rq->xsk_pool;
+
+ err = virtnet_rq_bind_xsk_pool(vi, rq, NULL);
+
+ xsk_pool_dma_unmap(pool, 0);
+
+ return err;
+}
+
+static int virtnet_xsk_pool_setup(struct net_device *dev, struct netdev_bpf *xdp)
+{
+ if (xdp->xsk.pool)
+ return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
+ xdp->xsk.queue_id);
+ else
+ return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
+}
+
static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
{
@@ -5151,6 +5283,8 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
switch (xdp->command) {
case XDP_SETUP_PROG:
return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
+ case XDP_SETUP_XSK_POOL:
+ return virtnet_xsk_pool_setup(dev, xdp);
default:
return -EINVAL;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 07/10] virtio_net: xsk: support wakeup
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (5 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 06/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 08/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
` (3 subsequent siblings)
10 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
xsk wakeup is used to trigger the logic for xsk xmit by xsk framework or
user.
Virtio-net does not support to actively generate an interruption, so it
tries to trigger tx NAPI on the local cpu.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cd87b39600d4..29fa25ce1a7f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1054,6 +1054,29 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
}
}
+static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct send_queue *sq;
+
+ if (!netif_running(dev))
+ return -ENETDOWN;
+
+ if (qid >= vi->curr_queue_pairs)
+ return -EINVAL;
+
+ sq = &vi->sq[qid];
+
+ if (napi_if_scheduled_mark_missed(&sq->napi))
+ return 0;
+
+ local_bh_disable();
+ virtqueue_napi_schedule(&sq->napi, sq->vq);
+ local_bh_enable();
+
+ return 0;
+}
+
static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
struct send_queue *sq,
struct xdp_frame *xdpf)
@@ -5399,6 +5422,7 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
.ndo_bpf = virtnet_xdp,
.ndo_xdp_xmit = virtnet_xdp_xmit,
+ .ndo_xsk_wakeup = virtnet_xsk_wakeup,
.ndo_features_check = passthru_features_check,
.ndo_get_phys_port_name = virtnet_get_phys_port_name,
.ndo_set_features = virtnet_set_features,
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 08/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (6 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 07/10] virtio_net: xsk: support wakeup Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-08 6:49 ` Jason Wang
2024-07-05 7:37 ` [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
` (2 subsequent siblings)
10 siblings, 1 reply; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
Implement the logic of filling rq with XSK buffers.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
v7:
1. some small fixes
drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++++++++---
1 file changed, 66 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 29fa25ce1a7f..2b27f5ada64a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -354,6 +354,8 @@ struct receive_queue {
/* xdp rxq used by xsk */
struct xdp_rxq_info xsk_rxq_info;
+
+ struct xdp_buff **xsk_buffs;
};
/* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -1054,6 +1056,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
}
}
+static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
+{
+ sg->dma_address = addr;
+ sg->length = len;
+}
+
+static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
+ struct xsk_buff_pool *pool, gfp_t gfp)
+{
+ struct xdp_buff **xsk_buffs;
+ dma_addr_t addr;
+ int err = 0;
+ u32 len, i;
+ int num;
+
+ xsk_buffs = rq->xsk_buffs;
+
+ num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
+ if (!num)
+ return -ENOMEM;
+
+ len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
+
+ for (i = 0; i < num; ++i) {
+ /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
+ addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
+
+ sg_init_table(rq->sg, 1);
+ sg_fill_dma(rq->sg, addr, len);
+
+ err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, xsk_buffs[i], gfp);
+ if (err)
+ goto err;
+ }
+
+ return num;
+
+err:
+ if (i)
+ err = i;
+
+ for (; i < num; ++i)
+ xsk_buff_free(xsk_buffs[i]);
+
+ return err;
+}
+
static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -2245,7 +2294,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
gfp_t gfp)
{
int err;
- bool oom;
+
+ if (rq->xsk_pool) {
+ err = virtnet_add_recvbuf_xsk(vi, rq, rq->xsk_pool, gfp);
+ goto kick;
+ }
do {
if (vi->mergeable_rx_bufs)
@@ -2255,10 +2308,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
else
err = add_recvbuf_small(vi, rq, gfp);
- oom = err == -ENOMEM;
if (err)
break;
} while (rq->vq->num_free);
+
+kick:
if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
unsigned long flags;
@@ -2267,7 +2321,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
u64_stats_update_end_irqrestore(&rq->stats.syncp, flags);
}
- return !oom;
+ return err != -ENOMEM;
}
static void skb_recv_done(struct virtqueue *rvq)
@@ -5104,7 +5158,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
struct receive_queue *rq;
struct device *dma_dev;
struct send_queue *sq;
- int err;
+ int err, size;
if (vi->hdr_len > xsk_pool_get_headroom(pool))
return -EINVAL;
@@ -5135,6 +5189,12 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
if (!dma_dev)
return -EPERM;
+ size = virtqueue_get_vring_size(rq->vq);
+
+ rq->xsk_buffs = kvcalloc(size, sizeof(*rq->xsk_buffs), GFP_KERNEL);
+ if (!rq->xsk_buffs)
+ return -ENOMEM;
+
err = xsk_pool_dma_map(pool, dma_dev, 0);
if (err)
goto err_xsk_map;
@@ -5169,6 +5229,8 @@ static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
xsk_pool_dma_unmap(pool, 0);
+ kvfree(rq->xsk_buffs);
+
return err;
}
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (7 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 08/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-08 7:00 ` Jason Wang
2024-07-05 7:37 ` [PATCH net-next v7 10/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
2024-07-05 14:14 ` [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Michal Kubiak
10 siblings, 1 reply; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
In the process:
1. We may need to copy data to create skb for XDP_PASS.
2. We may need to call xsk_buff_free() to release the buffer.
3. The handle for xdp_buff is difference from the buffer.
If we pushed this logic into existing receive handle(merge and small),
we would have to maintain code scattered inside merge and small (and big).
So I think it is a good choice for us to put the xsk code into an
independent function.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
v7:
1. rename xdp_construct_skb to xsk_construct_skb
2. refactor virtnet_receive()
drivers/net/virtio_net.c | 176 +++++++++++++++++++++++++++++++++++++--
1 file changed, 168 insertions(+), 8 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2b27f5ada64a..64d8cd481890 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -498,6 +498,12 @@ struct virtio_net_common_hdr {
};
static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
+static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
+ struct net_device *dev,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats);
+static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
+ struct sk_buff *skb, u8 flags);
static bool is_xdp_frame(void *ptr)
{
@@ -1062,6 +1068,124 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
sg->length = len;
}
+static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
+ struct receive_queue *rq, void *buf, u32 len)
+{
+ struct xdp_buff *xdp;
+ u32 bufsize;
+
+ xdp = (struct xdp_buff *)buf;
+
+ bufsize = xsk_pool_get_rx_frame_size(rq->xsk_pool) + vi->hdr_len;
+
+ if (unlikely(len > bufsize)) {
+ pr_debug("%s: rx error: len %u exceeds truesize %u\n",
+ vi->dev->name, len, bufsize);
+ DEV_STATS_INC(vi->dev, rx_length_errors);
+ xsk_buff_free(xdp);
+ return NULL;
+ }
+
+ xsk_buff_set_size(xdp, len);
+ xsk_buff_dma_sync_for_cpu(xdp);
+
+ return xdp;
+}
+
+static struct sk_buff *xsk_construct_skb(struct receive_queue *rq,
+ struct xdp_buff *xdp)
+{
+ unsigned int metasize = xdp->data - xdp->data_meta;
+ struct sk_buff *skb;
+ unsigned int size;
+
+ size = xdp->data_end - xdp->data_hard_start;
+ skb = napi_alloc_skb(&rq->napi, size);
+ if (unlikely(!skb)) {
+ xsk_buff_free(xdp);
+ return NULL;
+ }
+
+ skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
+
+ size = xdp->data_end - xdp->data_meta;
+ memcpy(__skb_put(skb, size), xdp->data_meta, size);
+
+ if (metasize) {
+ __skb_pull(skb, metasize);
+ skb_metadata_set(skb, metasize);
+ }
+
+ xsk_buff_free(xdp);
+
+ return skb;
+}
+
+static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
+ struct receive_queue *rq, struct xdp_buff *xdp,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ struct bpf_prog *prog;
+ u32 ret;
+
+ ret = XDP_PASS;
+ rcu_read_lock();
+ prog = rcu_dereference(rq->xdp_prog);
+ if (prog)
+ ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
+ rcu_read_unlock();
+
+ switch (ret) {
+ case XDP_PASS:
+ return xsk_construct_skb(rq, xdp);
+
+ case XDP_TX:
+ case XDP_REDIRECT:
+ return NULL;
+
+ default:
+ /* drop packet */
+ xsk_buff_free(xdp);
+ u64_stats_inc(&stats->drops);
+ return NULL;
+ }
+}
+
+static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
+ void *buf, u32 len,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ struct net_device *dev = vi->dev;
+ struct sk_buff *skb = NULL;
+ struct xdp_buff *xdp;
+ u8 flags;
+
+ len -= vi->hdr_len;
+
+ u64_stats_add(&stats->bytes, len);
+
+ xdp = buf_to_xdp(vi, rq, buf, len);
+ if (!xdp)
+ return;
+
+ if (unlikely(len < ETH_HLEN)) {
+ pr_debug("%s: short packet %i\n", dev->name, len);
+ DEV_STATS_INC(dev, rx_length_errors);
+ xsk_buff_free(xdp);
+ return;
+ }
+
+ flags = ((struct virtio_net_common_hdr *)(xdp->data - vi->hdr_len))->hdr.flags;
+
+ if (!vi->mergeable_rx_bufs)
+ skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
+
+ if (skb)
+ virtnet_receive_done(vi, rq, skb, flags);
+}
+
static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
struct xsk_buff_pool *pool, gfp_t gfp)
{
@@ -2392,32 +2516,68 @@ static void refill_work(struct work_struct *work)
}
}
-static int virtnet_receive(struct receive_queue *rq, int budget,
- unsigned int *xdp_xmit)
+static int virtnet_receive_xsk_bufs(struct virtnet_info *vi,
+ struct receive_queue *rq,
+ int budget,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ unsigned int len;
+ int packets = 0;
+ void *buf;
+
+ while (packets < budget) {
+ buf = virtqueue_get_buf(rq->vq, &len);
+ if (!buf)
+ break;
+
+ virtnet_receive_xsk_buf(vi, rq, buf, len, xdp_xmit, stats);
+ packets++;
+ }
+
+ return packets;
+}
+
+static int virtnet_receive_packets(struct virtnet_info *vi,
+ struct receive_queue *rq,
+ int budget,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
{
- struct virtnet_info *vi = rq->vq->vdev->priv;
- struct virtnet_rq_stats stats = {};
unsigned int len;
int packets = 0;
void *buf;
- int i;
if (!vi->big_packets || vi->mergeable_rx_bufs) {
void *ctx;
-
while (packets < budget &&
(buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
- receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
+ receive_buf(vi, rq, buf, len, ctx, xdp_xmit, stats);
packets++;
}
} else {
while (packets < budget &&
(buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
- receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
+ receive_buf(vi, rq, buf, len, NULL, xdp_xmit, stats);
packets++;
}
}
+ return packets;
+}
+
+static int virtnet_receive(struct receive_queue *rq, int budget,
+ unsigned int *xdp_xmit)
+{
+ struct virtnet_info *vi = rq->vq->vdev->priv;
+ struct virtnet_rq_stats stats = {};
+ int i, packets;
+
+ if (rq->xsk_pool)
+ packets = virtnet_receive_xsk_bufs(vi, rq, budget, xdp_xmit, &stats);
+ else
+ packets = virtnet_receive_packets(vi, rq, budget, xdp_xmit, &stats);
+
if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
if (!try_fill_recv(vi, rq, GFP_ATOMIC)) {
spin_lock(&vi->refill_lock);
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v7 10/10] virtio_net: xsk: rx: support recv merge mode
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (8 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
@ 2024-07-05 7:37 ` Xuan Zhuo
2024-07-08 8:10 ` Jason Wang
2024-07-05 14:14 ` [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Michal Kubiak
10 siblings, 1 reply; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-05 7:37 UTC (permalink / raw)
To: netdev
Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
Support AF-XDP for merge mode.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
v7:
1. include the handle for unused buffers
drivers/net/virtio_net.c | 144 +++++++++++++++++++++++++++++++++++++++
1 file changed, 144 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 64d8cd481890..67724e7ab5e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -504,6 +504,10 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
struct virtnet_rq_stats *stats);
static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
struct sk_buff *skb, u8 flags);
+static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb,
+ struct sk_buff *curr_skb,
+ struct page *page, void *buf,
+ int len, int truesize);
static bool is_xdp_frame(void *ptr)
{
@@ -984,6 +988,11 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
rq = &vi->rq[i];
+ if (rq->xsk_pool) {
+ xsk_buff_free((struct xdp_buff *)buf);
+ return;
+ }
+
if (!vi->big_packets || vi->mergeable_rx_bufs)
virtnet_rq_unmap(rq, buf, 0);
@@ -1152,6 +1161,139 @@ static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct
}
}
+static void xsk_drop_follow_bufs(struct net_device *dev,
+ struct receive_queue *rq,
+ u32 num_buf,
+ struct virtnet_rq_stats *stats)
+{
+ struct xdp_buff *xdp;
+ u32 len;
+
+ while (num_buf-- > 1) {
+ xdp = virtqueue_get_buf(rq->vq, &len);
+ if (unlikely(!xdp)) {
+ pr_debug("%s: rx error: %d buffers missing\n",
+ dev->name, num_buf);
+ DEV_STATS_INC(dev, rx_length_errors);
+ break;
+ }
+ u64_stats_add(&stats->bytes, len);
+ xsk_buff_free(xdp);
+ }
+}
+
+static int xsk_append_merge_buffer(struct virtnet_info *vi,
+ struct receive_queue *rq,
+ struct sk_buff *head_skb,
+ u32 num_buf,
+ struct virtio_net_hdr_mrg_rxbuf *hdr,
+ struct virtnet_rq_stats *stats)
+{
+ struct sk_buff *curr_skb;
+ struct xdp_buff *xdp;
+ u32 len, truesize;
+ struct page *page;
+ void *buf;
+
+ curr_skb = head_skb;
+
+ while (--num_buf) {
+ buf = virtqueue_get_buf(rq->vq, &len);
+ if (unlikely(!buf)) {
+ pr_debug("%s: rx error: %d buffers out of %d missing\n",
+ vi->dev->name, num_buf,
+ virtio16_to_cpu(vi->vdev,
+ hdr->num_buffers));
+ DEV_STATS_INC(vi->dev, rx_length_errors);
+ return -EINVAL;
+ }
+
+ u64_stats_add(&stats->bytes, len);
+
+ xdp = buf_to_xdp(vi, rq, buf, len);
+ if (!xdp)
+ goto err;
+
+ buf = napi_alloc_frag(len);
+ if (!buf) {
+ xsk_buff_free(xdp);
+ goto err;
+ }
+
+ memcpy(buf, xdp->data - vi->hdr_len, len);
+
+ xsk_buff_free(xdp);
+
+ page = virt_to_page(buf);
+
+ truesize = len;
+
+ curr_skb = virtnet_skb_append_frag(head_skb, curr_skb, page,
+ buf, len, truesize);
+ if (!curr_skb) {
+ put_page(page);
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ xsk_drop_follow_bufs(vi->dev, rq, num_buf, stats);
+ return -EINVAL;
+}
+
+static struct sk_buff *virtnet_receive_xsk_merge(struct net_device *dev, struct virtnet_info *vi,
+ struct receive_queue *rq, struct xdp_buff *xdp,
+ unsigned int *xdp_xmit,
+ struct virtnet_rq_stats *stats)
+{
+ struct virtio_net_hdr_mrg_rxbuf *hdr;
+ struct bpf_prog *prog;
+ struct sk_buff *skb;
+ u32 ret, num_buf;
+
+ hdr = xdp->data - vi->hdr_len;
+ num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
+
+ ret = XDP_PASS;
+ rcu_read_lock();
+ prog = rcu_dereference(rq->xdp_prog);
+ /* TODO: support multi buffer. */
+ if (prog && num_buf == 1)
+ ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
+ rcu_read_unlock();
+
+ switch (ret) {
+ case XDP_PASS:
+ skb = xsk_construct_skb(rq, xdp);
+ if (!skb)
+ goto drop_bufs;
+
+ if (xsk_append_merge_buffer(vi, rq, skb, num_buf, hdr, stats)) {
+ dev_kfree_skb(skb);
+ goto drop;
+ }
+
+ return skb;
+
+ case XDP_TX:
+ case XDP_REDIRECT:
+ return NULL;
+
+ default:
+ /* drop packet */
+ xsk_buff_free(xdp);
+ }
+
+drop_bufs:
+ xsk_drop_follow_bufs(dev, rq, num_buf, stats);
+
+drop:
+ u64_stats_inc(&stats->drops);
+ return NULL;
+}
+
static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
void *buf, u32 len,
unsigned int *xdp_xmit,
@@ -1181,6 +1323,8 @@ static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queu
if (!vi->mergeable_rx_bufs)
skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
+ else
+ skb = virtnet_receive_xsk_merge(dev, vi, rq, xdp, xdp_xmit, stats);
if (skb)
virtnet_receive_done(vi, rq, skb, flags);
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
` (9 preceding siblings ...)
2024-07-05 7:37 ` [PATCH net-next v7 10/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
@ 2024-07-05 14:14 ` Michal Kubiak
2024-07-08 1:11 ` Xuan Zhuo
10 siblings, 1 reply; 22+ messages in thread
From: Michal Kubiak @ 2024-07-05 14:14 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
On Fri, Jul 05, 2024 at 03:37:24PM +0800, Xuan Zhuo wrote:
> v6:
> 1. start from supporting the rx zerocopy
>
> v5:
> 1. fix the comments of last version
> http://lore.kernel.org/all/20240611114147.31320-1-xuanzhuo@linux.alibaba.com
> v4:
> 1. remove the commits that introduce the independent directory
> 2. remove the supporting for the rx merge mode (for limit 15
> commits of net-next). Let's start with the small mode.
> 3. merge some commits and remove some not important commits
>
> ## AF_XDP
>
> XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> copy feature of xsk (XDP socket) needs to be supported by the driver. The
> performance of zero copy is very good. mlx5 and intel ixgbe already support
> this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> feature.
>
> At present, we have completed some preparation:
>
> 1. vq-reset (virtio spec and kernel code)
> 2. virtio-core premapped dma
> 3. virtio-net xdp refactor
>
> So it is time for Virtio-Net to complete the support for the XDP Socket
> Zerocopy.
>
>
After taking a look at this series I haven't found adding
NETDEV_XDP_ACT_XSK_ZEROCOPY flag to netdev->xdp_features.
Is it intentional or just an oversight?
Thanks,
Michal
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy
2024-07-05 14:14 ` [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Michal Kubiak
@ 2024-07-08 1:11 ` Xuan Zhuo
0 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-08 1:11 UTC (permalink / raw)
To: Michal Kubiak
Cc: netdev, Michael S. Tsirkin, Jason Wang, Eugenio Pérez,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, virtualization, bpf
On Fri, 5 Jul 2024 16:14:04 +0200, Michal Kubiak <michal.kubiak@intel.com> wrote:
> On Fri, Jul 05, 2024 at 03:37:24PM +0800, Xuan Zhuo wrote:
> > v6:
> > 1. start from supporting the rx zerocopy
> >
> > v5:
> > 1. fix the comments of last version
> > http://lore.kernel.org/all/20240611114147.31320-1-xuanzhuo@linux.alibaba.com
> > v4:
> > 1. remove the commits that introduce the independent directory
> > 2. remove the supporting for the rx merge mode (for limit 15
> > commits of net-next). Let's start with the small mode.
> > 3. merge some commits and remove some not important commits
> >
> > ## AF_XDP
> >
> > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > feature.
> >
> > At present, we have completed some preparation:
> >
> > 1. vq-reset (virtio spec and kernel code)
> > 2. virtio-core premapped dma
> > 3. virtio-net xdp refactor
> >
> > So it is time for Virtio-Net to complete the support for the XDP Socket
> > Zerocopy.
> >
> >
>
> After taking a look at this series I haven't found adding
> NETDEV_XDP_ACT_XSK_ZEROCOPY flag to netdev->xdp_features.
> Is it intentional or just an oversight?
Because there are too many commits, the work of virtio net supporting af-xdp is
split to rx part and tx part. This patch set is for rx part. The flags will be
update after tx part.
Thanks.
>
> Thanks,
> Michal
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 01/10] virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM
2024-07-05 7:37 ` [PATCH net-next v7 01/10] virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM Xuan Zhuo
@ 2024-07-08 6:18 ` Jason Wang
0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2024-07-08 6:18 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, Jul 5, 2024 at 3:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> virtio net has VIRTIO_XDP_HEADROOM that is equal to
> XDP_PACKET_HEADROOM to calculate the headroom for xdp.
>
> But here we should use the macro XDP_PACKET_HEADROOM from bpf.h to
> calculate the headroom for xdp. So here we remove the
> VIRTIO_XDP_HEADROOM, and use the XDP_PACKET_HEADROOM to replace it.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 06/10] virtio_net: xsk: bind/unbind xsk for rx
2024-07-05 7:37 ` [PATCH net-next v7 06/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
@ 2024-07-08 6:36 ` Jason Wang
0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2024-07-08 6:36 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, Jul 5, 2024 at 3:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This patch implement the logic of bind/unbind xsk pool to rq.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>
> v7:
> 1. remove a container struct for xsk
> 2. update comments
> 3. add check between hdr_len and xsk headroom
>
> drivers/net/virtio_net.c | 134 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 134 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3c828cdd438b..cd87b39600d4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -25,6 +25,7 @@
> #include <net/net_failover.h>
> #include <net/netdev_rx_queue.h>
> #include <net/netdev_queues.h>
> +#include <net/xdp_sock_drv.h>
>
> static int napi_weight = NAPI_POLL_WEIGHT;
> module_param(napi_weight, int, 0444);
> @@ -348,6 +349,11 @@ struct receive_queue {
>
> /* Record the last dma info to free after new pages is allocated. */
> struct virtnet_rq_dma *last_dma;
> +
> + struct xsk_buff_pool *xsk_pool;
> +
> + /* xdp rxq used by xsk */
> + struct xdp_rxq_info xsk_rxq_info;
> };
>
> /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -5026,6 +5032,132 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> return virtnet_set_guest_offloads(vi, offloads);
> }
>
> +static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct receive_queue *rq,
> + struct xsk_buff_pool *pool)
> +{
> + int err, qindex;
> +
> + qindex = rq - vi->rq;
> +
> + if (pool) {
> + err = xdp_rxq_info_reg(&rq->xsk_rxq_info, vi->dev, qindex, rq->napi.napi_id);
> + if (err < 0)
> + return err;
> +
> + err = xdp_rxq_info_reg_mem_model(&rq->xsk_rxq_info,
> + MEM_TYPE_XSK_BUFF_POOL, NULL);
> + if (err < 0)
> + goto unreg;
> +
> + xsk_pool_set_rxq_info(pool, &rq->xsk_rxq_info);
> + }
> +
> + virtnet_rx_pause(vi, rq);
> +
> + err = virtqueue_reset(rq->vq, virtnet_rq_unmap_free_buf);
> + if (err) {
> + netdev_err(vi->dev, "reset rx fail: rx queue index: %d err: %d\n", qindex, err);
> +
> + pool = NULL;
> + }
> +
> + rq->xsk_pool = pool;
> +
> + virtnet_rx_resume(vi, rq);
> +
> + if (pool)
> + return 0;
> +
> +unreg:
> + xdp_rxq_info_unreg(&rq->xsk_rxq_info);
> + return err;
> +}
> +
> +static int virtnet_xsk_pool_enable(struct net_device *dev,
> + struct xsk_buff_pool *pool,
> + u16 qid)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> + struct receive_queue *rq;
> + struct device *dma_dev;
> + struct send_queue *sq;
> + int err;
> +
> + if (vi->hdr_len > xsk_pool_get_headroom(pool))
> + return -EINVAL;
> +
> + /* In big_packets mode, xdp cannot work, so there is no need to
> + * initialize xsk of rq.
> + */
> + if (vi->big_packets && !vi->mergeable_rx_bufs)
> + return -ENOENT;
> +
> + if (qid >= vi->curr_queue_pairs)
> + return -EINVAL;
> +
> + sq = &vi->sq[qid];
> + rq = &vi->rq[qid];
> +
> + /* xsk assumes that tx and rx must have the same dma device. The af-xdp
> + * may use one buffer to receive from the rx and reuse this buffer to
> + * send by the tx. So the dma dev of sq and rq must be the same one.
> + *
> + * But vq->dma_dev allows every vq has the respective dma dev. So I
> + * check the dma dev of vq and sq is the same dev.
> + */
> + if (virtqueue_dma_dev(rq->vq) != virtqueue_dma_dev(sq->vq))
> + return -EPERM;
I think -EINVAL is better.
> +
> + dma_dev = virtqueue_dma_dev(rq->vq);
> + if (!dma_dev)
> + return -EPERM;
-EINVAL seems to be better.
With those fixed.
Acked-by: Jason Wang <jasowang@redhat.com>
THanks
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 08/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-07-05 7:37 ` [PATCH net-next v7 08/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
@ 2024-07-08 6:49 ` Jason Wang
2024-07-08 7:57 ` Xuan Zhuo
0 siblings, 1 reply; 22+ messages in thread
From: Jason Wang @ 2024-07-08 6:49 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, Jul 5, 2024 at 3:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Implement the logic of filling rq with XSK buffers.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>
> v7:
> 1. some small fixes
>
> drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++++++++---
> 1 file changed, 66 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 29fa25ce1a7f..2b27f5ada64a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -354,6 +354,8 @@ struct receive_queue {
>
> /* xdp rxq used by xsk */
> struct xdp_rxq_info xsk_rxq_info;
> +
> + struct xdp_buff **xsk_buffs;
> };
>
> /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -1054,6 +1056,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> }
> }
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> + sg->dma_address = addr;
> + sg->length = len;
> +}
> +
> +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> + struct xsk_buff_pool *pool, gfp_t gfp)
> +{
> + struct xdp_buff **xsk_buffs;
> + dma_addr_t addr;
> + int err = 0;
> + u32 len, i;
> + int num;
> +
> + xsk_buffs = rq->xsk_buffs;
> +
> + num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
> + if (!num)
> + return -ENOMEM;
> +
> + len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> +
> + for (i = 0; i < num; ++i) {
> + /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
It's better to also say we assume hdr->len is larger than
XDP_PACKET_HEADROOM. (see function xyz).
> + addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
> +
> + sg_init_table(rq->sg, 1);
> + sg_fill_dma(rq->sg, addr, len);
> +
> + err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, xsk_buffs[i], gfp);
> + if (err)
> + goto err;
> + }
> +
> + return num;
> +
> +err:
> + if (i)
> + err = i;
Any reason to assign an index to err here?
Others look good.
Thanks
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode
2024-07-05 7:37 ` [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
@ 2024-07-08 7:00 ` Jason Wang
2024-07-08 7:42 ` Xuan Zhuo
0 siblings, 1 reply; 22+ messages in thread
From: Jason Wang @ 2024-07-08 7:00 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, Jul 5, 2024 at 3:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> In the process:
> 1. We may need to copy data to create skb for XDP_PASS.
> 2. We may need to call xsk_buff_free() to release the buffer.
> 3. The handle for xdp_buff is difference from the buffer.
>
> If we pushed this logic into existing receive handle(merge and small),
> we would have to maintain code scattered inside merge and small (and big).
> So I think it is a good choice for us to put the xsk code into an
> independent function.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>
> v7:
> 1. rename xdp_construct_skb to xsk_construct_skb
> 2. refactor virtnet_receive()
>
> drivers/net/virtio_net.c | 176 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 168 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2b27f5ada64a..64d8cd481890 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -498,6 +498,12 @@ struct virtio_net_common_hdr {
> };
>
> static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> + struct net_device *dev,
> + unsigned int *xdp_xmit,
> + struct virtnet_rq_stats *stats);
> +static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
> + struct sk_buff *skb, u8 flags);
>
> static bool is_xdp_frame(void *ptr)
> {
> @@ -1062,6 +1068,124 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> sg->length = len;
> }
>
> +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> + struct receive_queue *rq, void *buf, u32 len)
> +{
> + struct xdp_buff *xdp;
> + u32 bufsize;
> +
> + xdp = (struct xdp_buff *)buf;
> +
> + bufsize = xsk_pool_get_rx_frame_size(rq->xsk_pool) + vi->hdr_len;
> +
> + if (unlikely(len > bufsize)) {
> + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> + vi->dev->name, len, bufsize);
> + DEV_STATS_INC(vi->dev, rx_length_errors);
> + xsk_buff_free(xdp);
> + return NULL;
> + }
> +
> + xsk_buff_set_size(xdp, len);
> + xsk_buff_dma_sync_for_cpu(xdp);
> +
> + return xdp;
> +}
> +
> +static struct sk_buff *xsk_construct_skb(struct receive_queue *rq,
> + struct xdp_buff *xdp)
> +{
> + unsigned int metasize = xdp->data - xdp->data_meta;
> + struct sk_buff *skb;
> + unsigned int size;
> +
> + size = xdp->data_end - xdp->data_hard_start;
> + skb = napi_alloc_skb(&rq->napi, size);
> + if (unlikely(!skb)) {
> + xsk_buff_free(xdp);
> + return NULL;
> + }
> +
> + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> +
> + size = xdp->data_end - xdp->data_meta;
> + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> +
> + if (metasize) {
> + __skb_pull(skb, metasize);
> + skb_metadata_set(skb, metasize);
> + }
> +
> + xsk_buff_free(xdp);
> +
> + return skb;
> +}
> +
> +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> + struct receive_queue *rq, struct xdp_buff *xdp,
> + unsigned int *xdp_xmit,
> + struct virtnet_rq_stats *stats)
> +{
> + struct bpf_prog *prog;
> + u32 ret;
> +
> + ret = XDP_PASS;
> + rcu_read_lock();
> + prog = rcu_dereference(rq->xdp_prog);
> + if (prog)
> + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> + rcu_read_unlock();
> +
> + switch (ret) {
> + case XDP_PASS:
> + return xsk_construct_skb(rq, xdp);
> +
> + case XDP_TX:
> + case XDP_REDIRECT:
> + return NULL;
> +
> + default:
> + /* drop packet */
> + xsk_buff_free(xdp);
> + u64_stats_inc(&stats->drops);
> + return NULL;
> + }
> +}
> +
> +static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> + void *buf, u32 len,
> + unsigned int *xdp_xmit,
> + struct virtnet_rq_stats *stats)
> +{
> + struct net_device *dev = vi->dev;
> + struct sk_buff *skb = NULL;
> + struct xdp_buff *xdp;
> + u8 flags;
> +
> + len -= vi->hdr_len;
> +
> + u64_stats_add(&stats->bytes, len);
> +
> + xdp = buf_to_xdp(vi, rq, buf, len);
> + if (!xdp)
> + return;
> +
> + if (unlikely(len < ETH_HLEN)) {
> + pr_debug("%s: short packet %i\n", dev->name, len);
> + DEV_STATS_INC(dev, rx_length_errors);
> + xsk_buff_free(xdp);
> + return;
> + }
> +
> + flags = ((struct virtio_net_common_hdr *)(xdp->data - vi->hdr_len))->hdr.flags;
> +
> + if (!vi->mergeable_rx_bufs)
> + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
I wonder if we add the mergeable support in the next patch would it be
better to re-order the patch? For example, the xsk binding needs to be
moved to the last patch, otherwise we break xsk with a mergeable
buffer here?
Or anything I missed here?
Thanks
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode
2024-07-08 7:00 ` Jason Wang
@ 2024-07-08 7:42 ` Xuan Zhuo
2024-07-08 8:08 ` Jason Wang
0 siblings, 1 reply; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-08 7:42 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Mon, 8 Jul 2024 15:00:50 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jul 5, 2024 at 3:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In the process:
> > 1. We may need to copy data to create skb for XDP_PASS.
> > 2. We may need to call xsk_buff_free() to release the buffer.
> > 3. The handle for xdp_buff is difference from the buffer.
> >
> > If we pushed this logic into existing receive handle(merge and small),
> > we would have to maintain code scattered inside merge and small (and big).
> > So I think it is a good choice for us to put the xsk code into an
> > independent function.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >
> > v7:
> > 1. rename xdp_construct_skb to xsk_construct_skb
> > 2. refactor virtnet_receive()
> >
> > drivers/net/virtio_net.c | 176 +++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 168 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2b27f5ada64a..64d8cd481890 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -498,6 +498,12 @@ struct virtio_net_common_hdr {
> > };
> >
> > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > + struct net_device *dev,
> > + unsigned int *xdp_xmit,
> > + struct virtnet_rq_stats *stats);
> > +static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
> > + struct sk_buff *skb, u8 flags);
> >
> > static bool is_xdp_frame(void *ptr)
> > {
> > @@ -1062,6 +1068,124 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > sg->length = len;
> > }
> >
> > +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > + struct receive_queue *rq, void *buf, u32 len)
> > +{
> > + struct xdp_buff *xdp;
> > + u32 bufsize;
> > +
> > + xdp = (struct xdp_buff *)buf;
> > +
> > + bufsize = xsk_pool_get_rx_frame_size(rq->xsk_pool) + vi->hdr_len;
> > +
> > + if (unlikely(len > bufsize)) {
> > + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> > + vi->dev->name, len, bufsize);
> > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > + xsk_buff_free(xdp);
> > + return NULL;
> > + }
> > +
> > + xsk_buff_set_size(xdp, len);
> > + xsk_buff_dma_sync_for_cpu(xdp);
> > +
> > + return xdp;
> > +}
> > +
> > +static struct sk_buff *xsk_construct_skb(struct receive_queue *rq,
> > + struct xdp_buff *xdp)
> > +{
> > + unsigned int metasize = xdp->data - xdp->data_meta;
> > + struct sk_buff *skb;
> > + unsigned int size;
> > +
> > + size = xdp->data_end - xdp->data_hard_start;
> > + skb = napi_alloc_skb(&rq->napi, size);
> > + if (unlikely(!skb)) {
> > + xsk_buff_free(xdp);
> > + return NULL;
> > + }
> > +
> > + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > +
> > + size = xdp->data_end - xdp->data_meta;
> > + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> > +
> > + if (metasize) {
> > + __skb_pull(skb, metasize);
> > + skb_metadata_set(skb, metasize);
> > + }
> > +
> > + xsk_buff_free(xdp);
> > +
> > + return skb;
> > +}
> > +
> > +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> > + struct receive_queue *rq, struct xdp_buff *xdp,
> > + unsigned int *xdp_xmit,
> > + struct virtnet_rq_stats *stats)
> > +{
> > + struct bpf_prog *prog;
> > + u32 ret;
> > +
> > + ret = XDP_PASS;
> > + rcu_read_lock();
> > + prog = rcu_dereference(rq->xdp_prog);
> > + if (prog)
> > + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> > + rcu_read_unlock();
> > +
> > + switch (ret) {
> > + case XDP_PASS:
> > + return xsk_construct_skb(rq, xdp);
> > +
> > + case XDP_TX:
> > + case XDP_REDIRECT:
> > + return NULL;
> > +
> > + default:
> > + /* drop packet */
> > + xsk_buff_free(xdp);
> > + u64_stats_inc(&stats->drops);
> > + return NULL;
> > + }
> > +}
> > +
> > +static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > + void *buf, u32 len,
> > + unsigned int *xdp_xmit,
> > + struct virtnet_rq_stats *stats)
> > +{
> > + struct net_device *dev = vi->dev;
> > + struct sk_buff *skb = NULL;
> > + struct xdp_buff *xdp;
> > + u8 flags;
> > +
> > + len -= vi->hdr_len;
> > +
> > + u64_stats_add(&stats->bytes, len);
> > +
> > + xdp = buf_to_xdp(vi, rq, buf, len);
> > + if (!xdp)
> > + return;
> > +
> > + if (unlikely(len < ETH_HLEN)) {
> > + pr_debug("%s: short packet %i\n", dev->name, len);
> > + DEV_STATS_INC(dev, rx_length_errors);
> > + xsk_buff_free(xdp);
> > + return;
> > + }
> > +
> > + flags = ((struct virtio_net_common_hdr *)(xdp->data - vi->hdr_len))->hdr.flags;
> > +
> > + if (!vi->mergeable_rx_bufs)
> > + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
>
> I wonder if we add the mergeable support in the next patch would it be
> better to re-order the patch? For example, the xsk binding needs to be
> moved to the last patch, otherwise we break xsk with a mergeable
> buffer here?
If you worry that the user works with this commit, I want to say you do not
worry.
Because the flags NETDEV_XDP_ACT_XSK_ZEROCOPY is not added. I plan to add that
after the tx is completed.
I do test by adding this flags locally.
Thanks.
>
> Or anything I missed here?
>
> Thanks
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 08/10] virtio_net: xsk: rx: support fill with xsk buffer
2024-07-08 6:49 ` Jason Wang
@ 2024-07-08 7:57 ` Xuan Zhuo
0 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-08 7:57 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Mon, 8 Jul 2024 14:49:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jul 5, 2024 at 3:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Implement the logic of filling rq with XSK buffers.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >
> > v7:
> > 1. some small fixes
> >
> > drivers/net/virtio_net.c | 70 +++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 66 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 29fa25ce1a7f..2b27f5ada64a 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -354,6 +354,8 @@ struct receive_queue {
> >
> > /* xdp rxq used by xsk */
> > struct xdp_rxq_info xsk_rxq_info;
> > +
> > + struct xdp_buff **xsk_buffs;
> > };
> >
> > /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -1054,6 +1056,53 @@ static void check_sq_full_and_disable(struct virtnet_info *vi,
> > }
> > }
> >
> > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > +{
> > + sg->dma_address = addr;
> > + sg->length = len;
> > +}
> > +
> > +static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue *rq,
> > + struct xsk_buff_pool *pool, gfp_t gfp)
> > +{
> > + struct xdp_buff **xsk_buffs;
> > + dma_addr_t addr;
> > + int err = 0;
> > + u32 len, i;
> > + int num;
> > +
> > + xsk_buffs = rq->xsk_buffs;
> > +
> > + num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
> > + if (!num)
> > + return -ENOMEM;
> > +
> > + len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> > +
> > + for (i = 0; i < num; ++i) {
> > + /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr space */
>
> It's better to also say we assume hdr->len is larger than
> XDP_PACKET_HEADROOM. (see function xyz).
>
> > + addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
> > +
> > + sg_init_table(rq->sg, 1);
> > + sg_fill_dma(rq->sg, addr, len);
> > +
> > + err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, xsk_buffs[i], gfp);
> > + if (err)
> > + goto err;
> > + }
> > +
> > + return num;
> > +
> > +err:
> > + if (i)
> > + err = i;
>
> Any reason to assign an index to err here?
I tried to return the num of bufs added to the ring.
But rethink this, we should return the error of virtqueue_add_inbuf() directly.
Will fix.
Thanks.
>
> Others look good.
>
> Thanks
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode
2024-07-08 7:42 ` Xuan Zhuo
@ 2024-07-08 8:08 ` Jason Wang
2024-07-08 8:09 ` Xuan Zhuo
0 siblings, 1 reply; 22+ messages in thread
From: Jason Wang @ 2024-07-08 8:08 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Mon, Jul 8, 2024 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 8 Jul 2024 15:00:50 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Jul 5, 2024 at 3:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In the process:
> > > 1. We may need to copy data to create skb for XDP_PASS.
> > > 2. We may need to call xsk_buff_free() to release the buffer.
> > > 3. The handle for xdp_buff is difference from the buffer.
> > >
> > > If we pushed this logic into existing receive handle(merge and small),
> > > we would have to maintain code scattered inside merge and small (and big).
> > > So I think it is a good choice for us to put the xsk code into an
> > > independent function.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >
> > > v7:
> > > 1. rename xdp_construct_skb to xsk_construct_skb
> > > 2. refactor virtnet_receive()
> > >
> > > drivers/net/virtio_net.c | 176 +++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 168 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 2b27f5ada64a..64d8cd481890 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -498,6 +498,12 @@ struct virtio_net_common_hdr {
> > > };
> > >
> > > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > > + struct net_device *dev,
> > > + unsigned int *xdp_xmit,
> > > + struct virtnet_rq_stats *stats);
> > > +static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
> > > + struct sk_buff *skb, u8 flags);
> > >
> > > static bool is_xdp_frame(void *ptr)
> > > {
> > > @@ -1062,6 +1068,124 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > sg->length = len;
> > > }
> > >
> > > +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > > + struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > + struct xdp_buff *xdp;
> > > + u32 bufsize;
> > > +
> > > + xdp = (struct xdp_buff *)buf;
> > > +
> > > + bufsize = xsk_pool_get_rx_frame_size(rq->xsk_pool) + vi->hdr_len;
> > > +
> > > + if (unlikely(len > bufsize)) {
> > > + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> > > + vi->dev->name, len, bufsize);
> > > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > > + xsk_buff_free(xdp);
> > > + return NULL;
> > > + }
> > > +
> > > + xsk_buff_set_size(xdp, len);
> > > + xsk_buff_dma_sync_for_cpu(xdp);
> > > +
> > > + return xdp;
> > > +}
> > > +
> > > +static struct sk_buff *xsk_construct_skb(struct receive_queue *rq,
> > > + struct xdp_buff *xdp)
> > > +{
> > > + unsigned int metasize = xdp->data - xdp->data_meta;
> > > + struct sk_buff *skb;
> > > + unsigned int size;
> > > +
> > > + size = xdp->data_end - xdp->data_hard_start;
> > > + skb = napi_alloc_skb(&rq->napi, size);
> > > + if (unlikely(!skb)) {
> > > + xsk_buff_free(xdp);
> > > + return NULL;
> > > + }
> > > +
> > > + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > > +
> > > + size = xdp->data_end - xdp->data_meta;
> > > + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> > > +
> > > + if (metasize) {
> > > + __skb_pull(skb, metasize);
> > > + skb_metadata_set(skb, metasize);
> > > + }
> > > +
> > > + xsk_buff_free(xdp);
> > > +
> > > + return skb;
> > > +}
> > > +
> > > +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> > > + struct receive_queue *rq, struct xdp_buff *xdp,
> > > + unsigned int *xdp_xmit,
> > > + struct virtnet_rq_stats *stats)
> > > +{
> > > + struct bpf_prog *prog;
> > > + u32 ret;
> > > +
> > > + ret = XDP_PASS;
> > > + rcu_read_lock();
> > > + prog = rcu_dereference(rq->xdp_prog);
> > > + if (prog)
> > > + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> > > + rcu_read_unlock();
> > > +
> > > + switch (ret) {
> > > + case XDP_PASS:
> > > + return xsk_construct_skb(rq, xdp);
> > > +
> > > + case XDP_TX:
> > > + case XDP_REDIRECT:
> > > + return NULL;
> > > +
> > > + default:
> > > + /* drop packet */
> > > + xsk_buff_free(xdp);
> > > + u64_stats_inc(&stats->drops);
> > > + return NULL;
> > > + }
> > > +}
> > > +
> > > +static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > > + void *buf, u32 len,
> > > + unsigned int *xdp_xmit,
> > > + struct virtnet_rq_stats *stats)
> > > +{
> > > + struct net_device *dev = vi->dev;
> > > + struct sk_buff *skb = NULL;
> > > + struct xdp_buff *xdp;
> > > + u8 flags;
> > > +
> > > + len -= vi->hdr_len;
> > > +
> > > + u64_stats_add(&stats->bytes, len);
> > > +
> > > + xdp = buf_to_xdp(vi, rq, buf, len);
> > > + if (!xdp)
> > > + return;
> > > +
> > > + if (unlikely(len < ETH_HLEN)) {
> > > + pr_debug("%s: short packet %i\n", dev->name, len);
> > > + DEV_STATS_INC(dev, rx_length_errors);
> > > + xsk_buff_free(xdp);
> > > + return;
> > > + }
> > > +
> > > + flags = ((struct virtio_net_common_hdr *)(xdp->data - vi->hdr_len))->hdr.flags;
> > > +
> > > + if (!vi->mergeable_rx_bufs)
> > > + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
> >
> > I wonder if we add the mergeable support in the next patch would it be
> > better to re-order the patch? For example, the xsk binding needs to be
> > moved to the last patch, otherwise we break xsk with a mergeable
> > buffer here?
>
> If you worry that the user works with this commit, I want to say you do not
> worry.
>
> Because the flags NETDEV_XDP_ACT_XSK_ZEROCOPY is not added. I plan to add that
> after the tx is completed.
Ok, this is something I missed, it would be better to mention it
somewhere (or it is already there but I miss it).
>
> I do test by adding this flags locally.
>
> Thanks.
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
>
> >
> > Or anything I missed here?
> >
> > Thanks
> >
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode
2024-07-08 8:08 ` Jason Wang
@ 2024-07-08 8:09 ` Xuan Zhuo
0 siblings, 0 replies; 22+ messages in thread
From: Xuan Zhuo @ 2024-07-08 8:09 UTC (permalink / raw)
To: Jason Wang
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Mon, 8 Jul 2024 16:08:44 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 8, 2024 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 8 Jul 2024 15:00:50 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Jul 5, 2024 at 3:38 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > In the process:
> > > > 1. We may need to copy data to create skb for XDP_PASS.
> > > > 2. We may need to call xsk_buff_free() to release the buffer.
> > > > 3. The handle for xdp_buff is difference from the buffer.
> > > >
> > > > If we pushed this logic into existing receive handle(merge and small),
> > > > we would have to maintain code scattered inside merge and small (and big).
> > > > So I think it is a good choice for us to put the xsk code into an
> > > > independent function.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >
> > > > v7:
> > > > 1. rename xdp_construct_skb to xsk_construct_skb
> > > > 2. refactor virtnet_receive()
> > > >
> > > > drivers/net/virtio_net.c | 176 +++++++++++++++++++++++++++++++++++++--
> > > > 1 file changed, 168 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 2b27f5ada64a..64d8cd481890 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -498,6 +498,12 @@ struct virtio_net_common_hdr {
> > > > };
> > > >
> > > > static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> > > > + struct net_device *dev,
> > > > + unsigned int *xdp_xmit,
> > > > + struct virtnet_rq_stats *stats);
> > > > +static void virtnet_receive_done(struct virtnet_info *vi, struct receive_queue *rq,
> > > > + struct sk_buff *skb, u8 flags);
> > > >
> > > > static bool is_xdp_frame(void *ptr)
> > > > {
> > > > @@ -1062,6 +1068,124 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > > sg->length = len;
> > > > }
> > > >
> > > > +static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > > > + struct receive_queue *rq, void *buf, u32 len)
> > > > +{
> > > > + struct xdp_buff *xdp;
> > > > + u32 bufsize;
> > > > +
> > > > + xdp = (struct xdp_buff *)buf;
> > > > +
> > > > + bufsize = xsk_pool_get_rx_frame_size(rq->xsk_pool) + vi->hdr_len;
> > > > +
> > > > + if (unlikely(len > bufsize)) {
> > > > + pr_debug("%s: rx error: len %u exceeds truesize %u\n",
> > > > + vi->dev->name, len, bufsize);
> > > > + DEV_STATS_INC(vi->dev, rx_length_errors);
> > > > + xsk_buff_free(xdp);
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + xsk_buff_set_size(xdp, len);
> > > > + xsk_buff_dma_sync_for_cpu(xdp);
> > > > +
> > > > + return xdp;
> > > > +}
> > > > +
> > > > +static struct sk_buff *xsk_construct_skb(struct receive_queue *rq,
> > > > + struct xdp_buff *xdp)
> > > > +{
> > > > + unsigned int metasize = xdp->data - xdp->data_meta;
> > > > + struct sk_buff *skb;
> > > > + unsigned int size;
> > > > +
> > > > + size = xdp->data_end - xdp->data_hard_start;
> > > > + skb = napi_alloc_skb(&rq->napi, size);
> > > > + if (unlikely(!skb)) {
> > > > + xsk_buff_free(xdp);
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > > > +
> > > > + size = xdp->data_end - xdp->data_meta;
> > > > + memcpy(__skb_put(skb, size), xdp->data_meta, size);
> > > > +
> > > > + if (metasize) {
> > > > + __skb_pull(skb, metasize);
> > > > + skb_metadata_set(skb, metasize);
> > > > + }
> > > > +
> > > > + xsk_buff_free(xdp);
> > > > +
> > > > + return skb;
> > > > +}
> > > > +
> > > > +static struct sk_buff *virtnet_receive_xsk_small(struct net_device *dev, struct virtnet_info *vi,
> > > > + struct receive_queue *rq, struct xdp_buff *xdp,
> > > > + unsigned int *xdp_xmit,
> > > > + struct virtnet_rq_stats *stats)
> > > > +{
> > > > + struct bpf_prog *prog;
> > > > + u32 ret;
> > > > +
> > > > + ret = XDP_PASS;
> > > > + rcu_read_lock();
> > > > + prog = rcu_dereference(rq->xdp_prog);
> > > > + if (prog)
> > > > + ret = virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats);
> > > > + rcu_read_unlock();
> > > > +
> > > > + switch (ret) {
> > > > + case XDP_PASS:
> > > > + return xsk_construct_skb(rq, xdp);
> > > > +
> > > > + case XDP_TX:
> > > > + case XDP_REDIRECT:
> > > > + return NULL;
> > > > +
> > > > + default:
> > > > + /* drop packet */
> > > > + xsk_buff_free(xdp);
> > > > + u64_stats_inc(&stats->drops);
> > > > + return NULL;
> > > > + }
> > > > +}
> > > > +
> > > > +static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > > > + void *buf, u32 len,
> > > > + unsigned int *xdp_xmit,
> > > > + struct virtnet_rq_stats *stats)
> > > > +{
> > > > + struct net_device *dev = vi->dev;
> > > > + struct sk_buff *skb = NULL;
> > > > + struct xdp_buff *xdp;
> > > > + u8 flags;
> > > > +
> > > > + len -= vi->hdr_len;
> > > > +
> > > > + u64_stats_add(&stats->bytes, len);
> > > > +
> > > > + xdp = buf_to_xdp(vi, rq, buf, len);
> > > > + if (!xdp)
> > > > + return;
> > > > +
> > > > + if (unlikely(len < ETH_HLEN)) {
> > > > + pr_debug("%s: short packet %i\n", dev->name, len);
> > > > + DEV_STATS_INC(dev, rx_length_errors);
> > > > + xsk_buff_free(xdp);
> > > > + return;
> > > > + }
> > > > +
> > > > + flags = ((struct virtio_net_common_hdr *)(xdp->data - vi->hdr_len))->hdr.flags;
> > > > +
> > > > + if (!vi->mergeable_rx_bufs)
> > > > + skb = virtnet_receive_xsk_small(dev, vi, rq, xdp, xdp_xmit, stats);
> > >
> > > I wonder if we add the mergeable support in the next patch would it be
> > > better to re-order the patch? For example, the xsk binding needs to be
> > > moved to the last patch, otherwise we break xsk with a mergeable
> > > buffer here?
> >
> > If you worry that the user works with this commit, I want to say you do not
> > worry.
> >
> > Because the flags NETDEV_XDP_ACT_XSK_ZEROCOPY is not added. I plan to add that
> > after the tx is completed.
>
> Ok, this is something I missed, it would be better to mention it
> somewhere (or it is already there but I miss it).
OK. I will add it to next version cover.
Thanks.
>
> >
> > I do test by adding this flags locally.
> >
> > Thanks.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
> Thanks
>
> >
> > >
> > > Or anything I missed here?
> > >
> > > Thanks
> > >
> >
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v7 10/10] virtio_net: xsk: rx: support recv merge mode
2024-07-05 7:37 ` [PATCH net-next v7 10/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
@ 2024-07-08 8:10 ` Jason Wang
0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2024-07-08 8:10 UTC (permalink / raw)
To: Xuan Zhuo
Cc: netdev, Michael S. Tsirkin, Eugenio Pérez, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
virtualization, bpf
On Fri, Jul 5, 2024 at 3:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Support AF-XDP for merge mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-07-08 8:11 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-05 7:37 [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 01/10] virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM Xuan Zhuo
2024-07-08 6:18 ` Jason Wang
2024-07-05 7:37 ` [PATCH net-next v7 02/10] virtio_net: separate virtnet_rx_resize() Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 03/10] virtio_net: separate virtnet_tx_resize() Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 04/10] virtio_net: separate receive_buf Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 05/10] virtio_net: separate receive_mergeable Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 06/10] virtio_net: xsk: bind/unbind xsk for rx Xuan Zhuo
2024-07-08 6:36 ` Jason Wang
2024-07-05 7:37 ` [PATCH net-next v7 07/10] virtio_net: xsk: support wakeup Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 08/10] virtio_net: xsk: rx: support fill with xsk buffer Xuan Zhuo
2024-07-08 6:49 ` Jason Wang
2024-07-08 7:57 ` Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 09/10] virtio_net: xsk: rx: support recv small mode Xuan Zhuo
2024-07-08 7:00 ` Jason Wang
2024-07-08 7:42 ` Xuan Zhuo
2024-07-08 8:08 ` Jason Wang
2024-07-08 8:09 ` Xuan Zhuo
2024-07-05 7:37 ` [PATCH net-next v7 10/10] virtio_net: xsk: rx: support recv merge mode Xuan Zhuo
2024-07-08 8:10 ` Jason Wang
2024-07-05 14:14 ` [PATCH net-next v7 00/10] virtio-net: support AF_XDP zero copy Michal Kubiak
2024-07-08 1:11 ` Xuan Zhuo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).