* [PATCH net-next v3 0/5] remove page frag implementation in vhost_net
@ 2024-01-23 10:42 Yunsheng Lin
2024-01-23 10:42 ` [PATCH net-next v3 4/5] vhost/net: remove vhost_net_page_frag_refill() Yunsheng Lin
2024-01-29 12:40 ` [PATCH net-next v3 0/5] remove page frag implementation in vhost_net Yunsheng Lin
0 siblings, 2 replies; 5+ messages in thread
From: Yunsheng Lin @ 2024-01-23 10:42 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Matthias Brugger,
AngeloGioacchino Del Regno, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-arm-kernel,
linux-mediatek, bpf
Currently there are three implementations for page frag:
1. mm/page_alloc.c: net stack seems to be using it in the
rx part with 'struct page_frag_cache' and the main API
being page_frag_alloc_align().
2. net/core/sock.c: net stack seems to be using it in the
tx part with 'struct page_frag' and the main API being
skb_page_frag_refill().
3. drivers/vhost/net.c: vhost seems to be using it to build
xdp frame, and it's implementation seems to be a mix of
the above two.
This patchset tries to unfiy the page frag implementation a
little bit by unifying gfp bit for order 3 page allocation
and replacing page frag implementation in vhost.c with the
one in page_alloc.c.
After this patchset, we are not only able to unify the page
frag implementation a little, but also able to have about
0.5% performance boost testing by using the vhost_net_test
introduced in the last patch.
Before this patchset:
Performance counter stats for './vhost_net_test' (10 runs):
305325.78 msec task-clock # 1.738 CPUs utilized ( +- 0.12% )
1048668 context-switches # 3.435 K/sec ( +- 0.00% )
11 cpu-migrations # 0.036 /sec ( +- 17.64% )
33 page-faults # 0.108 /sec ( +- 0.49% )
244651819491 cycles # 0.801 GHz ( +- 0.43% ) (64)
64714638024 stalled-cycles-frontend # 26.45% frontend cycles idle ( +- 2.19% ) (67)
30774313491 stalled-cycles-backend # 12.58% backend cycles idle ( +- 7.68% ) (70)
201749748680 instructions # 0.82 insn per cycle
# 0.32 stalled cycles per insn ( +- 0.41% ) (66.76%)
65494787909 branches # 214.508 M/sec ( +- 0.35% ) (64)
4284111313 branch-misses # 6.54% of all branches ( +- 0.45% ) (66)
175.699 +- 0.189 seconds time elapsed ( +- 0.11% )
After this patchset:
Performance counter stats for './vhost_net_test' (10 runs):
303974.38 msec task-clock # 1.739 CPUs utilized ( +- 0.14% )
1048807 context-switches # 3.450 K/sec ( +- 0.00% )
14 cpu-migrations # 0.046 /sec ( +- 12.86% )
33 page-faults # 0.109 /sec ( +- 0.46% )
251289376347 cycles # 0.827 GHz ( +- 0.32% ) (60)
67885175415 stalled-cycles-frontend # 27.01% frontend cycles idle ( +- 0.48% ) (63)
27809282600 stalled-cycles-backend # 11.07% backend cycles idle ( +- 0.36% ) (71)
195543234672 instructions # 0.78 insn per cycle
# 0.35 stalled cycles per insn ( +- 0.29% ) (69.04%)
62423183552 branches # 205.357 M/sec ( +- 0.48% ) (67)
4135666632 branch-misses # 6.63% of all branches ( +- 0.63% ) (67)
174.764 +- 0.214 seconds time elapsed ( +- 0.12% )
Changelog:
V3:
1. Add __page_frag_alloc_align() which is passed with the align mask
the original function expected as suggested by Alexander.
2. Drop patch 3 in v2 suggested by Alexander.
3. Reorder patch 4 & 5 in v2 suggested by Alexander.
Note that placing this gfp flags handing for order 3 page in an inline
function is not considered, as we may be able to unify the page_frag
and page_frag_cache handling.
V2: Change 'xor'd' to 'masked off', add vhost tx testing for
vhost_net_test.
V1: Fix some typo, drop RFC tag and rebase on latest net-next.
Yunsheng Lin (5):
mm/page_alloc: modify page_frag_alloc_align() to accept align as an
argument
page_frag: unify gfp bits for order 3 page allocation
net: introduce page_frag_cache_drain()
vhost/net: remove vhost_net_page_frag_refill()
tools: virtio: introduce vhost_net_test
drivers/net/ethernet/google/gve/gve_main.c | 11 +-
drivers/net/ethernet/mediatek/mtk_wed_wo.c | 17 +-
drivers/nvme/host/tcp.c | 7 +-
drivers/nvme/target/tcp.c | 4 +-
drivers/vhost/net.c | 91 +---
include/linux/gfp.h | 16 +-
mm/page_alloc.c | 22 +-
net/core/skbuff.c | 6 +-
net/core/sock.c | 2 +-
tools/virtio/.gitignore | 1 +
tools/virtio/Makefile | 8 +-
tools/virtio/vhost_net_test.c | 576 +++++++++++++++++++++
12 files changed, 647 insertions(+), 114 deletions(-)
create mode 100644 tools/virtio/vhost_net_test.c
--
2.33.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v3 4/5] vhost/net: remove vhost_net_page_frag_refill()
2024-01-23 10:42 [PATCH net-next v3 0/5] remove page frag implementation in vhost_net Yunsheng Lin
@ 2024-01-23 10:42 ` Yunsheng Lin
2024-01-29 12:40 ` [PATCH net-next v3 0/5] remove page frag implementation in vhost_net Yunsheng Lin
1 sibling, 0 replies; 5+ messages in thread
From: Yunsheng Lin @ 2024-01-23 10:42 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Jason Wang,
Michael S. Tsirkin, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, kvm, virtualization, bpf
The page frag in vhost_net_page_frag_refill() uses the
'struct page_frag' from skb_page_frag_refill(), but it's
implementation is similar to page_frag_alloc_align() now.
This patch removes vhost_net_page_frag_refill() by using
'struct page_frag_cache' instead of 'struct page_frag',
and allocating frag using page_frag_alloc_align().
The added benefit is that not only unifying the page frag
implementation a little, but also having about 0.5% performance
boost testing by using the vhost_net_test introduced in the
last patch.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 91 ++++++++++++++-------------------------------
1 file changed, 27 insertions(+), 64 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e574e21cc0ca..4b2fcb228a0a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -141,10 +141,8 @@ struct vhost_net {
unsigned tx_zcopy_err;
/* Flush in progress. Protected by tx vq lock. */
bool tx_flush;
- /* Private page frag */
- struct page_frag page_frag;
- /* Refcount bias of page frag */
- int refcnt_bias;
+ /* Private page frag cache */
+ struct page_frag_cache pf_cache;
};
static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -655,41 +653,6 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
!vhost_vq_avail_empty(vq->dev, vq);
}
-static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
- struct page_frag *pfrag, gfp_t gfp)
-{
- if (pfrag->page) {
- if (pfrag->offset + sz <= pfrag->size)
- return true;
- __page_frag_cache_drain(pfrag->page, net->refcnt_bias);
- }
-
- pfrag->offset = 0;
- net->refcnt_bias = 0;
- if (SKB_FRAG_PAGE_ORDER) {
- /* Avoid direct reclaim but allow kswapd to wake */
- pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
- __GFP_COMP | __GFP_NOWARN |
- __GFP_NORETRY | __GFP_NOMEMALLOC,
- SKB_FRAG_PAGE_ORDER);
- if (likely(pfrag->page)) {
- pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
- goto done;
- }
- }
- pfrag->page = alloc_page(gfp);
- if (likely(pfrag->page)) {
- pfrag->size = PAGE_SIZE;
- goto done;
- }
- return false;
-
-done:
- net->refcnt_bias = USHRT_MAX;
- page_ref_add(pfrag->page, USHRT_MAX - 1);
- return true;
-}
-
#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
@@ -699,7 +662,6 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
struct vhost_net *net = container_of(vq->dev, struct vhost_net,
dev);
struct socket *sock = vhost_vq_get_backend(vq);
- struct page_frag *alloc_frag = &net->page_frag;
struct virtio_net_hdr *gso;
struct xdp_buff *xdp = &nvq->xdp[nvq->batched_xdp];
struct tun_xdp_hdr *hdr;
@@ -710,6 +672,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
int sock_hlen = nvq->sock_hlen;
void *buf;
int copied;
+ int ret;
if (unlikely(len < nvq->sock_hlen))
return -EFAULT;
@@ -719,18 +682,17 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
return -ENOSPC;
buflen += SKB_DATA_ALIGN(len + pad);
- alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
- if (unlikely(!vhost_net_page_frag_refill(net, buflen,
- alloc_frag, GFP_KERNEL)))
+ buf = page_frag_alloc_align(&net->pf_cache, buflen, GFP_KERNEL,
+ SMP_CACHE_BYTES);
+ if (unlikely(!buf))
return -ENOMEM;
- buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
- copied = copy_page_from_iter(alloc_frag->page,
- alloc_frag->offset +
- offsetof(struct tun_xdp_hdr, gso),
- sock_hlen, from);
- if (copied != sock_hlen)
- return -EFAULT;
+ copied = copy_from_iter(buf + offsetof(struct tun_xdp_hdr, gso),
+ sock_hlen, from);
+ if (copied != sock_hlen) {
+ ret = -EFAULT;
+ goto err;
+ }
hdr = buf;
gso = &hdr->gso;
@@ -743,27 +705,30 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
vhost16_to_cpu(vq, gso->csum_start) +
vhost16_to_cpu(vq, gso->csum_offset) + 2);
- if (vhost16_to_cpu(vq, gso->hdr_len) > len)
- return -EINVAL;
+ if (vhost16_to_cpu(vq, gso->hdr_len) > len) {
+ ret = -EINVAL;
+ goto err;
+ }
}
len -= sock_hlen;
- copied = copy_page_from_iter(alloc_frag->page,
- alloc_frag->offset + pad,
- len, from);
- if (copied != len)
- return -EFAULT;
+ copied = copy_from_iter(buf + pad, len, from);
+ if (copied != len) {
+ ret = -EFAULT;
+ goto err;
+ }
xdp_init_buff(xdp, buflen, NULL);
xdp_prepare_buff(xdp, buf, pad, len, true);
hdr->buflen = buflen;
- --net->refcnt_bias;
- alloc_frag->offset += buflen;
-
++nvq->batched_xdp;
return 0;
+
+err:
+ page_frag_free(buf);
+ return ret;
}
static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
@@ -1353,8 +1318,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
vqs[VHOST_NET_VQ_RX]);
f->private_data = n;
- n->page_frag.page = NULL;
- n->refcnt_bias = 0;
+ n->pf_cache.va = NULL;
return 0;
}
@@ -1422,8 +1386,7 @@ static int vhost_net_release(struct inode *inode, struct file *f)
kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
kfree(n->dev.vqs);
- if (n->page_frag.page)
- __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
+ page_frag_cache_drain(&n->pf_cache);
kvfree(n);
return 0;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 0/5] remove page frag implementation in vhost_net
2024-01-23 10:42 [PATCH net-next v3 0/5] remove page frag implementation in vhost_net Yunsheng Lin
2024-01-23 10:42 ` [PATCH net-next v3 4/5] vhost/net: remove vhost_net_page_frag_refill() Yunsheng Lin
@ 2024-01-29 12:40 ` Yunsheng Lin
2024-01-30 2:08 ` Jakub Kicinski
1 sibling, 1 reply; 5+ messages in thread
From: Yunsheng Lin @ 2024-01-29 12:40 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Matthias Brugger,
AngeloGioacchino Del Regno, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-arm-kernel,
linux-mediatek, bpf, Michael S. Tsirkin, Jason Wang
+cc Micheal and Jason
On 2024/1/23 18:42, Yunsheng Lin wrote:
Hi, Micheal and Jason
Is this patchset supposed to go through vhost tree instead of net-next?
As the state is changed to 'Not applicable' in the netdevbpf patchwork,
according to maintainer-netdev.rst:
Not applicable patch is expected to be applied outside of the networking
subsystem
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 0/5] remove page frag implementation in vhost_net
2024-01-29 12:40 ` [PATCH net-next v3 0/5] remove page frag implementation in vhost_net Yunsheng Lin
@ 2024-01-30 2:08 ` Jakub Kicinski
2024-01-30 10:38 ` Yunsheng Lin
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2024-01-30 2:08 UTC (permalink / raw)
To: Yunsheng Lin
Cc: davem, pabeni, netdev, linux-kernel, Matthias Brugger,
AngeloGioacchino Del Regno, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-arm-kernel,
linux-mediatek, bpf, Michael S. Tsirkin, Jason Wang
On Mon, 29 Jan 2024 20:40:37 +0800 Yunsheng Lin wrote:
> Is this patchset supposed to go through vhost tree instead of net-next?
> As the state is changed to 'Not applicable' in the netdevbpf patchwork,
> according to maintainer-netdev.rst:
>
> Not applicable patch is expected to be applied outside of the networking
> subsystem
Sorry about the confusion, DaveM changed the way he uses the states
since they were documented. There were concurrent changes to the gve
driver, patches no longer apply. Could you rebase?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 0/5] remove page frag implementation in vhost_net
2024-01-30 2:08 ` Jakub Kicinski
@ 2024-01-30 10:38 ` Yunsheng Lin
0 siblings, 0 replies; 5+ messages in thread
From: Yunsheng Lin @ 2024-01-30 10:38 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, pabeni, netdev, linux-kernel, Matthias Brugger,
AngeloGioacchino Del Regno, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, linux-arm-kernel,
linux-mediatek, bpf, Michael S. Tsirkin, Jason Wang
On 2024/1/30 10:08, Jakub Kicinski wrote:
> On Mon, 29 Jan 2024 20:40:37 +0800 Yunsheng Lin wrote:
>> Is this patchset supposed to go through vhost tree instead of net-next?
>> As the state is changed to 'Not applicable' in the netdevbpf patchwork,
>> according to maintainer-netdev.rst:
>>
>> Not applicable patch is expected to be applied outside of the networking
>> subsystem
>
> Sorry about the confusion, DaveM changed the way he uses the states
> since they were documented. There were concurrent changes to the gve
> driver, patches no longer apply. Could you rebase?
Sure.
Thanks for clarifying.
> .
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-30 10:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-23 10:42 [PATCH net-next v3 0/5] remove page frag implementation in vhost_net Yunsheng Lin
2024-01-23 10:42 ` [PATCH net-next v3 4/5] vhost/net: remove vhost_net_page_frag_refill() Yunsheng Lin
2024-01-29 12:40 ` [PATCH net-next v3 0/5] remove page frag implementation in vhost_net Yunsheng Lin
2024-01-30 2:08 ` Jakub Kicinski
2024-01-30 10:38 ` Yunsheng Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox