Re: [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache
  2025-05-06 14:55 [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache Jon Kohler
@ 2025-05-06 14:54 ` Willem de Bruijn
  2025-05-06 19:11   ` Jon Kohler
  2025-05-06 14:55 ` [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch Jon Kohler
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Willem de Bruijn @ 2025-05-06 14:54 UTC (permalink / raw)
  To: Jon Kohler, ast, daniel, davem, kuba, hawk, john.fastabend,
	netdev, bpf, jon, aleksander.lobakin

Jon Kohler wrote:
> Use the per-CPU NAPI cache for SKB allocation, leveraging bulk
> allocation since the batch size is known at submission time. This
> improves efficiency by reducing allocation overhead, particularly when
> using IFF_NAPI and GRO, which can replenish the cache in a tight loop.

Do you have experimental data?
 
> Additionally, utilize napi_build_skb and napi_consume_skb to further
> benefit from the NAPI cache.
> 
> Note: This series does not address the large payload path in
> tun_alloc_skb, which spans sock.c and skbuff.c. A separate series will
> handle privatizing the allocation code in tun and integrating the NAPI
> cache for that path.
> 
> Thanks all,
> Jon
> 
> Jon Kohler (4):
>   tun: rcu_deference xdp_prog only once per batch
>   tun: optimize skb allocation in tun_xdp_one
>   tun: use napi_build_skb in __tun_build_skb
>   tun: use napi_consume_skb in tun_do_read
> 
>  drivers/net/tun.c | 60 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 42 insertions(+), 18 deletions(-)
> 
> -- 
> 2.43.0
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache
@ 2025-05-06 14:55 Jon Kohler
  2025-05-06 14:54 ` Willem de Bruijn
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Jon Kohler @ 2025-05-06 14:55 UTC (permalink / raw)
  To: ast, daniel, davem, kuba, hawk, john.fastabend, netdev, bpf, jon,
	aleksander.lobakin

Use the per-CPU NAPI cache for SKB allocation, leveraging bulk
allocation since the batch size is known at submission time. This
improves efficiency by reducing allocation overhead, particularly when
using IFF_NAPI and GRO, which can replenish the cache in a tight loop.

Additionally, utilize napi_build_skb and napi_consume_skb to further
benefit from the NAPI cache.

Note: This series does not address the large payload path in
tun_alloc_skb, which spans sock.c and skbuff.c. A separate series will
handle privatizing the allocation code in tun and integrating the NAPI
cache for that path.

Thanks all,
Jon

Jon Kohler (4):
  tun: rcu_deference xdp_prog only once per batch
  tun: optimize skb allocation in tun_xdp_one
  tun: use napi_build_skb in __tun_build_skb
  tun: use napi_consume_skb in tun_do_read

 drivers/net/tun.c | 60 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 42 insertions(+), 18 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch
  2025-05-06 14:55 [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache Jon Kohler
  2025-05-06 14:54 ` Willem de Bruijn
@ 2025-05-06 14:55 ` Jon Kohler
  2025-05-07 20:43   ` Willem de Bruijn
  2025-05-06 14:55 ` [PATCH net-next 2/4] tun: optimize skb allocation in tun_xdp_one Jon Kohler
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2025-05-06 14:55 UTC (permalink / raw)
  To: ast, daniel, davem, kuba, hawk, john.fastabend, netdev, bpf, jon,
	aleksander.lobakin, Willem de Bruijn, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list

Hoist rcu_dereference(tun->xdp_prog) out of tun_xdp_one, so that
rcu_deference is called once during batch processing.

No functional change intended.

Signed-off-by: Jon Kohler <jon@nutanix.com>
---
 drivers/net/tun.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7babd1e9a378..87fc51916fce 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2353,12 +2353,12 @@ static void tun_put_page(struct tun_page *tpage)
 static int tun_xdp_one(struct tun_struct *tun,
 		       struct tun_file *tfile,
 		       struct xdp_buff *xdp, int *flush,
-		       struct tun_page *tpage)
+		       struct tun_page *tpage,
+		       struct bpf_prog *xdp_prog)
 {
 	unsigned int datasize = xdp->data_end - xdp->data;
 	struct tun_xdp_hdr *hdr = xdp->data_hard_start;
 	struct virtio_net_hdr *gso = &hdr->gso;
-	struct bpf_prog *xdp_prog;
 	struct sk_buff *skb = NULL;
 	struct sk_buff_head *queue;
 	u32 rxhash = 0, act;
@@ -2371,7 +2371,6 @@ static int tun_xdp_one(struct tun_struct *tun,
 	if (unlikely(datasize < ETH_HLEN))
 		return -EINVAL;
 
-	xdp_prog = rcu_dereference(tun->xdp_prog);
 	if (xdp_prog) {
 		if (gso->gso_type) {
 			skb_xdp = true;
@@ -2494,6 +2493,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	if (m->msg_controllen == sizeof(struct tun_msg_ctl) &&
 	    ctl && ctl->type == TUN_MSG_PTR) {
 		struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
+		struct bpf_prog *xdp_prog;
 		struct tun_page tpage;
 		int n = ctl->num;
 		int flush = 0, queued = 0;
@@ -2503,10 +2503,12 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 		local_bh_disable();
 		rcu_read_lock();
 		bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
+		xdp_prog = rcu_dereference(tun->xdp_prog);
 
 		for (i = 0; i < n; i++) {
 			xdp = &((struct xdp_buff *)ctl->ptr)[i];
-			ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+			ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage,
+					  xdp_prog);
 			if (ret > 0)
 				queued += ret;
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/4] tun: optimize skb allocation in tun_xdp_one
  2025-05-06 14:55 [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache Jon Kohler
  2025-05-06 14:54 ` Willem de Bruijn
  2025-05-06 14:55 ` [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch Jon Kohler
@ 2025-05-06 14:55 ` Jon Kohler
  2025-05-07 20:50   ` Willem de Bruijn
  2025-05-06 14:55 ` [PATCH net-next 3/4] tun: use napi_build_skb in __tun_build_skb Jon Kohler
  2025-05-06 14:55 ` [PATCH net-next 4/4] tun: use napi_consume_skb in tun_do_read Jon Kohler
  4 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2025-05-06 14:55 UTC (permalink / raw)
  To: ast, daniel, davem, kuba, hawk, john.fastabend, netdev, bpf, jon,
	aleksander.lobakin, Willem de Bruijn, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list

Enhance TUN_MSG_PTR batch processing by leveraging bulk allocation from
the per-CPU NAPI cache via napi_skb_cache_get_bulk. This improves
efficiency by reducing allocation overhead and is especially useful
when using IFF_NAPI and GRO is able to feed the cache entries back.

Handle scenarios where full preallocation of SKBs is not possible by
gracefully dropping only the uncovered portion of the batch payload.

Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>
---
 drivers/net/tun.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 87fc51916fce..f7f7490e78dc 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2354,12 +2354,12 @@ static int tun_xdp_one(struct tun_struct *tun,
 		       struct tun_file *tfile,
 		       struct xdp_buff *xdp, int *flush,
 		       struct tun_page *tpage,
-		       struct bpf_prog *xdp_prog)
+		       struct bpf_prog *xdp_prog,
+		       struct sk_buff *skb)
 {
 	unsigned int datasize = xdp->data_end - xdp->data;
 	struct tun_xdp_hdr *hdr = xdp->data_hard_start;
 	struct virtio_net_hdr *gso = &hdr->gso;
-	struct sk_buff *skb = NULL;
 	struct sk_buff_head *queue;
 	u32 rxhash = 0, act;
 	int buflen = hdr->buflen;
@@ -2381,16 +2381,15 @@ static int tun_xdp_one(struct tun_struct *tun,
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 		ret = tun_xdp_act(tun, xdp_prog, xdp, act);
-		if (ret < 0) {
-			put_page(virt_to_head_page(xdp->data));
+		if (ret < 0)
 			return ret;
-		}
 
 		switch (ret) {
 		case XDP_REDIRECT:
 			*flush = true;
 			fallthrough;
 		case XDP_TX:
+			napi_consume_skb(skb, 1);
 			return 0;
 		case XDP_PASS:
 			break;
@@ -2403,13 +2402,14 @@ static int tun_xdp_one(struct tun_struct *tun,
 				tpage->page = page;
 				tpage->count = 1;
 			}
+			napi_consume_skb(skb, 1);
 			return 0;
 		}
 	}
 
 build:
-	skb = build_skb(xdp->data_hard_start, buflen);
-	if (!skb) {
+	skb = build_skb_around(skb, xdp->data_hard_start, buflen);
+	if (unlikely(!skb)) {
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -2427,7 +2427,6 @@ static int tun_xdp_one(struct tun_struct *tun,
 
 	if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
 		atomic_long_inc(&tun->rx_frame_errors);
-		kfree_skb(skb);
 		ret = -EINVAL;
 		goto out;
 	}
@@ -2455,7 +2454,6 @@ static int tun_xdp_one(struct tun_struct *tun,
 
 		if (unlikely(tfile->detached)) {
 			spin_unlock(&queue->lock);
-			kfree_skb(skb);
 			return -EBUSY;
 		}
 
@@ -2496,7 +2494,9 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 		struct bpf_prog *xdp_prog;
 		struct tun_page tpage;
 		int n = ctl->num;
-		int flush = 0, queued = 0;
+		int flush = 0, queued = 0, num_skbs = 0;
+		/* Max size of VHOST_NET_BATCH */
+		void *skbs[64];
 
 		memset(&tpage, 0, sizeof(tpage));
 
@@ -2505,12 +2505,27 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 		bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 		xdp_prog = rcu_dereference(tun->xdp_prog);
 
-		for (i = 0; i < n; i++) {
+		num_skbs = napi_skb_cache_get_bulk(skbs, n);
+
+		for (i = 0; i < num_skbs; i++) {
+			struct sk_buff *skb = skbs[i];
 			xdp = &((struct xdp_buff *)ctl->ptr)[i];
 			ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage,
-					  xdp_prog);
+					  xdp_prog, skb);
 			if (ret > 0)
 				queued += ret;
+			else if (ret < 0) {
+				dev_core_stats_rx_dropped_inc(tun->dev);
+				napi_consume_skb(skb, 1);
+				put_page(virt_to_head_page(xdp->data));
+			}
+		}
+
+		/* Handle remaining xdp_buff entries if num_skbs < ctl->num */
+		for (i = num_skbs; i < ctl->num; i++) {
+			xdp = &((struct xdp_buff *)ctl->ptr)[i];
+			dev_core_stats_rx_dropped_inc(tun->dev);
+			put_page(virt_to_head_page(xdp->data));
 		}
 
 		if (flush)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 3/4] tun: use napi_build_skb in __tun_build_skb
  2025-05-06 14:55 [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache Jon Kohler
                   ` (2 preceding siblings ...)
  2025-05-06 14:55 ` [PATCH net-next 2/4] tun: optimize skb allocation in tun_xdp_one Jon Kohler
@ 2025-05-06 14:55 ` Jon Kohler
  2025-05-07 20:50   ` Willem de Bruijn
  2025-05-06 14:55 ` [PATCH net-next 4/4] tun: use napi_consume_skb in tun_do_read Jon Kohler
  4 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2025-05-06 14:55 UTC (permalink / raw)
  To: ast, daniel, davem, kuba, hawk, john.fastabend, netdev, bpf, jon,
	aleksander.lobakin, Willem de Bruijn, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list

Use napi_build_skb for small payload SKBs that end up using the
tun_build_skb path.

Signed-off-by: Jon Kohler <jon@nutanix.com>
---
 drivers/net/tun.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f7f7490e78dc..7b13d4bf5374 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1538,7 +1538,11 @@ static struct sk_buff *__tun_build_skb(struct tun_file *tfile,
 				       int buflen, int len, int pad,
 				       int metasize)
 {
-	struct sk_buff *skb = build_skb(buf, buflen);
+	struct sk_buff *skb;
+
+	local_bh_disable();
+	skb = napi_build_skb(buf, buflen);
+	local_bh_enable();
 
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 4/4] tun: use napi_consume_skb in tun_do_read
  2025-05-06 14:55 [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache Jon Kohler
                   ` (3 preceding siblings ...)
  2025-05-06 14:55 ` [PATCH net-next 3/4] tun: use napi_build_skb in __tun_build_skb Jon Kohler
@ 2025-05-06 14:55 ` Jon Kohler
  4 siblings, 0 replies; 16+ messages in thread
From: Jon Kohler @ 2025-05-06 14:55 UTC (permalink / raw)
  To: ast, daniel, davem, kuba, hawk, john.fastabend, netdev, bpf, jon,
	aleksander.lobakin, Willem de Bruijn, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list

Now that we have the build_skb paths using local NAPI cache, use
napi_consume_skb in tun_do_read, so that the local cache gets refilled
on read. This is especially useful in the vhost worker use case where
the RX and TX paths are running on the same worker kthread.

Signed-off-by: Jon Kohler <jon@nutanix.com>
---
 drivers/net/tun.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7b13d4bf5374..f85115383667 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2163,10 +2163,13 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 		struct sk_buff *skb = ptr;
 
 		ret = tun_put_user(tun, tfile, skb, to);
-		if (unlikely(ret < 0))
+		if (ret >= 0) {
+			local_bh_disable();
+			napi_consume_skb(skb, 1);
+			local_bh_enable();
+		} else {
 			kfree_skb(skb);
-		else
-			consume_skb(skb);
+		}
 	}
 
 	return ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache
  2025-05-06 14:54 ` Willem de Bruijn
@ 2025-05-06 19:11   ` Jon Kohler
  2025-05-07 13:25     ` Willem de Bruijn
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2025-05-06 19:11 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	aleksander.lobakin@intel.com, Jason Wang, Michael S. Tsirkin

> On May 6, 2025, at 10:54 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> Jon Kohler wrote:
>> Use the per-CPU NAPI cache for SKB allocation, leveraging bulk
>> allocation since the batch size is known at submission time. This
>> improves efficiency by reducing allocation overhead, particularly when
>> using IFF_NAPI and GRO, which can replenish the cache in a tight loop.
> 
> Do you have experimental data?

Yes! Sorry I missed to paste it into the cover letter. For the GRO case, I
turned tso off in the guest, which when using iperf3 + TCP puts all of the
traffic down the tun_xdp_one() path, so we get good batching, and GRO
aggregates the payloads back up again. 

cmds:
  ethtool -K eth0 tso off
  taskset -c 2 iperf3  -c other-vm-here -t 30 -p 5200 --bind local-add-here --cport 4200 -b 0 -i 30

Before this series: ~14.4 Gbits/sec

After this series: ~15.2 Gbits/sec

So about a ~5%-ish speedup in that case.

In the UDP case (same syntax, just add a -u), there isn’t any GRO but
we do get a wee bump on pure TX of about ~1%

In mixed TX/RX where there is cache feeding happening from tun_do_read,
we get a bit of a bump too, since the batch allocate doesn’t need to work as
hard. 

In pure RX side, there is a bit of a benefit in that path because of bulk
deallocate, so it seems to be a net-win all around from what I’ve seen thus far.

Happy to grab more details if there are other aspects you’re curious about.

Note: In both the TCP non-GSO case and UDP cases, we’d get even more
of a bump if we can figure out the overhead of vhost get_tx_bufs, which is
a ~37% overhead per flame graph. Adding Jason/Michael as FYI on that. I
suspect we could separately do some sort of batched reads there, which
would give us even more room for this series to scream.

> 
>> Additionally, utilize napi_build_skb and napi_consume_skb to further
>> benefit from the NAPI cache.
>> 
>> Note: This series does not address the large payload path in
>> tun_alloc_skb, which spans sock.c and skbuff.c. A separate series will
>> handle privatizing the allocation code in tun and integrating the NAPI
>> cache for that path.
>> 
>> Thanks all,
>> Jon
>> 
>> Jon Kohler (4):
>>  tun: rcu_deference xdp_prog only once per batch
>>  tun: optimize skb allocation in tun_xdp_one
>>  tun: use napi_build_skb in __tun_build_skb
>>  tun: use napi_consume_skb in tun_do_read
>> 
>> drivers/net/tun.c | 60 +++++++++++++++++++++++++++++++++--------------
>> 1 file changed, 42 insertions(+), 18 deletions(-)
>> 
>> -- 
>> 2.43.0
>> 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache
  2025-05-06 19:11   ` Jon Kohler
@ 2025-05-07 13:25     ` Willem de Bruijn
  0 siblings, 0 replies; 16+ messages in thread
From: Willem de Bruijn @ 2025-05-07 13:25 UTC (permalink / raw)
  To: Jon Kohler, Willem de Bruijn
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	aleksander.lobakin@intel.com, Jason Wang, Michael S. Tsirkin

Jon Kohler wrote:
> 
> 
> > On May 6, 2025, at 10:54 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  CAUTION: External Email
> > 
> > |-------------------------------------------------------------------!
> > 
> > Jon Kohler wrote:
> >> Use the per-CPU NAPI cache for SKB allocation, leveraging bulk
> >> allocation since the batch size is known at submission time. This
> >> improves efficiency by reducing allocation overhead, particularly when
> >> using IFF_NAPI and GRO, which can replenish the cache in a tight loop.
> > 
> > Do you have experimental data?
> 
> Yes! Sorry I missed to paste it into the cover letter. For the GRO case, I
> turned tso off in the guest, which when using iperf3 + TCP puts all of the
> traffic down the tun_xdp_one() path, so we get good batching, and GRO
> aggregates the payloads back up again. 
> 
> cmds:
>   ethtool -K eth0 tso off
>   taskset -c 2 iperf3  -c other-vm-here -t 30 -p 5200 --bind local-add-here --cport 4200 -b 0 -i 30
> 
> Before this series: ~14.4 Gbits/sec
> 
> After this series: ~15.2 Gbits/sec
> 
> So about a ~5%-ish speedup in that case.
> 
> In the UDP case (same syntax, just add a -u), there isn’t any GRO but
> we do get a wee bump on pure TX of about ~1%
> 
> In mixed TX/RX where there is cache feeding happening from tun_do_read,
> we get a bit of a bump too, since the batch allocate doesn’t need to work as
> hard. 
> 
> In pure RX side, there is a bit of a benefit in that path because of bulk
> deallocate, so it seems to be a net-win all around from what I’ve seen thus far.
> 
> Happy to grab more details if there are other aspects you’re curious about.

Thanks! No this is great. Let's definitely capture this in the
relevant patch or commit message. I'll take a closer look at the
implementation now.

> Note: In both the TCP non-GSO case and UDP cases, we’d get even more
> of a bump if we can figure out the overhead of vhost get_tx_bufs, which is
> a ~37% overhead per flame graph. Adding Jason/Michael as FYI on that. I
> suspect we could separately do some sort of batched reads there, which
> would give us even more room for this series to scream.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch
  2025-05-06 14:55 ` [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch Jon Kohler
@ 2025-05-07 20:43   ` Willem de Bruijn
  2025-05-08  3:13     ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Willem de Bruijn @ 2025-05-07 20:43 UTC (permalink / raw)
  To: Jon Kohler, ast, daniel, davem, kuba, hawk, john.fastabend,
	netdev, bpf, jon, aleksander.lobakin, Willem de Bruijn,
	Jason Wang, Andrew Lunn, Eric Dumazet, Paolo Abeni, open list

Jon Kohler wrote:
> Hoist rcu_dereference(tun->xdp_prog) out of tun_xdp_one, so that
> rcu_deference is called once during batch processing.

I'm skeptical that this does anything.

The compiler can inline tun_xdp_one and indeed seems to do so. And
then it can cache the read in a register if that is the best use of
a register.

> 
> No functional change intended.
> 
> Signed-off-by: Jon Kohler <jon@nutanix.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 3/4] tun: use napi_build_skb in __tun_build_skb
  2025-05-06 14:55 ` [PATCH net-next 3/4] tun: use napi_build_skb in __tun_build_skb Jon Kohler
@ 2025-05-07 20:50   ` Willem de Bruijn
  2025-05-08  3:08     ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Willem de Bruijn @ 2025-05-07 20:50 UTC (permalink / raw)
  To: Jon Kohler, ast, daniel, davem, kuba, hawk, john.fastabend,
	netdev, bpf, jon, aleksander.lobakin, Willem de Bruijn,
	Jason Wang, Andrew Lunn, Eric Dumazet, Paolo Abeni, open list,
	hawk

Jon Kohler wrote:
> Use napi_build_skb for small payload SKBs that end up using the
> tun_build_skb path.
> 
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> ---
>  drivers/net/tun.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index f7f7490e78dc..7b13d4bf5374 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1538,7 +1538,11 @@ static struct sk_buff *__tun_build_skb(struct tun_file *tfile,
>  				       int buflen, int len, int pad,
>  				       int metasize)
>  {
> -	struct sk_buff *skb = build_skb(buf, buflen);
> +	struct sk_buff *skb;
> +
> +	local_bh_disable();
> +	skb = napi_build_skb(buf, buflen);
> +	local_bh_enable();

The goal of this whole series seems to be to use the percpu skb cache
for bulk alloc.

As all these helpers' prefix indicates, they are meant to be used with
NAPI. Not sure using them on a tun write() datapath is deemed
acceptable. Or even correct. Perhaps the infrastructure authors have
an opinion.

From commit 795bb1c00dd3 ("net: bulk free infrastructure for NAPI
context, use napi_consume_skb") it does appear that technically all
that is needed is to be called in softirq context.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/4] tun: optimize skb allocation in tun_xdp_one
  2025-05-06 14:55 ` [PATCH net-next 2/4] tun: optimize skb allocation in tun_xdp_one Jon Kohler
@ 2025-05-07 20:50   ` Willem de Bruijn
  2025-05-08  3:02     ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Willem de Bruijn @ 2025-05-07 20:50 UTC (permalink / raw)
  To: Jon Kohler, ast, daniel, davem, kuba, hawk, john.fastabend,
	netdev, bpf, jon, aleksander.lobakin, Willem de Bruijn,
	Jason Wang, Andrew Lunn, Eric Dumazet, Paolo Abeni, open list

Jon Kohler wrote:
> Enhance TUN_MSG_PTR batch processing by leveraging bulk allocation from
> the per-CPU NAPI cache via napi_skb_cache_get_bulk. This improves
> efficiency by reducing allocation overhead and is especially useful
> when using IFF_NAPI and GRO is able to feed the cache entries back.
> 
> Handle scenarios where full preallocation of SKBs is not possible by
> gracefully dropping only the uncovered portion of the batch payload.
> 
> Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> ---
>  drivers/net/tun.c | 39 +++++++++++++++++++++++++++------------
>  1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 87fc51916fce..f7f7490e78dc 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2354,12 +2354,12 @@ static int tun_xdp_one(struct tun_struct *tun,
>  		       struct tun_file *tfile,
>  		       struct xdp_buff *xdp, int *flush,
>  		       struct tun_page *tpage,
> -		       struct bpf_prog *xdp_prog)
> +		       struct bpf_prog *xdp_prog,
> +		       struct sk_buff *skb)
>  {
>  	unsigned int datasize = xdp->data_end - xdp->data;
>  	struct tun_xdp_hdr *hdr = xdp->data_hard_start;
>  	struct virtio_net_hdr *gso = &hdr->gso;
> -	struct sk_buff *skb = NULL;
>  	struct sk_buff_head *queue;
>  	u32 rxhash = 0, act;
>  	int buflen = hdr->buflen;
> @@ -2381,16 +2381,15 @@ static int tun_xdp_one(struct tun_struct *tun,
>  
>  		act = bpf_prog_run_xdp(xdp_prog, xdp);
>  		ret = tun_xdp_act(tun, xdp_prog, xdp, act);
> -		if (ret < 0) {
> -			put_page(virt_to_head_page(xdp->data));
> +		if (ret < 0)
>  			return ret;
> -		}
>  
>  		switch (ret) {
>  		case XDP_REDIRECT:
>  			*flush = true;
>  			fallthrough;
>  		case XDP_TX:
> +			napi_consume_skb(skb, 1);
>  			return 0;
>  		case XDP_PASS:
>  			break;
> @@ -2403,13 +2402,14 @@ static int tun_xdp_one(struct tun_struct *tun,
>  				tpage->page = page;
>  				tpage->count = 1;
>  			}
> +			napi_consume_skb(skb, 1);
>  			return 0;
>  		}
>  	}
>  
>  build:
> -	skb = build_skb(xdp->data_hard_start, buflen);
> -	if (!skb) {
> +	skb = build_skb_around(skb, xdp->data_hard_start, buflen);
> +	if (unlikely(!skb)) {
>  		ret = -ENOMEM;
>  		goto out;
>  	}
> @@ -2427,7 +2427,6 @@ static int tun_xdp_one(struct tun_struct *tun,
>  
>  	if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
>  		atomic_long_inc(&tun->rx_frame_errors);
> -		kfree_skb(skb);
>  		ret = -EINVAL;
>  		goto out;
>  	}
> @@ -2455,7 +2454,6 @@ static int tun_xdp_one(struct tun_struct *tun,
>  
>  		if (unlikely(tfile->detached)) {
>  			spin_unlock(&queue->lock);
> -			kfree_skb(skb);
>  			return -EBUSY;
>  		}
>  
> @@ -2496,7 +2494,9 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>  		struct bpf_prog *xdp_prog;
>  		struct tun_page tpage;
>  		int n = ctl->num;
> -		int flush = 0, queued = 0;
> +		int flush = 0, queued = 0, num_skbs = 0;
> +		/* Max size of VHOST_NET_BATCH */
> +		void *skbs[64];
>  
>  		memset(&tpage, 0, sizeof(tpage));
>  
> @@ -2505,12 +2505,27 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>  		bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
>  		xdp_prog = rcu_dereference(tun->xdp_prog);
>  
> -		for (i = 0; i < n; i++) {
> +		num_skbs = napi_skb_cache_get_bulk(skbs, n);
> +
> +		for (i = 0; i < num_skbs; i++) {
> +			struct sk_buff *skb = skbs[i];
>  			xdp = &((struct xdp_buff *)ctl->ptr)[i];
>  			ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage,
> -					  xdp_prog);
> +					  xdp_prog, skb);
>  			if (ret > 0)
>  				queued += ret;
> +			else if (ret < 0) {
> +				dev_core_stats_rx_dropped_inc(tun->dev);
> +				napi_consume_skb(skb, 1);
> +				put_page(virt_to_head_page(xdp->data));
> +			}
> +		}
> +
> +		/* Handle remaining xdp_buff entries if num_skbs < ctl->num */
> +		for (i = num_skbs; i < ctl->num; i++) {
> +			xdp = &((struct xdp_buff *)ctl->ptr)[i];
> +			dev_core_stats_rx_dropped_inc(tun->dev);
> +			put_page(virt_to_head_page(xdp->data));

The code should attempt to send out packets rather than drop them.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/4] tun: optimize skb allocation in tun_xdp_one
  2025-05-07 20:50   ` Willem de Bruijn
@ 2025-05-08  3:02     ` Jon Kohler
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Kohler @ 2025-05-08  3:02 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	aleksander.lobakin@intel.com, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list



> On May 7, 2025, at 4:50 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> Jon Kohler wrote:
>> Enhance TUN_MSG_PTR batch processing by leveraging bulk allocation from
>> the per-CPU NAPI cache via napi_skb_cache_get_bulk. This improves
>> efficiency by reducing allocation overhead and is especially useful
>> when using IFF_NAPI and GRO is able to feed the cache entries back.
>> 
>> Handle scenarios where full preallocation of SKBs is not possible by
>> gracefully dropping only the uncovered portion of the batch payload.
>> 
>> Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Signed-off-by: Jon Kohler <jon@nutanix.com>
>> ---
>> drivers/net/tun.c | 39 +++++++++++++++++++++++++++------------
>> 1 file changed, 27 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 87fc51916fce..f7f7490e78dc 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -2354,12 +2354,12 @@ static int tun_xdp_one(struct tun_struct *tun,
>>       struct tun_file *tfile,
>>       struct xdp_buff *xdp, int *flush,
>>       struct tun_page *tpage,
>> -       struct bpf_prog *xdp_prog)
>> +       struct bpf_prog *xdp_prog,
>> +       struct sk_buff *skb)
>> {
>> unsigned int datasize = xdp->data_end - xdp->data;
>> struct tun_xdp_hdr *hdr = xdp->data_hard_start;
>> struct virtio_net_hdr *gso = &hdr->gso;
>> - struct sk_buff *skb = NULL;
>> struct sk_buff_head *queue;
>> u32 rxhash = 0, act;
>> int buflen = hdr->buflen;
>> @@ -2381,16 +2381,15 @@ static int tun_xdp_one(struct tun_struct *tun,
>> 
>> act = bpf_prog_run_xdp(xdp_prog, xdp);
>> ret = tun_xdp_act(tun, xdp_prog, xdp, act);
>> - if (ret < 0) {
>> - put_page(virt_to_head_page(xdp->data));
>> + if (ret < 0)
>> return ret;
>> - }
>> 
>> switch (ret) {
>> case XDP_REDIRECT:
>> *flush = true;
>> fallthrough;
>> case XDP_TX:
>> + napi_consume_skb(skb, 1);
>> return 0;
>> case XDP_PASS:
>> break;
>> @@ -2403,13 +2402,14 @@ static int tun_xdp_one(struct tun_struct *tun,
>> tpage->page = page;
>> tpage->count = 1;
>> }
>> + napi_consume_skb(skb, 1);
>> return 0;
>> }
>> }
>> 
>> build:
>> - skb = build_skb(xdp->data_hard_start, buflen);
>> - if (!skb) {
>> + skb = build_skb_around(skb, xdp->data_hard_start, buflen);
>> + if (unlikely(!skb)) {
>> ret = -ENOMEM;
>> goto out;
>> }
>> @@ -2427,7 +2427,6 @@ static int tun_xdp_one(struct tun_struct *tun,
>> 
>> if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
>> atomic_long_inc(&tun->rx_frame_errors);
>> - kfree_skb(skb);
>> ret = -EINVAL;
>> goto out;
>> }
>> @@ -2455,7 +2454,6 @@ static int tun_xdp_one(struct tun_struct *tun,
>> 
>> if (unlikely(tfile->detached)) {
>> spin_unlock(&queue->lock);
>> - kfree_skb(skb);
>> return -EBUSY;
>> }
>> 
>> @@ -2496,7 +2494,9 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>> struct bpf_prog *xdp_prog;
>> struct tun_page tpage;
>> int n = ctl->num;
>> - int flush = 0, queued = 0;
>> + int flush = 0, queued = 0, num_skbs = 0;
>> + /* Max size of VHOST_NET_BATCH */
>> + void *skbs[64];
>> 
>> memset(&tpage, 0, sizeof(tpage));
>> 
>> @@ -2505,12 +2505,27 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>> bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
>> xdp_prog = rcu_dereference(tun->xdp_prog);
>> 
>> - for (i = 0; i < n; i++) {
>> + num_skbs = napi_skb_cache_get_bulk(skbs, n);
>> +
>> + for (i = 0; i < num_skbs; i++) {
>> + struct sk_buff *skb = skbs[i];
>> xdp = &((struct xdp_buff *)ctl->ptr)[i];
>> ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage,
>> -  xdp_prog);
>> +  xdp_prog, skb);
>> if (ret > 0)
>> queued += ret;
>> + else if (ret < 0) {
>> + dev_core_stats_rx_dropped_inc(tun->dev);
>> + napi_consume_skb(skb, 1);
>> + put_page(virt_to_head_page(xdp->data));
>> + }
>> + }
>> +
>> + /* Handle remaining xdp_buff entries if num_skbs < ctl->num */
>> + for (i = num_skbs; i < ctl->num; i++) {
>> + xdp = &((struct xdp_buff *)ctl->ptr)[i];
>> + dev_core_stats_rx_dropped_inc(tun->dev);
>> + put_page(virt_to_head_page(xdp->data));
> 
> The code should attempt to send out packets rather than drop them.
> 

I took a hint from the other two places that currently do bulk SKB
allocation drop at least some of their payloads on allocation failure.
See the other call sites for napi_skb_cache_get_bulk for reference.

Also, this is similar to what the code already does today, because
if build_skb fails today, tun_xdp_one just ENOMEM’s and doesn’t go
any further.

This code is at least somewhat better than today, because this code
will increment the drop counters, vs silently dropped today.

Happy to take a pointer if there is a suggested retry mechanism of
sorts?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 3/4] tun: use napi_build_skb in __tun_build_skb
  2025-05-07 20:50   ` Willem de Bruijn
@ 2025-05-08  3:08     ` Jon Kohler
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Kohler @ 2025-05-08  3:08 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	aleksander.lobakin@intel.com, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list



> On May 7, 2025, at 4:50 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> Jon Kohler wrote:
>> Use napi_build_skb for small payload SKBs that end up using the
>> tun_build_skb path.
>> 
>> Signed-off-by: Jon Kohler <jon@nutanix.com>
>> ---
>> drivers/net/tun.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index f7f7490e78dc..7b13d4bf5374 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1538,7 +1538,11 @@ static struct sk_buff *__tun_build_skb(struct tun_file *tfile,
>>       int buflen, int len, int pad,
>>       int metasize)
>> {
>> - struct sk_buff *skb = build_skb(buf, buflen);
>> + struct sk_buff *skb;
>> +
>> + local_bh_disable();
>> + skb = napi_build_skb(buf, buflen);
>> + local_bh_enable();
> 
> The goal of this whole series seems to be to use the percpu skb cache
> for bulk alloc.

Yes

> 
> As all these helpers' prefix indicates, they are meant to be used with
> NAPI. Not sure using them on a tun write() datapath is deemed
> acceptable. Or even correct. Perhaps the infrastructure authors have
> an opinion.

@Alexsander: thoughts on this one? Following in the footsteps of
cpu_map_kthread_run, that appears to be its own things, non-NAPI,
and "bpf: cpumap: switch to napi_skb_cache_get_bulk()” simply wrapped
that whole area with local_bh_disable

> 
> From commit 795bb1c00dd3 ("net: bulk free infrastructure for NAPI
> context, use napi_consume_skb") it does appear that technically all
> that is needed is to be called in softirq context.

I saw the same thing, it appears that it isn’t NAPI constrained (like say
napi_alloc_skb, which takes napi arg), but rather softirq and that’s that.

My initial read on all of this was that the napi_ prefix was the original
intent and it morphed out of that over time (to just be softirq protected).
Always happy to learn more if I’ve misread the situation



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch
  2025-05-07 20:43   ` Willem de Bruijn
@ 2025-05-08  3:13     ` Jon Kohler
  2025-05-08 13:31       ` Willem de Bruijn
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2025-05-08  3:13 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	aleksander.lobakin@intel.com, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list



> On May 7, 2025, at 4:43 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> Jon Kohler wrote:
>> Hoist rcu_dereference(tun->xdp_prog) out of tun_xdp_one, so that
>> rcu_deference is called once during batch processing.
> 
> I'm skeptical that this does anything.
> 
> The compiler can inline tun_xdp_one and indeed seems to do so. And
> then it can cache the read in a register if that is the best use of
> a register.

The thought here is that if a compiler decided to not-inline tun_xdp_one
(perhaps it grew to big, or the compiler was being sassy), that the intent
would simply be that this wants to be called once-and-only-once. This
change just makes that intent more clear, and is a nice little cleanup.

I’ve got a series that stacks on top of this that enables multi-buffer support
and I can keep an eye on if that gets inlined or not.

> 
>> 
>> No functional change intended.
>> 
>> Signed-off-by: Jon Kohler <jon@nutanix.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch
  2025-05-08  3:13     ` Jon Kohler
@ 2025-05-08 13:31       ` Willem de Bruijn
  2025-05-08 13:40         ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Willem de Bruijn @ 2025-05-08 13:31 UTC (permalink / raw)
  To: Jon Kohler, Willem de Bruijn
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	aleksander.lobakin@intel.com, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list

Jon Kohler wrote:
> 
> 
> > On May 7, 2025, at 4:43 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  CAUTION: External Email
> > 
> > |-------------------------------------------------------------------!
> > 
> > Jon Kohler wrote:
> >> Hoist rcu_dereference(tun->xdp_prog) out of tun_xdp_one, so that
> >> rcu_deference is called once during batch processing.
> > 
> > I'm skeptical that this does anything.
> > 
> > The compiler can inline tun_xdp_one and indeed seems to do so. And
> > then it can cache the read in a register if that is the best use of
> > a register.
> 
> The thought here is that if a compiler decided to not-inline tun_xdp_one
> (perhaps it grew to big, or the compiler was being sassy), that the intent
> would simply be that this wants to be called once-and-only-once. This
> change just makes that intent more clear, and is a nice little cleanup.
> 
> I’ve got a series that stacks on top of this that enables multi-buffer support
> and I can keep an eye on if that gets inlined or not.

That will only give you one outcome with a specific compiler, platform
and build configuration.

I would just drop this and let the compiler handle such optimizations.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch
  2025-05-08 13:31       ` Willem de Bruijn
@ 2025-05-08 13:40         ` Jon Kohler
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Kohler @ 2025-05-08 13:40 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	aleksander.lobakin@intel.com, Jason Wang, Andrew Lunn,
	Eric Dumazet, Paolo Abeni, open list



> On May 8, 2025, at 9:31 AM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> Jon Kohler wrote:
>> 
>> 
>>> On May 7, 2025, at 4:43 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
>>> 
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>> 
>>> |-------------------------------------------------------------------!
>>> 
>>> Jon Kohler wrote:
>>>> Hoist rcu_dereference(tun->xdp_prog) out of tun_xdp_one, so that
>>>> rcu_deference is called once during batch processing.
>>> 
>>> I'm skeptical that this does anything.
>>> 
>>> The compiler can inline tun_xdp_one and indeed seems to do so. And
>>> then it can cache the read in a register if that is the best use of
>>> a register.
>> 
>> The thought here is that if a compiler decided to not-inline tun_xdp_one
>> (perhaps it grew to big, or the compiler was being sassy), that the intent
>> would simply be that this wants to be called once-and-only-once. This
>> change just makes that intent more clear, and is a nice little cleanup.
>> 
>> I’ve got a series that stacks on top of this that enables multi-buffer support
>> and I can keep an eye on if that gets inlined or not.
> 
> That will only give you one outcome with a specific compiler, platform
> and build configuration.
> 
> I would just drop this and let the compiler handle such optimizations.

Ok, thanks for the feedback, will do

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-05-08 13:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-06 14:55 [PATCH net-next 0/4] tun: optimize SKB allocation with NAPI cache Jon Kohler
2025-05-06 14:54 ` Willem de Bruijn
2025-05-06 19:11   ` Jon Kohler
2025-05-07 13:25     ` Willem de Bruijn
2025-05-06 14:55 ` [PATCH net-next 1/4] tun: rcu_deference xdp_prog only once per batch Jon Kohler
2025-05-07 20:43   ` Willem de Bruijn
2025-05-08  3:13     ` Jon Kohler
2025-05-08 13:31       ` Willem de Bruijn
2025-05-08 13:40         ` Jon Kohler
2025-05-06 14:55 ` [PATCH net-next 2/4] tun: optimize skb allocation in tun_xdp_one Jon Kohler
2025-05-07 20:50   ` Willem de Bruijn
2025-05-08  3:02     ` Jon Kohler
2025-05-06 14:55 ` [PATCH net-next 3/4] tun: use napi_build_skb in __tun_build_skb Jon Kohler
2025-05-07 20:50   ` Willem de Bruijn
2025-05-08  3:08     ` Jon Kohler
2025-05-06 14:55 ` [PATCH net-next 4/4] tun: use napi_consume_skb in tun_do_read Jon Kohler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).