public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels
       [not found] <20260312155657.25676-1-mmietus97.ref@yahoo.com>
@ 2026-03-12 15:56 ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 01/11] net: dst_cache: add noref versions for dst_cache Marek Mietus
                     ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

Currently, tunnel xmit flows always take a reference on the dst_entry
for each xmitted packet. These atomic operations are redundant in some
flows.

This patchset introduces the infrastructure required for converting
the tunnel xmit flows to noref, and converts them where possible.

These changes improve tunnel performance, since less atomic operations
are used.

There are already noref optimizations in both ipv4 and ip6.
(See __ip_queue_xmit, inet6_csk_xmit)
This patchset implements similar optimizations in ip and udp tunnels.

Benchmarks:
I used a vxlan tunnel over a pair of veth peers and measured the average
throughput over multiple samples.

I ran 100 samples on a clean build, and another 100 on a patched
build. Each sample ran for 120 seconds. These were my results:

clean:      72.52 gb/sec, stddev = 1.39
patched:    75.39 gb/sec, stddev = 0.94

TL;DR - This patchset results in a 4% improvement in throughput for
vxlan. It's safe to assume that we might see similar results when testing
other tunnels.

Changes in v8:
 - Removed repetitive "if (!noref) dst_release()" flows by updating the
output routing lookup flow to decrement the refcount for the dst, essentially
making all dsts noref.

Link to v7: https://lore.kernel.org/netdev/20260127070452.6581-1-mmietus97@yahoo.com/

Marek Mietus (11):
  net: dst_cache: add noref versions for dst_cache
  net: tunnel: convert iptunnel_xmit to noref
  net: tunnel: convert udp_tunnel{6,}_xmit_skb to noref
  net: tunnel: return noref dsts in udp_tunnel{,6}_dst_lookup
  net: ovpn: convert ovpn_udp{4,6}_output to use a noref dst
  wireguard: socket: convert send{4,6} to use a noref dst when possible
  net: tunnel: convert ip_md_tunnel_xmit to use noref dsts
  net: tunnel: convert ip_tunnel_xmit to use a noref dst when possible
  net: sctp: convert sctp_v{4,6}_xmit to use a noref dst when possible
  net: sit: convert ipip6_tunnel_xmit to use a noref dst
  net: tipc: convert tipc_udp_xmit to use a noref dst

 drivers/net/amt.c              |   3 +
 drivers/net/bareudp.c          |   4 -
 drivers/net/geneve.c           |  11 ---
 drivers/net/gtp.c              |   7 ++
 drivers/net/ovpn/udp.c         |   8 +-
 drivers/net/vxlan/vxlan_core.c |   6 --
 drivers/net/wireguard/socket.c |  12 ++-
 include/net/dst_cache.h        |  71 ++++++++++++++++++
 net/core/dst_cache.c           | 133 ++++++++++++++++++++++++++++++---
 net/ipv4/ip_tunnel.c           |  32 ++++----
 net/ipv4/ip_tunnel_core.c      |   2 +-
 net/ipv4/udp_tunnel_core.c     |   6 +-
 net/ipv6/ip6_udp_tunnel.c      |  10 ++-
 net/ipv6/sit.c                 |  14 +---
 net/sctp/ipv6.c                |   6 +-
 net/sctp/protocol.c            |   7 +-
 net/tipc/udp_media.c           |   6 +-
 17 files changed, 258 insertions(+), 80 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 01/11] net: dst_cache: add noref versions for dst_cache
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 02/11] net: tunnel: convert iptunnel_xmit to noref Marek Mietus
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

Implement noref variants for existing dst_cache helpers
interacting with dst_entry. This is required for implementing
noref flows, which avoid redundant atomic operations.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 include/net/dst_cache.h |  71 +++++++++++++++++++++
 net/core/dst_cache.c    | 133 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 194 insertions(+), 10 deletions(-)

diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h
index 1961699598e2..8d425cd75fd3 100644
--- a/include/net/dst_cache.h
+++ b/include/net/dst_cache.h
@@ -23,6 +23,23 @@ struct dst_cache {
  */
 struct dst_entry *dst_cache_get(struct dst_cache *dst_cache);
 
+/**
+ * dst_cache_get_rcu - perform cache lookup under RCU
+ * @dst_cache: the cache
+ *
+ * Perform cache lookup without taking a reference on the dst.
+ * Must be called with local BH disabled, and within an rcu read side
+ * critical section.
+ *
+ * The caller should use dst_cache_get_ip4_rcu() if it need to retrieve the
+ * source address to be used when xmitting to the cached dst.
+ * local BH must be disabled.
+ *
+ * Return: Pointer to retrieved rtable if cache is initialized and
+ * cached dst is valid, NULL otherwise.
+ */
+struct dst_entry *dst_cache_get_rcu(struct dst_cache *dst_cache);
+
 /**
  *	dst_cache_get_ip4 - perform cache lookup and fetch ipv4 source address
  *	@dst_cache: the cache
@@ -32,6 +49,21 @@ struct dst_entry *dst_cache_get(struct dst_cache *dst_cache);
  */
 struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr);
 
+/**
+ * dst_cache_get_ip4_rcu - lookup cache and ipv4 source under RCU
+ * @dst_cache: the cache
+ * @saddr: return value for the retrieved source address
+ *
+ * Perform cache lookup and fetch ipv4 source without taking a
+ * reference on the dst.
+ * Must be called with local BH disabled, and within an rcu read side
+ * critical section.
+ *
+ * Return: Pointer to retrieved rtable if cache is initialized and
+ * cached dst is valid, NULL otherwise.
+ */
+struct rtable *dst_cache_get_ip4_rcu(struct dst_cache *dst_cache, __be32 *saddr);
+
 /**
  *	dst_cache_set_ip4 - store the ipv4 dst into the cache
  *	@dst_cache: the cache
@@ -43,6 +75,17 @@ struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr);
 void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       __be32 saddr);
 
+/**
+ * dst_cache_steal_ip4 - store the ipv4 dst into the cache and steal its
+ * reference
+ * @dst_cache: the cache
+ * @dst: the entry to be cached whose reference will be stolen
+ * @saddr: the source address to be stored inside the cache
+ *
+ * local BH must be disabled
+ */
+void dst_cache_steal_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 __be32 saddr);
 #if IS_ENABLED(CONFIG_IPV6)
 
 /**
@@ -56,6 +99,18 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       const struct in6_addr *saddr);
 
+/**
+ * dst_cache_steal_ip6 - store the ipv6 dst into the cache and steal its
+ * reference
+ * @dst_cache: the cache
+ * @dst: the entry to be cached whose reference will be stolen
+ * @saddr: the source address to be stored inside the cache
+ *
+ * local BH must be disabled
+ */
+void dst_cache_steal_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 const struct in6_addr *saddr);
+
 /**
  *	dst_cache_get_ip6 - perform cache lookup and fetch ipv6 source address
  *	@dst_cache: the cache
@@ -65,6 +120,22 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
  */
 struct dst_entry *dst_cache_get_ip6(struct dst_cache *dst_cache,
 				    struct in6_addr *saddr);
+
+/**
+ * dst_cache_get_ip6_rcu - lookup cache and ipv6 source under RCU
+ * @dst_cache: the cache
+ * @saddr: return value for the retrieved source address
+ *
+ * Perform cache lookup and fetch ipv6 source without taking a
+ * reference on the dst.
+ * Must be called with local BH disabled, and within an rcu read side
+ * critical section.
+ *
+ * Return: Pointer to retrieved dst_entry if cache is initialized and
+ * cached dst is valid, NULL otherwise.
+ */
+struct dst_entry *dst_cache_get_ip6_rcu(struct dst_cache *dst_cache,
+					struct in6_addr *saddr);
 #endif
 
 /**
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 9ab4902324e1..52418cfb9b8a 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -25,20 +25,27 @@ struct dst_cache_pcpu {
 	};
 };
 
-static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
-				      struct dst_entry *dst, u32 cookie)
+static void __dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
+					struct dst_entry *dst, u32 cookie)
 {
 	DEBUG_NET_WARN_ON_ONCE(!in_softirq());
 	dst_release(dst_cache->dst);
-	if (dst)
-		dst_hold(dst);
 
 	dst_cache->cookie = cookie;
 	dst_cache->dst = dst;
 }
 
-static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
-					       struct dst_cache_pcpu *idst)
+static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
+				      struct dst_entry *dst, u32 cookie)
+{
+	if (dst)
+		dst_hold(dst);
+
+	__dst_cache_per_cpu_dst_set(dst_cache, dst, cookie);
+}
+
+static struct dst_entry *__dst_cache_per_cpu_get(struct dst_cache *dst_cache,
+						 struct dst_cache_pcpu *idst)
 {
 	struct dst_entry *dst;
 
@@ -47,14 +54,10 @@ static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
 	if (!dst)
 		goto fail;
 
-	/* the cache already hold a dst reference; it can't go away */
-	dst_hold(dst);
-
 	if (unlikely(!time_after(idst->refresh_ts,
 				 READ_ONCE(dst_cache->reset_ts)) ||
 		     (READ_ONCE(dst->obsolete) && !dst->ops->check(dst, idst->cookie)))) {
 		dst_cache_per_cpu_dst_set(idst, NULL, 0);
-		dst_release(dst);
 		goto fail;
 	}
 	return dst;
@@ -64,6 +67,18 @@ static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
 	return NULL;
 }
 
+static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
+					       struct dst_cache_pcpu *idst)
+{
+	struct dst_entry *dst;
+
+	dst = __dst_cache_per_cpu_get(dst_cache, idst);
+	if (dst)
+		/* the cache already hold a dst reference; it can't go away */
+		dst_hold(dst);
+	return dst;
+}
+
 struct dst_entry *dst_cache_get(struct dst_cache *dst_cache)
 {
 	struct dst_entry *dst;
@@ -78,6 +93,20 @@ struct dst_entry *dst_cache_get(struct dst_cache *dst_cache)
 }
 EXPORT_SYMBOL_GPL(dst_cache_get);
 
+struct dst_entry *dst_cache_get_rcu(struct dst_cache *dst_cache)
+{
+	struct dst_entry *dst;
+
+	if (!dst_cache->cache)
+		return NULL;
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+	dst = __dst_cache_per_cpu_get(dst_cache, this_cpu_ptr(dst_cache->cache));
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+	return dst;
+}
+EXPORT_SYMBOL_GPL(dst_cache_get_rcu);
+
 struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr)
 {
 	struct dst_cache_pcpu *idst;
@@ -100,6 +129,28 @@ struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr)
 }
 EXPORT_SYMBOL_GPL(dst_cache_get_ip4);
 
+struct rtable *dst_cache_get_ip4_rcu(struct dst_cache *dst_cache, __be32 *saddr)
+{
+	struct dst_cache_pcpu *idst;
+	struct dst_entry *dst;
+
+	if (!dst_cache->cache)
+		return NULL;
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+	idst = this_cpu_ptr(dst_cache->cache);
+	dst = __dst_cache_per_cpu_get(dst_cache, idst);
+	if (!dst) {
+		local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+		return NULL;
+	}
+
+	*saddr = idst->in_saddr.s_addr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+	return dst_rtable(dst);
+}
+EXPORT_SYMBOL_GPL(dst_cache_get_ip4_rcu);
+
 void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       __be32 saddr)
 {
@@ -116,6 +167,24 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 }
 EXPORT_SYMBOL_GPL(dst_cache_set_ip4);
 
+void dst_cache_steal_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 __be32 saddr)
+{
+	struct dst_cache_pcpu *idst;
+
+	if (!dst_cache->cache) {
+		dst_release(dst);
+		return;
+	}
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+	idst = this_cpu_ptr(dst_cache->cache);
+	__dst_cache_per_cpu_dst_set(idst, dst, 0);
+	idst->in_saddr.s_addr = saddr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+}
+EXPORT_SYMBOL_GPL(dst_cache_steal_ip4);
+
 #if IS_ENABLED(CONFIG_IPV6)
 void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       const struct in6_addr *saddr)
@@ -135,6 +204,26 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
 }
 EXPORT_SYMBOL_GPL(dst_cache_set_ip6);
 
+void dst_cache_steal_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 const struct in6_addr *saddr)
+{
+	struct dst_cache_pcpu *idst;
+
+	if (!dst_cache->cache) {
+		dst_release(dst);
+		return;
+	}
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+
+	idst = this_cpu_ptr(dst_cache->cache);
+	__dst_cache_per_cpu_dst_set(idst, dst,
+				    rt6_get_cookie(dst_rt6_info(dst)));
+	idst->in6_saddr = *saddr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+}
+EXPORT_SYMBOL_GPL(dst_cache_steal_ip6);
+
 struct dst_entry *dst_cache_get_ip6(struct dst_cache *dst_cache,
 				    struct in6_addr *saddr)
 {
@@ -158,6 +247,30 @@ struct dst_entry *dst_cache_get_ip6(struct dst_cache *dst_cache,
 	return dst;
 }
 EXPORT_SYMBOL_GPL(dst_cache_get_ip6);
+
+struct dst_entry *dst_cache_get_ip6_rcu(struct dst_cache *dst_cache,
+					struct in6_addr *saddr)
+{
+	struct dst_cache_pcpu *idst;
+	struct dst_entry *dst;
+
+	if (!dst_cache->cache)
+		return NULL;
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+
+	idst = this_cpu_ptr(dst_cache->cache);
+	dst = __dst_cache_per_cpu_get(dst_cache, idst);
+	if (!dst) {
+		local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+		return NULL;
+	}
+
+	*saddr = idst->in6_saddr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+	return dst;
+}
+EXPORT_SYMBOL_GPL(dst_cache_get_ip6_rcu);
 #endif
 
 int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 02/11] net: tunnel: convert iptunnel_xmit to noref
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 01/11] net: dst_cache: add noref versions for dst_cache Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 03/11] net: tunnel: convert udp_tunnel{6,}_xmit_skb " Marek Mietus
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

iptunnel_xmit assumes that a reference was taken on the dst passed to it,
and uses that reference.

This forces callers to reference the dst, preventing noref optimizations.

Convert iptunnel_xmit to be noref and drop the requirement that a ref be
taken on the dst.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 net/ipv4/ip_tunnel.c       | 2 ++
 net/ipv4/ip_tunnel_core.c  | 2 +-
 net/ipv4/udp_tunnel_core.c | 3 +++
 net/ipv6/sit.c             | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 50d0f5fe4e4c..2136a46bcdc5 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -655,6 +655,7 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl,
 		      df, !net_eq(tunnel->net, dev_net(dev)), 0);
+	ip_rt_put(rt);
 	return;
 tx_error:
 	DEV_STATS_INC(dev, tx_errors);
@@ -844,6 +845,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl,
 		      df, !net_eq(tunnel->net, dev_net(dev)), 0);
+	ip_rt_put(rt);
 	return;
 
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 2e61ac137128..70f0f123b0ba 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -61,7 +61,7 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	skb_scrub_packet(skb, xnet);
 
 	skb_clear_hash_if_not_l4(skb);
-	skb_dst_set(skb, &rt->dst);
+	skb_dst_set_noref(skb, &rt->dst);
 	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 	IPCB(skb)->flags = ipcb_flags;
 
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index b1f667c52cb2..8a91f36cc052 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -190,8 +190,11 @@ void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb
 
 	udp_set_csum(nocheck, skb, src, dst, skb->len);
 
+	rcu_read_lock();
 	iptunnel_xmit(sk, rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df, xnet,
 		      ipcb_flags);
+	rcu_read_unlock();
+	ip_rt_put(rt);
 }
 EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index ef2e5111fb3a..34d4d72dc58a 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1029,6 +1029,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 
 	iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl,
 		      df, !net_eq(tunnel->net, dev_net(dev)), 0);
+	ip_rt_put(rt);
 	return NETDEV_TX_OK;
 
 tx_error_icmp:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 03/11] net: tunnel: convert udp_tunnel{6,}_xmit_skb to noref
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 01/11] net: dst_cache: add noref versions for dst_cache Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 02/11] net: tunnel: convert iptunnel_xmit to noref Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 04/11] net: tunnel: return noref dsts in udp_tunnel{,6}_dst_lookup Marek Mietus
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

udp_tunnel{6,}_xmit_skb assume that a reference was taken on the dst
passed to them, and use that reference.

This forces callers to reference the dst, preventing noref optimizations.

Convert udp_tunnel{6,}_xmit_skb to be noref and drop the requirement
that a ref be taken on the dst.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 drivers/net/amt.c              | 3 +++
 drivers/net/bareudp.c          | 2 ++
 drivers/net/geneve.c           | 2 ++
 drivers/net/gtp.c              | 7 +++++++
 drivers/net/ovpn/udp.c         | 2 ++
 drivers/net/vxlan/vxlan_core.c | 2 ++
 drivers/net/wireguard/socket.c | 2 ++
 net/ipv4/udp_tunnel_core.c     | 3 ---
 net/ipv6/ip6_udp_tunnel.c      | 2 +-
 net/sctp/ipv6.c                | 3 +++
 net/sctp/protocol.c            | 4 ++++
 net/tipc/udp_media.c           | 2 ++
 12 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index f2f3139e38a5..a9fd2b864a1a 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -1050,6 +1050,7 @@ static bool amt_send_membership_update(struct amt_dev *amt,
 			    false,
 			    false,
 			    0);
+	ip_rt_put(rt);
 	amt_update_gw_status(amt, AMT_STATUS_SENT_UPDATE, true);
 	return false;
 }
@@ -1108,6 +1109,7 @@ static void amt_send_multicast_data(struct amt_dev *amt,
 			    false,
 			    false,
 			    0);
+	ip_rt_put(rt);
 }
 
 static bool amt_send_membership_query(struct amt_dev *amt,
@@ -1167,6 +1169,7 @@ static bool amt_send_membership_query(struct amt_dev *amt,
 			    false,
 			    false,
 			    0);
+	ip_rt_put(rt);
 	amt_update_relay_status(tunnel, AMT_STATUS_SENT_QUERY, true);
 	return false;
 }
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 0df3208783ad..92ee4a36f86f 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -364,6 +364,7 @@ static int bareudp_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			    !net_eq(bareudp->net, dev_net(bareudp->dev)),
 			    !test_bit(IP_TUNNEL_CSUM_BIT, info->key.tun_flags),
 			    0);
+	ip_rt_put(rt);
 	return 0;
 
 free_dst:
@@ -433,6 +434,7 @@ static int bareudp6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			     !test_bit(IP_TUNNEL_CSUM_BIT,
 				       info->key.tun_flags),
 			     0);
+	dst_release(dst);
 	return 0;
 
 free_dst:
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 01cdd06102e0..4216a8ffd591 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1396,6 +1396,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			    !net_eq(geneve->net, dev_net(geneve->dev)),
 			    !test_bit(IP_TUNNEL_CSUM_BIT, info->key.tun_flags),
 			    0);
+	ip_rt_put(rt);
 	return 0;
 }
 
@@ -1487,6 +1488,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			     !test_bit(IP_TUNNEL_CSUM_BIT,
 				       info->key.tun_flags),
 			     0);
+	dst_release(dst);
 	return 0;
 }
 #endif
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index e8949f556209..09b774eed6c7 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -449,6 +449,7 @@ static int gtp0_send_echo_resp_ip(struct gtp_dev *gtp, struct sk_buff *skb)
 				    dev_net(gtp->dev)),
 			    false,
 			    0);
+	ip_rt_put(rt);
 
 	return 0;
 }
@@ -708,6 +709,7 @@ static int gtp1u_send_echo_resp(struct gtp_dev *gtp, struct sk_buff *skb)
 				    dev_net(gtp->dev)),
 			    false,
 			    0);
+	ip_rt_put(rt);
 	return 0;
 }
 
@@ -1308,6 +1310,7 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 				    !net_eq(sock_net(pktinfo.pctx->sk),
 					    dev_net(dev)),
 				    false, 0);
+		ip_rt_put(pktinfo.rt);
 		break;
 	case AF_INET6:
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1318,6 +1321,7 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 				     0,
 				     pktinfo.gtph_port, pktinfo.gtph_port,
 				     false, 0);
+		dst_release(&pktinfo.rt6->dst);
 #else
 		goto tx_err;
 #endif
@@ -2400,6 +2404,7 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info)
 		return -ENODEV;
 	}
 
+	rcu_read_lock();
 	udp_tunnel_xmit_skb(rt, sk, skb_to_send,
 			    fl4.saddr, fl4.daddr,
 			    inet_dscp_to_dsfield(fl4.flowi4_dscp),
@@ -2409,6 +2414,8 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info)
 			    !net_eq(sock_net(sk),
 				    dev_net(gtp->dev)),
 			    false, 0);
+	rcu_read_unlock();
+	ip_rt_put(rt);
 	return 0;
 }
 
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index 272b535ecaad..2e202bd2b73f 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -200,6 +200,7 @@ static int ovpn_udp4_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	udp_tunnel_xmit_skb(rt, sk, skb, fl.saddr, fl.daddr, 0,
 			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
 			    fl.fl4_dport, false, sk->sk_no_check_tx, 0);
+	ip_rt_put(rt);
 	ret = 0;
 err:
 	local_bh_enable();
@@ -275,6 +276,7 @@ static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	udp_tunnel6_xmit_skb(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0,
 			     ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
 			     fl.fl6_dport, udp_get_no_check6_tx(sk), 0);
+	dst_release(dst);
 	ret = 0;
 err:
 	local_bh_enable();
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 17c941aac32d..4482a47dbe15 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -2545,6 +2545,7 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				    pkey->u.ipv4.dst, tos, ttl, df,
 				    src_port, dst_port, xnet, !udp_sum,
 				    ipcb_flags);
+		ip_rt_put(rt);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		struct vxlan_sock *sock6;
@@ -2620,6 +2621,7 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				     &saddr, &pkey->u.ipv6.dst, tos, ttl,
 				     pkey->label, src_port, dst_port, !udp_sum,
 				     ip6cb_flags);
+		dst_release(ndst);
 #endif
 	}
 	vxlan_vnifilter_count(vxlan, vni, NULL, VXLAN_VNI_STATS_TX, pkt_len);
diff --git a/drivers/net/wireguard/socket.c b/drivers/net/wireguard/socket.c
index 253488f8c00f..ee7d9c675909 100644
--- a/drivers/net/wireguard/socket.c
+++ b/drivers/net/wireguard/socket.c
@@ -85,6 +85,7 @@ static int send4(struct wg_device *wg, struct sk_buff *skb,
 	udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds,
 			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
 			    fl.fl4_dport, false, false, 0);
+	ip_rt_put(rt);
 	goto out;
 
 err:
@@ -152,6 +153,7 @@ static int send6(struct wg_device *wg, struct sk_buff *skb,
 	udp_tunnel6_xmit_skb(dst, sock, skb, skb->dev, &fl.saddr, &fl.daddr, ds,
 			     ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
 			     fl.fl6_dport, false, 0);
+	dst_release(dst);
 	goto out;
 
 err:
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index 8a91f36cc052..b1f667c52cb2 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -190,11 +190,8 @@ void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb
 
 	udp_set_csum(nocheck, skb, src, dst, skb->len);
 
-	rcu_read_lock();
 	iptunnel_xmit(sk, rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df, xnet,
 		      ipcb_flags);
-	rcu_read_unlock();
-	ip_rt_put(rt);
 }
 EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
 
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index cef3e0210744..d58815db8182 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -95,7 +95,7 @@ void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
 
 	uh->len = htons(skb->len);
 
-	skb_dst_set(skb, dst);
+	skb_dst_set_noref(skb, dst);
 
 	udp6_set_csum(nocheck, skb, saddr, daddr, skb->len);
 
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 53a5c027f8e3..43340eac8ec5 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -261,9 +261,12 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t)
 	skb_set_inner_ipproto(skb, IPPROTO_SCTP);
 	label = ip6_make_flowlabel(sock_net(sk), skb, fl6->flowlabel, true, fl6);
 
+	rcu_read_lock();
 	udp_tunnel6_xmit_skb(dst, sk, skb, NULL, &fl6->saddr, &fl6->daddr,
 			     tclass, ip6_dst_hoplimit(dst), label,
 			     sctp_sk(sk)->udp_port, t->encap_port, false, 0);
+	rcu_read_unlock();
+	dst_release(dst);
 	return 0;
 }
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 828a59b8e7bf..815279410bf9 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1070,10 +1070,14 @@ static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t)
 	skb_reset_inner_mac_header(skb);
 	skb_reset_inner_transport_header(skb);
 	skb_set_inner_ipproto(skb, IPPROTO_SCTP);
+
+	rcu_read_lock();
 	udp_tunnel_xmit_skb(dst_rtable(dst), sk, skb, fl4->saddr,
 			    fl4->daddr, dscp, ip4_dst_hoplimit(dst), df,
 			    sctp_sk(sk)->udp_port, t->encap_port, false, false,
 			    0);
+	rcu_read_unlock();
+	dst_release(dst);
 	return 0;
 }
 
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 2b8e385d1e51..df0d82fed02e 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -198,6 +198,7 @@ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb,
 		udp_tunnel_xmit_skb(rt, ub->ubsock->sk, skb, src->ipv4.s_addr,
 				    dst->ipv4.s_addr, 0, ttl, 0, src->port,
 				    dst->port, false, true, 0);
+		ip_rt_put(rt);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		if (!ndst) {
@@ -220,6 +221,7 @@ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb,
 		udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb, NULL,
 				     &src->ipv6, &dst->ipv6, 0, ttl, 0,
 				     src->port, dst->port, false, 0);
+		dst_release(ndst);
 #endif
 	}
 	local_bh_enable();
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 04/11] net: tunnel: return noref dsts in udp_tunnel{,6}_dst_lookup
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (2 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 03/11] net: tunnel: convert udp_tunnel{6,}_xmit_skb " Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 05/11] net: ovpn: convert ovpn_udp{4,6}_output to use a noref dst Marek Mietus
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

Update udp_tunnel{,6}_dst_lookup to always return noref dsts. The dst
is only valid inside the RCU read-side critical section in which it was
queried.

The dst is fetched from the dst cache (as noref) or returned by a routing
lookup operation, in which case its refcount is either stolen into the
cache, or decremented shortly before returning (in case the cache can't
be used). This is safe, since this code runs in an RCU read-side critical
section, and the dst only lingers until the end of said section.

Update all callers to use the new convention (of no longer calling
dst_release, since all dsts are now noref). This affects the bareudp,
geneve and vxlan tunnels.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 drivers/net/bareudp.c          |  6 ------
 drivers/net/geneve.c           | 13 -------------
 drivers/net/vxlan/vxlan_core.c |  8 --------
 net/ipv4/udp_tunnel_core.c     |  6 ++++--
 net/ipv6/ip6_udp_tunnel.c      |  8 ++++++--
 5 files changed, 10 insertions(+), 31 deletions(-)

diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 92ee4a36f86f..456bc17c352d 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -364,11 +364,9 @@ static int bareudp_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			    !net_eq(bareudp->net, dev_net(bareudp->dev)),
 			    !test_bit(IP_TUNNEL_CSUM_BIT, info->key.tun_flags),
 			    0);
-	ip_rt_put(rt);
 	return 0;
 
 free_dst:
-	dst_release(&rt->dst);
 	return err;
 }
 
@@ -434,11 +432,9 @@ static int bareudp6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			     !test_bit(IP_TUNNEL_CSUM_BIT,
 				       info->key.tun_flags),
 			     0);
-	dst_release(dst);
 	return 0;
 
 free_dst:
-	dst_release(dst);
 	return err;
 }
 
@@ -524,7 +520,6 @@ static int bareudp_fill_metadata_dst(struct net_device *dev,
 		if (IS_ERR(rt))
 			return PTR_ERR(rt);
 
-		ip_rt_put(rt);
 		info->key.u.ipv4.src = saddr;
 	} else if (ip_tunnel_info_af(info) == AF_INET6) {
 		struct dst_entry *dst;
@@ -538,7 +533,6 @@ static int bareudp_fill_metadata_dst(struct net_device *dev,
 		if (IS_ERR(dst))
 			return PTR_ERR(dst);
 
-		dst_release(dst);
 		info->key.u.ipv6.src = saddr;
 	} else {
 		return -EINVAL;
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 4216a8ffd591..8a918bd009dc 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1268,7 +1268,6 @@ static int geneve_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 	return 0;
 
 free_dst:
-	dst_release(dst);
 	return err;
 }
 
@@ -1327,7 +1326,6 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 				    geneve_build_gro_hint_opt(geneve, skb),
 				    netif_is_any_bridge_port(dev));
 	if (err < 0) {
-		dst_release(&rt->dst);
 		return err;
 	} else if (err) {
 		struct ip_tunnel_info *info;
@@ -1338,7 +1336,6 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 
 			unclone = skb_tunnel_info_unclone(skb);
 			if (unlikely(!unclone)) {
-				dst_release(&rt->dst);
 				return -ENOMEM;
 			}
 
@@ -1347,13 +1344,11 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		if (!pskb_may_pull(skb, ETH_HLEN)) {
-			dst_release(&rt->dst);
 			return -EINVAL;
 		}
 
 		skb->protocol = eth_type_trans(skb, geneve->dev);
 		__netif_rx(skb);
-		dst_release(&rt->dst);
 		return -EMSGSIZE;
 	}
 
@@ -1396,7 +1391,6 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			    !net_eq(geneve->net, dev_net(geneve->dev)),
 			    !test_bit(IP_TUNNEL_CSUM_BIT, info->key.tun_flags),
 			    0);
-	ip_rt_put(rt);
 	return 0;
 }
 
@@ -1439,7 +1433,6 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 				    geneve_build_gro_hint_opt(geneve, skb),
 				    netif_is_any_bridge_port(dev));
 	if (err < 0) {
-		dst_release(dst);
 		return err;
 	} else if (err) {
 		struct ip_tunnel_info *info = skb_tunnel_info(skb);
@@ -1449,7 +1442,6 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 
 			unclone = skb_tunnel_info_unclone(skb);
 			if (unlikely(!unclone)) {
-				dst_release(dst);
 				return -ENOMEM;
 			}
 
@@ -1458,13 +1450,11 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		if (!pskb_may_pull(skb, ETH_HLEN)) {
-			dst_release(dst);
 			return -EINVAL;
 		}
 
 		skb->protocol = eth_type_trans(skb, geneve->dev);
 		__netif_rx(skb);
-		dst_release(dst);
 		return -EMSGSIZE;
 	}
 
@@ -1488,7 +1478,6 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			     !test_bit(IP_TUNNEL_CSUM_BIT,
 				       info->key.tun_flags),
 			     0);
-	dst_release(dst);
 	return 0;
 }
 #endif
@@ -1576,7 +1565,6 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 		if (IS_ERR(rt))
 			return PTR_ERR(rt);
 
-		ip_rt_put(rt);
 		info->key.u.ipv4.src = saddr;
 #if IS_ENABLED(CONFIG_IPV6)
 	} else if (ip_tunnel_info_af(info) == AF_INET6) {
@@ -1602,7 +1590,6 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 		if (IS_ERR(dst))
 			return PTR_ERR(dst);
 
-		dst_release(dst);
 		info->key.u.ipv6.src = saddr;
 #endif
 	} else {
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 4482a47dbe15..39fb2e6df6c4 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -2320,7 +2320,6 @@ static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
 	    vxlan->cfg.flags & VXLAN_F_LOCALBYPASS) {
 		struct vxlan_dev *dst_vxlan;
 
-		dst_release(dst);
 		dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
 					   addr_family, dst_port,
 					   vxlan->cfg.flags);
@@ -2528,7 +2527,6 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				unclone->key.u.ipv4.dst = saddr;
 			}
 			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
-			dst_release(ndst);
 			goto out_unlock;
 		}
 
@@ -2545,7 +2543,6 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				    pkey->u.ipv4.dst, tos, ttl, df,
 				    src_port, dst_port, xnet, !udp_sum,
 				    ipcb_flags);
-		ip_rt_put(rt);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		struct vxlan_sock *sock6;
@@ -2603,7 +2600,6 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 			}
 
 			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
-			dst_release(ndst);
 			goto out_unlock;
 		}
 
@@ -2621,7 +2617,6 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				     &saddr, &pkey->u.ipv6.dst, tos, ttl,
 				     pkey->label, src_port, dst_port, !udp_sum,
 				     ip6cb_flags);
-		dst_release(ndst);
 #endif
 	}
 	vxlan_vnifilter_count(vxlan, vni, NULL, VXLAN_VNI_STATS_TX, pkt_len);
@@ -2641,7 +2636,6 @@ void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		DEV_STATS_INC(dev, collisions);
 	else if (err == -ENETUNREACH)
 		DEV_STATS_INC(dev, tx_carrier_errors);
-	dst_release(ndst);
 	DEV_STATS_INC(dev, tx_errors);
 	vxlan_vnifilter_count(vxlan, vni, NULL, VXLAN_VNI_STATS_TX_ERRORS, 0);
 	kfree_skb_reason(skb, reason);
@@ -3248,7 +3242,6 @@ static int vxlan_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 					   &info->dst_cache);
 		if (IS_ERR(rt))
 			return PTR_ERR(rt);
-		ip_rt_put(rt);
 	} else {
 #if IS_ENABLED(CONFIG_IPV6)
 		struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
@@ -3264,7 +3257,6 @@ static int vxlan_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 					      &info->dst_cache);
 		if (IS_ERR(ndst))
 			return PTR_ERR(ndst);
-		dst_release(ndst);
 #else /* !CONFIG_IPV6 */
 		return -EPFNOSUPPORT;
 #endif
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index b1f667c52cb2..c9c3fe8f0158 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -240,7 +240,7 @@ struct rtable *udp_tunnel_dst_lookup(struct sk_buff *skb,
 
 #ifdef CONFIG_DST_CACHE
 	if (dst_cache) {
-		rt = dst_cache_get_ip4(dst_cache, saddr);
+		rt = dst_cache_get_ip4_rcu(dst_cache, saddr);
 		if (rt)
 			return rt;
 	}
@@ -269,8 +269,10 @@ struct rtable *udp_tunnel_dst_lookup(struct sk_buff *skb,
 	}
 #ifdef CONFIG_DST_CACHE
 	if (dst_cache)
-		dst_cache_set_ip4(dst_cache, &rt->dst, fl4.saddr);
+		dst_cache_steal_ip4(dst_cache, &rt->dst, fl4.saddr);
+	else
 #endif
+		ip_rt_put(rt);
 	*saddr = fl4.saddr;
 	return rt;
 }
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index d58815db8182..94901935c9e9 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -130,6 +130,8 @@ EXPORT_SYMBOL_GPL(udp_tunnel6_xmit_skb);
  *
  *      It returns a valid dst pointer and stores src address to be used in
  *      tunnel in param saddr on success, else a pointer encoded error code.
+ *      The returned dst pointer is noref and must only be used in the RCU
+ *      read-side critical section in which it was queried.
  */
 
 struct dst_entry *udp_tunnel6_dst_lookup(struct sk_buff *skb,
@@ -147,7 +149,7 @@ struct dst_entry *udp_tunnel6_dst_lookup(struct sk_buff *skb,
 
 #ifdef CONFIG_DST_CACHE
 	if (dst_cache) {
-		dst = dst_cache_get_ip6(dst_cache, saddr);
+		dst = dst_cache_get_ip6_rcu(dst_cache, saddr);
 		if (dst)
 			return dst;
 	}
@@ -175,8 +177,10 @@ struct dst_entry *udp_tunnel6_dst_lookup(struct sk_buff *skb,
 	}
 #ifdef CONFIG_DST_CACHE
 	if (dst_cache)
-		dst_cache_set_ip6(dst_cache, dst, &fl6.saddr);
+		dst_cache_steal_ip6(dst_cache, dst, &fl6.saddr);
+	else
 #endif
+		dst_release(dst);
 	*saddr = fl6.saddr;
 	return dst;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 05/11] net: ovpn: convert ovpn_udp{4,6}_output to use a noref dst
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (3 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 04/11] net: tunnel: return noref dsts in udp_tunnel{,6}_dst_lookup Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 06/11] wireguard: socket: convert send{4,6} to use a noref dst when possible Marek Mietus
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

ovpn_udp{4,6}_output unnecessarily reference the dst_entry from the
dst_cache when interacting with the cache.

Reduce this overhead by avoiding the redundant refcount increments.

These changes are safe as both ipv4 and ip6 support noref xmit under RCU
which is already the case for ovpn.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 drivers/net/ovpn/udp.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index 2e202bd2b73f..d2900a3e96ac 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -158,7 +158,7 @@ static int ovpn_udp4_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	int ret;
 
 	local_bh_disable();
-	rt = dst_cache_get_ip4(cache, &fl.saddr);
+	rt = dst_cache_get_ip4_rcu(cache, &fl.saddr);
 	if (rt)
 		goto transmit;
 
@@ -194,13 +194,12 @@ static int ovpn_udp4_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 				    ret);
 		goto err;
 	}
-	dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
+	dst_cache_steal_ip4(cache, &rt->dst, fl.saddr);
 
 transmit:
 	udp_tunnel_xmit_skb(rt, sk, skb, fl.saddr, fl.daddr, 0,
 			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
 			    fl.fl4_dport, false, sk->sk_no_check_tx, 0);
-	ip_rt_put(rt);
 	ret = 0;
 err:
 	local_bh_enable();
@@ -236,7 +235,7 @@ static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	};
 
 	local_bh_disable();
-	dst = dst_cache_get_ip6(cache, &fl.saddr);
+	dst = dst_cache_get_ip6_rcu(cache, &fl.saddr);
 	if (dst)
 		goto transmit;
 
@@ -260,7 +259,7 @@ static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 				    &bind->remote.in6, ret);
 		goto err;
 	}
-	dst_cache_set_ip6(cache, dst, &fl.saddr);
+	dst_cache_steal_ip6(cache, dst, &fl.saddr);
 
 transmit:
 	/* user IPv6 packets may be larger than the transport interface
@@ -276,7 +275,6 @@ static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	udp_tunnel6_xmit_skb(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0,
 			     ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
 			     fl.fl6_dport, udp_get_no_check6_tx(sk), 0);
-	dst_release(dst);
 	ret = 0;
 err:
 	local_bh_enable();
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 06/11] wireguard: socket: convert send{4,6} to use a noref dst when possible
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (4 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 05/11] net: ovpn: convert ovpn_udp{4,6}_output to use a noref dst Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 07/11] net: tunnel: convert ip_md_tunnel_xmit to use noref dsts Marek Mietus
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

send{4,6} unnecessarily reference the dst_entry from the
dst_cache when interacting with the cache.

Reduce this overhead by avoiding the redundant refcount increments.

This is only possible in flows where the cache is used. Otherwise, we
fall-back to a referenced dst.

These changes are safe as both ipv4 and ip6 support noref xmit under RCU
which is already the case for the wireguard send{4,6} functions.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 drivers/net/wireguard/socket.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireguard/socket.c b/drivers/net/wireguard/socket.c
index ee7d9c675909..b311965269a1 100644
--- a/drivers/net/wireguard/socket.c
+++ b/drivers/net/wireguard/socket.c
@@ -46,7 +46,7 @@ static int send4(struct wg_device *wg, struct sk_buff *skb,
 	fl.fl4_sport = inet_sk(sock)->inet_sport;
 
 	if (cache)
-		rt = dst_cache_get_ip4(cache, &fl.saddr);
+		rt = dst_cache_get_ip4_rcu(cache, &fl.saddr);
 
 	if (!rt) {
 		security_sk_classify_flow(sock, flowi4_to_flowi_common(&fl));
@@ -78,14 +78,15 @@ static int send4(struct wg_device *wg, struct sk_buff *skb,
 			goto err;
 		}
 		if (cache)
-			dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
+			dst_cache_steal_ip4(cache, &rt->dst, fl.saddr);
 	}
 
 	skb->ignore_df = 1;
 	udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds,
 			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
 			    fl.fl4_dport, false, false, 0);
-	ip_rt_put(rt);
+	if (!cache)
+		ip_rt_put(rt);
 	goto out;
 
 err:
@@ -127,7 +128,7 @@ static int send6(struct wg_device *wg, struct sk_buff *skb,
 	fl.fl6_sport = inet_sk(sock)->inet_sport;
 
 	if (cache)
-		dst = dst_cache_get_ip6(cache, &fl.saddr);
+		dst = dst_cache_get_ip6_rcu(cache, &fl.saddr);
 
 	if (!dst) {
 		security_sk_classify_flow(sock, flowi6_to_flowi_common(&fl));
@@ -146,14 +147,15 @@ static int send6(struct wg_device *wg, struct sk_buff *skb,
 			goto err;
 		}
 		if (cache)
-			dst_cache_set_ip6(cache, dst, &fl.saddr);
+			dst_cache_steal_ip6(cache, dst, &fl.saddr);
 	}
 
 	skb->ignore_df = 1;
 	udp_tunnel6_xmit_skb(dst, sock, skb, skb->dev, &fl.saddr, &fl.daddr, ds,
 			     ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
 			     fl.fl6_dport, false, 0);
-	dst_release(dst);
+	if (!cache)
+		dst_release(dst);
 	goto out;
 
 err:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 07/11] net: tunnel: convert ip_md_tunnel_xmit to use noref dsts
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (5 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 06/11] wireguard: socket: convert send{4,6} to use a noref dst when possible Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 08/11] net: tunnel: convert ip_tunnel_xmit to use a noref dst when possible Marek Mietus
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

ip_md_tunnel_xmit unnecessarily references the dst_entry from the
dst_cache when interacting with the cache.

Reduce this overhead by avoiding the redundant refcount increments.

When using the cache, the found dst is either taken noref from the
cache or gets stolen into it. To reconcile both cache and the no cache
dst lookup flows to use a noref dst, we drop the found dst immediately
upon lookup in the no cache route lookup case.

This change is safe since ipv4 supports noref xmit under RCU which is
already the case for ip_md_tunnel_xmit.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 net/ipv4/ip_tunnel.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 2136a46bcdc5..d0e8fb7f040b 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -609,7 +609,7 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	use_cache = ip_tunnel_dst_cache_usable(skb, tun_info);
 	if (use_cache)
-		rt = dst_cache_get_ip4(&tun_info->dst_cache, &fl4.saddr);
+		rt = dst_cache_get_ip4_rcu(&tun_info->dst_cache, &fl4.saddr);
 	if (!rt) {
 		rt = ip_route_output_key(tunnel->net, &fl4);
 		if (IS_ERR(rt)) {
@@ -617,11 +617,12 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 			goto tx_error;
 		}
 		if (use_cache)
-			dst_cache_set_ip4(&tun_info->dst_cache, &rt->dst,
-					  fl4.saddr);
+			dst_cache_steal_ip4(&tun_info->dst_cache, &rt->dst,
+					    fl4.saddr);
+		else
+			ip_rt_put(rt);
 	}
 	if (rt->dst.dev == dev) {
-		ip_rt_put(rt);
 		DEV_STATS_INC(dev, collisions);
 		goto tx_error;
 	}
@@ -630,7 +631,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		df = htons(IP_DF);
 	if (tnl_update_pmtu(dev, skb, rt, df, inner_iph, tunnel_hlen,
 			    key->u.ipv4.dst, true)) {
-		ip_rt_put(rt);
 		goto tx_error;
 	}
 
@@ -647,7 +647,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
 	if (skb_cow_head(skb, headroom)) {
-		ip_rt_put(rt);
 		goto tx_dropped;
 	}
 
@@ -655,7 +654,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl,
 		      df, !net_eq(tunnel->net, dev_net(dev)), 0);
-	ip_rt_put(rt);
 	return;
 tx_error:
 	DEV_STATS_INC(dev, tx_errors);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 08/11] net: tunnel: convert ip_tunnel_xmit to use a noref dst when possible
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (6 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 07/11] net: tunnel: convert ip_md_tunnel_xmit to use noref dsts Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 09/11] net: sctp: convert sctp_v{4,6}_xmit " Marek Mietus
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

ip_tunnel_xmit unnecessarily references the dst_entry from the
dst_cache when interacting with the cache.

Reduce this overhead by avoiding the redundant refcount increments.

When using the cache, the found dst is either taken noref from the
cache or gets stolen into it. To reconcile both cache and the no cache
dst lookup flows to use a noref dst, we drop the found dst immediately
upon lookup in the no cache route lookup case.

This change is safe since ipv4 supports noref xmit under RCU which is
already the case for ip_tunnel_xmit.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 net/ipv4/ip_tunnel.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index d0e8fb7f040b..ec5d5bb74aae 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -769,11 +769,11 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 	if (connected && md) {
 		use_cache = ip_tunnel_dst_cache_usable(skb, tun_info);
 		if (use_cache)
-			rt = dst_cache_get_ip4(&tun_info->dst_cache,
-					       &fl4.saddr);
+			rt = dst_cache_get_ip4_rcu(&tun_info->dst_cache,
+						   &fl4.saddr);
 	} else {
-		rt = connected ? dst_cache_get_ip4(&tunnel->dst_cache,
-						&fl4.saddr) : NULL;
+		rt = connected ? dst_cache_get_ip4_rcu(&tunnel->dst_cache,
+						       &fl4.saddr) : NULL;
 	}
 
 	if (!rt) {
@@ -784,15 +784,16 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 			goto tx_error;
 		}
 		if (use_cache)
-			dst_cache_set_ip4(&tun_info->dst_cache, &rt->dst,
-					  fl4.saddr);
+			dst_cache_steal_ip4(&tun_info->dst_cache, &rt->dst,
+					    fl4.saddr);
 		else if (!md && connected)
-			dst_cache_set_ip4(&tunnel->dst_cache, &rt->dst,
-					  fl4.saddr);
+			dst_cache_steal_ip4(&tunnel->dst_cache, &rt->dst,
+					    fl4.saddr);
+		else
+			ip_rt_put(rt);
 	}
 
 	if (rt->dst.dev == dev) {
-		ip_rt_put(rt);
 		DEV_STATS_INC(dev, collisions);
 		goto tx_error;
 	}
@@ -802,7 +803,6 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		df |= (inner_iph->frag_off & htons(IP_DF));
 
 	if (tnl_update_pmtu(dev, skb, rt, df, inner_iph, 0, 0, false)) {
-		ip_rt_put(rt);
 		goto tx_error;
 	}
 
@@ -833,7 +833,6 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 			+ rt->dst.header_len + ip_encap_hlen(&tunnel->encap);
 
 	if (skb_cow_head(skb, max_headroom)) {
-		ip_rt_put(rt);
 		DEV_STATS_INC(dev, tx_dropped);
 		kfree_skb(skb);
 		return;
@@ -843,7 +842,6 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl,
 		      df, !net_eq(tunnel->net, dev_net(dev)), 0);
-	ip_rt_put(rt);
 	return;
 
 #if IS_ENABLED(CONFIG_IPV6)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 09/11] net: sctp: convert sctp_v{4,6}_xmit to use a noref dst when possible
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (7 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 08/11] net: tunnel: convert ip_tunnel_xmit to use a noref dst when possible Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 10/11] net: sit: convert ipip6_tunnel_xmit to use a noref dst Marek Mietus
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

sctp_v{4,6}_xmit unnecessarily clone the dst from the transport when
sending an encapsulated skb.

Reduce this overhead by avoiding the refcount increment introduced by
cloning the dst.

Since t->dst is already assumed to be valid throughout both functions,
it's safe to use the dst without incrementing the refcount.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 net/sctp/ipv6.c     | 5 ++---
 net/sctp/protocol.c | 5 ++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 43340eac8ec5..5c8f72cfb7de 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -219,7 +219,7 @@ int sctp_udp_v6_err(struct sock *sk, struct sk_buff *skb)
 
 static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t)
 {
-	struct dst_entry *dst = dst_clone(t->dst);
+	struct dst_entry *dst = t->dst;
 	struct flowi6 *fl6 = &t->fl.u.ip6;
 	struct sock *sk = skb->sk;
 	struct ipv6_pinfo *np = inet6_sk(sk);
@@ -243,7 +243,7 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t)
 	if (!t->encap_port || !sctp_sk(sk)->udp_port) {
 		int res;
 
-		skb_dst_set(skb, dst);
+		skb_dst_set(skb, dst_clone(dst));
 		rcu_read_lock();
 		res = ip6_xmit(sk, skb, fl6, sk->sk_mark,
 			       rcu_dereference(np->opt),
@@ -266,7 +266,6 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t)
 			     tclass, ip6_dst_hoplimit(dst), label,
 			     sctp_sk(sk)->udp_port, t->encap_port, false, 0);
 	rcu_read_unlock();
-	dst_release(dst);
 	return 0;
 }
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 815279410bf9..00e6b607ebd5 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1038,7 +1038,7 @@ static int sctp_inet_supported_addrs(const struct sctp_sock *opt,
 /* Wrapper routine that calls the ip transmit routine. */
 static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t)
 {
-	struct dst_entry *dst = dst_clone(t->dst);
+	struct dst_entry *dst = t->dst;
 	struct flowi4 *fl4 = &t->fl.u.ip4;
 	struct sock *sk = skb->sk;
 	struct inet_sock *inet = inet_sk(sk);
@@ -1056,7 +1056,7 @@ static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t)
 	SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS);
 
 	if (!t->encap_port || !sctp_sk(sk)->udp_port) {
-		skb_dst_set(skb, dst);
+		skb_dst_set(skb, dst_clone(dst));
 		return __ip_queue_xmit(sk, skb, &t->fl, dscp);
 	}
 
@@ -1077,7 +1077,6 @@ static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t)
 			    sctp_sk(sk)->udp_port, t->encap_port, false, false,
 			    0);
 	rcu_read_unlock();
-	dst_release(dst);
 	return 0;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 10/11] net: sit: convert ipip6_tunnel_xmit to use a noref dst
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (8 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 09/11] net: sctp: convert sctp_v{4,6}_xmit " Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-12 15:56   ` [PATCH net-next v8 11/11] net: tipc: convert tipc_udp_xmit " Marek Mietus
  2026-03-17 11:37   ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Paolo Abeni
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

ipip6_tunnel_xmit unnecessarily references the dst_entry from the
dst_cache when interacting with the cache.

Reduce this overhead by avoiding the redundant refcount increments.

This change is safe since ipv4 supports noref xmit under RCU which is
already the case for ipip6_tunnel_xmit.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 net/ipv6/sit.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 34d4d72dc58a..2dd43d472e5d 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -934,31 +934,28 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 			   IPPROTO_IPV6, 0, dst, tiph->saddr, 0, 0,
 			   sock_net_uid(tunnel->net, NULL));
 
-	rt = dst_cache_get_ip4(&tunnel->dst_cache, &fl4.saddr);
+	rt = dst_cache_get_ip4_rcu(&tunnel->dst_cache, &fl4.saddr);
 	if (!rt) {
 		rt = ip_route_output_flow(tunnel->net, &fl4, NULL);
 		if (IS_ERR(rt)) {
 			DEV_STATS_INC(dev, tx_carrier_errors);
 			goto tx_error_icmp;
 		}
-		dst_cache_set_ip4(&tunnel->dst_cache, &rt->dst, fl4.saddr);
+		dst_cache_steal_ip4(&tunnel->dst_cache, &rt->dst, fl4.saddr);
 	}
 
 	if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
-		ip_rt_put(rt);
 		DEV_STATS_INC(dev, tx_carrier_errors);
 		goto tx_error_icmp;
 	}
 	tdev = rt->dst.dev;
 
 	if (tdev == dev) {
-		ip_rt_put(rt);
 		DEV_STATS_INC(dev, collisions);
 		goto tx_error;
 	}
 
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP4)) {
-		ip_rt_put(rt);
 		goto tx_error;
 	}
 
@@ -967,7 +964,6 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 
 		if (mtu < IPV4_MIN_MTU) {
 			DEV_STATS_INC(dev, collisions);
-			ip_rt_put(rt);
 			goto tx_error;
 		}
 
@@ -981,7 +977,6 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 
 		if (skb->len > mtu && !skb_is_gso(skb)) {
 			icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-			ip_rt_put(rt);
 			goto tx_error;
 		}
 	}
@@ -1004,7 +999,6 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 	    (skb_cloned(skb) && !skb_clone_writable(skb, 0))) {
 		struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom);
 		if (!new_skb) {
-			ip_rt_put(rt);
 			DEV_STATS_INC(dev, tx_dropped);
 			kfree_skb(skb);
 			return NETDEV_TX_OK;
@@ -1020,16 +1014,13 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		ttl = iph6->hop_limit;
 	tos = INET_ECN_encapsulate(tos, ipv6_get_dsfield(iph6));
 
-	if (ip_tunnel_encap(skb, &tunnel->encap, &protocol, &fl4) < 0) {
-		ip_rt_put(rt);
+	if (ip_tunnel_encap(skb, &tunnel->encap, &protocol, &fl4) < 0)
 		goto tx_error;
-	}
 
 	skb_set_inner_ipproto(skb, IPPROTO_IPV6);
 
 	iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl,
 		      df, !net_eq(tunnel->net, dev_net(dev)), 0);
-	ip_rt_put(rt);
 	return NETDEV_TX_OK;
 
 tx_error_icmp:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v8 11/11] net: tipc: convert tipc_udp_xmit to use a noref dst
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (9 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 10/11] net: sit: convert ipip6_tunnel_xmit to use a noref dst Marek Mietus
@ 2026-03-12 15:56   ` Marek Mietus
  2026-03-17 11:37   ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Paolo Abeni
  11 siblings, 0 replies; 13+ messages in thread
From: Marek Mietus @ 2026-03-12 15:56 UTC (permalink / raw)
  To: netdev, sd, kuba, pabeni; +Cc: Jason, Marek Mietus

tipc_udp_xmit unnecessarily references the dst_entry from the
dst_cache when interacting with the cache.

Reduce this overhead by avoiding the redundant refcount increments.

This change is safe as both ipv4 and ip6 support noref xmit under RCU
which is already the case for tipc_udp_xmit.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 net/tipc/udp_media.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index df0d82fed02e..4e1db2915437 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -175,7 +175,7 @@ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb,
 	int ttl, err;
 
 	local_bh_disable();
-	ndst = dst_cache_get(cache);
+	ndst = dst_cache_get_rcu(cache);
 	if (dst->proto == htons(ETH_P_IP)) {
 		struct rtable *rt = dst_rtable(ndst);
 
@@ -191,14 +191,13 @@ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb,
 				err = PTR_ERR(rt);
 				goto tx_error;
 			}
-			dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
+			dst_cache_steal_ip4(cache, &rt->dst, fl.saddr);
 		}
 
 		ttl = ip4_dst_hoplimit(&rt->dst);
 		udp_tunnel_xmit_skb(rt, ub->ubsock->sk, skb, src->ipv4.s_addr,
 				    dst->ipv4.s_addr, 0, ttl, 0, src->port,
 				    dst->port, false, true, 0);
-		ip_rt_put(rt);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		if (!ndst) {
@@ -215,13 +214,12 @@ static int tipc_udp_xmit(struct net *net, struct sk_buff *skb,
 				err = PTR_ERR(ndst);
 				goto tx_error;
 			}
-			dst_cache_set_ip6(cache, ndst, &fl6.saddr);
+			dst_cache_steal_ip6(cache, ndst, &fl6.saddr);
 		}
 		ttl = ip6_dst_hoplimit(ndst);
 		udp_tunnel6_xmit_skb(ndst, ub->ubsock->sk, skb, NULL,
 				     &src->ipv6, &dst->ipv6, 0, ttl, 0,
 				     src->port, dst->port, false, 0);
-		dst_release(ndst);
 #endif
 	}
 	local_bh_enable();
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels
  2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (10 preceding siblings ...)
  2026-03-12 15:56   ` [PATCH net-next v8 11/11] net: tipc: convert tipc_udp_xmit " Marek Mietus
@ 2026-03-17 11:37   ` Paolo Abeni
  11 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2026-03-17 11:37 UTC (permalink / raw)
  To: Marek Mietus, netdev, sd, kuba; +Cc: Jason

On 3/12/26 4:56 PM, Marek Mietus wrote:
> Currently, tunnel xmit flows always take a reference on the dst_entry
> for each xmitted packet. These atomic operations are redundant in some
> flows.
> 
> This patchset introduces the infrastructure required for converting
> the tunnel xmit flows to noref, and converts them where possible.
> 
> These changes improve tunnel performance, since less atomic operations
> are used.
> 
> There are already noref optimizations in both ipv4 and ip6.
> (See __ip_queue_xmit, inet6_csk_xmit)
> This patchset implements similar optimizations in ip and udp tunnels.
> 
> Benchmarks:
> I used a vxlan tunnel over a pair of veth peers and measured the average
> throughput over multiple samples.
> 
> I ran 100 samples on a clean build, and another 100 on a patched
> build. Each sample ran for 120 seconds. These were my results:
> 
> clean:      72.52 gb/sec, stddev = 1.39
> patched:    75.39 gb/sec, stddev = 0.94
> 
> TL;DR - This patchset results in a 4% improvement in throughput for
> vxlan. It's safe to assume that we might see similar results when testing
> other tunnels.

Sabrina noted I wrongly replied on an old revision. Reporting my
statements here for completeness.

IMHO this performance delta is not enough to justify this amount of changes.

Additionally, the measured impact of removing the dst_hold/dst_release
does not fit with my direct experience on the same matter: it should be
below noise level in practice, as dst are per-cpu and and no
contention/false sharing is expected in a good setup.

I think you are observing larger impact because in the veth test
dst_release can happen on a remote CPU. Note that this setup (vxlan over
veth) is not very relevant in practice.

I'm sorry I'm not applying this series.

Side note: if you are interested into improving (UDP) tunnel
performances have a look to big TCP support work from Alice Mikityanska:

https://lore.kernel.org/netdev/20260226201600.222044-1-alice.kernel@fastmail.im/

/P


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-17 11:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260312155657.25676-1-mmietus97.ref@yahoo.com>
2026-03-12 15:56 ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 01/11] net: dst_cache: add noref versions for dst_cache Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 02/11] net: tunnel: convert iptunnel_xmit to noref Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 03/11] net: tunnel: convert udp_tunnel{6,}_xmit_skb " Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 04/11] net: tunnel: return noref dsts in udp_tunnel{,6}_dst_lookup Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 05/11] net: ovpn: convert ovpn_udp{4,6}_output to use a noref dst Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 06/11] wireguard: socket: convert send{4,6} to use a noref dst when possible Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 07/11] net: tunnel: convert ip_md_tunnel_xmit to use noref dsts Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 08/11] net: tunnel: convert ip_tunnel_xmit to use a noref dst when possible Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 09/11] net: sctp: convert sctp_v{4,6}_xmit " Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 10/11] net: sit: convert ipip6_tunnel_xmit to use a noref dst Marek Mietus
2026-03-12 15:56   ` [PATCH net-next v8 11/11] net: tipc: convert tipc_udp_xmit " Marek Mietus
2026-03-17 11:37   ` [PATCH net-next v8 00/11] net: tunnel: introduce noref xmit flows for tunnels Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox