netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels
       [not found] <20250922110622.10368-1-mmietus97.ref@yahoo.com>
@ 2025-09-22 11:06 ` Marek Mietus
  2025-09-22 11:06   ` [PATCH net-next v3 1/3] net: dst_cache: implement RCU variants for dst_cache helpers Marek Mietus
                     ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Marek Mietus @ 2025-09-22 11:06 UTC (permalink / raw)
  To: netdev, sd, antonio; +Cc: openvpn-devel

Currently, all xmit flows use dst_cache in a way that references
its dst_entry for each xmitted packet. These atomic operations
are redundant in some flows.

This patchset introduces new noref xmit helpers and incorporates
them in the OpenVPN driver. A similar improvement can also be
applied to other tunnel code in the future. The implementation
for OpenVPN is a good starting point as it doesn't use the
udp_tunnel_dst_lookup helper which adds some complexity.

There are already noref optimizations in both ipv4 and ip6 
(See __ip_queue_xmit, inet6_csk_xmit). This patchset allows for
similar optimizations in udp tunnels. Referencing the dst_entry
is now redundant, as the entire flow is protected under RCU, so
it is removed.

With this patchset, I was able to observe a 4% decrease in the total
time for ovpn_udp_send_skb using perf.

Changes in v3:
 - Added implementation for ip6
 - Updated cover letter and commit messages

Link to v2: https://lore.kernel.org/netdev/20250912112420.4394-1-mmietus97@yahoo.com/
-- 
2.51.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next v3 1/3] net: dst_cache: implement RCU variants for dst_cache helpers
  2025-09-22 11:06 ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
@ 2025-09-22 11:06   ` Marek Mietus
  2025-09-22 11:06   ` [PATCH net-next v3 2/3] net: tunnel: implement noref flows in udp_tunnel{,6}_xmit_skb Marek Mietus
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Marek Mietus @ 2025-09-22 11:06 UTC (permalink / raw)
  To: netdev, sd, antonio; +Cc: openvpn-devel, Marek Mietus

Implement RCU variants for existing dst_cache helpers interacting with
dst_entry.

The new helpers avoid referencing the dst_entry, sparing some unnecessary
atomic operations. They should only be used in flows that are already
guaranteed to be inside a RCU read-side critical context.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 include/net/dst_cache.h |  54 ++++++++++++++++++
 net/core/dst_cache.c    | 119 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 163 insertions(+), 10 deletions(-)

diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h
index 1961699598e2..2c98451469ef 100644
--- a/include/net/dst_cache.h
+++ b/include/net/dst_cache.h
@@ -32,6 +32,21 @@ struct dst_entry *dst_cache_get(struct dst_cache *dst_cache);
  */
 struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr);
 
+/**
+ * dst_cache_get_ip4_rcu - lookup cache and ipv4 source under RCU
+ * @dst_cache: the cache
+ * @saddr: return value for the retrieved source address
+ *
+ * Perform cache lookup and fetch ipv4 source without taking a
+ * reference on the dst.
+ * Must be called with local BH disabled, and within an rcu read side
+ * critical section.
+ *
+ * Return: Pointer to retrieved rtable if cache is initialized and
+ * cached dst is valid, NULL otherwise.
+ */
+struct rtable *dst_cache_get_ip4_rcu(struct dst_cache *dst_cache, __be32 *saddr);
+
 /**
  *	dst_cache_set_ip4 - store the ipv4 dst into the cache
  *	@dst_cache: the cache
@@ -43,6 +58,17 @@ struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr);
 void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       __be32 saddr);
 
+/**
+ * dst_cache_steal_ip4 - store the ipv4 dst into the cache and steal its
+ * reference
+ * @dst_cache: the cache
+ * @dst: the entry to be cached whose reference will be stolen
+ * @saddr: the source address to be stored inside the cache
+ *
+ * local BH must be disabled
+ */
+void dst_cache_steal_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 __be32 saddr);
 #if IS_ENABLED(CONFIG_IPV6)
 
 /**
@@ -56,6 +82,18 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       const struct in6_addr *saddr);
 
+/**
+ * dst_cache_steal_ip6 - store the ipv6 dst into the cache and steal its
+ * reference
+ * @dst_cache: the cache
+ * @dst: the entry to be cached whose reference will be stolen
+ * @saddr: the source address to be stored inside the cache
+ *
+ * local BH must be disabled
+ */
+void dst_cache_steal_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 const struct in6_addr *saddr);
+
 /**
  *	dst_cache_get_ip6 - perform cache lookup and fetch ipv6 source address
  *	@dst_cache: the cache
@@ -65,6 +103,22 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
  */
 struct dst_entry *dst_cache_get_ip6(struct dst_cache *dst_cache,
 				    struct in6_addr *saddr);
+
+/**
+ * dst_cache_get_ip6_rcu - lookup cache and ipv6 source under RCU
+ * @dst_cache: the cache
+ * @saddr: return value for the retrieved source address
+ *
+ * Perform cache lookup and fetch ipv6 source without taking a
+ * reference on the dst.
+ * Must be called with local BH disabled, and within an rcu read side
+ * critical section.
+ *
+ * Return: Pointer to retrieved dst_entry if cache is initialized and
+ * cached dst is valid, NULL otherwise.
+ */
+struct dst_entry *dst_cache_get_ip6_rcu(struct dst_cache *dst_cache,
+					struct in6_addr *saddr);
 #endif
 
 /**
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 9ab4902324e1..7d00745ac7d3 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -25,20 +25,27 @@ struct dst_cache_pcpu {
 	};
 };
 
-static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
-				      struct dst_entry *dst, u32 cookie)
+static void __dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
+					struct dst_entry *dst, u32 cookie)
 {
 	DEBUG_NET_WARN_ON_ONCE(!in_softirq());
 	dst_release(dst_cache->dst);
-	if (dst)
-		dst_hold(dst);
 
 	dst_cache->cookie = cookie;
 	dst_cache->dst = dst;
 }
 
-static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
-					       struct dst_cache_pcpu *idst)
+static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
+				      struct dst_entry *dst, u32 cookie)
+{
+	if (dst)
+		dst_hold(dst);
+
+	__dst_cache_per_cpu_dst_set(dst_cache, dst, cookie);
+}
+
+static struct dst_entry *__dst_cache_per_cpu_get(struct dst_cache *dst_cache,
+						 struct dst_cache_pcpu *idst)
 {
 	struct dst_entry *dst;
 
@@ -47,14 +54,10 @@ static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
 	if (!dst)
 		goto fail;
 
-	/* the cache already hold a dst reference; it can't go away */
-	dst_hold(dst);
-
 	if (unlikely(!time_after(idst->refresh_ts,
 				 READ_ONCE(dst_cache->reset_ts)) ||
 		     (READ_ONCE(dst->obsolete) && !dst->ops->check(dst, idst->cookie)))) {
 		dst_cache_per_cpu_dst_set(idst, NULL, 0);
-		dst_release(dst);
 		goto fail;
 	}
 	return dst;
@@ -64,6 +67,18 @@ static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
 	return NULL;
 }
 
+static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
+					       struct dst_cache_pcpu *idst)
+{
+	struct dst_entry *dst;
+
+	dst = __dst_cache_per_cpu_get(dst_cache, idst);
+	if (dst)
+		/* the cache already hold a dst reference; it can't go away */
+		dst_hold(dst);
+	return dst;
+}
+
 struct dst_entry *dst_cache_get(struct dst_cache *dst_cache)
 {
 	struct dst_entry *dst;
@@ -100,6 +115,28 @@ struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr)
 }
 EXPORT_SYMBOL_GPL(dst_cache_get_ip4);
 
+struct rtable *dst_cache_get_ip4_rcu(struct dst_cache *dst_cache, __be32 *saddr)
+{
+	struct dst_cache_pcpu *idst;
+	struct dst_entry *dst;
+
+	if (!dst_cache->cache)
+		return NULL;
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+	idst = this_cpu_ptr(dst_cache->cache);
+	dst = __dst_cache_per_cpu_get(dst_cache, idst);
+	if (!dst) {
+		local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+		return NULL;
+	}
+
+	*saddr = idst->in_saddr.s_addr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+	return dst_rtable(dst);
+}
+EXPORT_SYMBOL_GPL(dst_cache_get_ip4_rcu);
+
 void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       __be32 saddr)
 {
@@ -116,6 +153,24 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
 }
 EXPORT_SYMBOL_GPL(dst_cache_set_ip4);
 
+void dst_cache_steal_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 __be32 saddr)
+{
+	struct dst_cache_pcpu *idst;
+
+	if (!dst_cache->cache) {
+		dst_release(dst);
+		return;
+	}
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+	idst = this_cpu_ptr(dst_cache->cache);
+	__dst_cache_per_cpu_dst_set(idst, dst, 0);
+	idst->in_saddr.s_addr = saddr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+}
+EXPORT_SYMBOL_GPL(dst_cache_steal_ip4);
+
 #if IS_ENABLED(CONFIG_IPV6)
 void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
 		       const struct in6_addr *saddr)
@@ -135,6 +190,26 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
 }
 EXPORT_SYMBOL_GPL(dst_cache_set_ip6);
 
+void dst_cache_steal_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
+			 const struct in6_addr *saddr)
+{
+	struct dst_cache_pcpu *idst;
+
+	if (!dst_cache->cache) {
+		dst_release(dst);
+		return;
+	}
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+
+	idst = this_cpu_ptr(dst_cache->cache);
+	__dst_cache_per_cpu_dst_set(idst, dst,
+				    rt6_get_cookie(dst_rt6_info(dst)));
+	idst->in6_saddr = *saddr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+}
+EXPORT_SYMBOL_GPL(dst_cache_steal_ip6);
+
 struct dst_entry *dst_cache_get_ip6(struct dst_cache *dst_cache,
 				    struct in6_addr *saddr)
 {
@@ -158,6 +233,30 @@ struct dst_entry *dst_cache_get_ip6(struct dst_cache *dst_cache,
 	return dst;
 }
 EXPORT_SYMBOL_GPL(dst_cache_get_ip6);
+
+struct dst_entry *dst_cache_get_ip6_rcu(struct dst_cache *dst_cache,
+					struct in6_addr *saddr)
+{
+	struct dst_cache_pcpu *idst;
+	struct dst_entry *dst;
+
+	if (!dst_cache->cache)
+		return NULL;
+
+	local_lock_nested_bh(&dst_cache->cache->bh_lock);
+
+	idst = this_cpu_ptr(dst_cache->cache);
+	dst = __dst_cache_per_cpu_get(dst_cache, idst);
+	if (!dst) {
+		local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+		return NULL;
+	}
+
+	*saddr = idst->in6_saddr;
+	local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+	return dst;
+}
+EXPORT_SYMBOL_GPL(dst_cache_get_ip6_rcu);
 #endif
 
 int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v3 2/3] net: tunnel: implement noref flows in udp_tunnel{,6}_xmit_skb
  2025-09-22 11:06 ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
  2025-09-22 11:06   ` [PATCH net-next v3 1/3] net: dst_cache: implement RCU variants for dst_cache helpers Marek Mietus
@ 2025-09-22 11:06   ` Marek Mietus
  2025-09-22 11:06   ` [PATCH net-next v3 3/3] net: ovpn: use new noref xmit flow in ovpn_udp{4,6}_output Marek Mietus
  2025-09-24  1:48   ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Jakub Kicinski
  3 siblings, 0 replies; 7+ messages in thread
From: Marek Mietus @ 2025-09-22 11:06 UTC (permalink / raw)
  To: netdev, sd, antonio; +Cc: openvpn-devel, Marek Mietus

In cases where udp_tunnel{,6}_xmit_skb is called inside a
RCU read-side critical section, we can avoid referencing
the dst_entry.

Implement noref variants for udp_tunnel{,6}_xmit_skb and
iptunnel_xmit to be used in noref flows.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 include/net/ip_tunnels.h   |  3 +++
 include/net/udp_tunnel.h   | 13 ++++++++++
 net/ipv4/ip_tunnel_core.c  | 34 +++++++++++++++++++------
 net/ipv4/udp_tunnel_core.c | 29 +++++++++++++++++++---
 net/ipv6/ip6_udp_tunnel.c  | 51 ++++++++++++++++++++++++++++++--------
 5 files changed, 109 insertions(+), 21 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 8cf1380f3656..6dcea237bf21 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -604,6 +604,9 @@ static inline int iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
 void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, u8 proto,
 		   u8 tos, u8 ttl, __be16 df, bool xnet, u16 ipcb_flags);
+void iptunnel_xmit_noref(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
+			 __be32 src, __be32 dst, u8 proto,
+			 u8 tos, u8 ttl, __be16 df, bool xnet, u16 ipcb_flags);
 struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 					     gfp_t flags);
 int skb_tunnel_check_pmtu(struct sk_buff *skb, struct dst_entry *encap_dst,
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 9acef2fbd2fd..b35e0267e318 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -135,6 +135,10 @@ void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb
 			 __be32 src, __be32 dst, __u8 tos, __u8 ttl,
 			 __be16 df, __be16 src_port, __be16 dst_port,
 			 bool xnet, bool nocheck, u16 ipcb_flags);
+void udp_tunnel_xmit_skb_noref(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
+			       __be32 src, __be32 dst, __u8 tos, __u8 ttl,
+			       __be16 df, __be16 src_port, __be16 dst_port,
+			       bool xnet, bool nocheck, u16 ipcb_flags);
 
 void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
 			  struct sk_buff *skb,
@@ -145,6 +149,15 @@ void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
 			  __be16 src_port, __be16 dst_port, bool nocheck,
 			  u16 ip6cb_flags);
 
+void udp_tunnel6_xmit_skb_noref(struct dst_entry *dst,
+				struct sock *sk, struct sk_buff *skb,
+				struct net_device *dev,
+				const struct in6_addr *saddr,
+				const struct in6_addr *daddr,
+				__u8 prio, __u8 ttl, __be32 label,
+				__be16 src_port, __be16 dst_port,
+				bool nocheck, u16 ip6cb_flags);
+
 void udp_tunnel_sock_release(struct socket *sock);
 
 struct rtable *udp_tunnel_dst_lookup(struct sk_buff *skb,
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index cc9915543637..8b03fb380397 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -47,10 +47,10 @@ const struct ip6_tnl_encap_ops __rcu *
 		ip6tun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly;
 EXPORT_SYMBOL(ip6tun_encaps);
 
-void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
-		   __be32 src, __be32 dst, __u8 proto,
-		   __u8 tos, __u8 ttl, __be16 df, bool xnet,
-		   u16 ipcb_flags)
+static void __iptunnel_xmit(struct sock *sk, struct rtable *rt,
+			    struct sk_buff *skb, __be32 src, __be32 dst,
+			    __u8 proto, __u8 tos, __u8 ttl, __be16 df,
+			    u16 ipcb_flags)
 {
 	int pkt_len = skb->len - skb_inner_network_offset(skb);
 	struct net *net = dev_net(rt->dst.dev);
@@ -58,10 +58,7 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	struct iphdr *iph;
 	int err;
 
-	skb_scrub_packet(skb, xnet);
-
 	skb_clear_hash_if_not_l4(skb);
-	skb_dst_set(skb, &rt->dst);
 	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 	IPCB(skb)->flags = ipcb_flags;
 
@@ -89,8 +86,31 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 		iptunnel_xmit_stats(dev, pkt_len);
 	}
 }
+
+void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
+		   __be32 src, __be32 dst, __u8 proto,
+		   __u8 tos, __u8 ttl, __be16 df, bool xnet,
+		   u16 ipcb_flags)
+{
+	skb_scrub_packet(skb, xnet);
+	skb_dst_set(skb, &rt->dst);
+
+	__iptunnel_xmit(sk, rt, skb, src, dst, proto, tos, ttl, df, ipcb_flags);
+}
 EXPORT_SYMBOL_GPL(iptunnel_xmit);
 
+void iptunnel_xmit_noref(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
+			 __be32 src, __be32 dst, __u8 proto,
+			 __u8 tos, __u8 ttl, __be16 df, bool xnet,
+			 u16 ipcb_flags)
+{
+	skb_scrub_packet(skb, xnet);
+	skb_dst_set_noref(skb, &rt->dst);
+
+	__iptunnel_xmit(sk, rt, skb, src, dst, proto, tos, ttl, df, ipcb_flags);
+}
+EXPORT_SYMBOL_GPL(iptunnel_xmit_noref);
+
 int __iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
 			   __be16 inner_proto, bool raw_proto, bool xnet)
 {
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index fce945f23069..c0e9007cf081 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -170,10 +170,9 @@ void udp_tunnel_notify_del_rx_port(struct socket *sock, unsigned short type)
 }
 EXPORT_SYMBOL_GPL(udp_tunnel_notify_del_rx_port);
 
-void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
-			 __be32 src, __be32 dst, __u8 tos, __u8 ttl,
-			 __be16 df, __be16 src_port, __be16 dst_port,
-			 bool xnet, bool nocheck, u16 ipcb_flags)
+static void udp_tunnel_add_hdr(struct sk_buff *skb, __be32 src,
+			       __be32 dst, __be16 src_port,
+			       __be16 dst_port, bool nocheck)
 {
 	struct udphdr *uh;
 
@@ -188,12 +187,34 @@ void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 
 	udp_set_csum(nocheck, skb, src, dst, skb->len);
+}
+
+void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk,
+			 struct sk_buff *skb, __be32 src, __be32 dst,
+			 __u8 tos, __u8 ttl, __be16 df, __be16 src_port,
+			 __be16 dst_port, bool xnet, bool nocheck,
+			 u16 ipcb_flags)
+{
+	udp_tunnel_add_hdr(skb, src, dst, src_port, dst_port, nocheck);
 
 	iptunnel_xmit(sk, rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df, xnet,
 		      ipcb_flags);
 }
 EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
 
+void udp_tunnel_xmit_skb_noref(struct rtable *rt, struct sock *sk,
+			       struct sk_buff *skb, __be32 src, __be32 dst,
+			       __u8 tos, __u8 ttl, __be16 df, __be16 src_port,
+			       __be16 dst_port, bool xnet, bool nocheck,
+			       u16 ipcb_flags)
+{
+	udp_tunnel_add_hdr(skb, src, dst, src_port, dst_port, nocheck);
+
+	iptunnel_xmit_noref(sk, rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df,
+			    xnet, ipcb_flags);
+}
+EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb_noref);
+
 void udp_tunnel_sock_release(struct socket *sock)
 {
 	rcu_assign_sk_user_data(sock->sk, NULL);
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index 0ff547a4bff7..d262d0a0a178 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -74,14 +74,16 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
 }
 EXPORT_SYMBOL_GPL(udp_sock_create6);
 
-void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
-			  struct sk_buff *skb,
-			  struct net_device *dev,
-			  const struct in6_addr *saddr,
-			  const struct in6_addr *daddr,
-			  __u8 prio, __u8 ttl, __be32 label,
-			  __be16 src_port, __be16 dst_port, bool nocheck,
-			  u16 ip6cb_flags)
+static void __udp_tunnel6_xmit_skb(struct dst_entry *dst,
+				   struct sock *sk,
+				   struct sk_buff *skb,
+				   struct net_device *dev,
+				   const struct in6_addr *saddr,
+				   const struct in6_addr *daddr,
+				   __u8 prio, __u8 ttl, __be32 label,
+				   __be16 src_port, __be16 dst_port,
+				   bool nocheck,
+				   u16 ip6cb_flags)
 {
 	struct udphdr *uh;
 	struct ipv6hdr *ip6h;
@@ -95,8 +97,6 @@ void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
 
 	uh->len = htons(skb->len);
 
-	skb_dst_set(skb, dst);
-
 	udp6_set_csum(nocheck, skb, saddr, daddr, skb->len);
 
 	__skb_push(skb, sizeof(*ip6h));
@@ -111,8 +111,39 @@ void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
 
 	ip6tunnel_xmit(sk, skb, dev, ip6cb_flags);
 }
+
+void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
+			  struct sk_buff *skb,
+			  struct net_device *dev,
+			  const struct in6_addr *saddr,
+			  const struct in6_addr *daddr,
+			  __u8 prio, __u8 ttl, __be32 label,
+			  __be16 src_port, __be16 dst_port, bool nocheck,
+			  u16 ip6cb_flags)
+{
+	skb_dst_set(skb, dst);
+	__udp_tunnel6_xmit_skb(dst, sk, skb, dev, saddr, daddr,
+			       prio, ttl, label, src_port, dst_port,
+			       nocheck, ip6cb_flags);
+}
 EXPORT_SYMBOL_GPL(udp_tunnel6_xmit_skb);
 
+void udp_tunnel6_xmit_skb_noref(struct dst_entry *dst,
+				struct sock *sk, struct sk_buff *skb,
+				struct net_device *dev,
+				const struct in6_addr *saddr,
+				const struct in6_addr *daddr,
+				__u8 prio, __u8 ttl, __be32 label,
+				__be16 src_port, __be16 dst_port,
+				bool nocheck, u16 ip6cb_flags)
+{
+	skb_dst_set_noref(skb, dst);
+	__udp_tunnel6_xmit_skb(dst, sk, skb, dev, saddr, daddr,
+			       prio, ttl, label, src_port, dst_port,
+			       nocheck, ip6cb_flags);
+}
+EXPORT_SYMBOL_GPL(udp_tunnel6_xmit_skb_noref);
+
 /**
  *      udp_tunnel6_dst_lookup - perform route lookup on UDP tunnel
  *      @skb: Packet for which lookup is done
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v3 3/3] net: ovpn: use new noref xmit flow in ovpn_udp{4,6}_output
  2025-09-22 11:06 ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
  2025-09-22 11:06   ` [PATCH net-next v3 1/3] net: dst_cache: implement RCU variants for dst_cache helpers Marek Mietus
  2025-09-22 11:06   ` [PATCH net-next v3 2/3] net: tunnel: implement noref flows in udp_tunnel{,6}_xmit_skb Marek Mietus
@ 2025-09-22 11:06   ` Marek Mietus
  2025-09-24  1:48   ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Jakub Kicinski
  3 siblings, 0 replies; 7+ messages in thread
From: Marek Mietus @ 2025-09-22 11:06 UTC (permalink / raw)
  To: netdev, sd, antonio; +Cc: openvpn-devel, Marek Mietus

ovpn_udp{4,6}_output unnecessarily references the dst_entry from
the dst_cache.

Reduce this overhead by using the newly implemented
udp_tunnel{,6}_xmit_skb_noref function and dst_cache helpers.

These changes are safe as both ipv4 and ip6 support noref xmit under RCU
which is already the case for ovpn.

Signed-off-by: Marek Mietus <mmietus97@yahoo.com>
---
 drivers/net/ovpn/udp.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
index d6a0f7a0b75d..917cd308a7f4 100644
--- a/drivers/net/ovpn/udp.c
+++ b/drivers/net/ovpn/udp.c
@@ -158,7 +158,7 @@ static int ovpn_udp4_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	int ret;
 
 	local_bh_disable();
-	rt = dst_cache_get_ip4(cache, &fl.saddr);
+	rt = dst_cache_get_ip4_rcu(cache, &fl.saddr);
 	if (rt)
 		goto transmit;
 
@@ -194,12 +194,12 @@ static int ovpn_udp4_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 				    ret);
 		goto err;
 	}
-	dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
+	dst_cache_steal_ip4(cache, &rt->dst, fl.saddr);
 
 transmit:
-	udp_tunnel_xmit_skb(rt, sk, skb, fl.saddr, fl.daddr, 0,
-			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
-			    fl.fl4_dport, false, sk->sk_no_check_tx, 0);
+	udp_tunnel_xmit_skb_noref(rt, sk, skb, fl.saddr, fl.daddr, 0,
+				  ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
+				  fl.fl4_dport, false, sk->sk_no_check_tx, 0);
 	ret = 0;
 err:
 	local_bh_enable();
@@ -235,7 +235,7 @@ static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	};
 
 	local_bh_disable();
-	dst = dst_cache_get_ip6(cache, &fl.saddr);
+	dst = dst_cache_get_ip6_rcu(cache, &fl.saddr);
 	if (dst)
 		goto transmit;
 
@@ -259,7 +259,7 @@ static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 				    &bind->remote.in6, ret);
 		goto err;
 	}
-	dst_cache_set_ip6(cache, dst, &fl.saddr);
+	dst_cache_steal_ip6(cache, dst, &fl.saddr);
 
 transmit:
 	/* user IPv6 packets may be larger than the transport interface
@@ -269,12 +269,12 @@ static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
 	 * fragment packets if needed.
 	 *
 	 * NOTE: this is not needed for IPv4 because we pass df=0 to
-	 * udp_tunnel_xmit_skb()
+	 * udp_tunnel_xmit_skb_noref()
 	 */
 	skb->ignore_df = 1;
-	udp_tunnel6_xmit_skb(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0,
-			     ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
-			     fl.fl6_dport, udp_get_no_check6_tx(sk), 0);
+	udp_tunnel6_xmit_skb_noref(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0,
+				   ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
+				   fl.fl6_dport, udp_get_no_check6_tx(sk), 0);
 	ret = 0;
 err:
 	local_bh_enable();
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels
  2025-09-22 11:06 ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
                     ` (2 preceding siblings ...)
  2025-09-22 11:06   ` [PATCH net-next v3 3/3] net: ovpn: use new noref xmit flow in ovpn_udp{4,6}_output Marek Mietus
@ 2025-09-24  1:48   ` Jakub Kicinski
  2025-09-24 16:27     ` Marek Mietus
  3 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-09-24  1:48 UTC (permalink / raw)
  To: Marek Mietus; +Cc: netdev, sd, antonio, openvpn-devel

On Mon, 22 Sep 2025 13:06:19 +0200 Marek Mietus wrote:
> This patchset introduces new noref xmit helpers and incorporates
> them in the OpenVPN driver. A similar improvement can also be
> applied to other tunnel code in the future. The implementation
> for OpenVPN is a good starting point as it doesn't use the
> udp_tunnel_dst_lookup helper which adds some complexity.

You're basically refactoring an API, it's fairly unusual to leave both
APIs in place upstream. Unless the number of callers is really huge,
say >100, or complexity very high. Not sure how others feel but IMHO
you should try to convert all the tunnels.

> There are already noref optimizations in both ipv4 and ip6 
> (See __ip_queue_xmit, inet6_csk_xmit). This patchset allows for
> similar optimizations in udp tunnels. Referencing the dst_entry
> is now redundant, as the entire flow is protected under RCU, so
> it is removed.
> 
> With this patchset, I was able to observe a 4% decrease in the total
> time for ovpn_udp_send_skb using perf.

Please provide more meaningful perf wins. Relative change of perf in
one function doesn't tell use.. well.. anything.

Please do not remove the diff stat generated by git in the cover
letter.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels
  2025-09-24  1:48   ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Jakub Kicinski
@ 2025-09-24 16:27     ` Marek Mietus
  2025-09-24 23:31       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Mietus @ 2025-09-24 16:27 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, sd, antonio, openvpn-devel

W dniu 9/24/25 o 03:48, Jakub Kicinski pisze:
> On Mon, 22 Sep 2025 13:06:19 +0200 Marek Mietus wrote:
>> This patchset introduces new noref xmit helpers and incorporates
>> them in the OpenVPN driver. A similar improvement can also be
>> applied to other tunnel code in the future. The implementation
>> for OpenVPN is a good starting point as it doesn't use the
>> udp_tunnel_dst_lookup helper which adds some complexity.
> 
> You're basically refactoring an API, it's fairly unusual to leave both
> APIs in place upstream. Unless the number of callers is really huge,
> say >100, or complexity very high. Not sure how others feel but IMHO
> you should try to convert all the tunnels.
> 

I'm introducing an opt-in API, which is useful in some cases, but not
always as it optimizes flows that follow a specific pattern.

Since this API is opt-in, there is no need to over-complicate code
to integrate the new API. The current API is still retained and is not 
made redundant by the new API. Some tunnels may benefit from the new
API with only minor complications, and should be modified in separate
patchsets after this one.

>> There are already noref optimizations in both ipv4 and ip6 
>> (See __ip_queue_xmit, inet6_csk_xmit). This patchset allows for
>> similar optimizations in udp tunnels. Referencing the dst_entry
>> is now redundant, as the entire flow is protected under RCU, so
>> it is removed.
>>
>> With this patchset, I was able to observe a 4% decrease in the total
>> time for ovpn_udp_send_skb using perf.
> 
> Please provide more meaningful perf wins. Relative change of perf in
> one function doesn't tell use.. well.. anything.
>

Okay. Currently, I'm getting a consistent 2% increase in throughput on a VM,
using iperf. Is this what I should mention in the next cover-letter?
 
> Please do not remove the diff stat generated by git in the cover
> letter.
> 
> 

I'll make sure to include it in the next revision.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels
  2025-09-24 16:27     ` Marek Mietus
@ 2025-09-24 23:31       ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2025-09-24 23:31 UTC (permalink / raw)
  To: Marek Mietus; +Cc: netdev, sd, antonio, openvpn-devel

On Wed, 24 Sep 2025 18:27:46 +0200 Marek Mietus wrote:
> > On Mon, 22 Sep 2025 13:06:19 +0200 Marek Mietus wrote:  
> >> This patchset introduces new noref xmit helpers and incorporates
> >> them in the OpenVPN driver. A similar improvement can also be
> >> applied to other tunnel code in the future. The implementation
> >> for OpenVPN is a good starting point as it doesn't use the
> >> udp_tunnel_dst_lookup helper which adds some complexity.  
> > 
> > You're basically refactoring an API, it's fairly unusual to leave both
> > APIs in place upstream. Unless the number of callers is really huge,
> > say >100, or complexity very high. Not sure how others feel but IMHO
> > you should try to convert all the tunnels.
> >   
> 
> I'm introducing an opt-in API, which is useful in some cases, but not
> always as it optimizes flows that follow a specific pattern.
> 
> Since this API is opt-in, there is no need to over-complicate code
> to integrate the new API. The current API is still retained and is not 
> made redundant by the new API. Some tunnels may benefit from the new
> API with only minor complications, and should be modified in separate
> patchsets after this one.

My objection stands. Unless you have a reason why some tunnels need 
to ref the dst you should just convert all. Otherwise this is just
technical debt you're pushing on posterity.

> >> There are already noref optimizations in both ipv4 and ip6 
> >> (See __ip_queue_xmit, inet6_csk_xmit). This patchset allows for
> >> similar optimizations in udp tunnels. Referencing the dst_entry
> >> is now redundant, as the entire flow is protected under RCU, so
> >> it is removed.
> >>
> >> With this patchset, I was able to observe a 4% decrease in the total
> >> time for ovpn_udp_send_skb using perf.  
> > 
> > Please provide more meaningful perf wins. Relative change of perf in
> > one function doesn't tell use.. well.. anything.
> 
> Okay. Currently, I'm getting a consistent 2% increase in throughput on a VM,
> using iperf. Is this what I should mention in the next cover-letter?

Yes, that's much better! Some kind of average over multiple runs and/or
stddev would be ideal.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-09-24 23:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250922110622.10368-1-mmietus97.ref@yahoo.com>
2025-09-22 11:06 ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Marek Mietus
2025-09-22 11:06   ` [PATCH net-next v3 1/3] net: dst_cache: implement RCU variants for dst_cache helpers Marek Mietus
2025-09-22 11:06   ` [PATCH net-next v3 2/3] net: tunnel: implement noref flows in udp_tunnel{,6}_xmit_skb Marek Mietus
2025-09-22 11:06   ` [PATCH net-next v3 3/3] net: ovpn: use new noref xmit flow in ovpn_udp{4,6}_output Marek Mietus
2025-09-24  1:48   ` [PATCH net-next v3 0/3] net: tunnel: introduce noref xmit flows for tunnels Jakub Kicinski
2025-09-24 16:27     ` Marek Mietus
2025-09-24 23:31       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).