* [PATCH v2 net-next 0/2] udp_tunnel: GRO optimizations @ 2025-03-07 18:13 Paolo Abeni 2025-03-07 18:13 ` [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup Paolo Abeni 2025-03-07 18:13 ` [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible Paolo Abeni 0 siblings, 2 replies; 10+ messages in thread From: Paolo Abeni @ 2025-03-07 18:13 UTC (permalink / raw) To: netdev Cc: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern The UDP tunnel GRO stage is source of measurable overhead for workload based on UDP-encapsulated traffic: each incoming packets requires a full UDP socket lookup and an indirect call. In the most common setups a single UDP tunnel device is used. In such case we can optimize both the lookup and the indirect call. Patch 1 tracks per netns the active UDP tunnels and replaces the socket lookup with a single destination port comparison when possible. Patch 2 tracks the different types of UDP tunnels and replaces the indirect call with a static one when there is a single UDP tunnel type active. I measure ~5% performance improvement in TCP over UDP tunnel stream tests on top of this series. --- v1 -> v2: - fixed a couple of typos - fixed UDP_TUNNEL=n build - clarified design choices (see the individual patches changelog for more details) v1: https://lore.kernel.org/netdev/cover.1741275846.git.pabeni@redhat.com/ Paolo Abeni (2): udp_tunnel: create a fastpath GRO lookup. udp_tunnel: use static call for GRO hooks when possible include/linux/udp.h | 16 ++++ include/net/netns/ipv4.h | 11 +++ include/net/udp.h | 1 + include/net/udp_tunnel.h | 22 +++++ net/ipv4/udp.c | 13 ++- net/ipv4/udp_offload.c | 177 ++++++++++++++++++++++++++++++++++++- net/ipv4/udp_tunnel_core.c | 14 +++ net/ipv6/udp.c | 2 + net/ipv6/udp_offload.c | 5 ++ 9 files changed, 259 insertions(+), 2 deletions(-) -- 2.48.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup. 2025-03-07 18:13 [PATCH v2 net-next 0/2] udp_tunnel: GRO optimizations Paolo Abeni @ 2025-03-07 18:13 ` Paolo Abeni 2025-03-08 18:37 ` Willem de Bruijn 2025-03-07 18:13 ` [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible Paolo Abeni 1 sibling, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2025-03-07 18:13 UTC (permalink / raw) To: netdev Cc: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single tunnel device per namespace. Track in each namespace the UDP tunnel socket respecting the above. When only a single one is present, store a reference in the netns. When such reference is not NULL, UDP tunnel GRO lookup just need to match the incoming packet destination port vs the socket local port. The tunnel socket never sets the reuse[port] flag[s]. When bound to no address and interface, no other socket can exist in the same netns matching the specified local port. Note that the UDP tunnel socket reference is stored into struct netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep all the fastpath-related netns fields in the same struct and allow cacheline-based optimization. Currently both the IPv4 and IPv6 socket pointer share the same cacheline as the `udp_table` field. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- v1: v2: - fix [1] -> [i] typo - avoid replacing static_branch_dec(udp_encap_needed_key) with udp_encap_disable() (no-op) - move ipv6 cleanup after encap disable - clarified the design choice in the commit message --- include/linux/udp.h | 16 ++++++++++++++++ include/net/netns/ipv4.h | 11 +++++++++++ include/net/udp.h | 1 + include/net/udp_tunnel.h | 18 ++++++++++++++++++ net/ipv4/udp.c | 13 ++++++++++++- net/ipv4/udp_offload.c | 37 +++++++++++++++++++++++++++++++++++++ net/ipv4/udp_tunnel_core.c | 12 ++++++++++++ net/ipv6/udp.c | 2 ++ net/ipv6/udp_offload.c | 5 +++++ 9 files changed, 114 insertions(+), 1 deletion(-) diff --git a/include/linux/udp.h b/include/linux/udp.h index 0807e21cfec95..895240177f4f4 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -101,6 +101,13 @@ struct udp_sock { /* Cache friendly copy of sk->sk_peek_off >= 0 */ bool peeking_with_offset; + + /* + * Accounting for the tunnel GRO fastpath. + * Unprotected by compilers guard, as it uses space available in + * the last UDP socket cacheline. + */ + struct hlist_node tunnel_list; }; #define udp_test_bit(nr, sk) \ @@ -219,4 +226,13 @@ static inline void udp_allow_gso(struct sock *sk) #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE) +static inline struct sock *udp_tunnel_sk(const struct net *net, bool is_ipv6) +{ +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) + return rcu_dereference(net->ipv4.udp_tunnel_gro[is_ipv6].sk); +#else + return NULL; +#endif +} + #endif /* _LINUX_UDP_H */ diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 650b2dc9199f4..6373e3f17da84 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -47,6 +47,11 @@ struct sysctl_fib_multipath_hash_seed { }; #endif +struct udp_tunnel_gro { + struct sock __rcu *sk; + struct hlist_head list; +}; + struct netns_ipv4 { /* Cacheline organization can be found documented in * Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst. @@ -85,6 +90,11 @@ struct netns_ipv4 { struct inet_timewait_death_row tcp_death_row; struct udp_table *udp_table; +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) + /* Not in a pernet subsys because need to be available at GRO stage */ + struct udp_tunnel_gro udp_tunnel_gro[2]; +#endif + #ifdef CONFIG_SYSCTL struct ctl_table_header *forw_hdr; struct ctl_table_header *frags_hdr; @@ -277,4 +287,5 @@ struct netns_ipv4 { struct hlist_head *inet_addr_lst; struct delayed_work addr_chk_work; }; + #endif diff --git a/include/net/udp.h b/include/net/udp.h index 6e89520e100dc..a772510b2aa58 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -290,6 +290,7 @@ static inline void udp_lib_init_sock(struct sock *sk) struct udp_sock *up = udp_sk(sk); skb_queue_head_init(&up->reader_queue); + INIT_HLIST_NODE(&up->tunnel_list); up->forward_threshold = sk->sk_rcvbuf >> 2; set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags); } diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index a93dc51f6323e..eda0f3e2f65fa 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -203,6 +203,24 @@ static inline void udp_tunnel_encap_enable(struct sock *sk) udp_encap_enable(); } +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) +void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add); +#else +static inline void udp_tunnel_update_gro_lookup(struct net *net, + struct sock *sk, bool add) {} +#endif + +static inline void udp_tunnel_cleanup_gro(struct sock *sk) +{ + struct udp_sock *up = udp_sk(sk); + struct net *net = sock_net(sk); + + if (!up->tunnel_list.pprev) + return; + + udp_tunnel_update_gro_lookup(net, sk, false); +} + #define UDP_TUNNEL_NIC_MAX_TABLES 4 enum udp_tunnel_nic_info_flags { diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 17c7736d83494..ba6286fff077c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2891,8 +2891,10 @@ void udp_destroy_sock(struct sock *sk) if (encap_destroy) encap_destroy(sk); } - if (udp_test_bit(ENCAP_ENABLED, sk)) + if (udp_test_bit(ENCAP_ENABLED, sk)) { static_branch_dec(&udp_encap_needed_key); + udp_tunnel_cleanup_gro(sk); + } } } @@ -3804,6 +3806,15 @@ static void __net_init udp_set_table(struct net *net) static int __net_init udp_pernet_init(struct net *net) { +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) + int i; + + /* No tunnel is configured */ + for (i = 0; i < ARRAY_SIZE(net->ipv4.udp_tunnel_gro); ++i) { + INIT_HLIST_HEAD(&net->ipv4.udp_tunnel_gro[i].list); + rcu_assign_pointer(net->ipv4.udp_tunnel_gro[i].sk, NULL); + } +#endif udp_sysctl_init(net); udp_set_table(net); diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 2c0725583be39..054d4d4a8927f 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -12,6 +12,38 @@ #include <net/udp.h> #include <net/protocol.h> #include <net/inet_common.h> +#include <net/udp_tunnel.h> + +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) +static DEFINE_SPINLOCK(udp_tunnel_gro_lock); + +void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) +{ + bool is_ipv6 = sk->sk_family == AF_INET6; + struct udp_sock *tup, *up = udp_sk(sk); + struct udp_tunnel_gro *udp_tunnel_gro; + + spin_lock(&udp_tunnel_gro_lock); + udp_tunnel_gro = &net->ipv4.udp_tunnel_gro[is_ipv6]; + if (add) + hlist_add_head(&up->tunnel_list, &udp_tunnel_gro->list); + else + hlist_del_init(&up->tunnel_list); + + if (udp_tunnel_gro->list.first && + !udp_tunnel_gro->list.first->next) { + tup = hlist_entry(udp_tunnel_gro->list.first, struct udp_sock, + tunnel_list); + + rcu_assign_pointer(udp_tunnel_gro->sk, (struct sock *)tup); + } else { + rcu_assign_pointer(udp_tunnel_gro->sk, NULL); + } + + spin_unlock(&udp_tunnel_gro_lock); +} +EXPORT_SYMBOL_GPL(udp_tunnel_update_gro_lookup); +#endif static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb, netdev_features_t features, @@ -635,8 +667,13 @@ static struct sock *udp4_gro_lookup_skb(struct sk_buff *skb, __be16 sport, { const struct iphdr *iph = skb_gro_network_header(skb); struct net *net = dev_net_rcu(skb->dev); + struct sock *sk; int iif, sdif; + sk = udp_tunnel_sk(net, false); + if (sk && dport == htons(sk->sk_num)) + return sk; + inet_get_iif_sdif(skb, &iif, &sdif); return __udp4_lib_lookup(net, iph->saddr, sport, diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c index 619a53eb672da..b969c997c89c7 100644 --- a/net/ipv4/udp_tunnel_core.c +++ b/net/ipv4/udp_tunnel_core.c @@ -58,6 +58,15 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg, } EXPORT_SYMBOL(udp_sock_create4); +static inline bool sk_saddr_any(struct sock *sk) +{ +#if IS_ENABLED(CONFIG_IPV6) + return ipv6_addr_any(&sk->sk_v6_rcv_saddr); +#else + return !sk->sk_rcv_saddr; +#endif +} + void setup_udp_tunnel_sock(struct net *net, struct socket *sock, struct udp_tunnel_sock_cfg *cfg) { @@ -80,6 +89,9 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, udp_sk(sk)->gro_complete = cfg->gro_complete; udp_tunnel_encap_enable(sk); + + if (!sk->sk_dport && !sk->sk_bound_dev_if && sk_saddr_any(sock->sk)) + udp_tunnel_update_gro_lookup(net, sock->sk, true); } EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 3a0d6c5a8286b..4701b0dee8c4e 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -46,6 +46,7 @@ #include <net/tcp_states.h> #include <net/ip6_checksum.h> #include <net/ip6_tunnel.h> +#include <net/udp_tunnel.h> #include <net/xfrm.h> #include <net/inet_hashtables.h> #include <net/inet6_hashtables.h> @@ -1825,6 +1826,7 @@ void udpv6_destroy_sock(struct sock *sk) if (udp_test_bit(ENCAP_ENABLED, sk)) { static_branch_dec(&udpv6_encap_needed_key); udp_encap_disable(); + udp_tunnel_cleanup_gro(sk); } } } diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 404212dfc99ab..d8445ac1b2e43 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -118,8 +118,13 @@ static struct sock *udp6_gro_lookup_skb(struct sk_buff *skb, __be16 sport, { const struct ipv6hdr *iph = skb_gro_network_header(skb); struct net *net = dev_net_rcu(skb->dev); + struct sock *sk; int iif, sdif; + sk = udp_tunnel_sk(net, true); + if (sk && dport == htons(sk->sk_num)) + return sk; + inet6_get_iif_sdif(skb, &iif, &sdif); return __udp6_lib_lookup(net, &iph->saddr, sport, -- 2.48.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup. 2025-03-07 18:13 ` [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup Paolo Abeni @ 2025-03-08 18:37 ` Willem de Bruijn 2025-03-09 15:55 ` Paolo Abeni 0 siblings, 1 reply; 10+ messages in thread From: Willem de Bruijn @ 2025-03-08 18:37 UTC (permalink / raw) To: Paolo Abeni, netdev Cc: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern Paolo Abeni wrote: > Most UDP tunnels bind a socket to a local port, with ANY address, no > peer and no interface index specified. > Additionally it's quite common to have a single tunnel device per > namespace. > > Track in each namespace the UDP tunnel socket respecting the above. > When only a single one is present, store a reference in the netns. > > When such reference is not NULL, UDP tunnel GRO lookup just need to > match the incoming packet destination port vs the socket local port. > > The tunnel socket never sets the reuse[port] flag[s]. When bound to no > address and interface, no other socket can exist in the same netns > matching the specified local port. > > Note that the UDP tunnel socket reference is stored into struct > netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep > all the fastpath-related netns fields in the same struct and allow > cacheline-based optimization. Currently both the IPv4 and IPv6 socket > pointer share the same cacheline as the `udp_table` field. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > v1: v2: > - fix [1] -> [i] typo > - avoid replacing static_branch_dec(udp_encap_needed_key) with > udp_encap_disable() (no-op) > - move ipv6 cleanup after encap disable > - clarified the design choice in the commit message > --- > include/linux/udp.h | 16 ++++++++++++++++ > include/net/netns/ipv4.h | 11 +++++++++++ > include/net/udp.h | 1 + > include/net/udp_tunnel.h | 18 ++++++++++++++++++ > net/ipv4/udp.c | 13 ++++++++++++- > net/ipv4/udp_offload.c | 37 +++++++++++++++++++++++++++++++++++++ > net/ipv4/udp_tunnel_core.c | 12 ++++++++++++ > net/ipv6/udp.c | 2 ++ > net/ipv6/udp_offload.c | 5 +++++ > 9 files changed, 114 insertions(+), 1 deletion(-) > > diff --git a/include/linux/udp.h b/include/linux/udp.h > index 0807e21cfec95..895240177f4f4 100644 > --- a/include/linux/udp.h > +++ b/include/linux/udp.h > @@ -101,6 +101,13 @@ struct udp_sock { > > /* Cache friendly copy of sk->sk_peek_off >= 0 */ > bool peeking_with_offset; > + > + /* > + * Accounting for the tunnel GRO fastpath. > + * Unprotected by compilers guard, as it uses space available in > + * the last UDP socket cacheline. > + */ > + struct hlist_node tunnel_list; > }; > > #define udp_test_bit(nr, sk) \ > @@ -219,4 +226,13 @@ static inline void udp_allow_gso(struct sock *sk) > > #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE) > > +static inline struct sock *udp_tunnel_sk(const struct net *net, bool is_ipv6) > +{ > +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > + return rcu_dereference(net->ipv4.udp_tunnel_gro[is_ipv6].sk); > +#else > + return NULL; > +#endif > +} > + > #endif /* _LINUX_UDP_H */ > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > index 650b2dc9199f4..6373e3f17da84 100644 > --- a/include/net/netns/ipv4.h > +++ b/include/net/netns/ipv4.h > @@ -47,6 +47,11 @@ struct sysctl_fib_multipath_hash_seed { > }; > #endif > > +struct udp_tunnel_gro { > + struct sock __rcu *sk; > + struct hlist_head list; > +}; > + > struct netns_ipv4 { > /* Cacheline organization can be found documented in > * Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst. > @@ -85,6 +90,11 @@ struct netns_ipv4 { > struct inet_timewait_death_row tcp_death_row; > struct udp_table *udp_table; > > +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > + /* Not in a pernet subsys because need to be available at GRO stage */ > + struct udp_tunnel_gro udp_tunnel_gro[2]; > +#endif > + > #ifdef CONFIG_SYSCTL > struct ctl_table_header *forw_hdr; > struct ctl_table_header *frags_hdr; > @@ -277,4 +287,5 @@ struct netns_ipv4 { > struct hlist_head *inet_addr_lst; > struct delayed_work addr_chk_work; > }; > + > #endif > diff --git a/include/net/udp.h b/include/net/udp.h > index 6e89520e100dc..a772510b2aa58 100644 > --- a/include/net/udp.h > +++ b/include/net/udp.h > @@ -290,6 +290,7 @@ static inline void udp_lib_init_sock(struct sock *sk) > struct udp_sock *up = udp_sk(sk); > > skb_queue_head_init(&up->reader_queue); > + INIT_HLIST_NODE(&up->tunnel_list); > up->forward_threshold = sk->sk_rcvbuf >> 2; > set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags); > } > diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h > index a93dc51f6323e..eda0f3e2f65fa 100644 > --- a/include/net/udp_tunnel.h > +++ b/include/net/udp_tunnel.h > @@ -203,6 +203,24 @@ static inline void udp_tunnel_encap_enable(struct sock *sk) > udp_encap_enable(); > } > > +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > +void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add); > +#else > +static inline void udp_tunnel_update_gro_lookup(struct net *net, > + struct sock *sk, bool add) {} > +#endif > + > +static inline void udp_tunnel_cleanup_gro(struct sock *sk) > +{ > + struct udp_sock *up = udp_sk(sk); > + struct net *net = sock_net(sk); > + > + if (!up->tunnel_list.pprev) > + return; > + > + udp_tunnel_update_gro_lookup(net, sk, false); > +} > + > #define UDP_TUNNEL_NIC_MAX_TABLES 4 > > enum udp_tunnel_nic_info_flags { > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 17c7736d83494..ba6286fff077c 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -2891,8 +2891,10 @@ void udp_destroy_sock(struct sock *sk) > if (encap_destroy) > encap_destroy(sk); > } > - if (udp_test_bit(ENCAP_ENABLED, sk)) > + if (udp_test_bit(ENCAP_ENABLED, sk)) { > static_branch_dec(&udp_encap_needed_key); > + udp_tunnel_cleanup_gro(sk); > + } > } > } > > @@ -3804,6 +3806,15 @@ static void __net_init udp_set_table(struct net *net) > > static int __net_init udp_pernet_init(struct net *net) > { > +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > + int i; > + > + /* No tunnel is configured */ > + for (i = 0; i < ARRAY_SIZE(net->ipv4.udp_tunnel_gro); ++i) { > + INIT_HLIST_HEAD(&net->ipv4.udp_tunnel_gro[i].list); > + rcu_assign_pointer(net->ipv4.udp_tunnel_gro[i].sk, NULL); nit: RCU_INIT_POINTER suffices > + } > +#endif > udp_sysctl_init(net); > udp_set_table(net); > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index 2c0725583be39..054d4d4a8927f 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -12,6 +12,38 @@ > #include <net/udp.h> > #include <net/protocol.h> > #include <net/inet_common.h> > +#include <net/udp_tunnel.h> > + > +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > +static DEFINE_SPINLOCK(udp_tunnel_gro_lock); > + > +void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) > +{ > + bool is_ipv6 = sk->sk_family == AF_INET6; > + struct udp_sock *tup, *up = udp_sk(sk); > + struct udp_tunnel_gro *udp_tunnel_gro; > + > + spin_lock(&udp_tunnel_gro_lock); > + udp_tunnel_gro = &net->ipv4.udp_tunnel_gro[is_ipv6]; > + if (add) > + hlist_add_head(&up->tunnel_list, &udp_tunnel_gro->list); > + else > + hlist_del_init(&up->tunnel_list); > + > + if (udp_tunnel_gro->list.first && > + !udp_tunnel_gro->list.first->next) { > + tup = hlist_entry(udp_tunnel_gro->list.first, struct udp_sock, > + tunnel_list); > + > + rcu_assign_pointer(udp_tunnel_gro->sk, (struct sock *)tup); If the targeted case is a single tunnel, is it worth maintaining the list? If I understand correctly, it is only there to choose a fall-back when the current tup is removed. But complicates the code quite a bit. Just curious: what does tup stand for? > + } else { > + rcu_assign_pointer(udp_tunnel_gro->sk, NULL); > + } > + > + spin_unlock(&udp_tunnel_gro_lock); > +} > +EXPORT_SYMBOL_GPL(udp_tunnel_update_gro_lookup); > +#endif > > static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb, > netdev_features_t features, > @@ -635,8 +667,13 @@ static struct sock *udp4_gro_lookup_skb(struct sk_buff *skb, __be16 sport, > { > const struct iphdr *iph = skb_gro_network_header(skb); > struct net *net = dev_net_rcu(skb->dev); > + struct sock *sk; > int iif, sdif; > > + sk = udp_tunnel_sk(net, false); > + if (sk && dport == htons(sk->sk_num)) > + return sk; > + > inet_get_iif_sdif(skb, &iif, &sdif); > > return __udp4_lib_lookup(net, iph->saddr, sport, > diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c > index 619a53eb672da..b969c997c89c7 100644 > --- a/net/ipv4/udp_tunnel_core.c > +++ b/net/ipv4/udp_tunnel_core.c > @@ -58,6 +58,15 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg, > } > EXPORT_SYMBOL(udp_sock_create4); > > +static inline bool sk_saddr_any(struct sock *sk) > +{ > +#if IS_ENABLED(CONFIG_IPV6) > + return ipv6_addr_any(&sk->sk_v6_rcv_saddr); > +#else > + return !sk->sk_rcv_saddr; > +#endif > +} > + > void setup_udp_tunnel_sock(struct net *net, struct socket *sock, > struct udp_tunnel_sock_cfg *cfg) > { > @@ -80,6 +89,9 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, > udp_sk(sk)->gro_complete = cfg->gro_complete; > > udp_tunnel_encap_enable(sk); > + > + if (!sk->sk_dport && !sk->sk_bound_dev_if && sk_saddr_any(sock->sk)) > + udp_tunnel_update_gro_lookup(net, sock->sk, true); > } > EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock); > > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 3a0d6c5a8286b..4701b0dee8c4e 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -46,6 +46,7 @@ > #include <net/tcp_states.h> > #include <net/ip6_checksum.h> > #include <net/ip6_tunnel.h> > +#include <net/udp_tunnel.h> > #include <net/xfrm.h> > #include <net/inet_hashtables.h> > #include <net/inet6_hashtables.h> > @@ -1825,6 +1826,7 @@ void udpv6_destroy_sock(struct sock *sk) > if (udp_test_bit(ENCAP_ENABLED, sk)) { > static_branch_dec(&udpv6_encap_needed_key); > udp_encap_disable(); > + udp_tunnel_cleanup_gro(sk); > } > } > } > diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c > index 404212dfc99ab..d8445ac1b2e43 100644 > --- a/net/ipv6/udp_offload.c > +++ b/net/ipv6/udp_offload.c > @@ -118,8 +118,13 @@ static struct sock *udp6_gro_lookup_skb(struct sk_buff *skb, __be16 sport, > { > const struct ipv6hdr *iph = skb_gro_network_header(skb); > struct net *net = dev_net_rcu(skb->dev); > + struct sock *sk; > int iif, sdif; > > + sk = udp_tunnel_sk(net, true); > + if (sk && dport == htons(sk->sk_num)) > + return sk; > + > inet6_get_iif_sdif(skb, &iif, &sdif); > > return __udp6_lib_lookup(net, &iph->saddr, sport, > -- > 2.48.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup. 2025-03-08 18:37 ` Willem de Bruijn @ 2025-03-09 15:55 ` Paolo Abeni 2025-03-10 3:51 ` Willem de Bruijn 0 siblings, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2025-03-09 15:55 UTC (permalink / raw) To: Willem de Bruijn, netdev Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern On 3/8/25 7:37 PM, Willem de Bruijn wrote: > Paolo Abeni wrote: >> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >> index 2c0725583be39..054d4d4a8927f 100644 >> --- a/net/ipv4/udp_offload.c >> +++ b/net/ipv4/udp_offload.c >> @@ -12,6 +12,38 @@ >> #include <net/udp.h> >> #include <net/protocol.h> >> #include <net/inet_common.h> >> +#include <net/udp_tunnel.h> >> + >> +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) >> +static DEFINE_SPINLOCK(udp_tunnel_gro_lock); >> + >> +void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) >> +{ >> + bool is_ipv6 = sk->sk_family == AF_INET6; >> + struct udp_sock *tup, *up = udp_sk(sk); >> + struct udp_tunnel_gro *udp_tunnel_gro; >> + >> + spin_lock(&udp_tunnel_gro_lock); >> + udp_tunnel_gro = &net->ipv4.udp_tunnel_gro[is_ipv6]; >> + if (add) >> + hlist_add_head(&up->tunnel_list, &udp_tunnel_gro->list); >> + else >> + hlist_del_init(&up->tunnel_list); >> + >> + if (udp_tunnel_gro->list.first && >> + !udp_tunnel_gro->list.first->next) { >> + tup = hlist_entry(udp_tunnel_gro->list.first, struct udp_sock, >> + tunnel_list); >> + >> + rcu_assign_pointer(udp_tunnel_gro->sk, (struct sock *)tup); > > If the targeted case is a single tunnel, is it worth maintaining the list? > > If I understand correctly, it is only there to choose a fall-back when the > current tup is removed. But complicates the code quite a bit. I'll try to answer the questions on both patches here. I guess in the end there is a relevant amount of personal preferences. Overall accounting is ~20 lines, IMHO it's not much. I think we should at least preserve the optimization when the relevant tunnel is deleted and re-created, and the minimal accounting required for that will drop just a bunch of lines from udp_tunnel_update_gro_lookup(), while keeping all the hooking. Additionally I think it would be surprising transiently applying some unusual configuration and as a side effect get lower performances up to the next reboot (lacking complete accounting). > Just curious: what does tup stand for? Tunnel Udp Pointer. Suggestion for better name welcome! Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup. 2025-03-09 15:55 ` Paolo Abeni @ 2025-03-10 3:51 ` Willem de Bruijn 2025-03-10 11:29 ` Paolo Abeni 0 siblings, 1 reply; 10+ messages in thread From: Willem de Bruijn @ 2025-03-10 3:51 UTC (permalink / raw) To: Paolo Abeni, Willem de Bruijn, netdev Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern Paolo Abeni wrote: > On 3/8/25 7:37 PM, Willem de Bruijn wrote: > > Paolo Abeni wrote: > >> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > >> index 2c0725583be39..054d4d4a8927f 100644 > >> --- a/net/ipv4/udp_offload.c > >> +++ b/net/ipv4/udp_offload.c > >> @@ -12,6 +12,38 @@ > >> #include <net/udp.h> > >> #include <net/protocol.h> > >> #include <net/inet_common.h> > >> +#include <net/udp_tunnel.h> > >> + > >> +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > >> +static DEFINE_SPINLOCK(udp_tunnel_gro_lock); > >> + > >> +void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) > >> +{ > >> + bool is_ipv6 = sk->sk_family == AF_INET6; > >> + struct udp_sock *tup, *up = udp_sk(sk); > >> + struct udp_tunnel_gro *udp_tunnel_gro; > >> + > >> + spin_lock(&udp_tunnel_gro_lock); > >> + udp_tunnel_gro = &net->ipv4.udp_tunnel_gro[is_ipv6]; > >> + if (add) > >> + hlist_add_head(&up->tunnel_list, &udp_tunnel_gro->list); > >> + else > >> + hlist_del_init(&up->tunnel_list); > >> + > >> + if (udp_tunnel_gro->list.first && > >> + !udp_tunnel_gro->list.first->next) { > >> + tup = hlist_entry(udp_tunnel_gro->list.first, struct udp_sock, > >> + tunnel_list); > >> + > >> + rcu_assign_pointer(udp_tunnel_gro->sk, (struct sock *)tup); > > > > If the targeted case is a single tunnel, is it worth maintaining the list? > > > > If I understand correctly, it is only there to choose a fall-back when the > > current tup is removed. But complicates the code quite a bit. > > I'll try to answer the questions on both patches here. > > I guess in the end there is a relevant amount of personal preferences. > Overall accounting is ~20 lines, IMHO it's not much. In the next patch almost the entire body of udp_tunnel_update_gro_rcv is there to maintain the refcount and list of tunnels. Agreed that in the end it is subjective. Just that both patches mention optimizing for the common case of a single tunnel type. If you feel strongly, keep the list, of course. Specific to the implementation + if (enabled && !old_enabled) { Does enabled imply !old_enabled, once we get here? All paths that do not modify udp_tunnel_gro_type_nr goto out. + for (i = 0; i < UDP_MAX_TUNNEL_TYPES; i++) { + cur = &udp_tunnel_gro_types[i]; + if (refcount_read(&cur->count)) { + static_call_update(udp_tunnel_gro_rcv, + cur->gro_receive); + static_branch_enable(&udp_tunnel_static_call); + } + } Can you use avail, rather than walk the list again? > I think we should at least preserve the optimization when the relevant > tunnel is deleted and re-created, and the minimal accounting required > for that will drop just a bunch of lines from > udp_tunnel_update_gro_lookup(), while keeping all the hooking. > > Additionally I think it would be surprising transiently applying some > unusual configuration and as a side effect get lower performances up to > the next reboot (lacking complete accounting). > > > Just curious: what does tup stand for? > > Tunnel Udp Pointer. Suggestion for better name welcome! > > Thanks, > > Paolo > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup. 2025-03-10 3:51 ` Willem de Bruijn @ 2025-03-10 11:29 ` Paolo Abeni 0 siblings, 0 replies; 10+ messages in thread From: Paolo Abeni @ 2025-03-10 11:29 UTC (permalink / raw) To: Willem de Bruijn, netdev Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern On 3/10/25 4:51 AM, Willem de Bruijn wrote: > Paolo Abeni wrote: >> On 3/8/25 7:37 PM, Willem de Bruijn wrote: >>> Paolo Abeni wrote: >>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >>>> index 2c0725583be39..054d4d4a8927f 100644 >>>> --- a/net/ipv4/udp_offload.c >>>> +++ b/net/ipv4/udp_offload.c >>>> @@ -12,6 +12,38 @@ >>>> #include <net/udp.h> >>>> #include <net/protocol.h> >>>> #include <net/inet_common.h> >>>> +#include <net/udp_tunnel.h> >>>> + >>>> +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) >>>> +static DEFINE_SPINLOCK(udp_tunnel_gro_lock); >>>> + >>>> +void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) >>>> +{ >>>> + bool is_ipv6 = sk->sk_family == AF_INET6; >>>> + struct udp_sock *tup, *up = udp_sk(sk); >>>> + struct udp_tunnel_gro *udp_tunnel_gro; >>>> + >>>> + spin_lock(&udp_tunnel_gro_lock); >>>> + udp_tunnel_gro = &net->ipv4.udp_tunnel_gro[is_ipv6]; >>>> + if (add) >>>> + hlist_add_head(&up->tunnel_list, &udp_tunnel_gro->list); >>>> + else >>>> + hlist_del_init(&up->tunnel_list); >>>> + >>>> + if (udp_tunnel_gro->list.first && >>>> + !udp_tunnel_gro->list.first->next) { >>>> + tup = hlist_entry(udp_tunnel_gro->list.first, struct udp_sock, >>>> + tunnel_list); >>>> + >>>> + rcu_assign_pointer(udp_tunnel_gro->sk, (struct sock *)tup); >>> >>> If the targeted case is a single tunnel, is it worth maintaining the list? >>> >>> If I understand correctly, it is only there to choose a fall-back when the >>> current tup is removed. But complicates the code quite a bit. >> >> I'll try to answer the questions on both patches here. >> >> I guess in the end there is a relevant amount of personal preferences. >> Overall accounting is ~20 lines, IMHO it's not much. > > In the next patch almost the entire body of udp_tunnel_update_gro_rcv > is there to maintain the refcount and list of tunnels. > > Agreed that in the end it is subjective. Just that both patches > mention optimizing for the common case of a single tunnel type. > If you feel strongly, keep the list, of course. > > Specific to the implementation > > + if (enabled && !old_enabled) { > > Does enabled imply !old_enabled, once we get here? All paths > that do not modify udp_tunnel_gro_type_nr goto out. You are right, this can be simplified a bit just checking for the current `udp_tunnel_gro_type_nr` value. > > + for (i = 0; i < UDP_MAX_TUNNEL_TYPES; i++) { > + cur = &udp_tunnel_gro_types[i]; > + if (refcount_read(&cur->count)) { > + static_call_update(udp_tunnel_gro_rcv, > + cur->gro_receive); > + static_branch_enable(&udp_tunnel_static_call); > + } > + } > > Can you use avail, rather than walk the list again? When we reach the above code due to a tunnel deletion, `avail` will point to an unused array entry - that is "available" to store new tunnel info. Instead we are looking for the only entry currently in use. WRT the code complexity in patch 2/2, I attempted a few simplified tracking approaches, and the only one leading to measurably simpler code requires permanently (up to the next reboot) disabling the optimization as soon as any additional tunnel of any type is created in the system (child netns included). I think the code complexity delta is worthy avoiding such restriction. Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible 2025-03-07 18:13 [PATCH v2 net-next 0/2] udp_tunnel: GRO optimizations Paolo Abeni 2025-03-07 18:13 ` [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup Paolo Abeni @ 2025-03-07 18:13 ` Paolo Abeni 2025-03-08 18:40 ` Willem de Bruijn 1 sibling, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2025-03-07 18:13 UTC (permalink / raw) To: netdev Cc: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern It's quite common to have a single UDP tunnel type active in the whole system. In such a case we can replace the indirect call for the UDP tunnel GRO callback with a static call. Add the related accounting in the control path and switch to static call when possible. To keep the code simple use a static array for the registered tunnel types, and size such array based on the kernel config. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- v1 -> v2: - fix UDP_TUNNEL=n build --- include/net/udp_tunnel.h | 4 ++ net/ipv4/udp_offload.c | 140 ++++++++++++++++++++++++++++++++++++- net/ipv4/udp_tunnel_core.c | 2 + 3 files changed, 145 insertions(+), 1 deletion(-) diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index eda0f3e2f65fa..a7b230867eb14 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -205,9 +205,11 @@ static inline void udp_tunnel_encap_enable(struct sock *sk) #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add); +void udp_tunnel_update_gro_rcv(struct sock *sk, bool add); #else static inline void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) {} +static inline void udp_tunnel_update_gro_rcv(struct sock *sk, bool add) {} #endif static inline void udp_tunnel_cleanup_gro(struct sock *sk) @@ -215,6 +217,8 @@ static inline void udp_tunnel_cleanup_gro(struct sock *sk) struct udp_sock *up = udp_sk(sk); struct net *net = sock_net(sk); + udp_tunnel_update_gro_rcv(sk, false); + if (!up->tunnel_list.pprev) return; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 054d4d4a8927f..f06dd82d28562 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -15,6 +15,39 @@ #include <net/udp_tunnel.h> #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) + +/* + * Dummy GRO tunnel callback; should never be invoked, exists + * mainly to avoid dangling/NULL values for the udp tunnel + * static call. + */ +static struct sk_buff *dummy_gro_rcv(struct sock *sk, + struct list_head *head, + struct sk_buff *skb) +{ + WARN_ON_ONCE(1); + NAPI_GRO_CB(skb)->flush = 1; + return NULL; +} + +typedef struct sk_buff *(*udp_tunnel_gro_rcv_t)(struct sock *sk, + struct list_head *head, + struct sk_buff *skb); + +struct udp_tunnel_type_entry { + udp_tunnel_gro_rcv_t gro_receive; + refcount_t count; +}; + +#define UDP_MAX_TUNNEL_TYPES (IS_ENABLED(CONFIG_GENEVE) + \ + IS_ENABLED(CONFIG_VXLAN) * 2 + \ + IS_ENABLED(CONFIG_FOE) * 2) + +DEFINE_STATIC_CALL(udp_tunnel_gro_rcv, dummy_gro_rcv); +static DEFINE_STATIC_KEY_FALSE(udp_tunnel_static_call); +static struct mutex udp_tunnel_gro_type_lock; +static struct udp_tunnel_type_entry udp_tunnel_gro_types[UDP_MAX_TUNNEL_TYPES]; +static unsigned int udp_tunnel_gro_type_nr; static DEFINE_SPINLOCK(udp_tunnel_gro_lock); void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) @@ -43,6 +76,109 @@ void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) spin_unlock(&udp_tunnel_gro_lock); } EXPORT_SYMBOL_GPL(udp_tunnel_update_gro_lookup); + +void udp_tunnel_update_gro_rcv(struct sock *sk, bool add) +{ + struct udp_tunnel_type_entry *cur = NULL, *avail = NULL; + struct udp_sock *up = udp_sk(sk); + bool enabled, old_enabled; + int i; + + if (!up->gro_receive) + return; + + mutex_lock(&udp_tunnel_gro_type_lock); + for (i = 0; i < UDP_MAX_TUNNEL_TYPES; i++) { + if (!refcount_read(&udp_tunnel_gro_types[i].count)) + avail = &udp_tunnel_gro_types[i]; + else if (udp_tunnel_gro_types[i].gro_receive == up->gro_receive) + cur = &udp_tunnel_gro_types[i]; + } + old_enabled = udp_tunnel_gro_type_nr == 1; + if (add) { + /* + * Update the matching entry, if found, or add a new one + * if needed + */ + if (cur) { + refcount_inc(&cur->count); + goto out; + } + + if (unlikely(!avail)) { + /* Ensure static call will never be enabled */ + pr_err_once("Unexpected amount of UDP tunnel types, please update UDP_MAX_TUNNEL_TYPES\n"); + udp_tunnel_gro_type_nr = UDP_MAX_TUNNEL_TYPES + 1; + goto out; + } + + refcount_set(&avail->count, 1); + avail->gro_receive = up->gro_receive; + udp_tunnel_gro_type_nr++; + } else { + /* + * The stack cleanups only successfully added tunnel, the + * lookup on removal should never fail. + */ + if (WARN_ON_ONCE(!cur)) + goto out; + + if (!refcount_dec_and_test(&cur->count)) + goto out; + udp_tunnel_gro_type_nr--; + } + + /* Update the static call only when switching status */ + enabled = udp_tunnel_gro_type_nr == 1; + if (enabled && !old_enabled) { + for (i = 0; i < UDP_MAX_TUNNEL_TYPES; i++) { + cur = &udp_tunnel_gro_types[i]; + if (refcount_read(&cur->count)) { + static_call_update(udp_tunnel_gro_rcv, + cur->gro_receive); + static_branch_enable(&udp_tunnel_static_call); + } + } + } else if (!enabled && old_enabled) { + static_branch_disable(&udp_tunnel_static_call); + static_call_update(udp_tunnel_gro_rcv, dummy_gro_rcv); + } + +out: + mutex_unlock(&udp_tunnel_gro_type_lock); +} +EXPORT_SYMBOL_GPL(udp_tunnel_update_gro_rcv); + +static void udp_tunnel_gro_init(void) +{ + mutex_init(&udp_tunnel_gro_type_lock); +} + +static struct sk_buff *udp_tunnel_gro_rcv(struct sock *sk, + struct list_head *head, + struct sk_buff *skb) +{ + if (static_branch_likely(&udp_tunnel_static_call)) { + if (unlikely(gro_recursion_inc_test(skb))) { + NAPI_GRO_CB(skb)->flush |= 1; + return NULL; + } + return static_call(udp_tunnel_gro_rcv)(sk, head, skb); + } + return call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb); +} + +#else + +static void udp_tunnel_gro_init(void) {} + +static struct sk_buff *udp_tunnel_gro_rcv(struct sock *sk, + struct list_head *head, + struct sk_buff *skb) +{ + return call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb); +} + #endif static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb, @@ -654,7 +790,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */ skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr)); - pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb); + pp = udp_tunnel_gro_rcv(sk, head, skb); out: skb_gro_flush_final(skb, pp, flush); @@ -804,5 +940,7 @@ int __init udpv4_offload_init(void) .gro_complete = udp4_gro_complete, }, }; + + udp_tunnel_gro_init(); return inet_add_offload(&net_hotdata.udpv4_offload, IPPROTO_UDP); } diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c index b969c997c89c7..1ebc5daff5bc8 100644 --- a/net/ipv4/udp_tunnel_core.c +++ b/net/ipv4/udp_tunnel_core.c @@ -90,6 +90,8 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, udp_tunnel_encap_enable(sk); + udp_tunnel_update_gro_rcv(sock->sk, true); + if (!sk->sk_dport && !sk->sk_bound_dev_if && sk_saddr_any(sock->sk)) udp_tunnel_update_gro_lookup(net, sock->sk, true); } -- 2.48.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible 2025-03-07 18:13 ` [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible Paolo Abeni @ 2025-03-08 18:40 ` Willem de Bruijn 2025-03-09 15:57 ` Paolo Abeni 0 siblings, 1 reply; 10+ messages in thread From: Willem de Bruijn @ 2025-03-08 18:40 UTC (permalink / raw) To: Paolo Abeni, netdev Cc: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern Paolo Abeni wrote: > It's quite common to have a single UDP tunnel type active in the > whole system. In such a case we can replace the indirect call for > the UDP tunnel GRO callback with a static call. > > Add the related accounting in the control path and switch to static > call when possible. To keep the code simple use a static array for > the registered tunnel types, and size such array based on the kernel > config. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > v1 -> v2: > - fix UDP_TUNNEL=n build > --- > include/net/udp_tunnel.h | 4 ++ > net/ipv4/udp_offload.c | 140 ++++++++++++++++++++++++++++++++++++- > net/ipv4/udp_tunnel_core.c | 2 + > 3 files changed, 145 insertions(+), 1 deletion(-) > > diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h > index eda0f3e2f65fa..a7b230867eb14 100644 > --- a/include/net/udp_tunnel.h > +++ b/include/net/udp_tunnel.h > @@ -205,9 +205,11 @@ static inline void udp_tunnel_encap_enable(struct sock *sk) > > #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add); > +void udp_tunnel_update_gro_rcv(struct sock *sk, bool add); > #else > static inline void udp_tunnel_update_gro_lookup(struct net *net, > struct sock *sk, bool add) {} > +static inline void udp_tunnel_update_gro_rcv(struct sock *sk, bool add) {} > #endif > > static inline void udp_tunnel_cleanup_gro(struct sock *sk) > @@ -215,6 +217,8 @@ static inline void udp_tunnel_cleanup_gro(struct sock *sk) > struct udp_sock *up = udp_sk(sk); > struct net *net = sock_net(sk); > > + udp_tunnel_update_gro_rcv(sk, false); > + > if (!up->tunnel_list.pprev) > return; > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index 054d4d4a8927f..f06dd82d28562 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -15,6 +15,39 @@ > #include <net/udp_tunnel.h> > > #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > + > +/* > + * Dummy GRO tunnel callback; should never be invoked, exists > + * mainly to avoid dangling/NULL values for the udp tunnel > + * static call. > + */ > +static struct sk_buff *dummy_gro_rcv(struct sock *sk, > + struct list_head *head, > + struct sk_buff *skb) > +{ > + WARN_ON_ONCE(1); > + NAPI_GRO_CB(skb)->flush = 1; > + return NULL; > +} > + > +typedef struct sk_buff *(*udp_tunnel_gro_rcv_t)(struct sock *sk, > + struct list_head *head, > + struct sk_buff *skb); > + > +struct udp_tunnel_type_entry { > + udp_tunnel_gro_rcv_t gro_receive; > + refcount_t count; > +}; > + > +#define UDP_MAX_TUNNEL_TYPES (IS_ENABLED(CONFIG_GENEVE) + \ > + IS_ENABLED(CONFIG_VXLAN) * 2 + \ > + IS_ENABLED(CONFIG_FOE) * 2) CONFIG_BAREUDP > + > +DEFINE_STATIC_CALL(udp_tunnel_gro_rcv, dummy_gro_rcv); > +static DEFINE_STATIC_KEY_FALSE(udp_tunnel_static_call); > +static struct mutex udp_tunnel_gro_type_lock; > +static struct udp_tunnel_type_entry udp_tunnel_gro_types[UDP_MAX_TUNNEL_TYPES]; > +static unsigned int udp_tunnel_gro_type_nr; > static DEFINE_SPINLOCK(udp_tunnel_gro_lock); > > void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) > @@ -43,6 +76,109 @@ void udp_tunnel_update_gro_lookup(struct net *net, struct sock *sk, bool add) > spin_unlock(&udp_tunnel_gro_lock); > } > EXPORT_SYMBOL_GPL(udp_tunnel_update_gro_lookup); > + > +void udp_tunnel_update_gro_rcv(struct sock *sk, bool add) > +{ > + struct udp_tunnel_type_entry *cur = NULL, *avail = NULL; > + struct udp_sock *up = udp_sk(sk); > + bool enabled, old_enabled; > + int i; > + > + if (!up->gro_receive) > + return; > + > + mutex_lock(&udp_tunnel_gro_type_lock); > + for (i = 0; i < UDP_MAX_TUNNEL_TYPES; i++) { > + if (!refcount_read(&udp_tunnel_gro_types[i].count)) > + avail = &udp_tunnel_gro_types[i]; > + else if (udp_tunnel_gro_types[i].gro_receive == up->gro_receive) > + cur = &udp_tunnel_gro_types[i]; > + } > + old_enabled = udp_tunnel_gro_type_nr == 1; > + if (add) { > + /* > + * Update the matching entry, if found, or add a new one > + * if needed > + */ > + if (cur) { > + refcount_inc(&cur->count); > + goto out; > + } > + > + if (unlikely(!avail)) { > + /* Ensure static call will never be enabled */ > + pr_err_once("Unexpected amount of UDP tunnel types, please update UDP_MAX_TUNNEL_TYPES\n"); > + udp_tunnel_gro_type_nr = UDP_MAX_TUNNEL_TYPES + 1; > + goto out; > + } > + > + refcount_set(&avail->count, 1); > + avail->gro_receive = up->gro_receive; > + udp_tunnel_gro_type_nr++; > + } else { > + /* > + * The stack cleanups only successfully added tunnel, the > + * lookup on removal should never fail. > + */ > + if (WARN_ON_ONCE(!cur)) > + goto out; > + > + if (!refcount_dec_and_test(&cur->count)) > + goto out; > + udp_tunnel_gro_type_nr--; > + } > + > + /* Update the static call only when switching status */ > + enabled = udp_tunnel_gro_type_nr == 1; > + if (enabled && !old_enabled) { > + for (i = 0; i < UDP_MAX_TUNNEL_TYPES; i++) { > + cur = &udp_tunnel_gro_types[i]; > + if (refcount_read(&cur->count)) { > + static_call_update(udp_tunnel_gro_rcv, > + cur->gro_receive); > + static_branch_enable(&udp_tunnel_static_call); > + } > + } > + } else if (!enabled && old_enabled) { > + static_branch_disable(&udp_tunnel_static_call); > + static_call_update(udp_tunnel_gro_rcv, dummy_gro_rcv); > + } Same is patch 1: is the added complexity of tracking all tunnels warranted, over just optimistically claiming the static_call for the first tunnel and leaving it NULL when that is removed, even if other tunnels were added in the meantime. > + > +out: > + mutex_unlock(&udp_tunnel_gro_type_lock); > +} > +EXPORT_SYMBOL_GPL(udp_tunnel_update_gro_rcv); > + > +static void udp_tunnel_gro_init(void) > +{ > + mutex_init(&udp_tunnel_gro_type_lock); > +} > + > +static struct sk_buff *udp_tunnel_gro_rcv(struct sock *sk, > + struct list_head *head, > + struct sk_buff *skb) > +{ > + if (static_branch_likely(&udp_tunnel_static_call)) { > + if (unlikely(gro_recursion_inc_test(skb))) { > + NAPI_GRO_CB(skb)->flush |= 1; > + return NULL; > + } > + return static_call(udp_tunnel_gro_rcv)(sk, head, skb); > + } > + return call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb); > +} > + > +#else > + > +static void udp_tunnel_gro_init(void) {} > + > +static struct sk_buff *udp_tunnel_gro_rcv(struct sock *sk, > + struct list_head *head, > + struct sk_buff *skb) > +{ > + return call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb); > +} > + > #endif > > static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb, > @@ -654,7 +790,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, > > skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */ > skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr)); > - pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb); > + pp = udp_tunnel_gro_rcv(sk, head, skb); > > out: > skb_gro_flush_final(skb, pp, flush); > @@ -804,5 +940,7 @@ int __init udpv4_offload_init(void) > .gro_complete = udp4_gro_complete, > }, > }; > + > + udp_tunnel_gro_init(); > return inet_add_offload(&net_hotdata.udpv4_offload, IPPROTO_UDP); > } > diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c > index b969c997c89c7..1ebc5daff5bc8 100644 > --- a/net/ipv4/udp_tunnel_core.c > +++ b/net/ipv4/udp_tunnel_core.c > @@ -90,6 +90,8 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, > > udp_tunnel_encap_enable(sk); > > + udp_tunnel_update_gro_rcv(sock->sk, true); > + > if (!sk->sk_dport && !sk->sk_bound_dev_if && sk_saddr_any(sock->sk)) > udp_tunnel_update_gro_lookup(net, sock->sk, true); > } > -- > 2.48.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible 2025-03-08 18:40 ` Willem de Bruijn @ 2025-03-09 15:57 ` Paolo Abeni 2025-03-10 3:30 ` Willem de Bruijn 0 siblings, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2025-03-09 15:57 UTC (permalink / raw) To: Willem de Bruijn, netdev Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern On 3/8/25 7:40 PM, Willem de Bruijn wrote: > Paolo Abeni wrote: >> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >> index 054d4d4a8927f..f06dd82d28562 100644 >> --- a/net/ipv4/udp_offload.c >> +++ b/net/ipv4/udp_offload.c >> @@ -15,6 +15,39 @@ >> #include <net/udp_tunnel.h> >> >> #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) >> + >> +/* >> + * Dummy GRO tunnel callback; should never be invoked, exists >> + * mainly to avoid dangling/NULL values for the udp tunnel >> + * static call. >> + */ >> +static struct sk_buff *dummy_gro_rcv(struct sock *sk, >> + struct list_head *head, >> + struct sk_buff *skb) >> +{ >> + WARN_ON_ONCE(1); >> + NAPI_GRO_CB(skb)->flush = 1; >> + return NULL; >> +} >> + >> +typedef struct sk_buff *(*udp_tunnel_gro_rcv_t)(struct sock *sk, >> + struct list_head *head, >> + struct sk_buff *skb); >> + >> +struct udp_tunnel_type_entry { >> + udp_tunnel_gro_rcv_t gro_receive; >> + refcount_t count; >> +}; >> + >> +#define UDP_MAX_TUNNEL_TYPES (IS_ENABLED(CONFIG_GENEVE) + \ >> + IS_ENABLED(CONFIG_VXLAN) * 2 + \ >> + IS_ENABLED(CONFIG_FOE) * 2) > > CONFIG_BAREUDP Why? AFAICS BAREUDP does not implement the gro_receive callback. UDP tunnel without such callback are irrelevant here. Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible 2025-03-09 15:57 ` Paolo Abeni @ 2025-03-10 3:30 ` Willem de Bruijn 0 siblings, 0 replies; 10+ messages in thread From: Willem de Bruijn @ 2025-03-10 3:30 UTC (permalink / raw) To: Paolo Abeni, Willem de Bruijn, netdev Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman, David Ahern Paolo Abeni wrote: > On 3/8/25 7:40 PM, Willem de Bruijn wrote: > > Paolo Abeni wrote: > >> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > >> index 054d4d4a8927f..f06dd82d28562 100644 > >> --- a/net/ipv4/udp_offload.c > >> +++ b/net/ipv4/udp_offload.c > >> @@ -15,6 +15,39 @@ > >> #include <net/udp_tunnel.h> > >> > >> #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) > >> + > >> +/* > >> + * Dummy GRO tunnel callback; should never be invoked, exists > >> + * mainly to avoid dangling/NULL values for the udp tunnel > >> + * static call. > >> + */ > >> +static struct sk_buff *dummy_gro_rcv(struct sock *sk, > >> + struct list_head *head, > >> + struct sk_buff *skb) > >> +{ > >> + WARN_ON_ONCE(1); > >> + NAPI_GRO_CB(skb)->flush = 1; > >> + return NULL; > >> +} > >> + > >> +typedef struct sk_buff *(*udp_tunnel_gro_rcv_t)(struct sock *sk, > >> + struct list_head *head, > >> + struct sk_buff *skb); > >> + > >> +struct udp_tunnel_type_entry { > >> + udp_tunnel_gro_rcv_t gro_receive; > >> + refcount_t count; > >> +}; > >> + > >> +#define UDP_MAX_TUNNEL_TYPES (IS_ENABLED(CONFIG_GENEVE) + \ > >> + IS_ENABLED(CONFIG_VXLAN) * 2 + \ > >> + IS_ENABLED(CONFIG_FOE) * 2) > > > > CONFIG_BAREUDP > > Why? AFAICS BAREUDP does not implement the gro_receive callback. UDP > tunnel without such callback are irrelevant here. You're right, of course. I looked at the wrong callback. Sorry. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-03-10 11:29 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-07 18:13 [PATCH v2 net-next 0/2] udp_tunnel: GRO optimizations Paolo Abeni 2025-03-07 18:13 ` [PATCH v2 net-next 1/2] udp_tunnel: create a fastpath GRO lookup Paolo Abeni 2025-03-08 18:37 ` Willem de Bruijn 2025-03-09 15:55 ` Paolo Abeni 2025-03-10 3:51 ` Willem de Bruijn 2025-03-10 11:29 ` Paolo Abeni 2025-03-07 18:13 ` [PATCH v2 net-next 2/2] udp_tunnel: use static call for GRO hooks when possible Paolo Abeni 2025-03-08 18:40 ` Willem de Bruijn 2025-03-09 15:57 ` Paolo Abeni 2025-03-10 3:30 ` Willem de Bruijn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).