* Re: [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one
@ 2017-01-12 17:41 Craig Gallek
2017-01-12 18:13 ` Josef Bacik
0 siblings, 1 reply; 5+ messages in thread
From: Craig Gallek @ 2017-01-12 17:41 UTC (permalink / raw)
To: Josef Bacik
Cc: David Miller, Hannes Frederic Sowa, Eric Dumazet, Tom Herbert,
netdev, kernel-team
On Wed, Jan 11, 2017 at 3:19 PM, Josef Bacik <jbacik@fb.com> wrote:
> +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
> + bool match_wildcard)
> +{
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (sk->sk_family == AF_INET6)
Still wrapping my head around this, so take it with a grain of salt,
but it's not obvious to me that sk and sk2 are guaranteed to have the
same family here (or if it even matters). Especially in the context
of the next patch which removes the bind_conflict callback... Does
this need to be an OR check for the family of either socket? Or is it
safe as-is because the first socket passed to this function is always
the existing one and the second one is the possible conflict socket?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one 2017-01-12 17:41 [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Craig Gallek @ 2017-01-12 18:13 ` Josef Bacik 0 siblings, 0 replies; 5+ messages in thread From: Josef Bacik @ 2017-01-12 18:13 UTC (permalink / raw) To: Craig Gallek Cc: David Miller, Hannes Frederic Sowa, Eric Dumazet, Tom Herbert, netdev, kernel-team On Thu, Jan 12, 2017 at 12:41 PM, Craig Gallek <kraigatgoog@gmail.com> wrote: > On Wed, Jan 11, 2017 at 3:19 PM, Josef Bacik <jbacik@fb.com> wrote: >> +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock >> *sk2, >> + bool match_wildcard) >> +{ >> +#if IS_ENABLED(CONFIG_IPV6) >> + if (sk->sk_family == AF_INET6) > Still wrapping my head around this, so take it with a grain of salt, > but it's not obvious to me that sk and sk2 are guaranteed to have the > same family here (or if it even matters). Especially in the context > of the next patch which removes the bind_conflict callback... Does > this need to be an OR check for the family of either socket? Or is it > safe as-is because the first socket passed to this function is always > the existing one and the second one is the possible conflict socket? It's safe as is, all we care is that sk1 is the INET6 socket. In the compare function we use inet6_rcv_saddr(sk2) which will return NULL if it isn't INET6 and then the function handles that case appropriately. This stuff is subtle so it's easy to get confused, I always made sure to run it on our production boxes to make sure I didn't break something ;). Thanks, Josef ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/6 net-next][V4] Rework inet_csk_get_port @ 2017-01-17 15:51 Josef Bacik 2017-01-17 15:51 ` [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Josef Bacik 0 siblings, 1 reply; 5+ messages in thread From: Josef Bacik @ 2017-01-17 15:51 UTC (permalink / raw) To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev, kernel-team V3->V4: -Removed the random include of addrconf.h that is no longer needed. V2->V3: -Dropped the fastsock from the tb and instead just carry the saddrs, family, and ipv6 only flag. -Reworked the helper functions to deal with this change so I could still use them when checking the fast path. -Killed tb->num_owners as per Eric's request. -Attached a reproducer to the bottom of this email. V1->V2: -Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one' at Hannes' suggestion. -Dropped ->bind_conflict and just use the new helper. -Fixed a compile bug from the original ->bind_conflict patch. The original description of the series follows ========================================================= At some point recently the guys working on our load balancer added the ability to use SO_REUSEPORT. When they restarted their app with this option enabled they immediately hit a softlockup on what appeared to be the inet_bind_bucket->lock. Eventually what all of our debugging and discussion led us to was the fact that the application comes up without SO_REUSEPORT, shuts down which creates around 100k twsk's, and then comes up and tries to open a bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket owners list under the lock. Since this lock is needed for dealing with the twsk's and basically anything else related to connections we would softlockup, and sometimes not ever recover. To solve this problem I did what you see in Path 5/5. Once we have a SO_REUSEPORT socket on the tb->owners list we know that the socket has no conflicts with any of the other sockets on that list. So we can add a copy of the sock_common (really all we need is the recv_saddr but it seemed ugly to copy just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've copied the whole common) in order to check subsequent SO_REUSEPORT sockets. If they match the previous one then we can skip the expensive inet_csk_bind_conflict check. This is what eliminated the soft lockup that we were seeing. Patches 1-4 are cleanups and re-workings. For instance when we specify port == 0 we need to find an open port, but we would do two passes through inet_csk_bind_conflict every time we found a possible port. We would also keep track of the smallest_port value in order to try and use it if we found no port our first run through. This however made no sense as it would have had to fail the first pass through inet_csk_bind_conflict, so would not actually pass the second pass through either. Finally I split the function into two functions in order to make it easier to read and to distinguish between the two behaviors. I have tested this on one of our load balancing boxes during peak traffic and it hasn't fallen over. But this is not my area, so obviously feel free to point out where I'm being stupid and I'll get it fixed up and retested. Thanks, Josef ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one 2017-01-17 15:51 [PATCH 0/6 net-next][V4] Rework inet_csk_get_port Josef Bacik @ 2017-01-17 15:51 ` Josef Bacik 0 siblings, 0 replies; 5+ messages in thread From: Josef Bacik @ 2017-01-17 15:51 UTC (permalink / raw) To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev, kernel-team We pass these per-protocol equal functions around in various places, but we can just have one function that checks the sk->sk_family and then do the right comparison function. I've also changed the ipv4 version to not cast to inet_sock since it is unneeded. Signed-off-by: Josef Bacik <jbacik@fb.com> --- include/net/addrconf.h | 4 +-- include/net/inet_hashtables.h | 5 +-- include/net/udp.h | 1 - net/ipv4/inet_connection_sock.c | 72 ++++++++++++++++++++++++++++++++++++++++ net/ipv4/inet_hashtables.c | 16 +++------ net/ipv4/udp.c | 58 +++++++------------------------- net/ipv6/inet6_connection_sock.c | 4 +-- net/ipv6/inet6_hashtables.c | 46 +------------------------ net/ipv6/udp.c | 2 +- 9 files changed, 95 insertions(+), 113 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 8f998af..17c6fd8 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -88,9 +88,7 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr, u32 banned_flags); int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr, u32 banned_flags); -int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, - bool match_wildcard); -int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, bool match_wildcard); void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr); void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr); diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 0574493..756ed16 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -203,10 +203,7 @@ void inet_hashinfo_init(struct inet_hashinfo *h); bool inet_ehash_insert(struct sock *sk, struct sock *osk); bool inet_ehash_nolisten(struct sock *sk, struct sock *osk); -int __inet_hash(struct sock *sk, struct sock *osk, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)); +int __inet_hash(struct sock *sk, struct sock *osk); int inet_hash(struct sock *sk); void inet_unhash(struct sock *sk); diff --git a/include/net/udp.h b/include/net/udp.h index 1661791..c9d8b8e 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -204,7 +204,6 @@ static inline void udp_lib_close(struct sock *sk, long timeout) } int udp_lib_get_port(struct sock *sk, unsigned short snum, - int (*)(const struct sock *, const struct sock *, bool), unsigned int hash2_nulladdr); u32 udp_flow_hashrnd(void); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 19ea045..ba597cb 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -31,6 +31,78 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n"; EXPORT_SYMBOL(inet_csk_timer_bug_msg); #endif +#if IS_ENABLED(CONFIG_IPV6) +/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6 + * only, and any IPv4 addresses if not IPv6 only + * match_wildcard == false: addresses must be exactly the same, i.e. + * IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY, + * and 0.0.0.0 equals to 0.0.0.0 only + */ +static int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ + const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); + int sk2_ipv6only = inet_v6_ipv6only(sk2); + int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); + int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; + + /* if both are mapped, treat as IPv4 */ + if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) { + if (!sk2_ipv6only) { + if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) + return 1; + if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) + return match_wildcard; + } + return 0; + } + + if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY) + return 1; + + if (addr_type2 == IPV6_ADDR_ANY && match_wildcard && + !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) + return 1; + + if (addr_type == IPV6_ADDR_ANY && match_wildcard && + !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED)) + return 1; + + if (sk2_rcv_saddr6 && + ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) + return 1; + + return 0; +} +#endif + +/* match_wildcard == true: 0.0.0.0 equals to any IPv4 addresses + * match_wildcard == false: addresses must be exactly the same, i.e. + * 0.0.0.0 only equals to 0.0.0.0 + */ +static int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ + if (!ipv6_only_sock(sk2)) { + if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) + return 1; + if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) + return match_wildcard; + } + return 0; +} + +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return ipv6_rcv_saddr_equal(sk, sk2, match_wildcard); +#endif + return ipv4_rcv_saddr_equal(sk, sk2, match_wildcard); +} +EXPORT_SYMBOL(inet_rcv_saddr_equal); + void inet_get_local_port_range(struct net *net, int *low, int *high) { unsigned int seq; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index ca97835..2ef9b01 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -435,10 +435,7 @@ bool inet_ehash_nolisten(struct sock *sk, struct sock *osk) EXPORT_SYMBOL_GPL(inet_ehash_nolisten); static int inet_reuseport_add_sock(struct sock *sk, - struct inet_listen_hashbucket *ilb, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) + struct inet_listen_hashbucket *ilb) { struct inet_bind_bucket *tb = inet_csk(sk)->icsk_bind_hash; struct sock *sk2; @@ -451,7 +448,7 @@ static int inet_reuseport_add_sock(struct sock *sk, sk2->sk_bound_dev_if == sk->sk_bound_dev_if && inet_csk(sk2)->icsk_bind_hash == tb && sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) && - saddr_same(sk, sk2, false)) + inet_rcv_saddr_equal(sk, sk2, false)) return reuseport_add_sock(sk, sk2); } @@ -461,10 +458,7 @@ static int inet_reuseport_add_sock(struct sock *sk, return 0; } -int __inet_hash(struct sock *sk, struct sock *osk, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) +int __inet_hash(struct sock *sk, struct sock *osk) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct inet_listen_hashbucket *ilb; @@ -479,7 +473,7 @@ int __inet_hash(struct sock *sk, struct sock *osk, spin_lock(&ilb->lock); if (sk->sk_reuseport) { - err = inet_reuseport_add_sock(sk, ilb, saddr_same); + err = inet_reuseport_add_sock(sk, ilb); if (err) goto unlock; } @@ -503,7 +497,7 @@ int inet_hash(struct sock *sk) if (sk->sk_state != TCP_CLOSE) { local_bh_disable(); - err = __inet_hash(sk, NULL, ipv4_rcv_saddr_equal); + err = __inet_hash(sk, NULL); local_bh_enable(); } diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 4318d72..d6dddcf 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -137,11 +137,7 @@ EXPORT_SYMBOL(udp_memory_allocated); static int udp_lib_lport_inuse(struct net *net, __u16 num, const struct udp_hslot *hslot, unsigned long *bitmap, - struct sock *sk, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard), - unsigned int log) + struct sock *sk, unsigned int log) { struct sock *sk2; kuid_t uid = sock_i_uid(sk); @@ -153,7 +149,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, (!sk2->sk_reuse || !sk->sk_reuse) && (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && - saddr_comp(sk, sk2, true)) { + inet_rcv_saddr_equal(sk, sk2, true)) { if (sk2->sk_reuseport && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(uid, sock_i_uid(sk2))) { @@ -176,10 +172,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, */ static int udp_lib_lport_inuse2(struct net *net, __u16 num, struct udp_hslot *hslot2, - struct sock *sk, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) + struct sock *sk) { struct sock *sk2; kuid_t uid = sock_i_uid(sk); @@ -193,7 +186,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num, (!sk2->sk_reuse || !sk->sk_reuse) && (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && - saddr_comp(sk, sk2, true)) { + inet_rcv_saddr_equal(sk, sk2, true)) { if (sk2->sk_reuseport && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(uid, sock_i_uid(sk2))) { @@ -208,10 +201,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num, return res; } -static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) +static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot) { struct net *net = sock_net(sk); kuid_t uid = sock_i_uid(sk); @@ -225,7 +215,7 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, (udp_sk(sk2)->udp_port_hash == udp_sk(sk)->udp_port_hash) && (sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) && - (*saddr_same)(sk, sk2, false)) { + inet_rcv_saddr_equal(sk, sk2, false)) { return reuseport_add_sock(sk, sk2); } } @@ -241,14 +231,10 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, * * @sk: socket struct in question * @snum: port number to look up - * @saddr_comp: AF-dependent comparison of bound local IP addresses * @hash2_nulladdr: AF-dependent hash value in secondary hash chains, * with NULL address */ int udp_lib_get_port(struct sock *sk, unsigned short snum, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard), unsigned int hash2_nulladdr) { struct udp_hslot *hslot, *hslot2; @@ -277,7 +263,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, bitmap_zero(bitmap, PORTS_PER_CHAIN); spin_lock_bh(&hslot->lock); udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, - saddr_comp, udptable->log); + udptable->log); snum = first; /* @@ -310,12 +296,11 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, if (hslot->count < hslot2->count) goto scan_primary_hash; - exist = udp_lib_lport_inuse2(net, snum, hslot2, - sk, saddr_comp); + exist = udp_lib_lport_inuse2(net, snum, hslot2, sk); if (!exist && (hash2_nulladdr != slot2)) { hslot2 = udp_hashslot2(udptable, hash2_nulladdr); exist = udp_lib_lport_inuse2(net, snum, hslot2, - sk, saddr_comp); + sk); } if (exist) goto fail_unlock; @@ -323,8 +308,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, goto found; } scan_primary_hash: - if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, - saddr_comp, 0)) + if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, 0)) goto fail_unlock; } found: @@ -333,7 +317,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, udp_sk(sk)->udp_portaddr_hash ^= snum; if (sk_unhashed(sk)) { if (sk->sk_reuseport && - udp_reuseport_add_sock(sk, hslot, saddr_comp)) { + udp_reuseport_add_sock(sk, hslot)) { inet_sk(sk)->inet_num = 0; udp_sk(sk)->udp_port_hash = 0; udp_sk(sk)->udp_portaddr_hash ^= snum; @@ -365,24 +349,6 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, } EXPORT_SYMBOL(udp_lib_get_port); -/* match_wildcard == true: 0.0.0.0 equals to any IPv4 addresses - * match_wildcard == false: addresses must be exactly the same, i.e. - * 0.0.0.0 only equals to 0.0.0.0 - */ -int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2, - bool match_wildcard) -{ - struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2); - - if (!ipv6_only_sock(sk2)) { - if (inet1->inet_rcv_saddr == inet2->inet_rcv_saddr) - return 1; - if (!inet1->inet_rcv_saddr || !inet2->inet_rcv_saddr) - return match_wildcard; - } - return 0; -} - static u32 udp4_portaddr_hash(const struct net *net, __be32 saddr, unsigned int port) { @@ -398,7 +364,7 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum) /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; - return udp_lib_get_port(sk, snum, ipv4_rcv_saddr_equal, hash2_nulladdr); + return udp_lib_get_port(sk, snum, hash2_nulladdr); } static int compute_score(struct sock *sk, struct net *net, diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 7396e75..55ee2ea 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -54,12 +54,12 @@ int inet6_csk_bind_conflict(const struct sock *sk, (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid((struct sock *)sk2))))) { - if (ipv6_rcv_saddr_equal(sk, sk2, true)) + if (inet_rcv_saddr_equal(sk, sk2, true)) break; } if (!relax && reuse && sk2->sk_reuse && sk2->sk_state != TCP_LISTEN && - ipv6_rcv_saddr_equal(sk, sk2, true)) + inet_rcv_saddr_equal(sk, sk2, true)) break; } } diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 02761c9..d090091 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -268,54 +268,10 @@ int inet6_hash(struct sock *sk) if (sk->sk_state != TCP_CLOSE) { local_bh_disable(); - err = __inet_hash(sk, NULL, ipv6_rcv_saddr_equal); + err = __inet_hash(sk, NULL); local_bh_enable(); } return err; } EXPORT_SYMBOL_GPL(inet6_hash); - -/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6 - * only, and any IPv4 addresses if not IPv6 only - * match_wildcard == false: addresses must be exactly the same, i.e. - * IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY, - * and 0.0.0.0 equals to 0.0.0.0 only - */ -int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, - bool match_wildcard) -{ - const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); - int sk2_ipv6only = inet_v6_ipv6only(sk2); - int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); - int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; - - /* if both are mapped, treat as IPv4 */ - if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) { - if (!sk2_ipv6only) { - if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) - return 1; - if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) - return match_wildcard; - } - return 0; - } - - if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY) - return 1; - - if (addr_type2 == IPV6_ADDR_ANY && match_wildcard && - !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) - return 1; - - if (addr_type == IPV6_ADDR_ANY && match_wildcard && - !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED)) - return 1; - - if (sk2_rcv_saddr6 && - ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) - return 1; - - return 0; -} -EXPORT_SYMBOL_GPL(ipv6_rcv_saddr_equal); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 4d5c4ee..05d6932 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -103,7 +103,7 @@ int udp_v6_get_port(struct sock *sk, unsigned short snum) /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; - return udp_lib_get_port(sk, snum, ipv6_rcv_saddr_equal, hash2_nulladdr); + return udp_lib_get_port(sk, snum, hash2_nulladdr); } static void udp_v6_rehash(struct sock *sk) -- 2.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 0/6 net-next][V3] Rework inet_csk_get_port @ 2017-01-11 20:06 Josef Bacik 2017-01-11 20:19 ` [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Josef Bacik 0 siblings, 1 reply; 5+ messages in thread From: Josef Bacik @ 2017-01-11 20:06 UTC (permalink / raw) To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev, kernel-team Sorry the holidays delayed this a bit. I've made the changes requested and I also sat down and wrote a reproducer so everybody could see what was happening. Without my patches my reproducer takes about 50 seconds to run on my 48 CPU devel box. With my patches it takes 8 seconds. V2->V3: -Dropped the fastsock from the tb and instead just carry the saddrs, family, and ipv6 only flag. -Reworked the helper functions to deal with this change so I could still use them when checking the fast path. -Killed tb->num_owners as per Eric's request. -Attached a reproducer to the bottom of this email. V1->V2: -Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one' at Hannes' suggestion. -Dropped ->bind_conflict and just use the new helper. -Fixed a compile bug from the original ->bind_conflict patch. The original description of the series follows ========================================================= At some point recently the guys working on our load balancer added the ability to use SO_REUSEPORT. When they restarted their app with this option enabled they immediately hit a softlockup on what appeared to be the inet_bind_bucket->lock. Eventually what all of our debugging and discussion led us to was the fact that the application comes up without SO_REUSEPORT, shuts down which creates around 100k twsk's, and then comes up and tries to open a bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket owners list under the lock. Since this lock is needed for dealing with the twsk's and basically anything else related to connections we would softlockup, and sometimes not ever recover. To solve this problem I did what you see in Path 5/5. Once we have a SO_REUSEPORT socket on the tb->owners list we know that the socket has no conflicts with any of the other sockets on that list. So we can add a copy of the sock_common (really all we need is the recv_saddr but it seemed ugly to copy just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've copied the whole common) in order to check subsequent SO_REUSEPORT sockets. If they match the previous one then we can skip the expensive inet_csk_bind_conflict check. This is what eliminated the soft lockup that we were seeing. Patches 1-4 are cleanups and re-workings. For instance when we specify port == 0 we need to find an open port, but we would do two passes through inet_csk_bind_conflict every time we found a possible port. We would also keep track of the smallest_port value in order to try and use it if we found no port our first run through. This however made no sense as it would have had to fail the first pass through inet_csk_bind_conflict, so would not actually pass the second pass through either. Finally I split the function into two functions in order to make it easier to read and to distinguish between the two behaviors. I have tested this on one of our load balancing boxes during peak traffic and it hasn't fallen over. But this is not my area, so obviously feel free to point out where I'm being stupid and I'll get it fixed up and retested. Thanks, Josef #include <sys/types.h> #include <sys/socket.h> #include <sys/wait.h> #include <sys/un.h> #include <errno.h> #include <netdb.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <pthread.h> static int ready = 0; static int done = 0; static void sigint_handler(int s) { ready = 1; } static int create_sock(int socktype, int soreuseport) { struct addrinfo hints; struct addrinfo *ai, *ai_orig; const char *addrs[] = {"::", "0.0.0.0", NULL}; const char *port = "9999"; int sock; int num_socks = 0; int i, ret; memset(&hints, 0, sizeof(hints)); hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; hints.ai_socktype = SOCK_STREAM; hints.ai_family = (socktype == 0) ? AF_INET6 : AF_INET; hints.ai_protocol = IPPROTO_TCP; ret = getaddrinfo(addrs[socktype], port, &hints, &ai); if (ret < 0) { fprintf(stderr, "couldn't get addr info %d\n", errno); return -1; } ai_orig = ai; while (ai != NULL) { int yes = 1; sock = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol); if (sock < 0) { fprintf(stderr, "socket failed %d\n", errno); goto next; } if (soreuseport) { if (setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &yes, sizeof(yes)) < 0) { fprintf(stderr, "setsockopt failed %d\n", errno); close(sock); goto next; } } if (bind(sock, ai->ai_addr, ai->ai_addrlen)) { fprintf(stderr, "bind failed %d\n", errno); close(sock); goto next; } if (listen(sock, 1024) < 0) { fprintf(stderr, "listen failed %d\n", errno); close(sock); goto next; } num_socks++; break; next: ai = ai->ai_next; } freeaddrinfo(ai_orig); if (!num_socks) { fprintf(stderr, "failed to open any sockets\n"); return -1; } return sock; } #define SOCK_PATH "THIS_SHIT_IS_FUCKING_INSANE" #define TAKEOVER_REQUEST 1 #define CLOSE_REQUEST 2 static int *takeover_sockets(int *num_fds_ret) { struct sockaddr_un remote; struct msghdr msg = {}; struct cmsghdr *cmsg; struct iovec iov; int *fds; int req, num_fds, fd, i, len; char *buf; *num_fds_ret = 0; if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { fprintf(stderr, "Couldn't open our unix sock\n"); return NULL; } remote.sun_family = AF_UNIX; strcpy(remote.sun_path, SOCK_PATH); len = strlen(remote.sun_path) + sizeof(remote.sun_family); if (connect(fd, (struct sockaddr *)&remote, len) < 0) { fprintf(stderr, "Couldn't connect, %d\n", errno); close(fd); return NULL; } req = TAKEOVER_REQUEST; if (send(fd, &req, sizeof(req), 0) < 0) { fprintf(stderr, "Couldn't send our request, %d\n", errno); close(fd); return NULL; } buf = malloc(CMSG_SPACE(sizeof(int) * 32)); if (!buf) { fprintf(stderr, "Failed to allocate cmsg buffer\n"); free(fds); close(fd); return NULL; } iov.iov_base = &num_fds; iov.iov_len = sizeof(num_fds); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = buf; msg.msg_controllen = CMSG_SPACE(sizeof(int) * 1); if (recvmsg(fd, &msg, 0) < 0) { fprintf(stderr, "Couldn't get our array of fd's, %d\n", errno); free(fds); close(fd); return NULL; } fds = malloc(sizeof(int) * num_fds); if (!fds) { fprintf(stderr, "Couldn't allocate an array for %d fd's\n", num_fds); close(fd); return NULL; } for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; cmsg = CMSG_NXTHDR(&msg, cmsg)) { int *fdsptr = (int *)CMSG_DATA(cmsg); for (i = 0; i < num_fds; i++) { int newfd = fdsptr[i]; int yes = 1; if (newfd < 0) { fprintf(stderr, "Couldn't dup the old socket, %d\n", errno); close(fd); num_fds = i; break; } if (setsockopt(newfd, SOL_SOCKET, SO_REUSEPORT, &yes, sizeof(yes)) < 0) { fprintf(stderr, "setsockopt failed on taken over socket %d\n", errno); close(newfd); close(fd); break; } fds[i] = newfd; } } req = CLOSE_REQUEST; if (send(fd, &req, sizeof(req), 0) < 0) fprintf(stderr, "Couldn't send the close request, %d\n", errno); close(fd); *num_fds_ret = num_fds; return fds; } static int listen_for_takeover(pid_t child, int *fds, int num_fds) { struct sockaddr_un local, remote; int sock, len; if ((sock = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { fprintf(stderr, "Couldn't open unix sock %d\n", errno); kill(child, SIGINT); return -1; } local.sun_family = AF_UNIX; strcpy(local.sun_path, SOCK_PATH); unlink(local.sun_path); len = strlen(local.sun_path) + sizeof(local.sun_family); if (bind(sock, (struct sockaddr *)&local, len) < 0) { fprintf(stderr, "Couldn't bind to unix sock %d\n", errno); kill(child, SIGINT); return -1; } if (listen(sock, 1) < 0) { fprintf(stderr, "Couldn't listen on unix sock %d\n", errno); kill(child, SIGINT); return -1; } kill(child, SIGINT); while (!done) { int newsock, req, len = sizeof(remote); if ((newsock = accept(sock, (struct sockaddr *)&remote, &len)) < 0) { fprintf(stderr, "Couldn't accept connection %d\n", errno); return -1; } while (recv(newsock, &req, sizeof(req), 0) > 0) { if (req == TAKEOVER_REQUEST) { struct msghdr msg = {}; struct cmsghdr *cmsg; struct iovec iov = {.iov_base = &num_fds, .iov_len = sizeof(num_fds), }; char *buf; int *fdptr; buf = malloc(CMSG_SPACE(sizeof(int) * num_fds)); if (!buf) { fprintf(stderr, "Couldn't allocate cmsg buf\n"); return -1; } msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = buf; msg.msg_controllen = CMSG_SPACE(sizeof(int) * num_fds); cmsg = CMSG_FIRSTHDR(&msg); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_RIGHTS; cmsg->cmsg_len = CMSG_LEN(sizeof(int) * num_fds); fdptr = (int *)CMSG_DATA(cmsg); memcpy(fdptr, fds, num_fds * sizeof(int)); msg.msg_controllen = cmsg->cmsg_len; if (sendmsg(newsock, &msg, 0) < 0) { fprintf(stderr, "Failed to send fds %d\n", errno); return -1; } } else if (req == CLOSE_REQUEST) { break; } } done = 1; } return 0; } static void *thread_main(void *arg) { int i; for (i = 0; i < 256; i++) { int fd = create_sock(0, 1); if (fd < 0) return (void *)(unsigned long)fd; } return NULL; } static int do_spawn_threads(int num_threads) { struct sigaction sa; pthread_t *threads; int *fds; int num_fds; int error = 0; int i; threads = calloc(sizeof(pthread_t), num_threads); if (!threads) { fprintf(stderr, "Not enough memory\n"); return -1; } sa.sa_handler = sigint_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_RESTART; if (sigaction(SIGINT, &sa, NULL) < 0) { fprintf(stderr, "Couldn't set our sigaction\n"); return -1; } while (!ready) sleep(1); fds = takeover_sockets(&num_fds); printf("Starting new threads\n"); for (i = 0; i < num_threads; i++) { if (pthread_create(&threads[i], NULL, thread_main, NULL) < 0) { done = 1; break; } } while (i--) { void *retval; pthread_join(threads[i], &retval); if ((unsigned long)retval != 0) error = (int)(unsigned long)retval; } return error; } int main(int argc, char **argv) { int *fds; int num_fds; pthread_t *threads; unsigned long i; int ret = -1; int childstatus; pid_t child; child = fork(); if (child < 0) { fprintf(stderr, "Failed to create child! %d\n", errno); return -1; } if (!child) { return do_spawn_threads(64); } fds = malloc(sizeof(int)); if (!fds) { kill(child, SIGINT); goto out; } fds[0] = create_sock(0, 0); if (fds[0] < 0) { free(fds); kill(child, SIGINT); fprintf(stderr, "Couldn't create a socket\n"); goto out; } ret = listen_for_takeover(child, fds, 1); out: waitpid(child, &childstatus, 0); if (!ret) ret = WEXITSTATUS(childstatus); return ret; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one 2017-01-11 20:06 [PATCH 0/6 net-next][V3] Rework inet_csk_get_port Josef Bacik @ 2017-01-11 20:19 ` Josef Bacik 0 siblings, 0 replies; 5+ messages in thread From: Josef Bacik @ 2017-01-11 20:19 UTC (permalink / raw) To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev, kernel-team We pass these per-protocol equal functions around in various places, but we can just have one function that checks the sk->sk_family and then do the right comparison function. I've also changed the ipv4 version to not cast to inet_sock since it is unneeded. Signed-off-by: Josef Bacik <jbacik@fb.com> --- include/net/addrconf.h | 4 +-- include/net/inet_hashtables.h | 5 +-- include/net/udp.h | 1 - net/ipv4/inet_connection_sock.c | 72 ++++++++++++++++++++++++++++++++++++++++ net/ipv4/inet_hashtables.c | 16 +++------ net/ipv4/udp.c | 58 +++++++------------------------- net/ipv6/inet6_connection_sock.c | 4 +-- net/ipv6/inet6_hashtables.c | 46 +------------------------ net/ipv6/udp.c | 2 +- 9 files changed, 95 insertions(+), 113 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 8f998af..17c6fd8 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -88,9 +88,7 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr, u32 banned_flags); int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr, u32 banned_flags); -int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, - bool match_wildcard); -int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, bool match_wildcard); void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr); void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr); diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 0574493..756ed16 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -203,10 +203,7 @@ void inet_hashinfo_init(struct inet_hashinfo *h); bool inet_ehash_insert(struct sock *sk, struct sock *osk); bool inet_ehash_nolisten(struct sock *sk, struct sock *osk); -int __inet_hash(struct sock *sk, struct sock *osk, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)); +int __inet_hash(struct sock *sk, struct sock *osk); int inet_hash(struct sock *sk); void inet_unhash(struct sock *sk); diff --git a/include/net/udp.h b/include/net/udp.h index 1661791..c9d8b8e 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -204,7 +204,6 @@ static inline void udp_lib_close(struct sock *sk, long timeout) } int udp_lib_get_port(struct sock *sk, unsigned short snum, - int (*)(const struct sock *, const struct sock *, bool), unsigned int hash2_nulladdr); u32 udp_flow_hashrnd(void); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 19ea045..ba597cb 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -31,6 +31,78 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n"; EXPORT_SYMBOL(inet_csk_timer_bug_msg); #endif +#if IS_ENABLED(CONFIG_IPV6) +/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6 + * only, and any IPv4 addresses if not IPv6 only + * match_wildcard == false: addresses must be exactly the same, i.e. + * IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY, + * and 0.0.0.0 equals to 0.0.0.0 only + */ +static int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ + const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); + int sk2_ipv6only = inet_v6_ipv6only(sk2); + int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); + int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; + + /* if both are mapped, treat as IPv4 */ + if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) { + if (!sk2_ipv6only) { + if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) + return 1; + if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) + return match_wildcard; + } + return 0; + } + + if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY) + return 1; + + if (addr_type2 == IPV6_ADDR_ANY && match_wildcard && + !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) + return 1; + + if (addr_type == IPV6_ADDR_ANY && match_wildcard && + !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED)) + return 1; + + if (sk2_rcv_saddr6 && + ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) + return 1; + + return 0; +} +#endif + +/* match_wildcard == true: 0.0.0.0 equals to any IPv4 addresses + * match_wildcard == false: addresses must be exactly the same, i.e. + * 0.0.0.0 only equals to 0.0.0.0 + */ +static int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ + if (!ipv6_only_sock(sk2)) { + if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) + return 1; + if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) + return match_wildcard; + } + return 0; +} + +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return ipv6_rcv_saddr_equal(sk, sk2, match_wildcard); +#endif + return ipv4_rcv_saddr_equal(sk, sk2, match_wildcard); +} +EXPORT_SYMBOL(inet_rcv_saddr_equal); + void inet_get_local_port_range(struct net *net, int *low, int *high) { unsigned int seq; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index ca97835..2ef9b01 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -435,10 +435,7 @@ bool inet_ehash_nolisten(struct sock *sk, struct sock *osk) EXPORT_SYMBOL_GPL(inet_ehash_nolisten); static int inet_reuseport_add_sock(struct sock *sk, - struct inet_listen_hashbucket *ilb, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) + struct inet_listen_hashbucket *ilb) { struct inet_bind_bucket *tb = inet_csk(sk)->icsk_bind_hash; struct sock *sk2; @@ -451,7 +448,7 @@ static int inet_reuseport_add_sock(struct sock *sk, sk2->sk_bound_dev_if == sk->sk_bound_dev_if && inet_csk(sk2)->icsk_bind_hash == tb && sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) && - saddr_same(sk, sk2, false)) + inet_rcv_saddr_equal(sk, sk2, false)) return reuseport_add_sock(sk, sk2); } @@ -461,10 +458,7 @@ static int inet_reuseport_add_sock(struct sock *sk, return 0; } -int __inet_hash(struct sock *sk, struct sock *osk, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) +int __inet_hash(struct sock *sk, struct sock *osk) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct inet_listen_hashbucket *ilb; @@ -479,7 +473,7 @@ int __inet_hash(struct sock *sk, struct sock *osk, spin_lock(&ilb->lock); if (sk->sk_reuseport) { - err = inet_reuseport_add_sock(sk, ilb, saddr_same); + err = inet_reuseport_add_sock(sk, ilb); if (err) goto unlock; } @@ -503,7 +497,7 @@ int inet_hash(struct sock *sk) if (sk->sk_state != TCP_CLOSE) { local_bh_disable(); - err = __inet_hash(sk, NULL, ipv4_rcv_saddr_equal); + err = __inet_hash(sk, NULL); local_bh_enable(); } diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 4318d72..d6dddcf 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -137,11 +137,7 @@ EXPORT_SYMBOL(udp_memory_allocated); static int udp_lib_lport_inuse(struct net *net, __u16 num, const struct udp_hslot *hslot, unsigned long *bitmap, - struct sock *sk, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard), - unsigned int log) + struct sock *sk, unsigned int log) { struct sock *sk2; kuid_t uid = sock_i_uid(sk); @@ -153,7 +149,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, (!sk2->sk_reuse || !sk->sk_reuse) && (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && - saddr_comp(sk, sk2, true)) { + inet_rcv_saddr_equal(sk, sk2, true)) { if (sk2->sk_reuseport && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(uid, sock_i_uid(sk2))) { @@ -176,10 +172,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, */ static int udp_lib_lport_inuse2(struct net *net, __u16 num, struct udp_hslot *hslot2, - struct sock *sk, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) + struct sock *sk) { struct sock *sk2; kuid_t uid = sock_i_uid(sk); @@ -193,7 +186,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num, (!sk2->sk_reuse || !sk->sk_reuse) && (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if || sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && - saddr_comp(sk, sk2, true)) { + inet_rcv_saddr_equal(sk, sk2, true)) { if (sk2->sk_reuseport && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && uid_eq(uid, sock_i_uid(sk2))) { @@ -208,10 +201,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num, return res; } -static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) +static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot) { struct net *net = sock_net(sk); kuid_t uid = sock_i_uid(sk); @@ -225,7 +215,7 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, (udp_sk(sk2)->udp_port_hash == udp_sk(sk)->udp_port_hash) && (sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) && - (*saddr_same)(sk, sk2, false)) { + inet_rcv_saddr_equal(sk, sk2, false)) { return reuseport_add_sock(sk, sk2); } } @@ -241,14 +231,10 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, * * @sk: socket struct in question * @snum: port number to look up - * @saddr_comp: AF-dependent comparison of bound local IP addresses * @hash2_nulladdr: AF-dependent hash value in secondary hash chains, * with NULL address */ int udp_lib_get_port(struct sock *sk, unsigned short snum, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard), unsigned int hash2_nulladdr) { struct udp_hslot *hslot, *hslot2; @@ -277,7 +263,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, bitmap_zero(bitmap, PORTS_PER_CHAIN); spin_lock_bh(&hslot->lock); udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, - saddr_comp, udptable->log); + udptable->log); snum = first; /* @@ -310,12 +296,11 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, if (hslot->count < hslot2->count) goto scan_primary_hash; - exist = udp_lib_lport_inuse2(net, snum, hslot2, - sk, saddr_comp); + exist = udp_lib_lport_inuse2(net, snum, hslot2, sk); if (!exist && (hash2_nulladdr != slot2)) { hslot2 = udp_hashslot2(udptable, hash2_nulladdr); exist = udp_lib_lport_inuse2(net, snum, hslot2, - sk, saddr_comp); + sk); } if (exist) goto fail_unlock; @@ -323,8 +308,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, goto found; } scan_primary_hash: - if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, - saddr_comp, 0)) + if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, 0)) goto fail_unlock; } found: @@ -333,7 +317,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, udp_sk(sk)->udp_portaddr_hash ^= snum; if (sk_unhashed(sk)) { if (sk->sk_reuseport && - udp_reuseport_add_sock(sk, hslot, saddr_comp)) { + udp_reuseport_add_sock(sk, hslot)) { inet_sk(sk)->inet_num = 0; udp_sk(sk)->udp_port_hash = 0; udp_sk(sk)->udp_portaddr_hash ^= snum; @@ -365,24 +349,6 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, } EXPORT_SYMBOL(udp_lib_get_port); -/* match_wildcard == true: 0.0.0.0 equals to any IPv4 addresses - * match_wildcard == false: addresses must be exactly the same, i.e. - * 0.0.0.0 only equals to 0.0.0.0 - */ -int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2, - bool match_wildcard) -{ - struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2); - - if (!ipv6_only_sock(sk2)) { - if (inet1->inet_rcv_saddr == inet2->inet_rcv_saddr) - return 1; - if (!inet1->inet_rcv_saddr || !inet2->inet_rcv_saddr) - return match_wildcard; - } - return 0; -} - static u32 udp4_portaddr_hash(const struct net *net, __be32 saddr, unsigned int port) { @@ -398,7 +364,7 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum) /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; - return udp_lib_get_port(sk, snum, ipv4_rcv_saddr_equal, hash2_nulladdr); + return udp_lib_get_port(sk, snum, hash2_nulladdr); } static int compute_score(struct sock *sk, struct net *net, diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 7396e75..55ee2ea 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -54,12 +54,12 @@ int inet6_csk_bind_conflict(const struct sock *sk, (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid((struct sock *)sk2))))) { - if (ipv6_rcv_saddr_equal(sk, sk2, true)) + if (inet_rcv_saddr_equal(sk, sk2, true)) break; } if (!relax && reuse && sk2->sk_reuse && sk2->sk_state != TCP_LISTEN && - ipv6_rcv_saddr_equal(sk, sk2, true)) + inet_rcv_saddr_equal(sk, sk2, true)) break; } } diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 02761c9..d090091 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -268,54 +268,10 @@ int inet6_hash(struct sock *sk) if (sk->sk_state != TCP_CLOSE) { local_bh_disable(); - err = __inet_hash(sk, NULL, ipv6_rcv_saddr_equal); + err = __inet_hash(sk, NULL); local_bh_enable(); } return err; } EXPORT_SYMBOL_GPL(inet6_hash); - -/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6 - * only, and any IPv4 addresses if not IPv6 only - * match_wildcard == false: addresses must be exactly the same, i.e. - * IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY, - * and 0.0.0.0 equals to 0.0.0.0 only - */ -int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, - bool match_wildcard) -{ - const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); - int sk2_ipv6only = inet_v6_ipv6only(sk2); - int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); - int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; - - /* if both are mapped, treat as IPv4 */ - if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) { - if (!sk2_ipv6only) { - if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) - return 1; - if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) - return match_wildcard; - } - return 0; - } - - if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY) - return 1; - - if (addr_type2 == IPV6_ADDR_ANY && match_wildcard && - !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) - return 1; - - if (addr_type == IPV6_ADDR_ANY && match_wildcard && - !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED)) - return 1; - - if (sk2_rcv_saddr6 && - ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) - return 1; - - return 0; -} -EXPORT_SYMBOL_GPL(ipv6_rcv_saddr_equal); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 4d5c4ee..05d6932 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -103,7 +103,7 @@ int udp_v6_get_port(struct sock *sk, unsigned short snum) /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; - return udp_lib_get_port(sk, snum, ipv6_rcv_saddr_equal, hash2_nulladdr); + return udp_lib_get_port(sk, snum, hash2_nulladdr); } static void udp_v6_rehash(struct sock *sk) -- 2.5.5 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 0/6 net-next][V2] Rework inet_csk_get_port @ 2016-12-22 21:26 Josef Bacik 2016-12-22 21:26 ` [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Josef Bacik 0 siblings, 1 reply; 5+ messages in thread From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw) To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev, kernel-team V1->V2: -Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one' at Hannes' suggestion. -Dropped ->bind_conflict and just use the new helper. -Fixed a compile bug from the original ->bind_conflict patch. The original description of the series follows ========================================================= At some point recently the guys working on our load balancer added the ability to use SO_REUSEPORT. When they restarted their app with this option enabled they immediately hit a softlockup on what appeared to be the inet_bind_bucket->lock. Eventually what all of our debugging and discussion led us to was the fact that the application comes up without SO_REUSEPORT, shuts down which creates around 100k twsk's, and then comes up and tries to open a bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket owners list under the lock. Since this lock is needed for dealing with the twsk's and basically anything else related to connections we would softlockup, and sometimes not ever recover. To solve this problem I did what you see in Path 5/5. Once we have a SO_REUSEPORT socket on the tb->owners list we know that the socket has no conflicts with any of the other sockets on that list. So we can add a copy of the sock_common (really all we need is the recv_saddr but it seemed ugly to copy just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've copied the whole common) in order to check subsequent SO_REUSEPORT sockets. If they match the previous one then we can skip the expensive inet_csk_bind_conflict check. This is what eliminated the soft lockup that we were seeing. Patches 1-4 are cleanups and re-workings. For instance when we specify port == 0 we need to find an open port, but we would do two passes through inet_csk_bind_conflict every time we found a possible port. We would also keep track of the smallest_port value in order to try and use it if we found no port our first run through. This however made no sense as it would have had to fail the first pass through inet_csk_bind_conflict, so would not actually pass the second pass through either. Finally I split the function into two functions in order to make it easier to read and to distinguish between the two behaviors. I have tested this on one of our load balancing boxes during peak traffic and it hasn't fallen over. But this is not my area, so obviously feel free to point out where I'm being stupid and I'll get it fixed up and retested. Thanks, Josef ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one 2016-12-22 21:26 [PATCH 0/6 net-next][V2] Rework inet_csk_get_port Josef Bacik @ 2016-12-22 21:26 ` Josef Bacik 0 siblings, 0 replies; 5+ messages in thread From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw) To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev, kernel-team We pass these per-protocol equal functions around in various places, but we can just have one function that checks the sk->sk_family and then do the right comparison function. I've also changed the ipv4 version to not cast to inet_sock since it is unneeded. Signed-off-by: Josef Bacik <jbacik@fb.com> --- include/net/addrconf.h | 4 +-- include/net/inet_hashtables.h | 5 +-- include/net/udp.h | 1 - net/ipv4/inet_connection_sock.c | 72 ++++++++++++++++++++++++++++++++++++++++ net/ipv4/inet_hashtables.c | 16 +++------ net/ipv4/udp.c | 58 +++++++------------------------- net/ipv6/inet6_connection_sock.c | 4 +-- net/ipv6/inet6_hashtables.c | 46 +------------------------ net/ipv6/udp.c | 2 +- 9 files changed, 95 insertions(+), 113 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 8f998af..17c6fd8 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -88,9 +88,7 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr, u32 banned_flags); int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr, u32 banned_flags); -int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, - bool match_wildcard); -int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, bool match_wildcard); void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr); void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr); diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 0574493..756ed16 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -203,10 +203,7 @@ void inet_hashinfo_init(struct inet_hashinfo *h); bool inet_ehash_insert(struct sock *sk, struct sock *osk); bool inet_ehash_nolisten(struct sock *sk, struct sock *osk); -int __inet_hash(struct sock *sk, struct sock *osk, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)); +int __inet_hash(struct sock *sk, struct sock *osk); int inet_hash(struct sock *sk); void inet_unhash(struct sock *sk); diff --git a/include/net/udp.h b/include/net/udp.h index 1661791..c9d8b8e 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -204,7 +204,6 @@ static inline void udp_lib_close(struct sock *sk, long timeout) } int udp_lib_get_port(struct sock *sk, unsigned short snum, - int (*)(const struct sock *, const struct sock *, bool), unsigned int hash2_nulladdr); u32 udp_flow_hashrnd(void); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 19ea045..ba597cb 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -31,6 +31,78 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n"; EXPORT_SYMBOL(inet_csk_timer_bug_msg); #endif +#if IS_ENABLED(CONFIG_IPV6) +/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6 + * only, and any IPv4 addresses if not IPv6 only + * match_wildcard == false: addresses must be exactly the same, i.e. + * IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY, + * and 0.0.0.0 equals to 0.0.0.0 only + */ +static int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ + const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); + int sk2_ipv6only = inet_v6_ipv6only(sk2); + int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); + int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; + + /* if both are mapped, treat as IPv4 */ + if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) { + if (!sk2_ipv6only) { + if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) + return 1; + if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) + return match_wildcard; + } + return 0; + } + + if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY) + return 1; + + if (addr_type2 == IPV6_ADDR_ANY && match_wildcard && + !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) + return 1; + + if (addr_type == IPV6_ADDR_ANY && match_wildcard && + !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED)) + return 1; + + if (sk2_rcv_saddr6 && + ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) + return 1; + + return 0; +} +#endif + +/* match_wildcard == true: 0.0.0.0 equals to any IPv4 addresses + * match_wildcard == false: addresses must be exactly the same, i.e. + * 0.0.0.0 only equals to 0.0.0.0 + */ +static int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ + if (!ipv6_only_sock(sk2)) { + if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) + return 1; + if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) + return match_wildcard; + } + return 0; +} + +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return ipv6_rcv_saddr_equal(sk, sk2, match_wildcard); +#endif + return ipv4_rcv_saddr_equal(sk, sk2, match_wildcard); +} +EXPORT_SYMBOL(inet_rcv_saddr_equal); + void inet_get_local_port_range(struct net *net, int *low, int *high) { unsigned int seq; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index ca97835..2ef9b01 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -435,10 +435,7 @@ bool inet_ehash_nolisten(struct sock *sk, struct sock *osk) EXPORT_SYMBOL_GPL(inet_ehash_nolisten); static int inet_reuseport_add_sock(struct sock *sk, - struct inet_listen_hashbucket *ilb, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) + struct inet_listen_hashbucket *ilb) { struct inet_bind_bucket *tb = inet_csk(sk)->icsk_bind_hash; struct sock *sk2; @@ -451,7 +448,7 @@ static int inet_reuseport_add_sock(struct sock *sk, sk2->sk_bound_dev_if == sk->sk_bound_dev_if && inet_csk(sk2)->icsk_bind_hash == tb && sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) && - saddr_same(sk, sk2, false)) + inet_rcv_saddr_equal(sk, sk2, false)) return reuseport_add_sock(sk, sk2); } @@ -461,10 +458,7 @@ static int inet_reuseport_add_sock(struct sock *sk, return 0; } -int __inet_hash(struct sock *sk, struct sock *osk, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) +int __inet_hash(struct sock *sk, struct sock *osk) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct inet_listen_hashbucket *ilb; @@ -479,7 +473,7 @@ int __inet_hash(struct sock *sk, struct sock *osk, spin_lock(&ilb->lock); if (sk->sk_reuseport) { - err = inet_reuseport_add_sock(sk, ilb, saddr_same); + err = inet_reuseport_add_sock(sk, ilb); if (err) goto unlock; } @@ -503,7 +497,7 @@ int inet_hash(struct sock *sk) if (sk->sk_state != TCP_CLOSE) { local_bh_disable(); - err = __inet_hash(sk, NULL, ipv4_rcv_saddr_equal); + err = __inet_hash(sk, NULL); local_bh_enable(); } diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 9ca279b..cffbb65 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -137,11 +137,7 @@ EXPORT_SYMBOL(udp_memory_allocated); static int udp_lib_lport_inuse(struct net *net, __u16 num, const struct udp_hslot *hslot, unsigned long *bitmap, - struct sock *sk, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard), - unsigned int log) + struct sock *sk, unsigned int log) { struct sock *sk2; kuid_t uid = sock_i_uid(sk); @@ -156,7 +152,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, (!sk2->sk_reuseport || !sk->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) || !uid_eq(uid, sock_i_uid(sk2))) && - saddr_comp(sk, sk2, true)) { + inet_rcv_saddr_equal(sk, sk2, true)) { if (!bitmap) return 1; __set_bit(udp_sk(sk2)->udp_port_hash >> log, bitmap); @@ -171,10 +167,7 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num, */ static int udp_lib_lport_inuse2(struct net *net, __u16 num, struct udp_hslot *hslot2, - struct sock *sk, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) + struct sock *sk) { struct sock *sk2; kuid_t uid = sock_i_uid(sk); @@ -191,7 +184,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num, (!sk2->sk_reuseport || !sk->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) || !uid_eq(uid, sock_i_uid(sk2))) && - saddr_comp(sk, sk2, true)) { + inet_rcv_saddr_equal(sk, sk2, true)) { res = 1; break; } @@ -200,10 +193,7 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 num, return res; } -static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)) +static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot) { struct net *net = sock_net(sk); kuid_t uid = sock_i_uid(sk); @@ -217,7 +207,7 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, (udp_sk(sk2)->udp_port_hash == udp_sk(sk)->udp_port_hash) && (sk2->sk_bound_dev_if == sk->sk_bound_dev_if) && sk2->sk_reuseport && uid_eq(uid, sock_i_uid(sk2)) && - (*saddr_same)(sk, sk2, false)) { + inet_rcv_saddr_equal(sk, sk2, false)) { return reuseport_add_sock(sk, sk2); } } @@ -233,14 +223,10 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot, * * @sk: socket struct in question * @snum: port number to look up - * @saddr_comp: AF-dependent comparison of bound local IP addresses * @hash2_nulladdr: AF-dependent hash value in secondary hash chains, * with NULL address */ int udp_lib_get_port(struct sock *sk, unsigned short snum, - int (*saddr_comp)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard), unsigned int hash2_nulladdr) { struct udp_hslot *hslot, *hslot2; @@ -269,7 +255,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, bitmap_zero(bitmap, PORTS_PER_CHAIN); spin_lock_bh(&hslot->lock); udp_lib_lport_inuse(net, snum, hslot, bitmap, sk, - saddr_comp, udptable->log); + udptable->log); snum = first; /* @@ -301,12 +287,11 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, if (hslot->count < hslot2->count) goto scan_primary_hash; - exist = udp_lib_lport_inuse2(net, snum, hslot2, - sk, saddr_comp); + exist = udp_lib_lport_inuse2(net, snum, hslot2, sk); if (!exist && (hash2_nulladdr != slot2)) { hslot2 = udp_hashslot2(udptable, hash2_nulladdr); exist = udp_lib_lport_inuse2(net, snum, hslot2, - sk, saddr_comp); + sk); } if (exist) goto fail_unlock; @@ -314,8 +299,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, goto found; } scan_primary_hash: - if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, - saddr_comp, 0)) + if (udp_lib_lport_inuse(net, snum, hslot, NULL, sk, 0)) goto fail_unlock; } found: @@ -324,7 +308,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, udp_sk(sk)->udp_portaddr_hash ^= snum; if (sk_unhashed(sk)) { if (sk->sk_reuseport && - udp_reuseport_add_sock(sk, hslot, saddr_comp)) { + udp_reuseport_add_sock(sk, hslot)) { inet_sk(sk)->inet_num = 0; udp_sk(sk)->udp_port_hash = 0; udp_sk(sk)->udp_portaddr_hash ^= snum; @@ -356,24 +340,6 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, } EXPORT_SYMBOL(udp_lib_get_port); -/* match_wildcard == true: 0.0.0.0 equals to any IPv4 addresses - * match_wildcard == false: addresses must be exactly the same, i.e. - * 0.0.0.0 only equals to 0.0.0.0 - */ -int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2, - bool match_wildcard) -{ - struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2); - - if (!ipv6_only_sock(sk2)) { - if (inet1->inet_rcv_saddr == inet2->inet_rcv_saddr) - return 1; - if (!inet1->inet_rcv_saddr || !inet2->inet_rcv_saddr) - return match_wildcard; - } - return 0; -} - static u32 udp4_portaddr_hash(const struct net *net, __be32 saddr, unsigned int port) { @@ -389,7 +355,7 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum) /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; - return udp_lib_get_port(sk, snum, ipv4_rcv_saddr_equal, hash2_nulladdr); + return udp_lib_get_port(sk, snum, hash2_nulladdr); } static int compute_score(struct sock *sk, struct net *net, diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 7396e75..55ee2ea 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -54,12 +54,12 @@ int inet6_csk_bind_conflict(const struct sock *sk, (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid((struct sock *)sk2))))) { - if (ipv6_rcv_saddr_equal(sk, sk2, true)) + if (inet_rcv_saddr_equal(sk, sk2, true)) break; } if (!relax && reuse && sk2->sk_reuse && sk2->sk_state != TCP_LISTEN && - ipv6_rcv_saddr_equal(sk, sk2, true)) + inet_rcv_saddr_equal(sk, sk2, true)) break; } } diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 02761c9..d090091 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -268,54 +268,10 @@ int inet6_hash(struct sock *sk) if (sk->sk_state != TCP_CLOSE) { local_bh_disable(); - err = __inet_hash(sk, NULL, ipv6_rcv_saddr_equal); + err = __inet_hash(sk, NULL); local_bh_enable(); } return err; } EXPORT_SYMBOL_GPL(inet6_hash); - -/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6 - * only, and any IPv4 addresses if not IPv6 only - * match_wildcard == false: addresses must be exactly the same, i.e. - * IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY, - * and 0.0.0.0 equals to 0.0.0.0 only - */ -int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, - bool match_wildcard) -{ - const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); - int sk2_ipv6only = inet_v6_ipv6only(sk2); - int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); - int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; - - /* if both are mapped, treat as IPv4 */ - if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) { - if (!sk2_ipv6only) { - if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) - return 1; - if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) - return match_wildcard; - } - return 0; - } - - if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY) - return 1; - - if (addr_type2 == IPV6_ADDR_ANY && match_wildcard && - !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) - return 1; - - if (addr_type == IPV6_ADDR_ANY && match_wildcard && - !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED)) - return 1; - - if (sk2_rcv_saddr6 && - ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) - return 1; - - return 0; -} -EXPORT_SYMBOL_GPL(ipv6_rcv_saddr_equal); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 649efc2..9d2886f 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -103,7 +103,7 @@ int udp_v6_get_port(struct sock *sk, unsigned short snum) /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; - return udp_lib_get_port(sk, snum, ipv6_rcv_saddr_equal, hash2_nulladdr); + return udp_lib_get_port(sk, snum, hash2_nulladdr); } static void udp_v6_rehash(struct sock *sk) -- 2.5.5 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-01-17 15:51 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-12 17:41 [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Craig Gallek 2017-01-12 18:13 ` Josef Bacik -- strict thread matches above, loose matches on Subject: below -- 2017-01-17 15:51 [PATCH 0/6 net-next][V4] Rework inet_csk_get_port Josef Bacik 2017-01-17 15:51 ` [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Josef Bacik 2017-01-11 20:06 [PATCH 0/6 net-next][V3] Rework inet_csk_get_port Josef Bacik 2017-01-11 20:19 ` [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Josef Bacik 2016-12-22 21:26 [PATCH 0/6 net-next][V2] Rework inet_csk_get_port Josef Bacik 2016-12-22 21:26 ` [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one Josef Bacik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).