* [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression.
@ 2022-12-26 13:27 Kuniyuki Iwashima
2022-12-26 13:27 ` [PATCH v1 net 1/2] tcp: Add TIME_WAIT sockets in bhash2 Kuniyuki Iwashima
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2022-12-26 13:27 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Jiri Slaby, Joanne Koong, Kuniyuki Iwashima, Kuniyuki Iwashima,
netdev
We forgot to add twsk to bhash2. Therefore TIME_WAIT sockets cannot
prevent bind() to the same local address and port.
Changes:
v1:
* Patch 1:
* Add tw_bind2_node in inet_timewait_sock instead of
moving sk_bind2_node from struct sock to struct
sock_common.
RFC: https://lore.kernel.org/netdev/20221221151258.25748-1-kuniyu@amazon.com/
Kuniyuki Iwashima (2):
tcp: Add TIME_WAIT sockets in bhash2.
tcp: Add selftest for bind() and TIME_WAIT.
include/net/inet_hashtables.h | 4 +
include/net/inet_timewait_sock.h | 5 ++
net/ipv4/inet_connection_sock.c | 26 +++++-
net/ipv4/inet_hashtables.c | 8 +-
net/ipv4/inet_timewait_sock.c | 31 ++++++-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/bind_timewait.c | 92 +++++++++++++++++++++
7 files changed, 158 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/net/bind_timewait.c
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v1 net 1/2] tcp: Add TIME_WAIT sockets in bhash2. 2022-12-26 13:27 [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression Kuniyuki Iwashima @ 2022-12-26 13:27 ` Kuniyuki Iwashima 2022-12-27 18:26 ` Joanne Koong 2022-12-26 13:27 ` [PATCH v1 net 2/2] tcp: Add selftest for bind() and TIME_WAIT Kuniyuki Iwashima 2022-12-30 7:40 ` [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression patchwork-bot+netdevbpf 2 siblings, 1 reply; 6+ messages in thread From: Kuniyuki Iwashima @ 2022-12-26 13:27 UTC (permalink / raw) To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Jiri Slaby, Joanne Koong, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev Jiri Slaby reported regression of bind() with a simple repro. [0] The repro creates a TIME_WAIT socket and tries to bind() a new socket with the same local address and port. Before commit 28044fc1d495 ("net: Add a bhash2 table hashed by port and address"), the bind() failed with -EADDRINUSE, but now it succeeds. The cited commit should have put TIME_WAIT sockets into bhash2; otherwise, inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind() requests if the address is not a wildcard one. The straight option is to move sk_bind2_node from struct sock to struct sock_common to add twsk to bhash2 as implemented as RFC. [1] However, the binary layout change in the struct sock could affect performances moving hot fields on different cachelines. To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check it while validating bind(). [0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/ [1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/ Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Jiri Slaby <jirislaby@kernel.org> Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- include/net/inet_hashtables.h | 4 ++++ include/net/inet_timewait_sock.h | 5 +++++ net/ipv4/inet_connection_sock.c | 26 ++++++++++++++++++++++---- net/ipv4/inet_hashtables.c | 8 +++++--- net/ipv4/inet_timewait_sock.c | 31 +++++++++++++++++++++++++++++-- 5 files changed, 65 insertions(+), 9 deletions(-) diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 69174093078f..99bd823e97f6 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -108,6 +108,10 @@ struct inet_bind2_bucket { struct hlist_node node; /* List of sockets hashed to this bucket */ struct hlist_head owners; + /* bhash has twsk in owners, but bhash2 has twsk in + * deathrow not to add a member in struct sock_common. + */ + struct hlist_head deathrow; }; static inline struct net *ib_net(const struct inet_bind_bucket *ib) diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h index 5b47545f22d3..4a8e578405cb 100644 --- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -73,9 +73,14 @@ struct inet_timewait_sock { u32 tw_priority; struct timer_list tw_timer; struct inet_bind_bucket *tw_tb; + struct inet_bind2_bucket *tw_tb2; + struct hlist_node tw_bind2_node; }; #define tw_tclass tw_tos +#define twsk_for_each_bound_bhash2(__tw, list) \ + hlist_for_each_entry(__tw, list, tw_bind2_node) + static inline struct inet_timewait_sock *inet_twsk(const struct sock *sk) { return (struct inet_timewait_sock *)sk; diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index b366ab9148f2..848ffc3e0239 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -173,22 +173,40 @@ static bool inet_bind_conflict(const struct sock *sk, struct sock *sk2, return false; } +static bool __inet_bhash2_conflict(const struct sock *sk, struct sock *sk2, + kuid_t sk_uid, bool relax, + bool reuseport_cb_ok, bool reuseport_ok) +{ + if (sk->sk_family == AF_INET && ipv6_only_sock(sk2)) + return false; + + return inet_bind_conflict(sk, sk2, sk_uid, relax, + reuseport_cb_ok, reuseport_ok); +} + static bool inet_bhash2_conflict(const struct sock *sk, const struct inet_bind2_bucket *tb2, kuid_t sk_uid, bool relax, bool reuseport_cb_ok, bool reuseport_ok) { + struct inet_timewait_sock *tw2; struct sock *sk2; sk_for_each_bound_bhash2(sk2, &tb2->owners) { - if (sk->sk_family == AF_INET && ipv6_only_sock(sk2)) - continue; + if (__inet_bhash2_conflict(sk, sk2, sk_uid, relax, + reuseport_cb_ok, reuseport_ok)) + return true; + } - if (inet_bind_conflict(sk, sk2, sk_uid, relax, - reuseport_cb_ok, reuseport_ok)) + twsk_for_each_bound_bhash2(tw2, &tb2->deathrow) { + sk2 = (struct sock *)tw2; + + if (__inet_bhash2_conflict(sk, sk2, sk_uid, relax, + reuseport_cb_ok, reuseport_ok)) return true; } + return false; } diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index d039b4e732a3..24a38b56fab9 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -116,6 +116,7 @@ static void inet_bind2_bucket_init(struct inet_bind2_bucket *tb, #endif tb->rcv_saddr = sk->sk_rcv_saddr; INIT_HLIST_HEAD(&tb->owners); + INIT_HLIST_HEAD(&tb->deathrow); hlist_add_head(&tb->node, &head->chain); } @@ -137,7 +138,7 @@ struct inet_bind2_bucket *inet_bind2_bucket_create(struct kmem_cache *cachep, /* Caller must hold hashbucket lock for this tb with local BH disabled */ void inet_bind2_bucket_destroy(struct kmem_cache *cachep, struct inet_bind2_bucket *tb) { - if (hlist_empty(&tb->owners)) { + if (hlist_empty(&tb->owners) && hlist_empty(&tb->deathrow)) { __hlist_del(&tb->node); kmem_cache_free(cachep, tb); } @@ -1103,15 +1104,16 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, /* Head lock still held and bh's disabled */ inet_bind_hash(sk, tb, tb2, port); - spin_unlock(&head2->lock); - if (sk_unhashed(sk)) { inet_sk(sk)->inet_sport = htons(port); inet_ehash_nolisten(sk, (struct sock *)tw, NULL); } if (tw) inet_twsk_bind_unhash(tw, hinfo); + + spin_unlock(&head2->lock); spin_unlock(&head->lock); + if (tw) inet_twsk_deschedule_put(tw); local_bh_enable(); diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 66fc940f9521..1d77d992e6e7 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -29,6 +29,7 @@ void inet_twsk_bind_unhash(struct inet_timewait_sock *tw, struct inet_hashinfo *hashinfo) { + struct inet_bind2_bucket *tb2 = tw->tw_tb2; struct inet_bind_bucket *tb = tw->tw_tb; if (!tb) @@ -37,6 +38,11 @@ void inet_twsk_bind_unhash(struct inet_timewait_sock *tw, __hlist_del(&tw->tw_bind_node); tw->tw_tb = NULL; inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); + + __hlist_del(&tw->tw_bind2_node); + tw->tw_tb2 = NULL; + inet_bind2_bucket_destroy(hashinfo->bind2_bucket_cachep, tb2); + __sock_put((struct sock *)tw); } @@ -45,7 +51,7 @@ static void inet_twsk_kill(struct inet_timewait_sock *tw) { struct inet_hashinfo *hashinfo = tw->tw_dr->hashinfo; spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); - struct inet_bind_hashbucket *bhead; + struct inet_bind_hashbucket *bhead, *bhead2; spin_lock(lock); sk_nulls_del_node_init_rcu((struct sock *)tw); @@ -54,9 +60,13 @@ static void inet_twsk_kill(struct inet_timewait_sock *tw) /* Disassociate with bind bucket. */ bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), tw->tw_num, hashinfo->bhash_size)]; + bhead2 = inet_bhashfn_portaddr(hashinfo, (struct sock *)tw, + twsk_net(tw), tw->tw_num); spin_lock(&bhead->lock); + spin_lock(&bhead2->lock); inet_twsk_bind_unhash(tw, hashinfo); + spin_unlock(&bhead2->lock); spin_unlock(&bhead->lock); refcount_dec(&tw->tw_dr->tw_refcount); @@ -93,6 +103,12 @@ static void inet_twsk_add_bind_node(struct inet_timewait_sock *tw, hlist_add_head(&tw->tw_bind_node, list); } +static void inet_twsk_add_bind2_node(struct inet_timewait_sock *tw, + struct hlist_head *list) +{ + hlist_add_head(&tw->tw_bind2_node, list); +} + /* * Enter the time wait state. This is called with locally disabled BH. * Essentially we whip up a timewait bucket, copy the relevant info into it @@ -105,17 +121,28 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, const struct inet_connection_sock *icsk = inet_csk(sk); struct inet_ehash_bucket *ehead = inet_ehash_bucket(hashinfo, sk->sk_hash); spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash); - struct inet_bind_hashbucket *bhead; + struct inet_bind_hashbucket *bhead, *bhead2; + /* Step 1: Put TW into bind hash. Original socket stays there too. Note, that any socket with inet->num != 0 MUST be bound in binding cache, even if it is closed. */ bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), inet->inet_num, hashinfo->bhash_size)]; + bhead2 = inet_bhashfn_portaddr(hashinfo, sk, twsk_net(tw), inet->inet_num); + spin_lock(&bhead->lock); + spin_lock(&bhead2->lock); + tw->tw_tb = icsk->icsk_bind_hash; WARN_ON(!icsk->icsk_bind_hash); inet_twsk_add_bind_node(tw, &tw->tw_tb->owners); + + tw->tw_tb2 = icsk->icsk_bind2_hash; + WARN_ON(!icsk->icsk_bind2_hash); + inet_twsk_add_bind2_node(tw, &tw->tw_tb2->deathrow); + + spin_unlock(&bhead2->lock); spin_unlock(&bhead->lock); spin_lock(lock); -- 2.30.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v1 net 1/2] tcp: Add TIME_WAIT sockets in bhash2. 2022-12-26 13:27 ` [PATCH v1 net 1/2] tcp: Add TIME_WAIT sockets in bhash2 Kuniyuki Iwashima @ 2022-12-27 18:26 ` Joanne Koong 0 siblings, 0 replies; 6+ messages in thread From: Joanne Koong @ 2022-12-27 18:26 UTC (permalink / raw) To: Kuniyuki Iwashima Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jiri Slaby, Kuniyuki Iwashima, netdev On Mon, Dec 26, 2022 at 5:28 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > Jiri Slaby reported regression of bind() with a simple repro. [0] > > The repro creates a TIME_WAIT socket and tries to bind() a new socket > with the same local address and port. Before commit 28044fc1d495 ("net: > Add a bhash2 table hashed by port and address"), the bind() failed with > -EADDRINUSE, but now it succeeds. > > The cited commit should have put TIME_WAIT sockets into bhash2; otherwise, > inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind() > requests if the address is not a wildcard one. > > The straight option is to move sk_bind2_node from struct sock to struct > sock_common to add twsk to bhash2 as implemented as RFC. [1] However, the > binary layout change in the struct sock could affect performances moving > hot fields on different cachelines. > > To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check > it while validating bind(). > > [0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/ > [1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/ > > Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") > Reported-by: Jiri Slaby <jirislaby@kernel.org> > Suggested-by: Paolo Abeni <pabeni@redhat.com> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> > --- > include/net/inet_hashtables.h | 4 ++++ > include/net/inet_timewait_sock.h | 5 +++++ > net/ipv4/inet_connection_sock.c | 26 ++++++++++++++++++++++---- > net/ipv4/inet_hashtables.c | 8 +++++--- > net/ipv4/inet_timewait_sock.c | 31 +++++++++++++++++++++++++++++-- > 5 files changed, 65 insertions(+), 9 deletions(-) > > diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h > index 69174093078f..99bd823e97f6 100644 > --- a/include/net/inet_hashtables.h > +++ b/include/net/inet_hashtables.h > @@ -108,6 +108,10 @@ struct inet_bind2_bucket { > struct hlist_node node; > /* List of sockets hashed to this bucket */ > struct hlist_head owners; > + /* bhash has twsk in owners, but bhash2 has twsk in > + * deathrow not to add a member in struct sock_common. nit: I think "but bhash2 has twsk in deathrow to avoid adding a member to struct sock_common" would be clearer. > + */ > + struct hlist_head deathrow; > }; > > static inline struct net *ib_net(const struct inet_bind_bucket *ib) > diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h > index 5b47545f22d3..4a8e578405cb 100644 > --- a/include/net/inet_timewait_sock.h > +++ b/include/net/inet_timewait_sock.h > @@ -73,9 +73,14 @@ struct inet_timewait_sock { > u32 tw_priority; > struct timer_list tw_timer; > struct inet_bind_bucket *tw_tb; > + struct inet_bind2_bucket *tw_tb2; > + struct hlist_node tw_bind2_node; > }; > #define tw_tclass tw_tos > > +#define twsk_for_each_bound_bhash2(__tw, list) \ > + hlist_for_each_entry(__tw, list, tw_bind2_node) > + > static inline struct inet_timewait_sock *inet_twsk(const struct sock *sk) > { > return (struct inet_timewait_sock *)sk; > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index b366ab9148f2..848ffc3e0239 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -173,22 +173,40 @@ static bool inet_bind_conflict(const struct sock *sk, struct sock *sk2, > return false; > } > > +static bool __inet_bhash2_conflict(const struct sock *sk, struct sock *sk2, > + kuid_t sk_uid, bool relax, > + bool reuseport_cb_ok, bool reuseport_ok) > +{ > + if (sk->sk_family == AF_INET && ipv6_only_sock(sk2)) > + return false; > + > + return inet_bind_conflict(sk, sk2, sk_uid, relax, > + reuseport_cb_ok, reuseport_ok); > +} > + > static bool inet_bhash2_conflict(const struct sock *sk, > const struct inet_bind2_bucket *tb2, > kuid_t sk_uid, > bool relax, bool reuseport_cb_ok, > bool reuseport_ok) > { > + struct inet_timewait_sock *tw2; > struct sock *sk2; > > sk_for_each_bound_bhash2(sk2, &tb2->owners) { > - if (sk->sk_family == AF_INET && ipv6_only_sock(sk2)) > - continue; > + if (__inet_bhash2_conflict(sk, sk2, sk_uid, relax, > + reuseport_cb_ok, reuseport_ok)) > + return true; > + } > > - if (inet_bind_conflict(sk, sk2, sk_uid, relax, > - reuseport_cb_ok, reuseport_ok)) > + twsk_for_each_bound_bhash2(tw2, &tb2->deathrow) { > + sk2 = (struct sock *)tw2; > + > + if (__inet_bhash2_conflict(sk, sk2, sk_uid, relax, > + reuseport_cb_ok, reuseport_ok)) > return true; > } > + > return false; > } > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index d039b4e732a3..24a38b56fab9 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -116,6 +116,7 @@ static void inet_bind2_bucket_init(struct inet_bind2_bucket *tb, > #endif > tb->rcv_saddr = sk->sk_rcv_saddr; > INIT_HLIST_HEAD(&tb->owners); > + INIT_HLIST_HEAD(&tb->deathrow); > hlist_add_head(&tb->node, &head->chain); > } > > @@ -137,7 +138,7 @@ struct inet_bind2_bucket *inet_bind2_bucket_create(struct kmem_cache *cachep, > /* Caller must hold hashbucket lock for this tb with local BH disabled */ > void inet_bind2_bucket_destroy(struct kmem_cache *cachep, struct inet_bind2_bucket *tb) > { > - if (hlist_empty(&tb->owners)) { > + if (hlist_empty(&tb->owners) && hlist_empty(&tb->deathrow)) { > __hlist_del(&tb->node); > kmem_cache_free(cachep, tb); > } > @@ -1103,15 +1104,16 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, > /* Head lock still held and bh's disabled */ > inet_bind_hash(sk, tb, tb2, port); > > - spin_unlock(&head2->lock); > - > if (sk_unhashed(sk)) { > inet_sk(sk)->inet_sport = htons(port); > inet_ehash_nolisten(sk, (struct sock *)tw, NULL); > } > if (tw) > inet_twsk_bind_unhash(tw, hinfo); > + > + spin_unlock(&head2->lock); > spin_unlock(&head->lock); > + > if (tw) > inet_twsk_deschedule_put(tw); > local_bh_enable(); > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c > index 66fc940f9521..1d77d992e6e7 100644 > --- a/net/ipv4/inet_timewait_sock.c > +++ b/net/ipv4/inet_timewait_sock.c > @@ -29,6 +29,7 @@ > void inet_twsk_bind_unhash(struct inet_timewait_sock *tw, > struct inet_hashinfo *hashinfo) > { > + struct inet_bind2_bucket *tb2 = tw->tw_tb2; > struct inet_bind_bucket *tb = tw->tw_tb; > > if (!tb) > @@ -37,6 +38,11 @@ void inet_twsk_bind_unhash(struct inet_timewait_sock *tw, > __hlist_del(&tw->tw_bind_node); > tw->tw_tb = NULL; > inet_bind_bucket_destroy(hashinfo->bind_bucket_cachep, tb); > + > + __hlist_del(&tw->tw_bind2_node); > + tw->tw_tb2 = NULL; > + inet_bind2_bucket_destroy(hashinfo->bind2_bucket_cachep, tb2); > + > __sock_put((struct sock *)tw); > } > > @@ -45,7 +51,7 @@ static void inet_twsk_kill(struct inet_timewait_sock *tw) > { > struct inet_hashinfo *hashinfo = tw->tw_dr->hashinfo; > spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash); > - struct inet_bind_hashbucket *bhead; > + struct inet_bind_hashbucket *bhead, *bhead2; > > spin_lock(lock); > sk_nulls_del_node_init_rcu((struct sock *)tw); > @@ -54,9 +60,13 @@ static void inet_twsk_kill(struct inet_timewait_sock *tw) > /* Disassociate with bind bucket. */ > bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), tw->tw_num, > hashinfo->bhash_size)]; > + bhead2 = inet_bhashfn_portaddr(hashinfo, (struct sock *)tw, > + twsk_net(tw), tw->tw_num); > > spin_lock(&bhead->lock); > + spin_lock(&bhead2->lock); > inet_twsk_bind_unhash(tw, hashinfo); > + spin_unlock(&bhead2->lock); > spin_unlock(&bhead->lock); > > refcount_dec(&tw->tw_dr->tw_refcount); > @@ -93,6 +103,12 @@ static void inet_twsk_add_bind_node(struct inet_timewait_sock *tw, > hlist_add_head(&tw->tw_bind_node, list); > } > > +static void inet_twsk_add_bind2_node(struct inet_timewait_sock *tw, > + struct hlist_head *list) > +{ > + hlist_add_head(&tw->tw_bind2_node, list); > +} > + > /* > * Enter the time wait state. This is called with locally disabled BH. > * Essentially we whip up a timewait bucket, copy the relevant info into it > @@ -105,17 +121,28 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, > const struct inet_connection_sock *icsk = inet_csk(sk); > struct inet_ehash_bucket *ehead = inet_ehash_bucket(hashinfo, sk->sk_hash); > spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash); > - struct inet_bind_hashbucket *bhead; > + struct inet_bind_hashbucket *bhead, *bhead2; > + > /* Step 1: Put TW into bind hash. Original socket stays there too. > Note, that any socket with inet->num != 0 MUST be bound in > binding cache, even if it is closed. > */ > bhead = &hashinfo->bhash[inet_bhashfn(twsk_net(tw), inet->inet_num, > hashinfo->bhash_size)]; > + bhead2 = inet_bhashfn_portaddr(hashinfo, sk, twsk_net(tw), inet->inet_num); > + > spin_lock(&bhead->lock); > + spin_lock(&bhead2->lock); > + > tw->tw_tb = icsk->icsk_bind_hash; > WARN_ON(!icsk->icsk_bind_hash); > inet_twsk_add_bind_node(tw, &tw->tw_tb->owners); > + > + tw->tw_tb2 = icsk->icsk_bind2_hash; > + WARN_ON(!icsk->icsk_bind2_hash); > + inet_twsk_add_bind2_node(tw, &tw->tw_tb2->deathrow); > + > + spin_unlock(&bhead2->lock); > spin_unlock(&bhead->lock); > > spin_lock(lock); > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v1 net 2/2] tcp: Add selftest for bind() and TIME_WAIT. 2022-12-26 13:27 [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression Kuniyuki Iwashima 2022-12-26 13:27 ` [PATCH v1 net 1/2] tcp: Add TIME_WAIT sockets in bhash2 Kuniyuki Iwashima @ 2022-12-26 13:27 ` Kuniyuki Iwashima 2022-12-27 18:08 ` Joanne Koong 2022-12-30 7:40 ` [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression patchwork-bot+netdevbpf 2 siblings, 1 reply; 6+ messages in thread From: Kuniyuki Iwashima @ 2022-12-26 13:27 UTC (permalink / raw) To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Jiri Slaby, Joanne Koong, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev bhash2 split the bind() validation logic into wildcard and non-wildcard cases. Let's add a test to catch future regression. Before the previous patch: # ./bind_timewait TAP version 13 1..2 # Starting 2 tests from 3 test cases. # RUN bind_timewait.localhost.1 ... # bind_timewait.c:87:1:Expected ret (0) == -1 (-1) # 1: Test terminated by assertion # FAIL bind_timewait.localhost.1 not ok 1 bind_timewait.localhost.1 # RUN bind_timewait.addrany.1 ... # OK bind_timewait.addrany.1 ok 2 bind_timewait.addrany.1 # FAILED: 1 / 2 tests passed. # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0 After: # ./bind_timewait TAP version 13 1..2 # Starting 2 tests from 3 test cases. # RUN bind_timewait.localhost.1 ... # OK bind_timewait.localhost.1 ok 1 bind_timewait.localhost.1 # RUN bind_timewait.addrany.1 ... # OK bind_timewait.addrany.1 ok 2 bind_timewait.addrany.1 # PASSED: 2 / 2 tests passed. # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/bind_timewait.c | 92 +++++++++++++++++++++ 2 files changed, 93 insertions(+) create mode 100644 tools/testing/selftests/net/bind_timewait.c diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 9cc84114741d..a6911cae368c 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only bind_bhash +bind_timewait csum cmsg_sender diag_uid diff --git a/tools/testing/selftests/net/bind_timewait.c b/tools/testing/selftests/net/bind_timewait.c new file mode 100644 index 000000000000..cb9fdf51ea59 --- /dev/null +++ b/tools/testing/selftests/net/bind_timewait.c @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright Amazon.com Inc. or its affiliates. */ + +#include <sys/socket.h> +#include <netinet/in.h> + +#include "../kselftest_harness.h" + +FIXTURE(bind_timewait) +{ + struct sockaddr_in addr; + socklen_t addrlen; +}; + +FIXTURE_VARIANT(bind_timewait) +{ + __u32 addr_const; +}; + +FIXTURE_VARIANT_ADD(bind_timewait, localhost) +{ + .addr_const = INADDR_LOOPBACK +}; + +FIXTURE_VARIANT_ADD(bind_timewait, addrany) +{ + .addr_const = INADDR_ANY +}; + +FIXTURE_SETUP(bind_timewait) +{ + self->addr.sin_family = AF_INET; + self->addr.sin_port = 0; + self->addr.sin_addr.s_addr = htonl(variant->addr_const); + self->addrlen = sizeof(self->addr); +} + +FIXTURE_TEARDOWN(bind_timewait) +{ +} + +void create_timewait_socket(struct __test_metadata *_metadata, + FIXTURE_DATA(bind_timewait) *self) +{ + int server_fd, client_fd, child_fd, ret; + struct sockaddr_in addr; + socklen_t addrlen; + + server_fd = socket(AF_INET, SOCK_STREAM, 0); + ASSERT_GT(server_fd, 0); + + ret = bind(server_fd, (struct sockaddr *)&self->addr, self->addrlen); + ASSERT_EQ(ret, 0); + + ret = listen(server_fd, 1); + ASSERT_EQ(ret, 0); + + ret = getsockname(server_fd, (struct sockaddr *)&self->addr, &self->addrlen); + ASSERT_EQ(ret, 0); + + client_fd = socket(AF_INET, SOCK_STREAM, 0); + ASSERT_GT(client_fd, 0); + + ret = connect(client_fd, (struct sockaddr *)&self->addr, self->addrlen); + ASSERT_EQ(ret, 0); + + addrlen = sizeof(addr); + child_fd = accept(server_fd, (struct sockaddr *)&addr, &addrlen); + ASSERT_GT(child_fd, 0); + + close(child_fd); + close(client_fd); + close(server_fd); +} + +TEST_F(bind_timewait, 1) +{ + int fd, ret; + + create_timewait_socket(_metadata, self); + + fd = socket(AF_INET, SOCK_STREAM, 0); + ASSERT_GT(fd, 0); + + ret = bind(fd, (struct sockaddr *)&self->addr, self->addrlen); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EADDRINUSE); + + close(fd); +} + +TEST_HARNESS_MAIN -- 2.30.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v1 net 2/2] tcp: Add selftest for bind() and TIME_WAIT. 2022-12-26 13:27 ` [PATCH v1 net 2/2] tcp: Add selftest for bind() and TIME_WAIT Kuniyuki Iwashima @ 2022-12-27 18:08 ` Joanne Koong 0 siblings, 0 replies; 6+ messages in thread From: Joanne Koong @ 2022-12-27 18:08 UTC (permalink / raw) To: Kuniyuki Iwashima Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jiri Slaby, Kuniyuki Iwashima, netdev On Mon, Dec 26, 2022 at 5:29 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > bhash2 split the bind() validation logic into wildcard and non-wildcard > cases. Let's add a test to catch future regression. > > Before the previous patch: > > # ./bind_timewait > TAP version 13 > 1..2 > # Starting 2 tests from 3 test cases. > # RUN bind_timewait.localhost.1 ... > # bind_timewait.c:87:1:Expected ret (0) == -1 (-1) > # 1: Test terminated by assertion > # FAIL bind_timewait.localhost.1 > not ok 1 bind_timewait.localhost.1 > # RUN bind_timewait.addrany.1 ... > # OK bind_timewait.addrany.1 > ok 2 bind_timewait.addrany.1 > # FAILED: 1 / 2 tests passed. > # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0 > > After: > > # ./bind_timewait > TAP version 13 > 1..2 > # Starting 2 tests from 3 test cases. > # RUN bind_timewait.localhost.1 ... > # OK bind_timewait.localhost.1 > ok 1 bind_timewait.localhost.1 > # RUN bind_timewait.addrany.1 ... > # OK bind_timewait.addrany.1 > ok 2 bind_timewait.addrany.1 > # PASSED: 2 / 2 tests passed. > # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> > --- > tools/testing/selftests/net/.gitignore | 1 + > tools/testing/selftests/net/bind_timewait.c | 92 +++++++++++++++++++++ > 2 files changed, 93 insertions(+) > create mode 100644 tools/testing/selftests/net/bind_timewait.c > > diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore > index 9cc84114741d..a6911cae368c 100644 > --- a/tools/testing/selftests/net/.gitignore > +++ b/tools/testing/selftests/net/.gitignore > @@ -1,5 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0-only > bind_bhash > +bind_timewait > csum > cmsg_sender > diag_uid > diff --git a/tools/testing/selftests/net/bind_timewait.c b/tools/testing/selftests/net/bind_timewait.c > new file mode 100644 > index 000000000000..cb9fdf51ea59 > --- /dev/null > +++ b/tools/testing/selftests/net/bind_timewait.c > @@ -0,0 +1,92 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright Amazon.com Inc. or its affiliates. */ > + > +#include <sys/socket.h> > +#include <netinet/in.h> > + > +#include "../kselftest_harness.h" > + > +FIXTURE(bind_timewait) > +{ > + struct sockaddr_in addr; > + socklen_t addrlen; > +}; > + > +FIXTURE_VARIANT(bind_timewait) > +{ > + __u32 addr_const; > +}; > + > +FIXTURE_VARIANT_ADD(bind_timewait, localhost) > +{ > + .addr_const = INADDR_LOOPBACK > +}; > + > +FIXTURE_VARIANT_ADD(bind_timewait, addrany) > +{ > + .addr_const = INADDR_ANY > +}; > + > +FIXTURE_SETUP(bind_timewait) > +{ > + self->addr.sin_family = AF_INET; > + self->addr.sin_port = 0; > + self->addr.sin_addr.s_addr = htonl(variant->addr_const); > + self->addrlen = sizeof(self->addr); > +} > + > +FIXTURE_TEARDOWN(bind_timewait) > +{ > +} > + > +void create_timewait_socket(struct __test_metadata *_metadata, > + FIXTURE_DATA(bind_timewait) *self) > +{ > + int server_fd, client_fd, child_fd, ret; > + struct sockaddr_in addr; > + socklen_t addrlen; > + > + server_fd = socket(AF_INET, SOCK_STREAM, 0); > + ASSERT_GT(server_fd, 0); > + > + ret = bind(server_fd, (struct sockaddr *)&self->addr, self->addrlen); > + ASSERT_EQ(ret, 0); > + > + ret = listen(server_fd, 1); > + ASSERT_EQ(ret, 0); > + > + ret = getsockname(server_fd, (struct sockaddr *)&self->addr, &self->addrlen); > + ASSERT_EQ(ret, 0); > + > + client_fd = socket(AF_INET, SOCK_STREAM, 0); > + ASSERT_GT(client_fd, 0); > + > + ret = connect(client_fd, (struct sockaddr *)&self->addr, self->addrlen); > + ASSERT_EQ(ret, 0); > + > + addrlen = sizeof(addr); > + child_fd = accept(server_fd, (struct sockaddr *)&addr, &addrlen); > + ASSERT_GT(child_fd, 0); > + > + close(child_fd); > + close(client_fd); > + close(server_fd); > +} > + > +TEST_F(bind_timewait, 1) > +{ > + int fd, ret; > + > + create_timewait_socket(_metadata, self); > + > + fd = socket(AF_INET, SOCK_STREAM, 0); > + ASSERT_GT(fd, 0); > + > + ret = bind(fd, (struct sockaddr *)&self->addr, self->addrlen); > + ASSERT_EQ(ret, -1); > + ASSERT_EQ(errno, EADDRINUSE); > + > + close(fd); > +} > + > +TEST_HARNESS_MAIN > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression. 2022-12-26 13:27 [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression Kuniyuki Iwashima 2022-12-26 13:27 ` [PATCH v1 net 1/2] tcp: Add TIME_WAIT sockets in bhash2 Kuniyuki Iwashima 2022-12-26 13:27 ` [PATCH v1 net 2/2] tcp: Add selftest for bind() and TIME_WAIT Kuniyuki Iwashima @ 2022-12-30 7:40 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 6+ messages in thread From: patchwork-bot+netdevbpf @ 2022-12-30 7:40 UTC (permalink / raw) To: Kuniyuki Iwashima Cc: davem, edumazet, kuba, pabeni, jirislaby, joannelkoong, kuni1840, netdev Hello: This series was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Mon, 26 Dec 2022 22:27:51 +0900 you wrote: > We forgot to add twsk to bhash2. Therefore TIME_WAIT sockets cannot > prevent bind() to the same local address and port. > > > Changes: > v1: > * Patch 1: > * Add tw_bind2_node in inet_timewait_sock instead of > moving sk_bind2_node from struct sock to struct > sock_common. > > [...] Here is the summary with links: - [v1,net,1/2] tcp: Add TIME_WAIT sockets in bhash2. https://git.kernel.org/netdev/net/c/936a192f9740 - [v1,net,2/2] tcp: Add selftest for bind() and TIME_WAIT. https://git.kernel.org/netdev/net/c/2c042e8e54ef You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-12-30 7:40 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-26 13:27 [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression Kuniyuki Iwashima 2022-12-26 13:27 ` [PATCH v1 net 1/2] tcp: Add TIME_WAIT sockets in bhash2 Kuniyuki Iwashima 2022-12-27 18:26 ` Joanne Koong 2022-12-26 13:27 ` [PATCH v1 net 2/2] tcp: Add selftest for bind() and TIME_WAIT Kuniyuki Iwashima 2022-12-27 18:08 ` Joanne Koong 2022-12-30 7:40 ` [PATCH v1 net 0/2] tcp: Fix bhash2 and TIME_WAIT regression patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).