All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pengfei Xu <pengfei.xu@intel.com>
To: dccp@vger.kernel.org
Subject: Re: [PATCH v2 net 4/4] dccp/tcp: Fixup bhash2 bucket when connect() fails.
Date: Thu, 17 Nov 2022 04:56:21 +0000	[thread overview]
Message-ID: <Y3XAce5qbbs978y4@xpf.sh.intel.com> (raw)
In-Reply-To: <20221116222805.64734-5-kuniyu@amazon.com>

Hi Kuniyuki Iwashima,

On 2022-11-16 at 19:20:14 -0800, Kuniyuki Iwashima wrote:
> From:   Pengfei Xu <pengfei.xu@intel.com>
> Date:   Thu, 17 Nov 2022 10:23:01 +0800
> > Hi Kuniyuki Iwashima,
> > 
> > If you consider bisect commit or some other info from below link is useful:
> > "https://lore.kernel.org/lkml/Y2xyHM1fcCkh9AKU@xpf.sh.intel.com/"
> > could you add one more Reported-by tag from me, if no, please ignore the
> > email.
> 
> Hi,
> 
> Thanks for providing a repro and bisecting, and sorry, I didn't subscribe
> LKML and haven't noticed the thread until Stephen forwarded it to the
> netndev mailing list today. [0]
> 
> The issue was brought up for discussion [1] about two weeks before the
> thread.  So, I would recommend that you check netdev first and send a
> report CCing netdev if it is a netwokring stuff.
> 
> The issue is reported by Mat[2], me[1], Ziyang[3], and you, and all of
> them were originally generated by syzkaller.
> 
> If we added all Reported-by tags, they would be:
> 
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> Reported-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Reported-by: Ziyang Xuan (William) <william.xuanziyang@huawei.com>
> Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> 
> But adding my Reported-by sounds odd, so considering the order, only
> syzbot and Mat ones make sense ?
  Ah, that make sense.

> 
> Anyway, I'll leave the decision to maintainers.
> 
> [0]: https://lore.kernel.org/netdev/20221116085854.0dcfa44d@hermes.local/
> [1]: https://lore.kernel.org/netdev/20221029001249.86337-1-kuniyu@amazon.com/
> [2]: https://lore.kernel.org/netdev/4bae9df4-42c1-85c3-d350-119a151d29@linux.intel.com/
> [3]: https://lore.kernel.org/netdev/4bd122d2-d606-b71e-dbe7-63fa293f0a73@huawei.com/
> 
> Thank you.
> 
> P.S please don't top post at Linux mailing lists :)
Is this suggestion for me, I'm not sure which action I should do if there
is.
If no, please ignore.

Thanks!
BR.

> 
> 
> > 
> > Thanks!
> > BR.
> > 
> > On 2022-11-16 at 14:28:05 -0800, Kuniyuki Iwashima wrote:
> > > If a socket bound to a wildcard address fails to connect(), we
> > > only reset saddr and keep the port.  Then, we have to fix up the
> > > bhash2 bucket; otherwise, the bucket has an inconsistent address
> > > in the list.
> > > 
> > > Also, listen() for such a socket will fire the WARN_ON() in
> > > inet_csk_get_port(). [0]
> > > 
> > > Note that when a system runs out of memory, we give up fixing the
> > > bucket and unlink sk from bhash and bhash2 by inet_put_port().
> > > 
> > > [0]:
> > > WARNING: CPU: 0 PID: 207 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
> > > Modules linked in:
> > > CPU: 0 PID: 207 Comm: bhash2_prev_rep Not tainted 6.1.0-rc3-00799-gc8421681c845 #63
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
> > > RIP: 0010:inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
> > > Code: 74 a7 eb 93 48 8b 54 24 18 0f b7 cb 4c 89 e6 4c 89 ff e8 48 b2 ff ff 49 8b 87 18 04 00 00 e9 32 ff ff ff 0f 0b e9 34 ff ff ff <0f> 0b e9 42 ff ff ff 41 8b 7f 50 41 8b 4f 54 89 fe 81 f6 00 00 ff
> > > RSP: 0018:ffffc900003d7e50 EFLAGS: 00010202
> > > RAX: ffff8881047fb500 RBX: 0000000000004e20 RCX: 0000000000000000
> > > RDX: 000000000000000a RSI: 00000000fffffe00 RDI: 00000000ffffffff
> > > RBP: ffffffff8324dc00 R08: 0000000000000001 R09: 0000000000000001
> > > R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> > > R13: 0000000000000001 R14: 0000000000004e20 R15: ffff8881054e1280
> > > FS:  00007f8ac04dc740(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000020001540 CR3: 00000001055fa003 CR4: 0000000000770ef0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > PKRU: 55555554
> > > Call Trace:
> > >  <TASK>
> > >  inet_csk_listen_start (net/ipv4/inet_connection_sock.c:1205)
> > >  inet_listen (net/ipv4/af_inet.c:228)
> > >  __sys_listen (net/socket.c:1810)
> > >  __x64_sys_listen (net/socket.c:1819 net/socket.c:1817 net/socket.c:1817)
> > >  do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> > >  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> > > RIP: 0033:0x7f8ac051de5d
> > > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffc1c177248 EFLAGS: 00000206 ORIG_RAX: 0000000000000032
> > > RAX: ffffffffffffffda RBX: 0000000020001550 RCX: 00007f8ac051de5d
> > > RDX: ffffffffffffff80 RSI: 0000000000000000 RDI: 0000000000000004
> > > RBP: 00007ffc1c177270 R08: 0000000000000018 R09: 0000000000000007
> > > R10: 0000000020001540 R11: 0000000000000206 R12: 00007ffc1c177388
> > > R13: 0000000000401169 R14: 0000000000403e18 R15: 00007f8ac0723000
> > >  </TASK>
> > > 
> > > Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
> > > Reported-by: syzbot <syzkaller@googlegroups.com>
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > >  include/net/inet_hashtables.h |  1 +
> > >  net/dccp/ipv4.c               |  3 +--
> > >  net/dccp/ipv6.c               |  3 +--
> > >  net/dccp/proto.c              |  3 +--
> > >  net/ipv4/inet_hashtables.c    | 38 +++++++++++++++++++++++++++++++----
> > >  net/ipv4/tcp.c                |  3 +--
> > >  net/ipv4/tcp_ipv4.c           |  3 +--
> > >  net/ipv6/tcp_ipv6.c           |  3 +--
> > >  8 files changed, 41 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> > > index ba06e8b52264..69174093078f 100644
> > > --- a/include/net/inet_hashtables.h
> > > +++ b/include/net/inet_hashtables.h
> > > @@ -282,6 +282,7 @@ inet_bhash2_addr_any_hashbucket(const struct sock *sk, const struct net *net, in
> > >   * rcv_saddr field should already have been updated when this is called.
> > >   */
> > >  int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family);
> > > +void inet_bhash2_reset_saddr(struct sock *sk);
> > >  
> > >  void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb,
> > >  		    struct inet_bind2_bucket *tb2, unsigned short port);
> > > diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> > > index 95e376e3b911..b780827f5e0a 100644
> > > --- a/net/dccp/ipv4.c
> > > +++ b/net/dccp/ipv4.c
> > > @@ -143,8 +143,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> > >  	 * This unhashes the socket and releases the local port, if necessary.
> > >  	 */
> > >  	dccp_set_state(sk, DCCP_CLOSED);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  	ip_rt_put(rt);
> > >  	sk->sk_route_caps = 0;
> > >  	inet->inet_dport = 0;
> > > diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> > > index 94c101ed57a9..602f3432d80b 100644
> > > --- a/net/dccp/ipv6.c
> > > +++ b/net/dccp/ipv6.c
> > > @@ -970,8 +970,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
> > >  
> > >  late_failure:
> > >  	dccp_set_state(sk, DCCP_CLOSED);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  	__sk_dst_reset(sk);
> > >  failure:
> > >  	inet->inet_dport = 0;
> > > diff --git a/net/dccp/proto.c b/net/dccp/proto.c
> > > index c548ca3e9b0e..85e35c5e8890 100644
> > > --- a/net/dccp/proto.c
> > > +++ b/net/dccp/proto.c
> > > @@ -279,8 +279,7 @@ int dccp_disconnect(struct sock *sk, int flags)
> > >  
> > >  	inet->inet_dport = 0;
> > >  
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  
> > >  	sk->sk_shutdown = 0;
> > >  	sock_reset_flag(sk, SOCK_DONE);
> > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> > > index dcb6bc918966..d24a04815f20 100644
> > > --- a/net/ipv4/inet_hashtables.c
> > > +++ b/net/ipv4/inet_hashtables.c
> > > @@ -871,7 +871,7 @@ static void inet_update_saddr(struct sock *sk, void *saddr, int family)
> > >  	}
> > >  }
> > >  
> > > -int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > > +static int __inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family, bool reset)
> > >  {
> > >  	struct inet_hashinfo *hinfo = tcp_or_dccp_get_hashinfo(sk);
> > >  	struct inet_bind2_bucket *tb2, *new_tb2;
> > > @@ -882,7 +882,11 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  
> > >  	if (!inet_csk(sk)->icsk_bind2_hash) {
> > >  		/* Not bind()ed before. */
> > > -		inet_update_saddr(sk, saddr, family);
> > > +		if (reset)
> > > +			inet_reset_saddr(sk);
> > > +		else
> > > +			inet_update_saddr(sk, saddr, family);
> > > +
> > >  		return 0;
> > >  	}
> > >  
> > > @@ -891,8 +895,19 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  	 * allocation fails.
> > >  	 */
> > >  	new_tb2 = kmem_cache_alloc(hinfo->bind2_bucket_cachep, GFP_ATOMIC);
> > > -	if (!new_tb2)
> > > +	if (!new_tb2) {
> > > +		if (reset) {
> > > +			/* The (INADDR_ANY, port) bucket might have already been
> > > +			 * freed, then we cannot fixup icsk_bind2_hash, so we give
> > > +			 * up and unlink sk from bhash/bhash2 not to fire WARN_ON()
> > > +			 * in inet_csk_get_port().
> > > +			 */
> > > +			inet_put_port(sk);
> > > +			inet_reset_saddr(sk);
> > > +		}
> > > +
> > >  		return -ENOMEM;
> > > +	}
> > >  
> > >  	/* Unlink first not to show the wrong address for other threads. */
> > >  	head2 = inet_bhashfn_portaddr(hinfo, sk, net, port);
> > > @@ -902,7 +917,10 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  	inet_bind2_bucket_destroy(hinfo->bind2_bucket_cachep, inet_csk(sk)->icsk_bind2_hash);
> > >  	spin_unlock_bh(&head2->lock);
> > >  
> > > -	inet_update_saddr(sk, saddr, family);
> > > +	if (reset)
> > > +		inet_reset_saddr(sk);
> > > +	else
> > > +		inet_update_saddr(sk, saddr, family);
> > >  
> > >  	/* Update bhash2 bucket. */
> > >  	head2 = inet_bhashfn_portaddr(hinfo, sk, net, port);
> > > @@ -922,8 +940,20 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  
> > >  	return 0;
> > >  }
> > > +
> > > +int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > > +{
> > > +	return __inet_bhash2_update_saddr(sk, saddr, family, false);
> > > +}
> > >  EXPORT_SYMBOL_GPL(inet_bhash2_update_saddr);
> > >  
> > > +void inet_bhash2_reset_saddr(struct sock *sk)
> > > +{
> > > +	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > +		__inet_bhash2_update_saddr(sk, NULL, 0, true);
> > > +}
> > > +EXPORT_SYMBOL_GPL(inet_bhash2_reset_saddr);
> > > +
> > >  /* RFC 6056 3.3.4.  Algorithm 4: Double-Hash Port Selection Algorithm
> > >   * Note that we use 32bit integers (vs RFC 'short integers')
> > >   * because 2^16 is not a multiple of num_ephemeral and this
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 54836a6b81d6..4f2205756cfe 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -3114,8 +3114,7 @@ int tcp_disconnect(struct sock *sk, int flags)
> > >  
> > >  	inet->inet_dport = 0;
> > >  
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  
> > >  	sk->sk_shutdown = 0;
> > >  	sock_reset_flag(sk, SOCK_DONE);
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index 23dd7e9df2d5..da46357f501b 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> > > @@ -331,8 +331,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> > >  	 * if necessary.
> > >  	 */
> > >  	tcp_set_state(sk, TCP_CLOSE);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  	ip_rt_put(rt);
> > >  	sk->sk_route_caps = 0;
> > >  	inet->inet_dport = 0;
> > > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > > index 2f3ca3190d26..f0548dbcabd2 100644
> > > --- a/net/ipv6/tcp_ipv6.c
> > > +++ b/net/ipv6/tcp_ipv6.c
> > > @@ -346,8 +346,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
> > >  
> > >  late_failure:
> > >  	tcp_set_state(sk, TCP_CLOSE);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  failure:
> > >  	inet->inet_dport = 0;
> > >  	sk->sk_route_caps = 0;
> > > -- 
> > > 2.30.2

WARNING: multiple messages have this Message-ID (diff)
From: Pengfei Xu <pengfei.xu@intel.com>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: <acme@mandriva.com>, <davem@davemloft.net>,
	<dccp@vger.kernel.org>, <dsahern@kernel.org>,
	<edumazet@google.com>, <joannelkoong@gmail.com>,
	<kuba@kernel.org>, <kuni1840@gmail.com>, <martin.lau@kernel.org>,
	<mathew.j.martineau@linux.intel.com>, <netdev@vger.kernel.org>,
	<pabeni@redhat.com>, <stephen@networkplumber.org>,
	<syzkaller@googlegroups.com>, <william.xuanziyang@huawei.com>,
	<yoshfuji@linux-ipv6.org>
Subject: Re: [PATCH v2 net 4/4] dccp/tcp: Fixup bhash2 bucket when connect() fails.
Date: Thu, 17 Nov 2022 13:02:41 +0800	[thread overview]
Message-ID: <Y3XAce5qbbs978y4@xpf.sh.intel.com> (raw)
In-Reply-To: <20221117032014.80364-1-kuniyu@amazon.com>

Hi Kuniyuki Iwashima,

On 2022-11-16 at 19:20:14 -0800, Kuniyuki Iwashima wrote:
> From:   Pengfei Xu <pengfei.xu@intel.com>
> Date:   Thu, 17 Nov 2022 10:23:01 +0800
> > Hi Kuniyuki Iwashima,
> > 
> > If you consider bisect commit or some other info from below link is useful:
> > "https://lore.kernel.org/lkml/Y2xyHM1fcCkh9AKU@xpf.sh.intel.com/"
> > could you add one more Reported-by tag from me, if no, please ignore the
> > email.
> 
> Hi,
> 
> Thanks for providing a repro and bisecting, and sorry, I didn't subscribe
> LKML and haven't noticed the thread until Stephen forwarded it to the
> netndev mailing list today. [0]
> 
> The issue was brought up for discussion [1] about two weeks before the
> thread.  So, I would recommend that you check netdev first and send a
> report CCing netdev if it is a netwokring stuff.
> 
> The issue is reported by Mat[2], me[1], Ziyang[3], and you, and all of
> them were originally generated by syzkaller.
> 
> If we added all Reported-by tags, they would be:
> 
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> Reported-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Reported-by: Ziyang Xuan (William) <william.xuanziyang@huawei.com>
> Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> 
> But adding my Reported-by sounds odd, so considering the order, only
> syzbot and Mat ones make sense ?
  Ah, that make sense.

> 
> Anyway, I'll leave the decision to maintainers.
> 
> [0]: https://lore.kernel.org/netdev/20221116085854.0dcfa44d@hermes.local/
> [1]: https://lore.kernel.org/netdev/20221029001249.86337-1-kuniyu@amazon.com/
> [2]: https://lore.kernel.org/netdev/4bae9df4-42c1-85c3-d350-119a151d29@linux.intel.com/
> [3]: https://lore.kernel.org/netdev/4bd122d2-d606-b71e-dbe7-63fa293f0a73@huawei.com/
> 
> Thank you.
> 
> P.S please don't top post at Linux mailing lists :)
Is this suggestion for me, I'm not sure which action I should do if there
is.
If no, please ignore.

Thanks!
BR.

> 
> 
> > 
> > Thanks!
> > BR.
> > 
> > On 2022-11-16 at 14:28:05 -0800, Kuniyuki Iwashima wrote:
> > > If a socket bound to a wildcard address fails to connect(), we
> > > only reset saddr and keep the port.  Then, we have to fix up the
> > > bhash2 bucket; otherwise, the bucket has an inconsistent address
> > > in the list.
> > > 
> > > Also, listen() for such a socket will fire the WARN_ON() in
> > > inet_csk_get_port(). [0]
> > > 
> > > Note that when a system runs out of memory, we give up fixing the
> > > bucket and unlink sk from bhash and bhash2 by inet_put_port().
> > > 
> > > [0]:
> > > WARNING: CPU: 0 PID: 207 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
> > > Modules linked in:
> > > CPU: 0 PID: 207 Comm: bhash2_prev_rep Not tainted 6.1.0-rc3-00799-gc8421681c845 #63
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
> > > RIP: 0010:inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
> > > Code: 74 a7 eb 93 48 8b 54 24 18 0f b7 cb 4c 89 e6 4c 89 ff e8 48 b2 ff ff 49 8b 87 18 04 00 00 e9 32 ff ff ff 0f 0b e9 34 ff ff ff <0f> 0b e9 42 ff ff ff 41 8b 7f 50 41 8b 4f 54 89 fe 81 f6 00 00 ff
> > > RSP: 0018:ffffc900003d7e50 EFLAGS: 00010202
> > > RAX: ffff8881047fb500 RBX: 0000000000004e20 RCX: 0000000000000000
> > > RDX: 000000000000000a RSI: 00000000fffffe00 RDI: 00000000ffffffff
> > > RBP: ffffffff8324dc00 R08: 0000000000000001 R09: 0000000000000001
> > > R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> > > R13: 0000000000000001 R14: 0000000000004e20 R15: ffff8881054e1280
> > > FS:  00007f8ac04dc740(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000020001540 CR3: 00000001055fa003 CR4: 0000000000770ef0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > PKRU: 55555554
> > > Call Trace:
> > >  <TASK>
> > >  inet_csk_listen_start (net/ipv4/inet_connection_sock.c:1205)
> > >  inet_listen (net/ipv4/af_inet.c:228)
> > >  __sys_listen (net/socket.c:1810)
> > >  __x64_sys_listen (net/socket.c:1819 net/socket.c:1817 net/socket.c:1817)
> > >  do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> > >  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> > > RIP: 0033:0x7f8ac051de5d
> > > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffc1c177248 EFLAGS: 00000206 ORIG_RAX: 0000000000000032
> > > RAX: ffffffffffffffda RBX: 0000000020001550 RCX: 00007f8ac051de5d
> > > RDX: ffffffffffffff80 RSI: 0000000000000000 RDI: 0000000000000004
> > > RBP: 00007ffc1c177270 R08: 0000000000000018 R09: 0000000000000007
> > > R10: 0000000020001540 R11: 0000000000000206 R12: 00007ffc1c177388
> > > R13: 0000000000401169 R14: 0000000000403e18 R15: 00007f8ac0723000
> > >  </TASK>
> > > 
> > > Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
> > > Reported-by: syzbot <syzkaller@googlegroups.com>
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > >  include/net/inet_hashtables.h |  1 +
> > >  net/dccp/ipv4.c               |  3 +--
> > >  net/dccp/ipv6.c               |  3 +--
> > >  net/dccp/proto.c              |  3 +--
> > >  net/ipv4/inet_hashtables.c    | 38 +++++++++++++++++++++++++++++++----
> > >  net/ipv4/tcp.c                |  3 +--
> > >  net/ipv4/tcp_ipv4.c           |  3 +--
> > >  net/ipv6/tcp_ipv6.c           |  3 +--
> > >  8 files changed, 41 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> > > index ba06e8b52264..69174093078f 100644
> > > --- a/include/net/inet_hashtables.h
> > > +++ b/include/net/inet_hashtables.h
> > > @@ -282,6 +282,7 @@ inet_bhash2_addr_any_hashbucket(const struct sock *sk, const struct net *net, in
> > >   * rcv_saddr field should already have been updated when this is called.
> > >   */
> > >  int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family);
> > > +void inet_bhash2_reset_saddr(struct sock *sk);
> > >  
> > >  void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb,
> > >  		    struct inet_bind2_bucket *tb2, unsigned short port);
> > > diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> > > index 95e376e3b911..b780827f5e0a 100644
> > > --- a/net/dccp/ipv4.c
> > > +++ b/net/dccp/ipv4.c
> > > @@ -143,8 +143,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> > >  	 * This unhashes the socket and releases the local port, if necessary.
> > >  	 */
> > >  	dccp_set_state(sk, DCCP_CLOSED);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  	ip_rt_put(rt);
> > >  	sk->sk_route_caps = 0;
> > >  	inet->inet_dport = 0;
> > > diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> > > index 94c101ed57a9..602f3432d80b 100644
> > > --- a/net/dccp/ipv6.c
> > > +++ b/net/dccp/ipv6.c
> > > @@ -970,8 +970,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
> > >  
> > >  late_failure:
> > >  	dccp_set_state(sk, DCCP_CLOSED);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  	__sk_dst_reset(sk);
> > >  failure:
> > >  	inet->inet_dport = 0;
> > > diff --git a/net/dccp/proto.c b/net/dccp/proto.c
> > > index c548ca3e9b0e..85e35c5e8890 100644
> > > --- a/net/dccp/proto.c
> > > +++ b/net/dccp/proto.c
> > > @@ -279,8 +279,7 @@ int dccp_disconnect(struct sock *sk, int flags)
> > >  
> > >  	inet->inet_dport = 0;
> > >  
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  
> > >  	sk->sk_shutdown = 0;
> > >  	sock_reset_flag(sk, SOCK_DONE);
> > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> > > index dcb6bc918966..d24a04815f20 100644
> > > --- a/net/ipv4/inet_hashtables.c
> > > +++ b/net/ipv4/inet_hashtables.c
> > > @@ -871,7 +871,7 @@ static void inet_update_saddr(struct sock *sk, void *saddr, int family)
> > >  	}
> > >  }
> > >  
> > > -int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > > +static int __inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family, bool reset)
> > >  {
> > >  	struct inet_hashinfo *hinfo = tcp_or_dccp_get_hashinfo(sk);
> > >  	struct inet_bind2_bucket *tb2, *new_tb2;
> > > @@ -882,7 +882,11 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  
> > >  	if (!inet_csk(sk)->icsk_bind2_hash) {
> > >  		/* Not bind()ed before. */
> > > -		inet_update_saddr(sk, saddr, family);
> > > +		if (reset)
> > > +			inet_reset_saddr(sk);
> > > +		else
> > > +			inet_update_saddr(sk, saddr, family);
> > > +
> > >  		return 0;
> > >  	}
> > >  
> > > @@ -891,8 +895,19 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  	 * allocation fails.
> > >  	 */
> > >  	new_tb2 = kmem_cache_alloc(hinfo->bind2_bucket_cachep, GFP_ATOMIC);
> > > -	if (!new_tb2)
> > > +	if (!new_tb2) {
> > > +		if (reset) {
> > > +			/* The (INADDR_ANY, port) bucket might have already been
> > > +			 * freed, then we cannot fixup icsk_bind2_hash, so we give
> > > +			 * up and unlink sk from bhash/bhash2 not to fire WARN_ON()
> > > +			 * in inet_csk_get_port().
> > > +			 */
> > > +			inet_put_port(sk);
> > > +			inet_reset_saddr(sk);
> > > +		}
> > > +
> > >  		return -ENOMEM;
> > > +	}
> > >  
> > >  	/* Unlink first not to show the wrong address for other threads. */
> > >  	head2 = inet_bhashfn_portaddr(hinfo, sk, net, port);
> > > @@ -902,7 +917,10 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  	inet_bind2_bucket_destroy(hinfo->bind2_bucket_cachep, inet_csk(sk)->icsk_bind2_hash);
> > >  	spin_unlock_bh(&head2->lock);
> > >  
> > > -	inet_update_saddr(sk, saddr, family);
> > > +	if (reset)
> > > +		inet_reset_saddr(sk);
> > > +	else
> > > +		inet_update_saddr(sk, saddr, family);
> > >  
> > >  	/* Update bhash2 bucket. */
> > >  	head2 = inet_bhashfn_portaddr(hinfo, sk, net, port);
> > > @@ -922,8 +940,20 @@ int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > >  
> > >  	return 0;
> > >  }
> > > +
> > > +int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family)
> > > +{
> > > +	return __inet_bhash2_update_saddr(sk, saddr, family, false);
> > > +}
> > >  EXPORT_SYMBOL_GPL(inet_bhash2_update_saddr);
> > >  
> > > +void inet_bhash2_reset_saddr(struct sock *sk)
> > > +{
> > > +	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > +		__inet_bhash2_update_saddr(sk, NULL, 0, true);
> > > +}
> > > +EXPORT_SYMBOL_GPL(inet_bhash2_reset_saddr);
> > > +
> > >  /* RFC 6056 3.3.4.  Algorithm 4: Double-Hash Port Selection Algorithm
> > >   * Note that we use 32bit integers (vs RFC 'short integers')
> > >   * because 2^16 is not a multiple of num_ephemeral and this
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 54836a6b81d6..4f2205756cfe 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -3114,8 +3114,7 @@ int tcp_disconnect(struct sock *sk, int flags)
> > >  
> > >  	inet->inet_dport = 0;
> > >  
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  
> > >  	sk->sk_shutdown = 0;
> > >  	sock_reset_flag(sk, SOCK_DONE);
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index 23dd7e9df2d5..da46357f501b 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> > > @@ -331,8 +331,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> > >  	 * if necessary.
> > >  	 */
> > >  	tcp_set_state(sk, TCP_CLOSE);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  	ip_rt_put(rt);
> > >  	sk->sk_route_caps = 0;
> > >  	inet->inet_dport = 0;
> > > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > > index 2f3ca3190d26..f0548dbcabd2 100644
> > > --- a/net/ipv6/tcp_ipv6.c
> > > +++ b/net/ipv6/tcp_ipv6.c
> > > @@ -346,8 +346,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
> > >  
> > >  late_failure:
> > >  	tcp_set_state(sk, TCP_CLOSE);
> > > -	if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> > > -		inet_reset_saddr(sk);
> > > +	inet_bhash2_reset_saddr(sk);
> > >  failure:
> > >  	inet->inet_dport = 0;
> > >  	sk->sk_route_caps = 0;
> > > -- 
> > > 2.30.2

  parent reply	other threads:[~2022-11-17  4:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 22:28 [PATCH v2 net 4/4] dccp/tcp: Fixup bhash2 bucket when connect() fails Kuniyuki Iwashima
2022-11-16 22:28 ` Kuniyuki Iwashima
2022-11-17  2:23 ` Pengfei Xu
2022-11-17  2:23   ` Pengfei Xu
2022-11-17  3:20 ` Kuniyuki Iwashima
2022-11-17  3:20   ` Kuniyuki Iwashima
2022-11-17  4:56 ` Pengfei Xu [this message]
2022-11-17  5:02   ` Pengfei Xu
  -- strict thread matches above, loose matches on Subject: below --
2022-11-16 22:28 [PATCH v2 net 3/4] dccp/tcp: Don't update saddr before unlinking sk from the old bucket Kuniyuki Iwashima
2022-11-16 22:28 ` Kuniyuki Iwashima
2022-11-17 21:32 ` Joanne Koong
2022-11-17 21:32   ` Joanne Koong
2022-11-17 23:59 ` Kuniyuki Iwashima
2022-11-18  0:06   ` Kuniyuki Iwashima
2022-11-18  0:55 ` Joanne Koong
2022-11-18  0:55   ` Joanne Koong
2022-11-18  1:08 ` Kuniyuki Iwashima
2022-11-18  1:08   ` Kuniyuki Iwashima
2022-11-18 18:56 ` Joanne Koong
2022-11-18 19:02   ` Joanne Koong
2022-11-18 19:58 ` Kuniyuki Iwashima
2022-11-18 19:58   ` Kuniyuki Iwashima
2022-11-16 22:28 [PATCH v2 net 2/4] dccp/tcp: Remove NULL check for prev_saddr in inet_bhash2_update_saddr() Kuniyuki Iwashima
2022-11-16 22:28 ` Kuniyuki Iwashima
2022-11-17  0:07 ` Joanne Koong
2022-11-17  0:07   ` Joanne Koong
2022-11-16 22:28 [PATCH v2 net 1/4] dccp/tcp: Reset saddr on failure after inet6?_hash_connect() Kuniyuki Iwashima
2022-11-16 22:28 ` Kuniyuki Iwashima
2022-11-17  0:11 ` Joanne Koong
2022-11-17  0:11   ` Joanne Koong
2022-11-17  0:20 ` Kuniyuki Iwashima
2022-11-17  0:20   ` Kuniyuki Iwashima
2022-11-17  0:43 ` Joanne Koong
2022-11-17  0:43   ` Joanne Koong
2022-11-16 22:28 [PATCH v2 net 0/4] dccp/tcp: Fix bhash2 issues related to WARN_ON() in inet_csk_get_port() Kuniyuki Iwashima
2022-11-16 22:28 ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3XAce5qbbs978y4@xpf.sh.intel.com \
    --to=pengfei.xu@intel.com \
    --cc=dccp@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.