From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: virt-manager broken by bind(0) in net-next. Date: Fri, 30 Jan 2009 19:41:59 +0100 Message-ID: <498349F7.4050300@cosmosbay.com> References: <20090130112125.GA9908@ioremap.net> <20090130125337.GA7155@gondor.apana.org.au> <20090130095737.103edbff@extreme> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Herbert Xu , Evgeniy Polyakov , berrange@redhat.com, et-mgmt-tools@redhat.com, davem@davemloft.net, netdev@vger.kernel.org To: Stephen Hemminger Return-path: Received: from gw1.cosmosbay.com ([212.99.114.194]:38708 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751716AbZA3Sn3 convert rfc822-to-8bit (ORCPT ); Fri, 30 Jan 2009 13:43:29 -0500 In-Reply-To: <20090130095737.103edbff@extreme> Sender: netdev-owner@vger.kernel.org List-ID: Stephen Hemminger a =E9crit : > On Fri, 30 Jan 2009 23:53:37 +1100 > Herbert Xu wrote: >=20 >> Evgeniy Polyakov wrote: >>> So it is not explicit bind call, but port autoselection in the >>> connect(). Can you check what errno is returned? >>> Did I understand it right, that connect fails, you try different >>> address, but then suddenly all those sockets become 'alive'? >> Yes, I think a good strace vs. a bad strace would be really helpful >> in these cases. >> >> Thanks, >=20 > I have the strace but it comes up no different. > What is different is that in the broken case (net-next), I see > IPV6 being used: >=20 > State Recv-Q Send-Q Local Address:Port Peer Addres= s:Port =20 > ESTAB 23769 0 ::ffff:127.0.0.1:5900 ::ffff:127.0.0.= 1:55987 =20 > ESTAB 0 0 127.0.0.1:55987 127.0.0.= 1:5900 >=20 > and in the working case (2.6.29-rc3), IPV4 is being used > State Recv-Q Send-Q Local Address:Port Peer Addres= s:Port =20 > ESTAB 0 0 127.0.0.1:58894 127.0.0.= 1:5901 =20 > ESTAB 0 0 127.0.0.1:5901 127.0.0.= 1:58894=20 >=20 Reviewing commit a9d8f9110d7e953c2f2b521087a4179677843c2a I see use of a hashinfo->bsockets field that : - lacks proper lock/synchronization - suffers from cache line ping pongs on SMP Also there might be a problem at line 175 if (sk->sk_reuse && sk->sk_state !=3D TCP_LISTEN && --attempts >=3D 0) = {=20 spin_unlock(&head->lock); goto again; If we entered inet_csk_get_port() with a non null snum, we can "goto ag= ain" while it was not expected. diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection= _sock.c index df8e72f..752c6b2 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -172,7 +172,8 @@ tb_found: } else { ret =3D 1; if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) { - if (sk->sk_reuse && sk->sk_state !=3D TCP_LISTEN && --attempts >=3D= 0) { + if (sk->sk_reuse && sk->sk_state !=3D TCP_LISTEN && + smallest_size =3D=3D -1 && --attempts >=3D 0) { spin_unlock(&head->lock); goto again; }