From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38 Date: Sat, 2 Apr 2011 21:15:16 +0200 Message-ID: <20110402191516.GG5552@1wt.eu> References: <201104022001.48144.cyril.bonte@free.fr> <1301767848.2837.14.camel@edumazet-laptop> <201104022046.11701.cyril.bonte@free.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Eric Dumazet , netdev@vger.kernel.org, Daniel Baluta , Gaspar Chilingarov , Charles Duffy To: Cyril =?iso-8859-1?Q?Bont=E9?= Return-path: Received: from 1wt.eu ([62.212.114.60]:62123 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756426Ab1DBTZv (ORCPT ); Sat, 2 Apr 2011 15:25:51 -0400 Content-Disposition: inline In-Reply-To: <201104022046.11701.cyril.bonte@free.fr> Sender: netdev-owner@vger.kernel.org List-ID: Hi Eric, On Sat, Apr 02, 2011 at 08:46:11PM +0200, Cyril Bont=E9 wrote: > Le samedi 2 avril 2011 20:10:48, Eric Dumazet a =E9crit : > > Le samedi 02 avril 2011 =E0 20:01 +0200, Cyril Bont=E9 a =E9crit : > > (...) > > > > if (shutdown(listenfd, SHUT_WR) =3D=3D 0 && > > > =09 > > > listen(listenfd, 1024) =3D=3D 0 && > > > shutdown(listenfd, SHUT_RD) =3D=3D 0) { > > > =09 > > > printf("shutdown OK\n"); > > > =09 > > > } > > > =09 > > > } > > > exit(0); > > >=20 > > > } > >=20 > > Wow, not clear what this is doing.... > >=20 > > for sure the listen() call is not needed ? > >=20 > > And the shutdown(listenfd, SHUT_WR) is clearly useless too. >=20 > Well, I'm not the best one to explain that part but from what i read = in the=20 > comments of this part of code, both listen and SHUT_WR are used to de= tect=20 > errors on various OS (OpenBSD, Solaris, ...). >=20 > > I feel you only needed the shutdown(listenfd, SHUT_RD) call. > >=20 > > Why haproxy needs to setup a second listening socket on same port ? >=20 > I simplified the test case, which is far from what haproxy do (just f= orgot to=20 > explain the real behaviour). > To reload the configuration, a new haproxy process is launched, sendi= ng a=20 > signal to the previous one and asking it to free the ports for a whil= e (the=20 > shutdown part in the test). The new process then tries to bind the po= rts,=20 > which worked until 2.6.38 (if an error occurs, a new signal is sent t= o the=20 > previous process to listen to its sockets again). Indeed, here's what normally happens when haproxy reloads. New process is loaded with a new config. Once the config correctly pars= es, it sends a signal to the previous process asking it to temporarily rele= ase its listening ports so that the new one can bind, hence the shutdown(SH= UT_RD) performed in the old process. Then the new process can grab the ports and listen to them. Once that's= OK, it sends another signal to the old process telling it it can go away. B= ut if the new process failed to completely start (eg: could not grab one p= ort), then it sends a third signal to the old process asking it to rebind the= port and serve them again, and the new one dies with an error. That way, the service is never interrupted even if the new config fails late, because the old process has the ability to rebind to the port it temporarily released. Now with 2.6.38, as Cyril diagnosed it, the new bind() fails when the old process has just performed its shutdown(SHUT_RD), preventing the new process from binding to the ports until the old process has definitely closed them. The behaviour is very useful, because the old process might have lost its privileges, it will not have to rebind to the socket, just listen on it again since it is never closed. This is quite embarrassing, because this code used to work for the last 10 years, at least since kernel 2.2, and maybe even 2.0, I don't remember. I'm not sure what the original intent of the patch was, not what was the reported issue, but maybe we could find a way to both fix the reported issue (if any) and restore the old behaviour in order not to break existing programs. Best regards, Willy