* tcp: disallow bind() to reuse addr/port regression in 2.6.38
@ 2011-04-02 18:01 Cyril Bonté
2011-04-02 18:10 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Cyril Bonté @ 2011-04-02 18:01 UTC (permalink / raw)
To: netdev
Cc: Eric Dumazet, Daniel Baluta, Gaspar Chilingarov, Charles Duffy,
Willy Tarreau
Hi All,
(2nd try to fix the mailing list address)
It has been reported that kernel 2.6.38 prevented the load balancer haproxy to
reload. After reading the kernel Changelog, it looks like the following commit
has a negative side effect on the the way haproxy "pauses" its listening
sockets to start a new process :
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c191a836a908d1dd6b40c503741f91b914de3348
Disabling the TCPF_CLOSE flag condition reallows to work as before. I guess
this was done for good reasons (Sorry, I haven't found the thread about that
commit in the archives yet) but other applications may also be impacted by
this change.
I add Willy Tarreau to the CC to open the discussion.
Here is a simple test case to reproduce the issue (with kernel 2.6.38, it will
fail on the second loop whereas it works with previous kernel versions) :
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <stdio.h>
#include <strings.h>
int main(int argc, char**argv)
{
int listenfd;
struct sockaddr_in servaddr;
int i;
int one = 1;
for (i = 0; i < 2; i++)
{
printf("LOOP %d...\n", i + 1);
listenfd=socket(AF_INET,SOCK_STREAM,0);
setsockopt(listenfd,SOL_SOCKET,SO_REUSEADDR,(char *)&one,sizeof(one));
bzero(&servaddr,sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr=htonl(INADDR_ANY);
servaddr.sin_port=htons(32000);
if (bind(listenfd,(struct sockaddr *)&servaddr,sizeof(servaddr)) != 0)
{
perror("bind");
exit(1);
}
if (listen(listenfd,1024) != 0)
{
perror("listen");
exit(1);
}
if (shutdown(listenfd, SHUT_WR) == 0 &&
listen(listenfd, 1024) == 0 &&
shutdown(listenfd, SHUT_RD) == 0) {
printf("shutdown OK\n");
}
}
exit(0);
}
--
Cyril Bonté
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
2011-04-02 18:01 tcp: disallow bind() to reuse addr/port regression in 2.6.38 Cyril Bonté
@ 2011-04-02 18:10 ` Eric Dumazet
2011-04-02 18:46 ` Cyril Bonté
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2011-04-02 18:10 UTC (permalink / raw)
To: Cyril Bonté
Cc: netdev, Daniel Baluta, Gaspar Chilingarov, Charles Duffy,
Willy Tarreau
Le samedi 02 avril 2011 à 20:01 +0200, Cyril Bonté a écrit :
> Hi All,
>
> (2nd try to fix the mailing list address)
>
> It has been reported that kernel 2.6.38 prevented the load balancer haproxy to
> reload. After reading the kernel Changelog, it looks like the following commit
> has a negative side effect on the the way haproxy "pauses" its listening
> sockets to start a new process :
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c191a836a908d1dd6b40c503741f91b914de3348
>
> Disabling the TCPF_CLOSE flag condition reallows to work as before. I guess
> this was done for good reasons (Sorry, I haven't found the thread about that
> commit in the archives yet) but other applications may also be impacted by
> this change.
>
> I add Willy Tarreau to the CC to open the discussion.
>
> if (shutdown(listenfd, SHUT_WR) == 0 &&
> listen(listenfd, 1024) == 0 &&
> shutdown(listenfd, SHUT_RD) == 0) {
> printf("shutdown OK\n");
> }
> }
> exit(0);
> }
>
Wow, not clear what this is doing....
for sure the listen() call is not needed ?
And the shutdown(listenfd, SHUT_WR) is clearly useless too.
I feel you only needed the shutdown(listenfd, SHUT_RD) call.
Why haproxy needs to setup a second listening socket on same port ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
2011-04-02 18:10 ` Eric Dumazet
@ 2011-04-02 18:46 ` Cyril Bonté
2011-04-02 19:15 ` Willy Tarreau
0 siblings, 1 reply; 8+ messages in thread
From: Cyril Bonté @ 2011-04-02 18:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, Daniel Baluta, Gaspar Chilingarov, Charles Duffy,
Willy Tarreau
Le samedi 2 avril 2011 20:10:48, Eric Dumazet a écrit :
> Le samedi 02 avril 2011 à 20:01 +0200, Cyril Bonté a écrit :
> (...)
> > > if (shutdown(listenfd, SHUT_WR) == 0 &&
> >
> > listen(listenfd, 1024) == 0 &&
> > shutdown(listenfd, SHUT_RD) == 0) {
> >
> > printf("shutdown OK\n");
> >
> > }
> >
> > }
> > exit(0);
> >
> > }
>
> Wow, not clear what this is doing....
>
> for sure the listen() call is not needed ?
>
> And the shutdown(listenfd, SHUT_WR) is clearly useless too.
Well, I'm not the best one to explain that part but from what i read in the
comments of this part of code, both listen and SHUT_WR are used to detect
errors on various OS (OpenBSD, Solaris, ...).
> I feel you only needed the shutdown(listenfd, SHUT_RD) call.
>
> Why haproxy needs to setup a second listening socket on same port ?
I simplified the test case, which is far from what haproxy do (just forgot to
explain the real behaviour).
To reload the configuration, a new haproxy process is launched, sending a
signal to the previous one and asking it to free the ports for a while (the
shutdown part in the test). The new process then tries to bind the ports,
which worked until 2.6.38 (if an error occurs, a new signal is sent to the
previous process to listen to its sockets again).
--
Cyril Bonté
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
2011-04-02 18:46 ` Cyril Bonté
@ 2011-04-02 19:15 ` Willy Tarreau
2011-04-02 19:44 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Willy Tarreau @ 2011-04-02 19:15 UTC (permalink / raw)
To: Cyril Bonté
Cc: Eric Dumazet, netdev, Daniel Baluta, Gaspar Chilingarov,
Charles Duffy
Hi Eric,
On Sat, Apr 02, 2011 at 08:46:11PM +0200, Cyril Bonté wrote:
> Le samedi 2 avril 2011 20:10:48, Eric Dumazet a écrit :
> > Le samedi 02 avril 2011 à 20:01 +0200, Cyril Bonté a écrit :
> > (...)
> > > > if (shutdown(listenfd, SHUT_WR) == 0 &&
> > >
> > > listen(listenfd, 1024) == 0 &&
> > > shutdown(listenfd, SHUT_RD) == 0) {
> > >
> > > printf("shutdown OK\n");
> > >
> > > }
> > >
> > > }
> > > exit(0);
> > >
> > > }
> >
> > Wow, not clear what this is doing....
> >
> > for sure the listen() call is not needed ?
> >
> > And the shutdown(listenfd, SHUT_WR) is clearly useless too.
>
> Well, I'm not the best one to explain that part but from what i read in the
> comments of this part of code, both listen and SHUT_WR are used to detect
> errors on various OS (OpenBSD, Solaris, ...).
>
> > I feel you only needed the shutdown(listenfd, SHUT_RD) call.
> >
> > Why haproxy needs to setup a second listening socket on same port ?
>
> I simplified the test case, which is far from what haproxy do (just forgot to
> explain the real behaviour).
> To reload the configuration, a new haproxy process is launched, sending a
> signal to the previous one and asking it to free the ports for a while (the
> shutdown part in the test). The new process then tries to bind the ports,
> which worked until 2.6.38 (if an error occurs, a new signal is sent to the
> previous process to listen to its sockets again).
Indeed, here's what normally happens when haproxy reloads.
New process is loaded with a new config. Once the config correctly parses,
it sends a signal to the previous process asking it to temporarily release
its listening ports so that the new one can bind, hence the shutdown(SHUT_RD)
performed in the old process.
Then the new process can grab the ports and listen to them. Once that's OK,
it sends another signal to the old process telling it it can go away. But
if the new process failed to completely start (eg: could not grab one port),
then it sends a third signal to the old process asking it to rebind the port
and serve them again, and the new one dies with an error.
That way, the service is never interrupted even if the new config fails
late, because the old process has the ability to rebind to the port it
temporarily released.
Now with 2.6.38, as Cyril diagnosed it, the new bind() fails when the
old process has just performed its shutdown(SHUT_RD), preventing the
new process from binding to the ports until the old process has
definitely closed them.
The behaviour is very useful, because the old process might have lost
its privileges, it will not have to rebind to the socket, just listen
on it again since it is never closed.
This is quite embarrassing, because this code used to work for the
last 10 years, at least since kernel 2.2, and maybe even 2.0, I don't
remember.
I'm not sure what the original intent of the patch was, not what was
the reported issue, but maybe we could find a way to both fix the
reported issue (if any) and restore the old behaviour in order not
to break existing programs.
Best regards,
Willy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
2011-04-02 19:15 ` Willy Tarreau
@ 2011-04-02 19:44 ` Eric Dumazet
2011-04-02 20:37 ` Willy Tarreau
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2011-04-02 19:44 UTC (permalink / raw)
To: Willy Tarreau
Cc: Cyril Bonté, netdev, Daniel Baluta, Gaspar Chilingarov,
Charles Duffy
Le samedi 02 avril 2011 à 21:15 +0200, Willy Tarreau a écrit :
> Hi Eric,
>
> On Sat, Apr 02, 2011 at 08:46:11PM +0200, Cyril Bonté wrote:
> > Le samedi 2 avril 2011 20:10:48, Eric Dumazet a écrit :
> > > Le samedi 02 avril 2011 à 20:01 +0200, Cyril Bonté a écrit :
> > > (...)
> > > > > if (shutdown(listenfd, SHUT_WR) == 0 &&
> > > >
> > > > listen(listenfd, 1024) == 0 &&
> > > > shutdown(listenfd, SHUT_RD) == 0) {
> > > >
> > > > printf("shutdown OK\n");
> > > >
> > > > }
> > > >
> > > > }
> > > > exit(0);
> > > >
> > > > }
> > >
> > > Wow, not clear what this is doing....
> > >
> > > for sure the listen() call is not needed ?
> > >
> > > And the shutdown(listenfd, SHUT_WR) is clearly useless too.
> >
> > Well, I'm not the best one to explain that part but from what i read in the
> > comments of this part of code, both listen and SHUT_WR are used to detect
> > errors on various OS (OpenBSD, Solaris, ...).
> >
> > > I feel you only needed the shutdown(listenfd, SHUT_RD) call.
> > >
> > > Why haproxy needs to setup a second listening socket on same port ?
> >
> > I simplified the test case, which is far from what haproxy do (just forgot to
> > explain the real behaviour).
> > To reload the configuration, a new haproxy process is launched, sending a
> > signal to the previous one and asking it to free the ports for a while (the
> > shutdown part in the test). The new process then tries to bind the ports,
> > which worked until 2.6.38 (if an error occurs, a new signal is sent to the
> > previous process to listen to its sockets again).
>
> Indeed, here's what normally happens when haproxy reloads.
>
> New process is loaded with a new config. Once the config correctly parses,
> it sends a signal to the previous process asking it to temporarily release
> its listening ports so that the new one can bind, hence the shutdown(SHUT_RD)
> performed in the old process.
>
> Then the new process can grab the ports and listen to them. Once that's OK,
> it sends another signal to the old process telling it it can go away. But
> if the new process failed to completely start (eg: could not grab one port),
> then it sends a third signal to the old process asking it to rebind the port
> and serve them again, and the new one dies with an error.
>
> That way, the service is never interrupted even if the new config fails
> late, because the old process has the ability to rebind to the port it
> temporarily released.
>
> Now with 2.6.38, as Cyril diagnosed it, the new bind() fails when the
> old process has just performed its shutdown(SHUT_RD), preventing the
> new process from binding to the ports until the old process has
> definitely closed them.
>
> The behaviour is very useful, because the old process might have lost
> its privileges, it will not have to rebind to the socket, just listen
> on it again since it is never closed.
>
> This is quite embarrassing, because this code used to work for the
> last 10 years, at least since kernel 2.2, and maybe even 2.0, I don't
> remember.
>
> I'm not sure what the original intent of the patch was, not what was
> the reported issue, but maybe we could find a way to both fix the
> reported issue (if any) and restore the old behaviour in order not
> to break existing programs.
>
> Best regards,
> Willy
>
I wish it was that simple....
http://www.spinics.net/lists/netdev/msg151551.html
Is Cyril program running OK on FreeBsd ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
2011-04-02 19:44 ` Eric Dumazet
@ 2011-04-02 20:37 ` Willy Tarreau
2011-04-02 21:00 ` Cyril Bonté
0 siblings, 1 reply; 8+ messages in thread
From: Willy Tarreau @ 2011-04-02 20:37 UTC (permalink / raw)
To: Eric Dumazet
Cc: Cyril Bonté, netdev, Daniel Baluta, Gaspar Chilingarov,
Charles Duffy
On Sat, Apr 02, 2011 at 09:44:55PM +0200, Eric Dumazet wrote:
> Le samedi 02 avril 2011 à 21:15 +0200, Willy Tarreau a écrit :
> > Hi Eric,
> >
> > On Sat, Apr 02, 2011 at 08:46:11PM +0200, Cyril Bonté wrote:
> > > Le samedi 2 avril 2011 20:10:48, Eric Dumazet a écrit :
> > > > Le samedi 02 avril 2011 à 20:01 +0200, Cyril Bonté a écrit :
> > > > (...)
> > > > > > if (shutdown(listenfd, SHUT_WR) == 0 &&
> > > > >
> > > > > listen(listenfd, 1024) == 0 &&
> > > > > shutdown(listenfd, SHUT_RD) == 0) {
> > > > >
> > > > > printf("shutdown OK\n");
> > > > >
> > > > > }
> > > > >
> > > > > }
> > > > > exit(0);
> > > > >
> > > > > }
> > > >
> > > > Wow, not clear what this is doing....
> > > >
> > > > for sure the listen() call is not needed ?
> > > >
> > > > And the shutdown(listenfd, SHUT_WR) is clearly useless too.
> > >
> > > Well, I'm not the best one to explain that part but from what i read in the
> > > comments of this part of code, both listen and SHUT_WR are used to detect
> > > errors on various OS (OpenBSD, Solaris, ...).
> > >
> > > > I feel you only needed the shutdown(listenfd, SHUT_RD) call.
> > > >
> > > > Why haproxy needs to setup a second listening socket on same port ?
> > >
> > > I simplified the test case, which is far from what haproxy do (just forgot to
> > > explain the real behaviour).
> > > To reload the configuration, a new haproxy process is launched, sending a
> > > signal to the previous one and asking it to free the ports for a while (the
> > > shutdown part in the test). The new process then tries to bind the ports,
> > > which worked until 2.6.38 (if an error occurs, a new signal is sent to the
> > > previous process to listen to its sockets again).
> >
> > Indeed, here's what normally happens when haproxy reloads.
> >
> > New process is loaded with a new config. Once the config correctly parses,
> > it sends a signal to the previous process asking it to temporarily release
> > its listening ports so that the new one can bind, hence the shutdown(SHUT_RD)
> > performed in the old process.
> >
> > Then the new process can grab the ports and listen to them. Once that's OK,
> > it sends another signal to the old process telling it it can go away. But
> > if the new process failed to completely start (eg: could not grab one port),
> > then it sends a third signal to the old process asking it to rebind the port
> > and serve them again, and the new one dies with an error.
> >
> > That way, the service is never interrupted even if the new config fails
> > late, because the old process has the ability to rebind to the port it
> > temporarily released.
> >
> > Now with 2.6.38, as Cyril diagnosed it, the new bind() fails when the
> > old process has just performed its shutdown(SHUT_RD), preventing the
> > new process from binding to the ports until the old process has
> > definitely closed them.
> >
> > The behaviour is very useful, because the old process might have lost
> > its privileges, it will not have to rebind to the socket, just listen
> > on it again since it is never closed.
> >
> > This is quite embarrassing, because this code used to work for the
> > last 10 years, at least since kernel 2.2, and maybe even 2.0, I don't
> > remember.
> >
> > I'm not sure what the original intent of the patch was, not what was
> > the reported issue, but maybe we could find a way to both fix the
> > reported issue (if any) and restore the old behaviour in order not
> > to break existing programs.
> >
> > Best regards,
> > Willy
> >
>
> I wish it was that simple....
>
> http://www.spinics.net/lists/netdev/msg151551.html
What a mess :-(
I've been used to actively bind() to source ip:ports when dealing with that
number of connections, because I've long noticed that the port auto-selection
did not work once all source ports were used on at least one IP address.
Managing a source port list in user space is no big deal when you have to
support hundreds of thousands of connections, as there are harder issues
to deal with :-/
> Is Cyril program running OK on FreeBsd ?
I don't think so, as from memories, both FreeBSD and OpenBSD fail
on isten() after a shutdown(SHUT_RD), hence the strange looking
shut+listen+shut sequence you noticed (in order to detect whether
listen will work again or not).
I'm just wondering the relation between the SHUT_RD listen sockets
that we catch by accident and the issue that was the initial goal
of the patch regarding outgoing sockets. All this is not very clear
to me yet.
Regards,
Willy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
2011-04-02 20:37 ` Willy Tarreau
@ 2011-04-02 21:00 ` Cyril Bonté
2011-04-02 21:18 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Cyril Bonté @ 2011-04-02 21:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: Willy Tarreau, netdev, Daniel Baluta, Gaspar Chilingarov,
Charles Duffy
Le samedi 2 avril 2011 22:37:27, Willy Tarreau a écrit :
> On Sat, Apr 02, 2011 at 09:44:55PM +0200, Eric Dumazet wrote:
> > Is Cyril program running OK on FreeBsd ?
>
> I don't think so, as from memories, both FreeBSD and OpenBSD fail
> on isten() after a shutdown(SHUT_RD), hence the strange looking
> shut+listen+shut sequence you noticed (in order to detect whether
> listen will work again or not).
Well, I've just tested it on FreeBSD 8.1.
As Willy said, the listen() fails but what I observe is that as soon as
shutdown(SHUT_RW) is called, it is possible to bind a new socket on the same
port. A modified version of the program to sleep after the shutdown shows that
launching 3 processes in parallel (delayed to let them bind then shutdown)
will give 3 connections in CLOSE state.
--
Cyril Bonté
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tcp: disallow bind() to reuse addr/port regression in 2.6.38
2011-04-02 21:00 ` Cyril Bonté
@ 2011-04-02 21:18 ` Eric Dumazet
0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2011-04-02 21:18 UTC (permalink / raw)
To: Cyril Bonté
Cc: Willy Tarreau, netdev, Daniel Baluta, Gaspar Chilingarov,
Charles Duffy
Le samedi 02 avril 2011 à 23:00 +0200, Cyril Bonté a écrit :
> Le samedi 2 avril 2011 22:37:27, Willy Tarreau a écrit :
> > On Sat, Apr 02, 2011 at 09:44:55PM +0200, Eric Dumazet wrote:
> > > Is Cyril program running OK on FreeBsd ?
> >
> > I don't think so, as from memories, both FreeBSD and OpenBSD fail
> > on isten() after a shutdown(SHUT_RD), hence the strange looking
> > shut+listen+shut sequence you noticed (in order to detect whether
> > listen will work again or not).
>
> Well, I've just tested it on FreeBSD 8.1.
> As Willy said, the listen() fails but what I observe is that as soon as
> shutdown(SHUT_RW) is called, it is possible to bind a new socket on the same
> port. A modified version of the program to sleep after the shutdown shows that
> launching 3 processes in parallel (delayed to let them bind then shutdown)
> will give 3 connections in CLOSE state.
>
Yes, but as soon as shutdown(SHUT_RDWR) is called on socket fd1, is this
same socket reusable ?
Maybe the only possible action is a close(fd1), and socket not any more
bound.
Man page on shutdown() is a bit silent, and makes sense for non
listening sockets.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-04-02 21:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-02 18:01 tcp: disallow bind() to reuse addr/port regression in 2.6.38 Cyril Bonté
2011-04-02 18:10 ` Eric Dumazet
2011-04-02 18:46 ` Cyril Bonté
2011-04-02 19:15 ` Willy Tarreau
2011-04-02 19:44 ` Eric Dumazet
2011-04-02 20:37 ` Willy Tarreau
2011-04-02 21:00 ` Cyril Bonté
2011-04-02 21:18 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox