* listen(2) backlog changes in or around Linux 3.1? @ 2012-10-12 23:40 enh 2012-10-15 17:12 ` Venkat Venkatsubra 0 siblings, 1 reply; 20+ messages in thread From: enh @ 2012-10-12 23:40 UTC (permalink / raw) To: netdev i used to use the following hack to unit test connect timeouts: i'd call listen(2) on a socket and then deliberately connect (backlog + 3) sockets without accept(2)ing any of the connections. (why 3? because Stevens told me so, and experiment backed him up. see figure 4.10 in his UNIX Network Programming.) with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next connect(2) to the same loopback port would hang indefinitely. i could even unblock the connect by calling accept(2) in another thread. this was awesome for testing. in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no longer works. it doesn't seem to be as simple as "the constant is no longer 3". my tests are now flaky. sometimes they work like they used to, and sometimes an extra connect(2) will succeed. (or, if i'm in non-blocking mode, my poll(2) will return with the non-blocking socket that's trying to connect now ready.) i'm guessing if this changed in 3.1 and is still changed in 3.4, whatever's changed wasn't an accident. but i haven't been able to find the right search terms to RTFM. i also finally got around to grepping the kernel for the "+ 3", but wasn't able to find that. (so i'd be interested to know where the old behavior came from too.) my least worst workaround at the moment is to use one of RFC5737's test networks, but that requires that the device have a network connection, otherwise my connect(2)s fail immediately with ENETUNREACH, which is no use to me. also, unlike my old trick, i've got no way to suddenly "unblock" a slow connect(2) (this is useful for unit testing the code that does the poll(2) part of the usual connect-with-timeout implementation). https://android-review.googlesource.com/#/c/44563/ hopefully someone here can shed some light on this? ideally someone will have a workaround as good as my old trick. i realize i was relying on undocumented behavior, and i'm happy to have to check /proc/version and behave appropriately, but i'd really like a way to keep my unit tests! thanks, elliott ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-12 23:40 listen(2) backlog changes in or around Linux 3.1? enh @ 2012-10-15 17:12 ` Venkat Venkatsubra 2012-10-15 17:26 ` enh 0 siblings, 1 reply; 20+ messages in thread From: Venkat Venkatsubra @ 2012-10-15 17:12 UTC (permalink / raw) To: enh; +Cc: netdev On 10/12/2012 6:40 PM, enh wrote: > i used to use the following hack to unit test connect timeouts: i'd > call listen(2) on a socket and then deliberately connect (backlog + 3) > sockets without accept(2)ing any of the connections. (why 3? because > Stevens told me so, and experiment backed him up. see figure 4.10 in > his UNIX Network Programming.) > > with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next > connect(2) to the same loopback port would hang indefinitely. i could > even unblock the connect by calling accept(2) in another thread. this > was awesome for testing. > > in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no > longer works. it doesn't seem to be as simple as "the constant is no > longer 3". my tests are now flaky. sometimes they work like they used > to, and sometimes an extra connect(2) will succeed. (or, if i'm in > non-blocking mode, my poll(2) will return with the non-blocking socket > that's trying to connect now ready.) > > i'm guessing if this changed in 3.1 and is still changed in 3.4, > whatever's changed wasn't an accident. but i haven't been able to find > the right search terms to RTFM. i also finally got around to grepping > the kernel for the "+ 3", but wasn't able to find that. (so i'd be > interested to know where the old behavior came from too.) > > my least worst workaround at the moment is to use one of RFC5737's > test networks, but that requires that the device have a network > connection, otherwise my connect(2)s fail immediately with > ENETUNREACH, which is no use to me. also, unlike my old trick, i've > got no way to suddenly "unblock" a slow connect(2) (this is useful for > unit testing the code that does the poll(2) part of the usual > connect-with-timeout implementation). > https://android-review.googlesource.com/#/c/44563/ > > hopefully someone here can shed some light on this? ideally someone > will have a workaround as good as my old trick. i realize i was > relying on undocumented behavior, and i'm happy to have to check > /proc/version and behave appropriately, but i'd really like a way to > keep my unit tests! > > thanks, > elliott > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Elliott, In BSD I think the backlog used to be reset to 3/2 times that passed by the user. So, 2 becomes 3. Probably the 1/2 times increase was to accommodate the ones in partial/incomplete queue. In Linux is it possible you were getting the same behavior before the below commit ? Since the check used to be "backlog+1" a 2 will behave as 3 ? commit 8488df894d05d6fa41c2bd298c335f944bb0e401 Author: Wei Dong <weid@np.css.fujitsu.com> Date: Fri Mar 2 12:37:26 2007 -0800 [NET]: Fix bugs in "Whether sock accept queue is full" checking when I use linux TCP socket, and find there is a bug in function sk_acceptq_is_full(). When a new SYN comes, TCP module first checks its validation. If valid, send SYN,ACK to the client and add the sock to the syn hash table. Next time if received the valid ACK for SYN,ACK from the client. server will accept this connection and increase the sk->sk_ack_backlog -- which is done in function tcp_check_req().We check wether acceptq is full in function tcp_v4_syn_recv_sock(). Consider an example: After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept() system call is not invoked now. 1. 1st connection comes. invoke sk_acceptq_is_full(). sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk->sk_ack_backlog 2. 2nd connection comes. invoke sk_acceptq_is_full(). sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk->sk_ack_backlog 3. 3rd connection comes. invoke sk_acceptq_is_full(). sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse this connection. I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1 but now it can accept 2 connections. Signed-off-by: Wei Dong <weid@np.css.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Venkat ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-15 17:12 ` Venkat Venkatsubra @ 2012-10-15 17:26 ` enh 2012-10-15 21:30 ` Venkat Venkatsubra 2012-10-16 23:31 ` enh 0 siblings, 2 replies; 20+ messages in thread From: enh @ 2012-10-15 17:26 UTC (permalink / raw) To: Venkat Venkatsubra; +Cc: netdev On Mon, Oct 15, 2012 at 10:12 AM, Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> wrote: > On 10/12/2012 6:40 PM, enh wrote: >> >> i used to use the following hack to unit test connect timeouts: i'd >> call listen(2) on a socket and then deliberately connect (backlog + 3) >> sockets without accept(2)ing any of the connections. (why 3? because >> Stevens told me so, and experiment backed him up. see figure 4.10 in >> his UNIX Network Programming.) >> >> with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next >> connect(2) to the same loopback port would hang indefinitely. i could >> even unblock the connect by calling accept(2) in another thread. this >> was awesome for testing. >> >> in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no >> longer works. it doesn't seem to be as simple as "the constant is no >> longer 3". my tests are now flaky. sometimes they work like they used >> to, and sometimes an extra connect(2) will succeed. (or, if i'm in >> non-blocking mode, my poll(2) will return with the non-blocking socket >> that's trying to connect now ready.) >> >> i'm guessing if this changed in 3.1 and is still changed in 3.4, >> whatever's changed wasn't an accident. but i haven't been able to find >> the right search terms to RTFM. i also finally got around to grepping >> the kernel for the "+ 3", but wasn't able to find that. (so i'd be >> interested to know where the old behavior came from too.) >> >> my least worst workaround at the moment is to use one of RFC5737's >> test networks, but that requires that the device have a network >> connection, otherwise my connect(2)s fail immediately with >> ENETUNREACH, which is no use to me. also, unlike my old trick, i've >> got no way to suddenly "unblock" a slow connect(2) (this is useful for >> unit testing the code that does the poll(2) part of the usual >> connect-with-timeout implementation). >> https://android-review.googlesource.com/#/c/44563/ >> >> hopefully someone here can shed some light on this? ideally someone >> will have a workaround as good as my old trick. i realize i was >> relying on undocumented behavior, and i'm happy to have to check >> /proc/version and behave appropriately, but i'd really like a way to >> keep my unit tests! >> >> thanks, >> elliott >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Hi Elliott, > > In BSD I think the backlog used to be reset to 3/2 times that passed by the > user. So, 2 becomes 3. > Probably the 1/2 times increase was to accommodate the ones in > partial/incomplete queue. > In Linux is it possible you were getting the same behavior before the below > commit ? > Since the check used to be "backlog+1" a 2 will behave as 3 ? i don't think so, because with <= 3.0 kernels i used to have a backlog of 1 and be able to make _4_ connections before my next connect would hang. but this > to >= change is at least something for me to investigate... > commit 8488df894d05d6fa41c2bd298c335f944bb0e401 > Author: Wei Dong <weid@np.css.fujitsu.com> > Date: Fri Mar 2 12:37:26 2007 -0800 > > [NET]: Fix bugs in "Whether sock accept queue is full" checking > > when I use linux TCP socket, and find there is a bug in function > sk_acceptq_is_full(). > > When a new SYN comes, TCP module first checks its validation. If > valid, > send SYN,ACK to the client and add the sock to the syn hash table. Next > time if received the valid ACK for SYN,ACK from the client. server will > accept this connection and increase the sk->sk_ack_backlog -- which is > done in function tcp_check_req().We check wether acceptq is full in > function tcp_v4_syn_recv_sock(). > > Consider an example: > > After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to > 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept() > system call is not invoked now. > > 1. 1st connection comes. invoke sk_acceptq_is_full(). > sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept > this connection. > Increase the sk->sk_ack_backlog > 2. 2nd connection comes. invoke sk_acceptq_is_full(). > sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept > this connection. > Increase the sk->sk_ack_backlog > 3. 3rd connection comes. invoke sk_acceptq_is_full(). > sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. > Refuse this connection. > > I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1 > but now it can accept 2 connections. > > Signed-off-by: Wei Dong <weid@np.css.fujitsu.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > Venkat ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-15 17:26 ` enh @ 2012-10-15 21:30 ` Venkat Venkatsubra 2012-10-16 23:31 ` enh 1 sibling, 0 replies; 20+ messages in thread From: Venkat Venkatsubra @ 2012-10-15 21:30 UTC (permalink / raw) To: enh; +Cc: netdev On 10/15/2012 12:26 PM, enh wrote: > On Mon, Oct 15, 2012 at 10:12 AM, Venkat Venkatsubra > <venkat.x.venkatsubra@oracle.com> wrote: >> On 10/12/2012 6:40 PM, enh wrote: >>> i used to use the following hack to unit test connect timeouts: i'd >>> call listen(2) on a socket and then deliberately connect (backlog + 3) >>> sockets without accept(2)ing any of the connections. (why 3? because >>> Stevens told me so, and experiment backed him up. see figure 4.10 in >>> his UNIX Network Programming.) >>> >>> with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next >>> connect(2) to the same loopback port would hang indefinitely. i could >>> even unblock the connect by calling accept(2) in another thread. this >>> was awesome for testing. >>> >>> in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no >>> longer works. it doesn't seem to be as simple as "the constant is no >>> longer 3". my tests are now flaky. sometimes they work like they used >>> to, and sometimes an extra connect(2) will succeed. (or, if i'm in >>> non-blocking mode, my poll(2) will return with the non-blocking socket >>> that's trying to connect now ready.) >>> >>> i'm guessing if this changed in 3.1 and is still changed in 3.4, >>> whatever's changed wasn't an accident. but i haven't been able to find >>> the right search terms to RTFM. i also finally got around to grepping >>> the kernel for the "+ 3", but wasn't able to find that. (so i'd be >>> interested to know where the old behavior came from too.) >>> >>> my least worst workaround at the moment is to use one of RFC5737's >>> test networks, but that requires that the device have a network >>> connection, otherwise my connect(2)s fail immediately with >>> ENETUNREACH, which is no use to me. also, unlike my old trick, i've >>> got no way to suddenly "unblock" a slow connect(2) (this is useful for >>> unit testing the code that does the poll(2) part of the usual >>> connect-with-timeout implementation). >>> https://android-review.googlesource.com/#/c/44563/ >>> >>> hopefully someone here can shed some light on this? ideally someone >>> will have a workaround as good as my old trick. i realize i was >>> relying on undocumented behavior, and i'm happy to have to check >>> /proc/version and behave appropriately, but i'd really like a way to >>> keep my unit tests! >>> >>> thanks, >>> elliott >>> -- >>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Hi Elliott, >> >> In BSD I think the backlog used to be reset to 3/2 times that passed by the >> user. So, 2 becomes 3. >> Probably the 1/2 times increase was to accommodate the ones in >> partial/incomplete queue. >> In Linux is it possible you were getting the same behavior before the below >> commit ? >> Since the check used to be "backlog+1" a 2 will behave as 3 ? > i don't think so, because with<= 3.0 kernels i used to have a backlog > of 1 and be able to make _4_ connections before my next connect would > hang. but this> to>= change is at least something for me to > investigate... > >> commit 8488df894d05d6fa41c2bd298c335f944bb0e401 >> Author: Wei Dong<weid@np.css.fujitsu.com> >> Date: Fri Mar 2 12:37:26 2007 -0800 >> >> [NET]: Fix bugs in "Whether sock accept queue is full" checking >> >> when I use linux TCP socket, and find there is a bug in function >> sk_acceptq_is_full(). >> >> When a new SYN comes, TCP module first checks its validation. If >> valid, >> send SYN,ACK to the client and add the sock to the syn hash table. Next >> time if received the valid ACK for SYN,ACK from the client. server will >> accept this connection and increase the sk->sk_ack_backlog -- which is >> done in function tcp_check_req().We check wether acceptq is full in >> function tcp_v4_syn_recv_sock(). >> >> Consider an example: >> >> After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to >> 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept() >> system call is not invoked now. >> >> 1. 1st connection comes. invoke sk_acceptq_is_full(). >> sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept >> this connection. >> Increase the sk->sk_ack_backlog >> 2. 2nd connection comes. invoke sk_acceptq_is_full(). >> sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept >> this connection. >> Increase the sk->sk_ack_backlog >> 3. 3rd connection comes. invoke sk_acceptq_is_full(). >> sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. >> Refuse this connection. >> >> I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1 >> but now it can accept 2 connections. >> >> Signed-off-by: Wei Dong<weid@np.css.fujitsu.com> >> Signed-off-by: David S. Miller<davem@davemloft.net> >> >> Venkat > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Ignore my sk_acceptq_is_full() > and >= changes. That commit was reverted back by this one : commit 64a146513f8f12ba204b7bf5cb7e9505594ead42 Author: David S. Miller <davem@sunset.davemloft.net> Date: Tue Mar 6 11:21:05 2007 -0800 [NET]: Revert incorrect accept queue backlog changes. This reverts two changes: 8488df894d05d6fa41c2bd298c335f944bb0e401 248f06726e866942b3d8ca8f411f9067713b7ff8 A backlog value of N really does mean allow "N + 1" connections to queue to a listening socket. This allows one to specify "0" as the backlog and still get 1 connection. Noticed by Gerrit Renker and Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net> Venkat ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-15 17:26 ` enh 2012-10-15 21:30 ` Venkat Venkatsubra @ 2012-10-16 23:31 ` enh 2012-10-18 16:00 ` Venkat Venkatsubra 2012-10-18 16:54 ` Eric Dumazet 1 sibling, 2 replies; 20+ messages in thread From: enh @ 2012-10-16 23:31 UTC (permalink / raw) To: netdev boiling things down to a short C++ program, i see that i can reproduce the behavior even on 2.6 kernels. if i run this, i see 4 connections immediately (3 + 1, as i'd expect)... but then about 10s later i see another 2. and every few seconds after that, i see another 2. i've let this run until i have hundreds of connect(2) calls that have returned, despite my small listen(2) backlog and the fact that i'm not accept(2)ing. so i guess the only thing that's changed with newer kernels is timing (hell, since i only see newer kernels on newer hardware, it might just be a hardware thing). and clearly i don't understand what the listen(2) backlog means any more. #include <netinet/ip.h> #include <netinet/tcp.h> #include <sys/types.h> #include <sys/socket.h> #include <iostream> #include <stdlib.h> #include <string.h> #include <errno.h> void dump_ti(int fd) { tcp_info ti; socklen_t tcp_info_length = sizeof(tcp_info); int rc = getsockopt(fd, SOL_IP, TCP_INFO, &ti, &tcp_info_length); if (rc == -1) { std::cout << "getsockopt rc " << rc << ": " << strerror(errno) << "\n"; return; } std::cout << "ti.tcpi_unacked=" << ti.tcpi_unacked << "\n"; std::cout << "ti.tcpi_sacked=" << ti.tcpi_sacked << "\n"; } void connect_to(sockaddr_in& sa) { int s = socket(AF_INET, SOCK_STREAM, 0); if (s == -1) { abort(); } int rc = connect(s, (sockaddr*) &sa, sizeof(sockaddr_in)); std::cout << "connect = " << rc << "\n"; } int main() { int ss = socket(AF_INET, SOCK_STREAM, 0); std::cout << "socket fd " << ss << "\n"; sockaddr_in sa; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = htonl(INADDR_ANY); sa.sin_port = htons(9877); int rc = bind(ss, (sockaddr*) &sa, sizeof(sa)); std::cout << "bind rc " << rc << ": " << strerror(errno) << "\n"; std::cout << "bind port " << sa.sin_port << "\n"; rc = listen(ss, 1); std::cout << "listen rc " << rc << ": " << strerror(errno) << "\n"; dump_ti(ss); while (true) { connect_to(sa); dump_ti(ss); } return 0; } On Mon, Oct 15, 2012 at 10:26 AM, enh <enh@google.com> wrote: > On Mon, Oct 15, 2012 at 10:12 AM, Venkat Venkatsubra > <venkat.x.venkatsubra@oracle.com> wrote: >> On 10/12/2012 6:40 PM, enh wrote: >>> >>> i used to use the following hack to unit test connect timeouts: i'd >>> call listen(2) on a socket and then deliberately connect (backlog + 3) >>> sockets without accept(2)ing any of the connections. (why 3? because >>> Stevens told me so, and experiment backed him up. see figure 4.10 in >>> his UNIX Network Programming.) >>> >>> with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next >>> connect(2) to the same loopback port would hang indefinitely. i could >>> even unblock the connect by calling accept(2) in another thread. this >>> was awesome for testing. >>> >>> in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no >>> longer works. it doesn't seem to be as simple as "the constant is no >>> longer 3". my tests are now flaky. sometimes they work like they used >>> to, and sometimes an extra connect(2) will succeed. (or, if i'm in >>> non-blocking mode, my poll(2) will return with the non-blocking socket >>> that's trying to connect now ready.) >>> >>> i'm guessing if this changed in 3.1 and is still changed in 3.4, >>> whatever's changed wasn't an accident. but i haven't been able to find >>> the right search terms to RTFM. i also finally got around to grepping >>> the kernel for the "+ 3", but wasn't able to find that. (so i'd be >>> interested to know where the old behavior came from too.) >>> >>> my least worst workaround at the moment is to use one of RFC5737's >>> test networks, but that requires that the device have a network >>> connection, otherwise my connect(2)s fail immediately with >>> ENETUNREACH, which is no use to me. also, unlike my old trick, i've >>> got no way to suddenly "unblock" a slow connect(2) (this is useful for >>> unit testing the code that does the poll(2) part of the usual >>> connect-with-timeout implementation). >>> https://android-review.googlesource.com/#/c/44563/ >>> >>> hopefully someone here can shed some light on this? ideally someone >>> will have a workaround as good as my old trick. i realize i was >>> relying on undocumented behavior, and i'm happy to have to check >>> /proc/version and behave appropriately, but i'd really like a way to >>> keep my unit tests! >>> >>> thanks, >>> elliott >>> -- >>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> Hi Elliott, >> >> In BSD I think the backlog used to be reset to 3/2 times that passed by the >> user. So, 2 becomes 3. >> Probably the 1/2 times increase was to accommodate the ones in >> partial/incomplete queue. >> In Linux is it possible you were getting the same behavior before the below >> commit ? >> Since the check used to be "backlog+1" a 2 will behave as 3 ? > > i don't think so, because with <= 3.0 kernels i used to have a backlog > of 1 and be able to make _4_ connections before my next connect would > hang. but this > to >= change is at least something for me to > investigate... > >> commit 8488df894d05d6fa41c2bd298c335f944bb0e401 >> Author: Wei Dong <weid@np.css.fujitsu.com> >> Date: Fri Mar 2 12:37:26 2007 -0800 >> >> [NET]: Fix bugs in "Whether sock accept queue is full" checking >> >> when I use linux TCP socket, and find there is a bug in function >> sk_acceptq_is_full(). >> >> When a new SYN comes, TCP module first checks its validation. If >> valid, >> send SYN,ACK to the client and add the sock to the syn hash table. Next >> time if received the valid ACK for SYN,ACK from the client. server will >> accept this connection and increase the sk->sk_ack_backlog -- which is >> done in function tcp_check_req().We check wether acceptq is full in >> function tcp_v4_syn_recv_sock(). >> >> Consider an example: >> >> After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to >> 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept() >> system call is not invoked now. >> >> 1. 1st connection comes. invoke sk_acceptq_is_full(). >> sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept >> this connection. >> Increase the sk->sk_ack_backlog >> 2. 2nd connection comes. invoke sk_acceptq_is_full(). >> sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept >> this connection. >> Increase the sk->sk_ack_backlog >> 3. 3rd connection comes. invoke sk_acceptq_is_full(). >> sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. >> Refuse this connection. >> >> I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1 >> but now it can accept 2 connections. >> >> Signed-off-by: Wei Dong <weid@np.css.fujitsu.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> Venkat ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-16 23:31 ` enh @ 2012-10-18 16:00 ` Venkat Venkatsubra 2012-10-18 16:53 ` Venkat Venkatsubra 2012-10-18 16:54 ` Eric Dumazet 1 sibling, 1 reply; 20+ messages in thread From: Venkat Venkatsubra @ 2012-10-18 16:00 UTC (permalink / raw) To: enh; +Cc: netdev Hi Elliott, I see the same behavior with your test program. The connect() keeps succeeding even though accept() is not performed. It pauses after 4 connections for a while and then periodically keeps adding few (2 I think). But the server side end points are terminated too. You will see only the first 2 sessions on the server side. If you modify your test program to say read or poll the sockets you should get a termination notification on them I think . The behavior overall looks fine in my opinion. But it could be a change of behavior for your test program. Venkat On 10/16/2012 6:31 PM, enh wrote: > boiling things down to a short C++ program, i see that i can reproduce > the behavior even on 2.6 kernels. if i run this, i see 4 connections > immediately (3 + 1, as i'd expect)... but then about 10s later i see > another 2. and every few seconds after that, i see another 2. i've let > this run until i have hundreds of connect(2) calls that have returned, > despite my small listen(2) backlog and the fact that i'm not > accept(2)ing. > > so i guess the only thing that's changed with newer kernels is timing > (hell, since i only see newer kernels on newer hardware, it might just > be a hardware thing). > > and clearly i don't understand what the listen(2) backlog means any more. > > #include<netinet/ip.h> > #include<netinet/tcp.h> > #include<sys/types.h> > #include<sys/socket.h> > #include<iostream> > #include<stdlib.h> > #include<string.h> > #include<errno.h> > > void dump_ti(int fd) { > tcp_info ti; > socklen_t tcp_info_length = sizeof(tcp_info); > int rc = getsockopt(fd, SOL_IP, TCP_INFO,&ti,&tcp_info_length); > if (rc == -1) { > std::cout<< "getsockopt rc "<< rc<< ": "<< strerror(errno)<< "\n"; > return; > } > > std::cout<< "ti.tcpi_unacked="<< ti.tcpi_unacked<< "\n"; > std::cout<< "ti.tcpi_sacked="<< ti.tcpi_sacked<< "\n"; > } > > void connect_to(sockaddr_in& sa) { > int s = socket(AF_INET, SOCK_STREAM, 0); > if (s == -1) { > abort(); > } > > int rc = connect(s, (sockaddr*)&sa, sizeof(sockaddr_in)); > std::cout<< "connect = "<< rc<< "\n"; > } > > int main() { > int ss = socket(AF_INET, SOCK_STREAM, 0); > std::cout<< "socket fd "<< ss<< "\n"; > > sockaddr_in sa; > memset(&sa, 0, sizeof(sa)); > sa.sin_family = AF_INET; > sa.sin_addr.s_addr = htonl(INADDR_ANY); > sa.sin_port = htons(9877); > int rc = bind(ss, (sockaddr*)&sa, sizeof(sa)); > std::cout<< "bind rc "<< rc<< ": "<< strerror(errno)<< "\n"; > std::cout<< "bind port "<< sa.sin_port<< "\n"; > > rc = listen(ss, 1); > std::cout<< "listen rc "<< rc<< ": "<< strerror(errno)<< "\n"; > dump_ti(ss); > > while (true) { > connect_to(sa); > dump_ti(ss); > } > > return 0; > } > > > On Mon, Oct 15, 2012 at 10:26 AM, enh<enh@google.com> wrote: >> On Mon, Oct 15, 2012 at 10:12 AM, Venkat Venkatsubra >> <venkat.x.venkatsubra@oracle.com> wrote: >>> On 10/12/2012 6:40 PM, enh wrote: >>>> i used to use the following hack to unit test connect timeouts: i'd >>>> call listen(2) on a socket and then deliberately connect (backlog + 3) >>>> sockets without accept(2)ing any of the connections. (why 3? because >>>> Stevens told me so, and experiment backed him up. see figure 4.10 in >>>> his UNIX Network Programming.) >>>> >>>> with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next >>>> connect(2) to the same loopback port would hang indefinitely. i could >>>> even unblock the connect by calling accept(2) in another thread. this >>>> was awesome for testing. >>>> >>>> in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no >>>> longer works. it doesn't seem to be as simple as "the constant is no >>>> longer 3". my tests are now flaky. sometimes they work like they used >>>> to, and sometimes an extra connect(2) will succeed. (or, if i'm in >>>> non-blocking mode, my poll(2) will return with the non-blocking socket >>>> that's trying to connect now ready.) >>>> >>>> i'm guessing if this changed in 3.1 and is still changed in 3.4, >>>> whatever's changed wasn't an accident. but i haven't been able to find >>>> the right search terms to RTFM. i also finally got around to grepping >>>> the kernel for the "+ 3", but wasn't able to find that. (so i'd be >>>> interested to know where the old behavior came from too.) >>>> >>>> my least worst workaround at the moment is to use one of RFC5737's >>>> test networks, but that requires that the device have a network >>>> connection, otherwise my connect(2)s fail immediately with >>>> ENETUNREACH, which is no use to me. also, unlike my old trick, i've >>>> got no way to suddenly "unblock" a slow connect(2) (this is useful for >>>> unit testing the code that does the poll(2) part of the usual >>>> connect-with-timeout implementation). >>>> https://android-review.googlesource.com/#/c/44563/ >>>> >>>> hopefully someone here can shed some light on this? ideally someone >>>> will have a workaround as good as my old trick. i realize i was >>>> relying on undocumented behavior, and i'm happy to have to check >>>> /proc/version and behave appropriately, but i'd really like a way to >>>> keep my unit tests! >>>> >>>> thanks, >>>> elliott >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Hi Elliott, >>> >>> In BSD I think the backlog used to be reset to 3/2 times that passed by the >>> user. So, 2 becomes 3. >>> Probably the 1/2 times increase was to accommodate the ones in >>> partial/incomplete queue. >>> In Linux is it possible you were getting the same behavior before the below >>> commit ? >>> Since the check used to be "backlog+1" a 2 will behave as 3 ? >> i don't think so, because with<= 3.0 kernels i used to have a backlog >> of 1 and be able to make _4_ connections before my next connect would >> hang. but this> to>= change is at least something for me to >> investigate... >> >>> commit 8488df894d05d6fa41c2bd298c335f944bb0e401 >>> Author: Wei Dong<weid@np.css.fujitsu.com> >>> Date: Fri Mar 2 12:37:26 2007 -0800 >>> >>> [NET]: Fix bugs in "Whether sock accept queue is full" checking >>> >>> when I use linux TCP socket, and find there is a bug in function >>> sk_acceptq_is_full(). >>> >>> When a new SYN comes, TCP module first checks its validation. If >>> valid, >>> send SYN,ACK to the client and add the sock to the syn hash table. Next >>> time if received the valid ACK for SYN,ACK from the client. server will >>> accept this connection and increase the sk->sk_ack_backlog -- which is >>> done in function tcp_check_req().We check wether acceptq is full in >>> function tcp_v4_syn_recv_sock(). >>> >>> Consider an example: >>> >>> After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to >>> 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept() >>> system call is not invoked now. >>> >>> 1. 1st connection comes. invoke sk_acceptq_is_full(). >>> sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept >>> this connection. >>> Increase the sk->sk_ack_backlog >>> 2. 2nd connection comes. invoke sk_acceptq_is_full(). >>> sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept >>> this connection. >>> Increase the sk->sk_ack_backlog >>> 3. 3rd connection comes. invoke sk_acceptq_is_full(). >>> sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. >>> Refuse this connection. >>> >>> I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1 >>> but now it can accept 2 connections. >>> >>> Signed-off-by: Wei Dong<weid@np.css.fujitsu.com> >>> Signed-off-by: David S. Miller<davem@davemloft.net> >>> >>> Venkat > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-18 16:00 ` Venkat Venkatsubra @ 2012-10-18 16:53 ` Venkat Venkatsubra 2012-10-18 17:20 ` enh 0 siblings, 1 reply; 20+ messages in thread From: Venkat Venkatsubra @ 2012-10-18 16:53 UTC (permalink / raw) To: enh; +Cc: netdev Correction. I don't see the client side receiving any abort/termination notification. They all remain on ESTABLISHED state on the client side. In tcpdump I don't see a FIN or RST coming from the server for the aborted connections. Venkat On 10/18/2012 11:00 AM, Venkat Venkatsubra wrote: > Hi Elliott, > > I see the same behavior with your test program. > The connect() keeps succeeding even though accept() is not performed. > It pauses after 4 connections for a while and then periodically keeps > adding few (2 I think). > > But the server side end points are terminated too. You will see only > the first 2 sessions on the server side. > If you modify your test program to say read or poll the sockets you > should get a termination notification on them I think . > > The behavior overall looks fine in my opinion. But it could be a > change of behavior for your test program. > > Venkat > > On 10/16/2012 6:31 PM, enh wrote: >> boiling things down to a short C++ program, i see that i can reproduce >> the behavior even on 2.6 kernels. if i run this, i see 4 connections >> immediately (3 + 1, as i'd expect)... but then about 10s later i see >> another 2. and every few seconds after that, i see another 2. i've let >> this run until i have hundreds of connect(2) calls that have returned, >> despite my small listen(2) backlog and the fact that i'm not >> accept(2)ing. >> >> so i guess the only thing that's changed with newer kernels is timing >> (hell, since i only see newer kernels on newer hardware, it might just >> be a hardware thing). >> >> and clearly i don't understand what the listen(2) backlog means any >> more. >> >> #include<netinet/ip.h> >> #include<netinet/tcp.h> >> #include<sys/types.h> >> #include<sys/socket.h> >> #include<iostream> >> #include<stdlib.h> >> #include<string.h> >> #include<errno.h> >> >> void dump_ti(int fd) { >> tcp_info ti; >> socklen_t tcp_info_length = sizeof(tcp_info); >> int rc = getsockopt(fd, SOL_IP, TCP_INFO,&ti,&tcp_info_length); >> if (rc == -1) { >> std::cout<< "getsockopt rc "<< rc<< ": "<< strerror(errno)<< >> "\n"; >> return; >> } >> >> std::cout<< "ti.tcpi_unacked="<< ti.tcpi_unacked<< "\n"; >> std::cout<< "ti.tcpi_sacked="<< ti.tcpi_sacked<< "\n"; >> } >> >> void connect_to(sockaddr_in& sa) { >> int s = socket(AF_INET, SOCK_STREAM, 0); >> if (s == -1) { >> abort(); >> } >> >> int rc = connect(s, (sockaddr*)&sa, sizeof(sockaddr_in)); >> std::cout<< "connect = "<< rc<< "\n"; >> } >> >> int main() { >> int ss = socket(AF_INET, SOCK_STREAM, 0); >> std::cout<< "socket fd "<< ss<< "\n"; >> >> sockaddr_in sa; >> memset(&sa, 0, sizeof(sa)); >> sa.sin_family = AF_INET; >> sa.sin_addr.s_addr = htonl(INADDR_ANY); >> sa.sin_port = htons(9877); >> int rc = bind(ss, (sockaddr*)&sa, sizeof(sa)); >> std::cout<< "bind rc "<< rc<< ": "<< strerror(errno)<< "\n"; >> std::cout<< "bind port "<< sa.sin_port<< "\n"; >> >> rc = listen(ss, 1); >> std::cout<< "listen rc "<< rc<< ": "<< strerror(errno)<< "\n"; >> dump_ti(ss); >> >> while (true) { >> connect_to(sa); >> dump_ti(ss); >> } >> >> return 0; >> } >> >> >> On Mon, Oct 15, 2012 at 10:26 AM, enh<enh@google.com> wrote: >>> On Mon, Oct 15, 2012 at 10:12 AM, Venkat Venkatsubra >>> <venkat.x.venkatsubra@oracle.com> wrote: >>>> On 10/12/2012 6:40 PM, enh wrote: >>>>> i used to use the following hack to unit test connect timeouts: i'd >>>>> call listen(2) on a socket and then deliberately connect (backlog >>>>> + 3) >>>>> sockets without accept(2)ing any of the connections. (why 3? because >>>>> Stevens told me so, and experiment backed him up. see figure 4.10 in >>>>> his UNIX Network Programming.) >>>>> >>>>> with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next >>>>> connect(2) to the same loopback port would hang indefinitely. i could >>>>> even unblock the connect by calling accept(2) in another thread. this >>>>> was awesome for testing. >>>>> >>>>> in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no >>>>> longer works. it doesn't seem to be as simple as "the constant is no >>>>> longer 3". my tests are now flaky. sometimes they work like they used >>>>> to, and sometimes an extra connect(2) will succeed. (or, if i'm in >>>>> non-blocking mode, my poll(2) will return with the non-blocking >>>>> socket >>>>> that's trying to connect now ready.) >>>>> >>>>> i'm guessing if this changed in 3.1 and is still changed in 3.4, >>>>> whatever's changed wasn't an accident. but i haven't been able to >>>>> find >>>>> the right search terms to RTFM. i also finally got around to grepping >>>>> the kernel for the "+ 3", but wasn't able to find that. (so i'd be >>>>> interested to know where the old behavior came from too.) >>>>> >>>>> my least worst workaround at the moment is to use one of RFC5737's >>>>> test networks, but that requires that the device have a network >>>>> connection, otherwise my connect(2)s fail immediately with >>>>> ENETUNREACH, which is no use to me. also, unlike my old trick, i've >>>>> got no way to suddenly "unblock" a slow connect(2) (this is useful >>>>> for >>>>> unit testing the code that does the poll(2) part of the usual >>>>> connect-with-timeout implementation). >>>>> https://android-review.googlesource.com/#/c/44563/ >>>>> >>>>> hopefully someone here can shed some light on this? ideally someone >>>>> will have a workaround as good as my old trick. i realize i was >>>>> relying on undocumented behavior, and i'm happy to have to check >>>>> /proc/version and behave appropriately, but i'd really like a way to >>>>> keep my unit tests! >>>>> >>>>> thanks, >>>>> elliott >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> Hi Elliott, >>>> >>>> In BSD I think the backlog used to be reset to 3/2 times that >>>> passed by the >>>> user. So, 2 becomes 3. >>>> Probably the 1/2 times increase was to accommodate the ones in >>>> partial/incomplete queue. >>>> In Linux is it possible you were getting the same behavior before >>>> the below >>>> commit ? >>>> Since the check used to be "backlog+1" a 2 will behave as 3 ? >>> i don't think so, because with<= 3.0 kernels i used to have a backlog >>> of 1 and be able to make _4_ connections before my next connect would >>> hang. but this> to>= change is at least something for me to >>> investigate... >>> >>>> commit 8488df894d05d6fa41c2bd298c335f944bb0e401 >>>> Author: Wei Dong<weid@np.css.fujitsu.com> >>>> Date: Fri Mar 2 12:37:26 2007 -0800 >>>> >>>> [NET]: Fix bugs in "Whether sock accept queue is full" checking >>>> >>>> when I use linux TCP socket, and find there is a bug in >>>> function >>>> sk_acceptq_is_full(). >>>> >>>> When a new SYN comes, TCP module first checks its >>>> validation. If >>>> valid, >>>> send SYN,ACK to the client and add the sock to the syn hash >>>> table. Next >>>> time if received the valid ACK for SYN,ACK from the client. >>>> server will >>>> accept this connection and increase the sk->sk_ack_backlog -- >>>> which is >>>> done in function tcp_check_req().We check wether acceptq is >>>> full in >>>> function tcp_v4_syn_recv_sock(). >>>> >>>> Consider an example: >>>> >>>> After listen(sockfd, 1) system call, sk->sk_max_ack_backlog >>>> is set to >>>> 1. As we know, sk->sk_ack_backlog is initialized to 0. >>>> Assuming accept() >>>> system call is not invoked now. >>>> >>>> 1. 1st connection comes. invoke sk_acceptq_is_full(). >>>> sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function >>>> return 0 accept >>>> this connection. >>>> Increase the sk->sk_ack_backlog >>>> 2. 2nd connection comes. invoke sk_acceptq_is_full(). >>>> sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function >>>> return 0 accept >>>> this connection. >>>> Increase the sk->sk_ack_backlog >>>> 3. 3rd connection comes. invoke sk_acceptq_is_full(). >>>> sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function >>>> return 1. >>>> Refuse this connection. >>>> >>>> I think it has bugs. after listen system call. >>>> sk->sk_max_ack_backlog=1 >>>> but now it can accept 2 connections. >>>> >>>> Signed-off-by: Wei Dong<weid@np.css.fujitsu.com> >>>> Signed-off-by: David S. Miller<davem@davemloft.net> >>>> >>>> Venkat >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-18 16:53 ` Venkat Venkatsubra @ 2012-10-18 17:20 ` enh 2012-10-19 6:02 ` Vijay Subramanian 0 siblings, 1 reply; 20+ messages in thread From: enh @ 2012-10-18 17:20 UTC (permalink / raw) To: Venkat Venkatsubra; +Cc: netdev On Thu, Oct 18, 2012 at 9:53 AM, Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> wrote: > Correction. I don't see the client side receiving any abort/termination > notification. > They all remain on ESTABLISHED state on the client side. yeah, that's what i see with netstat -t too. in the meantime i'm working around this by connecting to one of RFC5737's test networks (https://android-review.googlesource.com/#/c/44563/), but i'd love to at least understand what's going on here, even if it's just that i have a fundamental misunderstanding of what the listen backlog is supposed to mean. > In tcpdump I don't see a FIN or RST coming from the server for the aborted > connections. > > Venkat > > > On 10/18/2012 11:00 AM, Venkat Venkatsubra wrote: >> >> Hi Elliott, >> >> I see the same behavior with your test program. >> The connect() keeps succeeding even though accept() is not performed. >> It pauses after 4 connections for a while and then periodically keeps >> adding few (2 I think). >> >> But the server side end points are terminated too. You will see only the >> first 2 sessions on the server side. >> If you modify your test program to say read or poll the sockets you should >> get a termination notification on them I think . >> >> The behavior overall looks fine in my opinion. But it could be a change >> of behavior for your test program. >> >> Venkat >> >> On 10/16/2012 6:31 PM, enh wrote: >>> >>> boiling things down to a short C++ program, i see that i can reproduce >>> the behavior even on 2.6 kernels. if i run this, i see 4 connections >>> immediately (3 + 1, as i'd expect)... but then about 10s later i see >>> another 2. and every few seconds after that, i see another 2. i've let >>> this run until i have hundreds of connect(2) calls that have returned, >>> despite my small listen(2) backlog and the fact that i'm not >>> accept(2)ing. >>> >>> so i guess the only thing that's changed with newer kernels is timing >>> (hell, since i only see newer kernels on newer hardware, it might just >>> be a hardware thing). >>> >>> and clearly i don't understand what the listen(2) backlog means any more. >>> >>> #include<netinet/ip.h> >>> #include<netinet/tcp.h> >>> #include<sys/types.h> >>> #include<sys/socket.h> >>> #include<iostream> >>> #include<stdlib.h> >>> #include<string.h> >>> #include<errno.h> >>> >>> void dump_ti(int fd) { >>> tcp_info ti; >>> socklen_t tcp_info_length = sizeof(tcp_info); >>> int rc = getsockopt(fd, SOL_IP, TCP_INFO,&ti,&tcp_info_length); >>> if (rc == -1) { >>> std::cout<< "getsockopt rc "<< rc<< ": "<< strerror(errno)<< >>> "\n"; >>> return; >>> } >>> >>> std::cout<< "ti.tcpi_unacked="<< ti.tcpi_unacked<< "\n"; >>> std::cout<< "ti.tcpi_sacked="<< ti.tcpi_sacked<< "\n"; >>> } >>> >>> void connect_to(sockaddr_in& sa) { >>> int s = socket(AF_INET, SOCK_STREAM, 0); >>> if (s == -1) { >>> abort(); >>> } >>> >>> int rc = connect(s, (sockaddr*)&sa, sizeof(sockaddr_in)); >>> std::cout<< "connect = "<< rc<< "\n"; >>> } >>> >>> int main() { >>> int ss = socket(AF_INET, SOCK_STREAM, 0); >>> std::cout<< "socket fd "<< ss<< "\n"; >>> >>> sockaddr_in sa; >>> memset(&sa, 0, sizeof(sa)); >>> sa.sin_family = AF_INET; >>> sa.sin_addr.s_addr = htonl(INADDR_ANY); >>> sa.sin_port = htons(9877); >>> int rc = bind(ss, (sockaddr*)&sa, sizeof(sa)); >>> std::cout<< "bind rc "<< rc<< ": "<< strerror(errno)<< "\n"; >>> std::cout<< "bind port "<< sa.sin_port<< "\n"; >>> >>> rc = listen(ss, 1); >>> std::cout<< "listen rc "<< rc<< ": "<< strerror(errno)<< "\n"; >>> dump_ti(ss); >>> >>> while (true) { >>> connect_to(sa); >>> dump_ti(ss); >>> } >>> >>> return 0; >>> } >>> >>> >>> On Mon, Oct 15, 2012 at 10:26 AM, enh<enh@google.com> wrote: >>>> >>>> On Mon, Oct 15, 2012 at 10:12 AM, Venkat Venkatsubra >>>> <venkat.x.venkatsubra@oracle.com> wrote: >>>>> >>>>> On 10/12/2012 6:40 PM, enh wrote: >>>>>> >>>>>> i used to use the following hack to unit test connect timeouts: i'd >>>>>> call listen(2) on a socket and then deliberately connect (backlog + 3) >>>>>> sockets without accept(2)ing any of the connections. (why 3? because >>>>>> Stevens told me so, and experiment backed him up. see figure 4.10 in >>>>>> his UNIX Network Programming.) >>>>>> >>>>>> with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next >>>>>> connect(2) to the same loopback port would hang indefinitely. i could >>>>>> even unblock the connect by calling accept(2) in another thread. this >>>>>> was awesome for testing. >>>>>> >>>>>> in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no >>>>>> longer works. it doesn't seem to be as simple as "the constant is no >>>>>> longer 3". my tests are now flaky. sometimes they work like they used >>>>>> to, and sometimes an extra connect(2) will succeed. (or, if i'm in >>>>>> non-blocking mode, my poll(2) will return with the non-blocking socket >>>>>> that's trying to connect now ready.) >>>>>> >>>>>> i'm guessing if this changed in 3.1 and is still changed in 3.4, >>>>>> whatever's changed wasn't an accident. but i haven't been able to find >>>>>> the right search terms to RTFM. i also finally got around to grepping >>>>>> the kernel for the "+ 3", but wasn't able to find that. (so i'd be >>>>>> interested to know where the old behavior came from too.) >>>>>> >>>>>> my least worst workaround at the moment is to use one of RFC5737's >>>>>> test networks, but that requires that the device have a network >>>>>> connection, otherwise my connect(2)s fail immediately with >>>>>> ENETUNREACH, which is no use to me. also, unlike my old trick, i've >>>>>> got no way to suddenly "unblock" a slow connect(2) (this is useful for >>>>>> unit testing the code that does the poll(2) part of the usual >>>>>> connect-with-timeout implementation). >>>>>> https://android-review.googlesource.com/#/c/44563/ >>>>>> >>>>>> hopefully someone here can shed some light on this? ideally someone >>>>>> will have a workaround as good as my old trick. i realize i was >>>>>> relying on undocumented behavior, and i'm happy to have to check >>>>>> /proc/version and behave appropriately, but i'd really like a way to >>>>>> keep my unit tests! >>>>>> >>>>>> thanks, >>>>>> elliott >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> Hi Elliott, >>>>> >>>>> In BSD I think the backlog used to be reset to 3/2 times that passed by >>>>> the >>>>> user. So, 2 becomes 3. >>>>> Probably the 1/2 times increase was to accommodate the ones in >>>>> partial/incomplete queue. >>>>> In Linux is it possible you were getting the same behavior before the >>>>> below >>>>> commit ? >>>>> Since the check used to be "backlog+1" a 2 will behave as 3 ? >>>> >>>> i don't think so, because with<= 3.0 kernels i used to have a backlog >>>> of 1 and be able to make _4_ connections before my next connect would >>>> hang. but this> to>= change is at least something for me to >>>> investigate... >>>> >>>>> commit 8488df894d05d6fa41c2bd298c335f944bb0e401 >>>>> Author: Wei Dong<weid@np.css.fujitsu.com> >>>>> Date: Fri Mar 2 12:37:26 2007 -0800 >>>>> >>>>> [NET]: Fix bugs in "Whether sock accept queue is full" checking >>>>> >>>>> when I use linux TCP socket, and find there is a bug in >>>>> function >>>>> sk_acceptq_is_full(). >>>>> >>>>> When a new SYN comes, TCP module first checks its validation. >>>>> If >>>>> valid, >>>>> send SYN,ACK to the client and add the sock to the syn hash table. >>>>> Next >>>>> time if received the valid ACK for SYN,ACK from the client. server >>>>> will >>>>> accept this connection and increase the sk->sk_ack_backlog -- >>>>> which is >>>>> done in function tcp_check_req().We check wether acceptq is full >>>>> in >>>>> function tcp_v4_syn_recv_sock(). >>>>> >>>>> Consider an example: >>>>> >>>>> After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is >>>>> set to >>>>> 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming >>>>> accept() >>>>> system call is not invoked now. >>>>> >>>>> 1. 1st connection comes. invoke sk_acceptq_is_full(). >>>>> sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 >>>>> accept >>>>> this connection. >>>>> Increase the sk->sk_ack_backlog >>>>> 2. 2nd connection comes. invoke sk_acceptq_is_full(). >>>>> sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 >>>>> accept >>>>> this connection. >>>>> Increase the sk->sk_ack_backlog >>>>> 3. 3rd connection comes. invoke sk_acceptq_is_full(). >>>>> sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. >>>>> Refuse this connection. >>>>> >>>>> I think it has bugs. after listen system call. >>>>> sk->sk_max_ack_backlog=1 >>>>> but now it can accept 2 connections. >>>>> >>>>> Signed-off-by: Wei Dong<weid@np.css.fujitsu.com> >>>>> Signed-off-by: David S. Miller<davem@davemloft.net> >>>>> >>>>> Venkat >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Elliott Hughes - http://who/enh - http://jessies.org/~enh/ NIO, JNI, or bionic questions? Mail me/drop by/add me as a reviewer. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-18 17:20 ` enh @ 2012-10-19 6:02 ` Vijay Subramanian 2012-10-19 6:50 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Vijay Subramanian @ 2012-10-19 6:02 UTC (permalink / raw) To: enh; +Cc: Venkat Venkatsubra, netdev, Eric Dumazet >> They all remain on ESTABLISHED state on the client side. > > yeah, that's what i see with netstat -t too. > > (https://android-review.googlesource.com/#/c/44563/), but i'd love to > at least understand what's going on here, even if it's just that i > have a fundamental misunderstanding of what the listen backlog is > supposed to mean. > The listen backlog represents the number of received SYNs that have not been processed i.e. for which a SYN-ACK has not been sent. Actually, the number of SYNs that can be pending for processing is actually backlog+1. With a backlog of 1, there can be 2 SYNs that can be pending for processing. Once a SYN is processed by the server socket (in LISTEN state) and a syn-ack is sent back, a request_sock is created to represent it. Once the client replies with the last step of connect() i.e. with an ack, a fully established socket is created. The number of queued request-socks for a LISTEN socket can be much more than the backlog limit given in listen() (which is 1 in your case). If after a short period (after SYNACK_RETRIES), the three way handshake is not completed, request_socks can be silently discarded. When a SYN is received, it is processed by tcp_v4_conn_request() where we have.. if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) got drop; So, for the SYN to be dropped, backlog limit must be exceeded _and_ we must have recently accepted another SYN request. So, even when backlog limit is exceeded, SYNs are processed and syn-acks are sent back. It seems that the listen backlog limit is applied definitively only in the third step in tcp_v4_syn_recv_sock() and not in the first step. In tcp_v4_syn_recv_sock(), we have if (sk_acceptq_is_full(sk)) goto exit_overflow; This prevents the socket from being created fully. On the client side however, since the three way handshake has finished, the socket goes into ESTABLISHED state which is what you see with netstat. In your test case, typically 2 connections are in state where SYN has to be processed and rest are as request_sock where synacks have been sent. However, they may not become fully created sockets as they will fail in step 3 as described above. man listen() says " The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow. " In your case where backlog is 1, there can be a max of 2 pending connections (SYNs not yet processed) and this is what we see. By this interpretation, behavior seems correct. Not sure if this behavior is a bug but the processing in tcp_v4_conn_request() does look suspicious. Should we terminate earlier without doing three way hand shake? Perhaps someone who knows this better can clarify. Hope this helps. Vijay ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-19 6:02 ` Vijay Subramanian @ 2012-10-19 6:50 ` Eric Dumazet 2012-10-19 8:06 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2012-10-19 6:50 UTC (permalink / raw) To: Vijay Subramanian; +Cc: enh, Venkat Venkatsubra, netdev On Thu, 2012-10-18 at 23:02 -0700, Vijay Subramanian wrote: > >> They all remain on ESTABLISHED state on the client side. > > > > yeah, that's what i see with netstat -t too. > > > > > (https://android-review.googlesource.com/#/c/44563/), but i'd love to > > at least understand what's going on here, even if it's just that i > > have a fundamental misunderstanding of what the listen backlog is > > supposed to mean. > > > > The listen backlog represents the number of received SYNs that have > not been processed i.e. for which a SYN-ACK has not been sent. > Actually, the number of SYNs > that can be pending for processing is actually backlog+1. With a > backlog of 1, there can be 2 SYNs that can be pending for processing. > > Once a SYN is processed by the server socket (in LISTEN state) and a > syn-ack is sent back, a request_sock is created to represent it. Once > the client replies with the last step of connect() i.e. with an ack, > a fully established socket is created. The number of queued > request-socks for a LISTEN socket can be much more than the backlog > limit given in listen() (which is 1 in your case). If after a short > period (after SYNACK_RETRIES), the three way handshake is not > completed, request_socks can be silently discarded. > > When a SYN is received, it is processed by tcp_v4_conn_request() > where we have.. > if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) > got drop; > > So, for the SYN to be dropped, backlog limit must be exceeded _and_ we > must have recently accepted another SYN request. So, even when backlog > limit is exceeded, SYNs are processed and syn-acks are sent back. It > seems that the listen backlog limit is applied definitively only in > the third step in tcp_v4_syn_recv_sock() and not in the first step. > In tcp_v4_syn_recv_sock(), we have > if (sk_acceptq_is_full(sk)) > goto exit_overflow; > > This prevents the socket from being created fully. On the client side > however, since the three way handshake has finished, the socket goes > into ESTABLISHED state which is what you see with netstat. In your > test case, typically 2 connections are in state where SYN has to be > processed and rest are as request_sock where synacks have been sent. > However, > they may not become fully created sockets as they will fail in step 3 > as described above. > > man listen() says > " The backlog argument defines the maximum length to which the > queue of pending connections for sockfd may grow. " In your case where > backlog is 1, there can be a max of 2 pending connections (SYNs not > yet processed) and this is what we see. By this interpretation, > behavior seems correct. > > Not sure if this behavior is a bug but the processing in > tcp_v4_conn_request() does look suspicious. Should we terminate > earlier without doing three way hand shake? > Perhaps someone who knows this better can clarify. > > Hope this helps. > Vijay I came to the same analysis than you. Current behavior is stupid, because the traffic for such 'sockets' is insane : As we sent a SYNACK, client sends the 3rd packet (ACK), and we ignore it. Then we keep retransmitting SYNACKS.... Oh well. 21:38:27.459937 IP glaptop.53627 > 172.30.42.23.9877: Flags [S], seq 1124582230, win 14600, options [mss 1460,sackOK,TS val 84038374 ecr 0,nop,wscale 7], length 0 21:38:27.460007 IP 172.30.42.23.9877 > glaptop.53627: Flags [S.], seq 1077519728, ack 1124582231, win 14480, options [mss 1460,sackOK,TS val 4230664 ecr 84038374,nop,wscale 7], length 0 21:38:27.460235 IP glaptop.53627 > 172.30.42.23.9877: Flags [.], ack 1, win 115, options [nop,nop,TS val 84038374 ecr 4230664], length 0 21:38:28.661139 IP 172.30.42.23.9877 > glaptop.53627: Flags [S.], seq 1077519728, ack 1124582231, win 14480, options [mss 1460,sackOK,TS val 4231866 ecr 84038374,nop,wscale 7], length 0 21:38:28.661428 IP glaptop.53627 > 172.30.42.23.9877: Flags [.], ack 1, win 115, options [nop,nop,TS val 84038494 ecr 4231866,nop,nop,sack 1 {0:1}], length 0 21:38:30.661138 IP 172.30.42.23.9877 > glaptop.53627: Flags [S.], seq 1077519728, ack 1124582231, win 14480, options [mss 1460,sackOK,TS val 4233866 ecr 84038494,nop,wscale 7], length 0 21:38:30.661412 IP glaptop.53627 > 172.30.42.23.9877: Flags [.], ack 1, win 115, options [nop,nop,TS val 84038694 ecr 4233866,nop,nop,sack 1 {0:1}], length 0 21:38:35.061135 IP 172.30.42.23.9877 > glaptop.53627: Flags [S.], seq 1077519728, ack 1124582231, win 14480, options [mss 1460,sackOK,TS val 4238266 ecr 84038694,nop,wscale 7], length 0 21:38:35.061413 IP glaptop.53627 > 172.30.42.23.9877: Flags [.], ack 1, win 115, options [nop,nop,TS val 84039134 ecr 4238266,nop,nop,sack 1 {0:1}], length 0 21:38:43.061118 IP 172.30.42.23.9877 > glaptop.53627: Flags [S.], seq 1077519728, ack 1124582231, win 14480, options [mss 1460,sackOK,TS val 4246266 ecr 84039134,nop,wscale 7], length 0 21:38:43.061357 IP glaptop.53627 > 172.30.42.23.9877: Flags [.], ack 1, win 115, options [nop,nop,TS val 84039934 ecr 4246266,nop,nop,sack 1 {0:1}], length 0 21:38:59.061135 IP 172.30.42.23.9877 > glaptop.53627: Flags [S.], seq 1077519728, ack 1124582231, win 14480, options [mss 1460,sackOK,TS val 4262266 ecr 84039934,nop,wscale 7], length 0 21:38:59.061434 IP glaptop.53627 > 172.30.42.23.9877: Flags [.], ack 1, win 115, options [nop,nop,TS val 84041534 ecr 4262266,nop,nop,sack 1 {0:1}], length 0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-19 6:50 ` Eric Dumazet @ 2012-10-19 8:06 ` Eric Dumazet 2012-10-19 9:14 ` Vijay Subramanian 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2012-10-19 8:06 UTC (permalink / raw) To: Vijay Subramanian; +Cc: enh, Venkat Venkatsubra, netdev On Fri, 2012-10-19 at 08:50 +0200, Eric Dumazet wrote: > I came to the same analysis than you. > > Current behavior is stupid, because the traffic for such 'sockets' is > insane : > > As we sent a SYNACK, client sends the 3rd packet (ACK), and we ignore > it. > > Then we keep retransmitting SYNACKS.... > > Oh well. What about the following patch ? include/net/sock.h | 7 ++++++- include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c | 1 + net/ipv4/tcp_ipv4.c | 4 +++- net/ipv6/tcp_ipv6.c | 3 ++- 5 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 0baccb6..d2ecfbe 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -698,9 +698,14 @@ static inline void sk_acceptq_added(struct sock *sk) sk->sk_ack_backlog++; } +static inline bool __sk_acceptq_is_full(const struct sock *sk, unsigned int young) +{ + return (sk->sk_ack_backlog + young) > sk->sk_max_ack_backlog; +} + static inline bool sk_acceptq_is_full(const struct sock *sk) { - return sk->sk_ack_backlog > sk->sk_max_ack_backlog; + return __sk_acceptq_is_full(sk, 0); } /* diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index fdfba23..5ff2daf 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -245,6 +245,7 @@ enum LINUX_MIB_TCPFASTOPENPASSIVEFAIL, /* TCPFastOpenPassiveFail */ LINUX_MIB_TCPFASTOPENLISTENOVERFLOW, /* TCPFastOpenListenOverflow */ LINUX_MIB_TCPFASTOPENCOOKIEREQD, /* TCPFastOpenCookieReqd */ + LINUX_MIB_TCPSYNDROP, /* TCPSynDrop */ __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 8de53e1..a5f59ab 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -267,6 +267,7 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPFastOpenPassiveFail", LINUX_MIB_TCPFASTOPENPASSIVEFAIL), SNMP_MIB_ITEM("TCPFastOpenListenOverflow", LINUX_MIB_TCPFASTOPENLISTENOVERFLOW), SNMP_MIB_ITEM("TCPFastOpenCookieReqd", LINUX_MIB_TCPFASTOPENCOOKIEREQD), + SNMP_MIB_ITEM("TCPSynDrop", LINUX_MIB_TCPSYNDROP), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ef998b0..0404926 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1507,7 +1507,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) * clogging syn queue with openreqs with exponentially increasing * timeout. */ - if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) + if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk))) goto drop; req = inet_reqsk_alloc(&tcp_request_sock_ops); @@ -1673,6 +1673,7 @@ drop_and_release: drop_and_free: reqsk_free(req); drop: + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNDROP); return 0; } EXPORT_SYMBOL(tcp_v4_conn_request); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 26175bf..39ffc54 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1054,7 +1054,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) goto drop; } - if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) + if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk))) goto drop; req = inet6_reqsk_alloc(&tcp6_request_sock_ops); @@ -1204,6 +1204,7 @@ drop_and_release: drop_and_free: reqsk_free(req); drop: + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNDROP); return 0; /* don't send reset */ } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-19 8:06 ` Eric Dumazet @ 2012-10-19 9:14 ` Vijay Subramanian 2012-10-19 10:29 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Vijay Subramanian @ 2012-10-19 9:14 UTC (permalink / raw) To: Eric Dumazet; +Cc: enh, Venkat Venkatsubra, netdev > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index ef998b0..0404926 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1507,7 +1507,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) > * clogging syn queue with openreqs with exponentially increasing > * timeout. > */ > - if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) > + if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk))) > goto drop; > For what its worth, I think the changes make sense. But is there any reason to exclude old request_socks in the call to __sk_acceptq_is_full().? as in if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_len(sk))) goto drop; I am not sure why the current code looks only at young request_socks. Thanks, Vijay ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-19 9:14 ` Vijay Subramanian @ 2012-10-19 10:29 ` Eric Dumazet 2012-10-19 11:39 ` Eric Dumazet 2012-10-22 20:00 ` Vijay Subramanian 0 siblings, 2 replies; 20+ messages in thread From: Eric Dumazet @ 2012-10-19 10:29 UTC (permalink / raw) To: Vijay Subramanian; +Cc: enh, Venkat Venkatsubra, netdev On Fri, 2012-10-19 at 02:14 -0700, Vijay Subramanian wrote: > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index ef998b0..0404926 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -1507,7 +1507,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) > > * clogging syn queue with openreqs with exponentially increasing > > * timeout. > > */ > > - if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) > > + if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk))) > > goto drop; > > > > For what its worth, I think the changes make sense. But is there any > reason to exclude old request_socks in the call to > __sk_acceptq_is_full().? > as in > if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_len(sk))) > goto drop; > > I am not sure why the current code looks only at young request_socks. > Thanks, > Vijay Old requests are assumed to be unlikely to complete (SYN attack). young requests are assumed to have a reasonable chance to complete. Note that we drop the SYN packet, so its not a 'final' decision. Some other OSes send RST in case the listener queue is full (I tested FreeBSD 9.0 this morning.) Note also we probably have a bug elsewhere : If we send a SYNACK, then receive the ACK from client, and the acceptq is full, we should reset the connexion. Right now we have kind of stupid situation, were we drop the ACK, and leave the REQ in the SYN_RECV state, so we retransmit SYNACKS. I am working on this part as well. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-19 10:29 ` Eric Dumazet @ 2012-10-19 11:39 ` Eric Dumazet 2012-10-22 20:00 ` Vijay Subramanian 1 sibling, 0 replies; 20+ messages in thread From: Eric Dumazet @ 2012-10-19 11:39 UTC (permalink / raw) To: Vijay Subramanian; +Cc: enh, Venkat Venkatsubra, netdev On Fri, 2012-10-19 at 12:29 +0200, Eric Dumazet wrote: > On Fri, 2012-10-19 at 02:14 -0700, Vijay Subramanian wrote: > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > > index ef998b0..0404926 100644 > > > --- a/net/ipv4/tcp_ipv4.c > > > +++ b/net/ipv4/tcp_ipv4.c > > > @@ -1507,7 +1507,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) > > > * clogging syn queue with openreqs with exponentially increasing > > > * timeout. > > > */ > > > - if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) > > > + if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_young(sk))) > > > goto drop; > > > > > > > For what its worth, I think the changes make sense. But is there any > > reason to exclude old request_socks in the call to > > __sk_acceptq_is_full().? > > as in > > if (__sk_acceptq_is_full(sk, inet_csk_reqsk_queue_len(sk))) > > goto drop; > > > > I am not sure why the current code looks only at young request_socks. > > Thanks, > > Vijay > > Old requests are assumed to be unlikely to complete (SYN attack). > > young requests are assumed to have a reasonable chance to complete. > > Note that we drop the SYN packet, so its not a 'final' decision. > > Some other OSes send RST in case the listener queue is full > (I tested FreeBSD 9.0 this morning.) > > Note also we probably have a bug elsewhere : > > If we send a SYNACK, then receive the ACK from client, and the acceptq > is full, we should reset the connexion. Right now we have kind of stupid > situation, were we drop the ACK, and leave the REQ in the SYN_RECV > state, so we retransmit SYNACKS. > > I am working on this part as well. > Well, it seems a documented feature : tcp_abort_on_overflow - BOOLEAN If listening service is too slow to accept new connections, reset them. Default state is FALSE. It means that if overflow occurred due to a burst, connection will recover. Enable this option _only_ if you are really sure that listening daemon cannot be tuned to accept connections faster. Enabling this option can harm clients of your server. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-19 10:29 ` Eric Dumazet 2012-10-19 11:39 ` Eric Dumazet @ 2012-10-22 20:00 ` Vijay Subramanian 2012-10-22 20:08 ` Eric Dumazet 1 sibling, 1 reply; 20+ messages in thread From: Vijay Subramanian @ 2012-10-22 20:00 UTC (permalink / raw) To: Eric Dumazet; +Cc: Vijay Subramanian, enh, Venkat Venkatsubra, netdev > > If we send a SYNACK, then receive the ACK from client, and the acceptq > is full, we should reset the connexion. Right now we have kind of stupid > situation, were we drop the ACK, and leave the REQ in the SYN_RECV > state, so we retransmit SYNACKS. > It seems the third ack is remembered in inet_rsk(req)->acked in tcp_check_req(). However, because of the order in which the tests are performed, server stills retransmits the synack needlessly. Following patch (for review) prevents this synack retransmission if third ack has been received. The request_sock will expire in around 30 seconds and will be dropped if it does not move into accept_queue by then. Maybe we should also call req->rsk_ops->send_reset(sk,skb); when the request_sock expires and is dropped? net/ipv4/inet_connection_sock.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d34ce29..4e8e52e 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, &expire, &resend); req->rsk_ops->syn_ack_timeout(parent, req); if (!expire && - (!resend || - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || - inet_rsk(req)->acked)) { + (!resend || inet_rsk(req)->acked || + !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) { unsigned long timeo; if (req->retrans++ == 0) Thanks, Vijay ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-22 20:00 ` Vijay Subramanian @ 2012-10-22 20:08 ` Eric Dumazet 2012-10-22 22:11 ` Vijay Subramanian 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2012-10-22 20:08 UTC (permalink / raw) To: Vijay Subramanian; +Cc: enh, Venkat Venkatsubra, netdev On Mon, 2012-10-22 at 13:00 -0700, Vijay Subramanian wrote: > > > > If we send a SYNACK, then receive the ACK from client, and the acceptq > > is full, we should reset the connexion. Right now we have kind of stupid > > situation, were we drop the ACK, and leave the REQ in the SYN_RECV > > state, so we retransmit SYNACKS. > > > > > It seems the third ack is remembered in inet_rsk(req)->acked in > tcp_check_req(). However, because of the order in which the tests are performed, > server stills retransmits the synack needlessly. Following patch > (for review) prevents this synack retransmission if third ack has been > received. > > The request_sock will expire in around 30 seconds and will be dropped if it does > not move into accept_queue by then. Maybe we should also call > req->rsk_ops->send_reset(sk,skb); > when the request_sock expires and is dropped? > Not sure its needed, and we are under stress. > > net/ipv4/inet_connection_sock.c | 5 ++--- > 1 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index d34ce29..4e8e52e 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -598,9 +598,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, > &expire, &resend); > req->rsk_ops->syn_ack_timeout(parent, req); > if (!expire && > - (!resend || > - !req->rsk_ops->rtx_syn_ack(parent, req, NULL) || > - inet_rsk(req)->acked)) { > + (!resend || inet_rsk(req)->acked || > + !req->rsk_ops->rtx_syn_ack(parent, req, NULL))) { > unsigned long timeo; > > if (req->retrans++ == 0) I wonder then if we dont need to retransmit the synack when req moves into accept_queue then ? Or else how the client can 'knows' it can send data to server ? All these facilities sound very complex and not really usable by clients (ie users not willing to wait more than few seconds anyway) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-22 20:08 ` Eric Dumazet @ 2012-10-22 22:11 ` Vijay Subramanian 2012-10-25 22:50 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Vijay Subramanian @ 2012-10-22 22:11 UTC (permalink / raw) To: Eric Dumazet; +Cc: enh, Venkat Venkatsubra, netdev > > I wonder then if we dont need to retransmit the synack when req moves > into accept_queue then ? If I understood the code correctly, the socket moves into accept_queue only when the third ack (with or without data) comes in. So, there should be no need to resend syn-ack. The issue is that there is no mechanism to promote req sockets which have finished TWHS to accept_queue currently. Socket can move into accept_queue only when third ack is processed. If we stop resending synacks, then socket will move into accept_queue when client sends data. > > Or else how the client can 'knows' it can send data to server ? >From client's point of view, TWHS is finished. Client is already in established state and can even now send data. Currently, such packets with data will be dropped if accept_queue is full. If accept_queue is not full, socket moves into accept_queue and established state and processes the data. I think the only thing my patch does is reorder the tests so that needless syn-ack retransmissions are stopped. > > All these facilities sound very complex and not really usable by clients > (ie users not willing to wait more than few seconds anyway) > Fair enough. We can drop this if it is not worth the trouble or if I have missed any other scenario. Thanks for your review and time! Vijay ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-22 22:11 ` Vijay Subramanian @ 2012-10-25 22:50 ` Eric Dumazet 2012-10-25 23:16 ` Vijay Subramanian 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2012-10-25 22:50 UTC (permalink / raw) To: Vijay Subramanian; +Cc: enh, Venkat Venkatsubra, netdev On Mon, 2012-10-22 at 15:11 -0700, Vijay Subramanian wrote: > > > > All these facilities sound very complex and not really usable by clients > > (ie users not willing to wait more than few seconds anyway) > > > > Fair enough. We can drop this if it is not worth the trouble or if I > have missed any other scenario. > Sorry my comment was not related to your patch, but existing logic. It seems there is no value resending SYNACK, as we received the client ACK. Please send an official patch ? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-25 22:50 ` Eric Dumazet @ 2012-10-25 23:16 ` Vijay Subramanian 0 siblings, 0 replies; 20+ messages in thread From: Vijay Subramanian @ 2012-10-25 23:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: enh, Venkat Venkatsubra, netdev On 25 October 2012 15:50, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2012-10-22 at 15:11 -0700, Vijay Subramanian wrote: > >> > >> > All these facilities sound very complex and not really usable by clients >> > (ie users not willing to wait more than few seconds anyway) >> > >> >> Fair enough. We can drop this if it is not worth the trouble or if I >> have missed any other scenario. >> > > Sorry my comment was not related to your patch, but existing logic. > > It seems there is no value resending SYNACK, as we received the client > ACK. > > Please send an official patch ? > > > Eric, I will send a patch shortly. Thanks, Vijay ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: listen(2) backlog changes in or around Linux 3.1? 2012-10-16 23:31 ` enh 2012-10-18 16:00 ` Venkat Venkatsubra @ 2012-10-18 16:54 ` Eric Dumazet 1 sibling, 0 replies; 20+ messages in thread From: Eric Dumazet @ 2012-10-18 16:54 UTC (permalink / raw) To: enh; +Cc: netdev On Tue, 2012-10-16 at 16:31 -0700, enh wrote: > boiling things down to a short C++ program, i see that i can reproduce > the behavior even on 2.6 kernels. if i run this, i see 4 connections > immediately (3 + 1, as i'd expect)... but then about 10s later i see > another 2. and every few seconds after that, i see another 2. i've let > this run until i have hundreds of connect(2) calls that have returned, > despite my small listen(2) backlog and the fact that i'm not > accept(2)ing. > > so i guess the only thing that's changed with newer kernels is timing > (hell, since i only see newer kernels on newer hardware, it might just > be a hardware thing). > > and clearly i don't understand what the listen(2) backlog means any more. Hi Elliott I would say there is a bug (or several !!), and this needs a fix. I am investigating. Thanks ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-10-25 23:16 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-12 23:40 listen(2) backlog changes in or around Linux 3.1? enh 2012-10-15 17:12 ` Venkat Venkatsubra 2012-10-15 17:26 ` enh 2012-10-15 21:30 ` Venkat Venkatsubra 2012-10-16 23:31 ` enh 2012-10-18 16:00 ` Venkat Venkatsubra 2012-10-18 16:53 ` Venkat Venkatsubra 2012-10-18 17:20 ` enh 2012-10-19 6:02 ` Vijay Subramanian 2012-10-19 6:50 ` Eric Dumazet 2012-10-19 8:06 ` Eric Dumazet 2012-10-19 9:14 ` Vijay Subramanian 2012-10-19 10:29 ` Eric Dumazet 2012-10-19 11:39 ` Eric Dumazet 2012-10-22 20:00 ` Vijay Subramanian 2012-10-22 20:08 ` Eric Dumazet 2012-10-22 22:11 ` Vijay Subramanian 2012-10-25 22:50 ` Eric Dumazet 2012-10-25 23:16 ` Vijay Subramanian 2012-10-18 16:54 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).