* SO_LINGER dead: I get an immediate RST on 2.6.24? @ 2009-01-11 21:23 bert hubert 2009-01-11 22:08 ` H. Willstrand 0 siblings, 1 reply; 11+ messages in thread From: bert hubert @ 2009-01-11 21:23 UTC (permalink / raw) To: netdev Hi everybody, I have an application where I need to send data from A to B, and beforehand, I don't know how much data this will be. B is 'stupid', and consists solely of a TCP/IP port accepting data, and I have no way to chunk this data. So what I do is issue blocking calls to write(), shutdown(fd, SHUT_WR), and wait for the fd to become readable which tells me that the remote has packed up, and I'm good to go. Before this, I've tried SO_LINGER with various timeouts but nothing helped. When I tcpdump, I find that my close() is immediately turned into an RST packet. Is SO_LINGER a NOOP? Does it still do anything? I'm about to blog this up - the 'shutdown() and read()' technique is something I had to purloin from the Apache source. So I'd love to know the words of the wise on this one. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 21:23 SO_LINGER dead: I get an immediate RST on 2.6.24? bert hubert @ 2009-01-11 22:08 ` H. Willstrand 2009-01-11 22:45 ` sendfile()? " bert hubert 0 siblings, 1 reply; 11+ messages in thread From: H. Willstrand @ 2009-01-11 22:08 UTC (permalink / raw) To: bert hubert, netdev On Sun, Jan 11, 2009 at 10:23 PM, bert hubert <bert.hubert@netherlabs.nl> wrote: > Hi everybody, > > I have an application where I need to send data from A to B, and beforehand, > I don't know how much data this will be. > > B is 'stupid', and consists solely of a TCP/IP port accepting data, and I > have no way to chunk this data. So what I do is issue blocking calls to > write(), shutdown(fd, SHUT_WR), and wait for the fd to become readable which > tells me that the remote has packed up, and I'm good to go. > > Before this, I've tried SO_LINGER with various timeouts but nothing helped. > > When I tcpdump, I find that my close() is immediately turned into an RST > packet. > > Is SO_LINGER a NOOP? Does it still do anything? > > I'm about to blog this up - the 'shutdown() and read()' technique is > something I had to purloin from the Apache source. > > So I'd love to know the words of the wise on this one. > > Thanks. > > -- > http://www.PowerDNS.com Open source, database driven DNS Software > http://netherlabs.nl Open and Closed source services > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > This is the correct behavior according to RFC 2525, see section 2.17 (there are an example). //H.W. ^ permalink raw reply [flat|nested] 11+ messages in thread
* sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 22:08 ` H. Willstrand @ 2009-01-11 22:45 ` bert hubert 2009-01-11 22:54 ` Evgeniy Polyakov 0 siblings, 1 reply; 11+ messages in thread From: bert hubert @ 2009-01-11 22:45 UTC (permalink / raw) To: H. Willstrand; +Cc: netdev On Sun, Jan 11, 2009 at 11:08:16PM +0100, H. Willstrand wrote: > > Is SO_LINGER a NOOP? Does it still do anything? > This is the correct behavior according to RFC 2525, see section 2.17 > (there are an example). Ah - very good, thank you. I'm trying to gather as much information as I can before writing this all up. This should save netdev & the linux kernel community a lot of email! Is there any way to make sure there is no pending output data, so one can safely call close(), and not get an RST-situation? Let me put it more succinctly. What I would very much like to have is what Linux sendfile() offers in practice. It appears that if one asks sendfile() to transmit a million bytes, it will only return when the ACK for the millionth byte is in. I know that TCP will never be fully fully reliable, but I would love to have a way to know that the millionth byte was ACKed, or alternatively, that an error prevented that. >From what I've read so far, I think the POSIX functions don't offer this. But does Linux? sendfile appears to get it right.. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 22:45 ` sendfile()? " bert hubert @ 2009-01-11 22:54 ` Evgeniy Polyakov 2009-01-11 23:08 ` bert hubert 0 siblings, 1 reply; 11+ messages in thread From: Evgeniy Polyakov @ 2009-01-11 22:54 UTC (permalink / raw) To: bert hubert, H. Willstrand, netdev Hi Bert. On Sun, Jan 11, 2009 at 11:45:43PM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > Is there any way to make sure there is no pending output data, so one can > safely call close(), and not get an RST-situation? You can try to work with tcp cork options, uncorking the socket means that stack has sent data to the hardware, there are no other guarantees. > Let me put it more succinctly. What I would very much like to have is what > Linux sendfile() offers in practice. > > It appears that if one asks sendfile() to transmit a million bytes, it will > only return when the ACK for the millionth byte is in. No it is not, it returns when it believes it has sent all the requested data, but in practice it can be even not sent but waiting in some hardware queue. > I know that TCP will never be fully fully reliable, but I would love to have > a way to know that the millionth byte was ACKed, or alternatively, that an > error prevented that. There is no way to get a notification when data is acked by the remote side. Generally you should invent some kind of own explicit acks. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 22:54 ` Evgeniy Polyakov @ 2009-01-11 23:08 ` bert hubert 2009-01-11 23:18 ` Evgeniy Polyakov 0 siblings, 1 reply; 11+ messages in thread From: bert hubert @ 2009-01-11 23:08 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: H. Willstrand, netdev On Mon, Jan 12, 2009 at 01:54:27AM +0300, Evgeniy Polyakov wrote: > You can try to work with tcp cork options, uncorking the socket means > that stack has sent data to the hardware, there are no other guarantees. Ah, smart. > > It appears that if one asks sendfile() to transmit a million bytes, it will > > only return when the ACK for the millionth byte is in. > > No it is not, it returns when it believes it has sent all the requested > data, but in practice it can be even not sent but waiting in some > hardware queue. Ah ok. > > I know that TCP will never be fully fully reliable, but I would love to have > > a way to know that the millionth byte was ACKed, or alternatively, that an > > error prevented that. > > There is no way to get a notification when data is acked by the remote > side. Generally you should invent some kind of own explicit acks. I fully understand. Sometimes I have to talk to stupid devices though. What I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: __u32 tcpi_unacked; Which comes from: struct tcp_sock { ... u32 packets_out; /* Packets which are "in flight" */ ... } If this becomes 0, perhaps this might tell me everything I sent was acked? Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 23:08 ` bert hubert @ 2009-01-11 23:18 ` Evgeniy Polyakov 2009-01-12 4:50 ` Bill Fink 0 siblings, 1 reply; 11+ messages in thread From: Evgeniy Polyakov @ 2009-01-11 23:18 UTC (permalink / raw) To: bert hubert, H. Willstrand, netdev On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > I fully understand. Sometimes I have to talk to stupid devices though. What > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > __u32 tcpi_unacked; > > Which comes from: > > struct tcp_sock { > ... > u32 packets_out; /* Packets which are "in flight" */ > ... > } > > If this becomes 0, perhaps this might tell me everything I sent was acked? 0 means that there are noin-flight packets, which is effectively number of unacked packets. So if your application waits for this field to become zero, it will wait for all sent packets to be acked. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 23:18 ` Evgeniy Polyakov @ 2009-01-12 4:50 ` Bill Fink 2009-01-12 9:18 ` Ilpo Järvinen 0 siblings, 1 reply; 11+ messages in thread From: Bill Fink @ 2009-01-12 4:50 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: bert hubert, H. Willstrand, netdev On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > I fully understand. Sometimes I have to talk to stupid devices though. What > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > __u32 tcpi_unacked; > > > > Which comes from: > > > > struct tcp_sock { > > ... > > u32 packets_out; /* Packets which are "in flight" */ > > ... > > } > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > 0 means that there are noin-flight packets, which is effectively number > of unacked packets. So if your application waits for this field to > become zero, it will wait for all sent packets to be acked. I use this type of strategy in nuttcp, and it seems to work fine. I have a loop with a small delay and a check of tcpi_unacked, and break out of the loop if tcpi_unacked becomes 0 or a defined timeout period has passed. -Bill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-12 4:50 ` Bill Fink @ 2009-01-12 9:18 ` Ilpo Järvinen 2009-01-13 5:31 ` Bill Fink 0 siblings, 1 reply; 11+ messages in thread From: Ilpo Järvinen @ 2009-01-12 9:18 UTC (permalink / raw) To: Bill Fink; +Cc: Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Sun, 11 Jan 2009, Bill Fink wrote: > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > I fully understand. Sometimes I have to talk to stupid devices though. What > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > __u32 tcpi_unacked; > > > > > > Which comes from: > > > > > > struct tcp_sock { > > > ... > > > u32 packets_out; /* Packets which are "in flight" */ > > > ... > > > } > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > 0 means that there are noin-flight packets, which is effectively number > > of unacked packets. So if your application waits for this field to > > become zero, it will wait for all sent packets to be acked. > > I use this type of strategy in nuttcp, and it seems to work fine. > I have a loop with a small delay and a check of tcpi_unacked, and > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > period has passed. Checking tcpi_unacked alone won't be reliable. The peer might be slow enough to advertize zero window for a short period of time and during that period you would have packets_out zero... -- i. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-12 9:18 ` Ilpo Järvinen @ 2009-01-13 5:31 ` Bill Fink 2009-02-13 17:02 ` Jeremy Jackson 0 siblings, 1 reply; 11+ messages in thread From: Bill Fink @ 2009-01-13 5:31 UTC (permalink / raw) To: Ilpo Järvinen ; +Cc: Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Mon, 12 Jan 2009, Ilpo Järvinen wrote: > On Sun, 11 Jan 2009, Bill Fink wrote: > > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > > I fully understand. Sometimes I have to talk to stupid devices though. What > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > > > __u32 tcpi_unacked; > > > > > > > > Which comes from: > > > > > > > > struct tcp_sock { > > > > ... > > > > u32 packets_out; /* Packets which are "in flight" */ > > > > ... > > > > } > > > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > > > 0 means that there are noin-flight packets, which is effectively number > > > of unacked packets. So if your application waits for this field to > > > become zero, it will wait for all sent packets to be acked. > > > > I use this type of strategy in nuttcp, and it seems to work fine. > > I have a loop with a small delay and a check of tcpi_unacked, and > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > > period has passed. > > Checking tcpi_unacked alone won't be reliable. The peer might be slow > enough to advertize zero window for a short period of time and during > that period you would have packets_out zero... I'll keep this in mind for the future, although it doesn't seem to be a significant issue in practice. I use this scheme to try and account for the tcpi_total_retrans for the data stream, so if this corner case was hit, it would mean an under reporting of the total TCP retransmissions for the nuttcp test. If I understand you correctly, to hit this corner case, just after the final TCP write, there would have to be no packets in flight together with a zero TCP window. To make it more bullet-proof, I guess after seeing a zero tcpi_unacked, an additional small delay should be performed, and then rechecking for a zero tcpi_unacked. I don't see anything else obvious (to me anyway) in the tcp_info that would be particularly helpful in handling this. -Thanks -Bill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-13 5:31 ` Bill Fink @ 2009-02-13 17:02 ` Jeremy Jackson 2009-02-20 18:10 ` Bill Fink 0 siblings, 1 reply; 11+ messages in thread From: Jeremy Jackson @ 2009-02-13 17:02 UTC (permalink / raw) To: Bill Fink Cc: ilpo.jarvinen, Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Tue, 2009-01-13 at 00:31 -0500, Bill Fink wrote: > On Mon, 12 Jan 2009, Ilpo Järvinen wrote: > > > On Sun, 11 Jan 2009, Bill Fink wrote: > > > > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > > > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > > > I fully understand. Sometimes I have to talk to stupid devices though. What An excellent article on this subject: http://ds9a.nl/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.txt "Luckily, it turns out that Linux keeps track of the amount of unacknowledged data, which can be queried using the SIOCOUTQ ioctl(). Once we see this number hit 0, we can be reasonably sure our data reached at least the remote operating system." is this the same as the TCP_INFO getsockopt() ? if you follow the progression from write(socket_fd, ) ... the data sits in the socket buffer, and SIOCOUTQ is initially zero. If the connection started with a zero window, it could sit like that for a while (sometimes called a "tarpit ?). But, you should still see the data in your socket buffer, yes? So, I think you want to make sure your socket write buffer is empty (converted to unacked data), *then* make sure your unacked data is 0. write(sock, buffer, 1000000); // returns 1000000 shutdown(sock, SHUT_WR); now wait for SIOCOUTQ to hit 0. if window is 0, shutdown() would wait until show device sets window > 0 again, or forever on a tarpitted connection. Either way, when if/when it finishes, you know all data was transmitted, now wait for all of it to be ACKed with SIOCOUTQ. > > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > > > > > __u32 tcpi_unacked; > > > > > > > > > > Which comes from: > > > > > > > > > > struct tcp_sock { > > > > > ... > > > > > u32 packets_out; /* Packets which are "in flight" */ > > > > > ... > > > > > } > > > > > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > > > > > 0 means that there are noin-flight packets, which is effectively number > > > > of unacked packets. So if your application waits for this field to > > > > become zero, it will wait for all sent packets to be acked. > > > > > > I use this type of strategy in nuttcp, and it seems to work fine. > > > I have a loop with a small delay and a check of tcpi_unacked, and > > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > > > period has passed. > > > > Checking tcpi_unacked alone won't be reliable. The peer might be slow > > enough to advertize zero window for a short period of time and during > > that period you would have packets_out zero... > > I'll keep this in mind for the future, although it doesn't seem to > be a significant issue in practice. I use this scheme to try and > account for the tcpi_total_retrans for the data stream, so if this > corner case was hit, it would mean an under reporting of the total > TCP retransmissions for the nuttcp test. > > If I understand you correctly, to hit this corner case, just after > the final TCP write, there would have to be no packets in flight > together with a zero TCP window. To make it more bullet-proof, I > guess after seeing a zero tcpi_unacked, an additional small delay > should be performed, and then rechecking for a zero tcpi_unacked. > I don't see anything else obvious (to me anyway) in the tcp_info > that would be particularly helpful in handling this. -- Jeremy Jackson Coplanar Networks (519)489-4903 http://www.coplanar.net jerj@coplanar.net ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-02-13 17:02 ` Jeremy Jackson @ 2009-02-20 18:10 ` Bill Fink 0 siblings, 0 replies; 11+ messages in thread From: Bill Fink @ 2009-02-20 18:10 UTC (permalink / raw) To: Jeremy Jackson Cc: ilpo.jarvinen, Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Fri, 13 Feb 2009, Jeremy Jackson wrote: > On Tue, 2009-01-13 at 00:31 -0500, Bill Fink wrote: > > On Mon, 12 Jan 2009, Ilpo Järvinen wrote: > > > > > On Sun, 11 Jan 2009, Bill Fink wrote: > > > > > > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > > > > > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > > > > I fully understand. Sometimes I have to talk to stupid devices though. What > > An excellent article on this subject: > > http://ds9a.nl/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.txt > > "Luckily, it turns out that Linux keeps track of the amount of > unacknowledged > data, which can be queried using the SIOCOUTQ ioctl(). Once we see this > number hit 0, we can be reasonably sure our data reached at least the > remote > operating system." > > is this the same as the TCP_INFO getsockopt() ? If you mean the tcpinfo_unacked variable, then no it is not the same as the SIOCOUTQ info. > if you follow the progression from write(socket_fd, ) ... the data sits > in > the socket buffer, and SIOCOUTQ is initially zero. If the connection > started with a zero window, > it could sit like that for a while (sometimes called a "tarpit ?). But, > you should still see the data in your socket buffer, yes? > > So, I think you want to make sure your socket write buffer is empty > (converted to unacked data), *then* make sure your unacked data is 0. > > write(sock, buffer, 1000000); // returns 1000000 > shutdown(sock, SHUT_WR); > now wait for SIOCOUTQ to hit 0. > > if window is 0, shutdown() would wait until show device sets window > 0 > again, or forever on a tarpitted connection. Either way, when if/when > it finishes, you know all data was transmitted, now wait for all of it > to be ACKed with SIOCOUTQ. While the "shutdown(sock, SHUT_WR)" might be useful, it isn't actually necessary, since the SIOCOUTQ info includes both unACKed data (reported by tcpinfo_unacked variable) and never sent data (written by app but outside of receiver's allowed window). -Bill > > > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > > > > > > > __u32 tcpi_unacked; > > > > > > > > > > > > Which comes from: > > > > > > > > > > > > struct tcp_sock { > > > > > > ... > > > > > > u32 packets_out; /* Packets which are "in flight" */ > > > > > > ... > > > > > > } > > > > > > > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > > > > > > > 0 means that there are noin-flight packets, which is effectively number > > > > > of unacked packets. So if your application waits for this field to > > > > > become zero, it will wait for all sent packets to be acked. > > > > > > > > I use this type of strategy in nuttcp, and it seems to work fine. > > > > I have a loop with a small delay and a check of tcpi_unacked, and > > > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > > > > period has passed. > > > > > > Checking tcpi_unacked alone won't be reliable. The peer might be slow > > > enough to advertize zero window for a short period of time and during > > > that period you would have packets_out zero... > > > > I'll keep this in mind for the future, although it doesn't seem to > > be a significant issue in practice. I use this scheme to try and > > account for the tcpi_total_retrans for the data stream, so if this > > corner case was hit, it would mean an under reporting of the total > > TCP retransmissions for the nuttcp test. > > > > If I understand you correctly, to hit this corner case, just after > > the final TCP write, there would have to be no packets in flight > > together with a zero TCP window. To make it more bullet-proof, I > > guess after seeing a zero tcpi_unacked, an additional small delay > > should be performed, and then rechecking for a zero tcpi_unacked. > > I don't see anything else obvious (to me anyway) in the tcp_info > > that would be particularly helpful in handling this. > > -- > Jeremy Jackson > Coplanar Networks > (519)489-4903 > http://www.coplanar.net > jerj@coplanar.net ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-02-20 18:11 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-11 21:23 SO_LINGER dead: I get an immediate RST on 2.6.24? bert hubert 2009-01-11 22:08 ` H. Willstrand 2009-01-11 22:45 ` sendfile()? " bert hubert 2009-01-11 22:54 ` Evgeniy Polyakov 2009-01-11 23:08 ` bert hubert 2009-01-11 23:18 ` Evgeniy Polyakov 2009-01-12 4:50 ` Bill Fink 2009-01-12 9:18 ` Ilpo Järvinen 2009-01-13 5:31 ` Bill Fink 2009-02-13 17:02 ` Jeremy Jackson 2009-02-20 18:10 ` Bill Fink
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).