* SO_LINGER dead: I get an immediate RST on 2.6.24? @ 2009-01-11 21:23 bert hubert 2009-01-11 22:08 ` H. Willstrand 0 siblings, 1 reply; 23+ messages in thread From: bert hubert @ 2009-01-11 21:23 UTC (permalink / raw) To: netdev Hi everybody, I have an application where I need to send data from A to B, and beforehand, I don't know how much data this will be. B is 'stupid', and consists solely of a TCP/IP port accepting data, and I have no way to chunk this data. So what I do is issue blocking calls to write(), shutdown(fd, SHUT_WR), and wait for the fd to become readable which tells me that the remote has packed up, and I'm good to go. Before this, I've tried SO_LINGER with various timeouts but nothing helped. When I tcpdump, I find that my close() is immediately turned into an RST packet. Is SO_LINGER a NOOP? Does it still do anything? I'm about to blog this up - the 'shutdown() and read()' technique is something I had to purloin from the Apache source. So I'd love to know the words of the wise on this one. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 21:23 SO_LINGER dead: I get an immediate RST on 2.6.24? bert hubert @ 2009-01-11 22:08 ` H. Willstrand 2009-01-11 22:45 ` sendfile()? " bert hubert 0 siblings, 1 reply; 23+ messages in thread From: H. Willstrand @ 2009-01-11 22:08 UTC (permalink / raw) To: bert hubert, netdev On Sun, Jan 11, 2009 at 10:23 PM, bert hubert <bert.hubert@netherlabs.nl> wrote: > Hi everybody, > > I have an application where I need to send data from A to B, and beforehand, > I don't know how much data this will be. > > B is 'stupid', and consists solely of a TCP/IP port accepting data, and I > have no way to chunk this data. So what I do is issue blocking calls to > write(), shutdown(fd, SHUT_WR), and wait for the fd to become readable which > tells me that the remote has packed up, and I'm good to go. > > Before this, I've tried SO_LINGER with various timeouts but nothing helped. > > When I tcpdump, I find that my close() is immediately turned into an RST > packet. > > Is SO_LINGER a NOOP? Does it still do anything? > > I'm about to blog this up - the 'shutdown() and read()' technique is > something I had to purloin from the Apache source. > > So I'd love to know the words of the wise on this one. > > Thanks. > > -- > http://www.PowerDNS.com Open source, database driven DNS Software > http://netherlabs.nl Open and Closed source services > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > This is the correct behavior according to RFC 2525, see section 2.17 (there are an example). //H.W. ^ permalink raw reply [flat|nested] 23+ messages in thread
* sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 22:08 ` H. Willstrand @ 2009-01-11 22:45 ` bert hubert 2009-01-11 22:54 ` Evgeniy Polyakov 0 siblings, 1 reply; 23+ messages in thread From: bert hubert @ 2009-01-11 22:45 UTC (permalink / raw) To: H. Willstrand; +Cc: netdev On Sun, Jan 11, 2009 at 11:08:16PM +0100, H. Willstrand wrote: > > Is SO_LINGER a NOOP? Does it still do anything? > This is the correct behavior according to RFC 2525, see section 2.17 > (there are an example). Ah - very good, thank you. I'm trying to gather as much information as I can before writing this all up. This should save netdev & the linux kernel community a lot of email! Is there any way to make sure there is no pending output data, so one can safely call close(), and not get an RST-situation? Let me put it more succinctly. What I would very much like to have is what Linux sendfile() offers in practice. It appears that if one asks sendfile() to transmit a million bytes, it will only return when the ACK for the millionth byte is in. I know that TCP will never be fully fully reliable, but I would love to have a way to know that the millionth byte was ACKed, or alternatively, that an error prevented that. >From what I've read so far, I think the POSIX functions don't offer this. But does Linux? sendfile appears to get it right.. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 22:45 ` sendfile()? " bert hubert @ 2009-01-11 22:54 ` Evgeniy Polyakov 2009-01-11 23:08 ` bert hubert 0 siblings, 1 reply; 23+ messages in thread From: Evgeniy Polyakov @ 2009-01-11 22:54 UTC (permalink / raw) To: bert hubert, H. Willstrand, netdev Hi Bert. On Sun, Jan 11, 2009 at 11:45:43PM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > Is there any way to make sure there is no pending output data, so one can > safely call close(), and not get an RST-situation? You can try to work with tcp cork options, uncorking the socket means that stack has sent data to the hardware, there are no other guarantees. > Let me put it more succinctly. What I would very much like to have is what > Linux sendfile() offers in practice. > > It appears that if one asks sendfile() to transmit a million bytes, it will > only return when the ACK for the millionth byte is in. No it is not, it returns when it believes it has sent all the requested data, but in practice it can be even not sent but waiting in some hardware queue. > I know that TCP will never be fully fully reliable, but I would love to have > a way to know that the millionth byte was ACKed, or alternatively, that an > error prevented that. There is no way to get a notification when data is acked by the remote side. Generally you should invent some kind of own explicit acks. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 22:54 ` Evgeniy Polyakov @ 2009-01-11 23:08 ` bert hubert 2009-01-11 23:18 ` Evgeniy Polyakov 0 siblings, 1 reply; 23+ messages in thread From: bert hubert @ 2009-01-11 23:08 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: H. Willstrand, netdev On Mon, Jan 12, 2009 at 01:54:27AM +0300, Evgeniy Polyakov wrote: > You can try to work with tcp cork options, uncorking the socket means > that stack has sent data to the hardware, there are no other guarantees. Ah, smart. > > It appears that if one asks sendfile() to transmit a million bytes, it will > > only return when the ACK for the millionth byte is in. > > No it is not, it returns when it believes it has sent all the requested > data, but in practice it can be even not sent but waiting in some > hardware queue. Ah ok. > > I know that TCP will never be fully fully reliable, but I would love to have > > a way to know that the millionth byte was ACKed, or alternatively, that an > > error prevented that. > > There is no way to get a notification when data is acked by the remote > side. Generally you should invent some kind of own explicit acks. I fully understand. Sometimes I have to talk to stupid devices though. What I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: __u32 tcpi_unacked; Which comes from: struct tcp_sock { ... u32 packets_out; /* Packets which are "in flight" */ ... } If this becomes 0, perhaps this might tell me everything I sent was acked? Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 23:08 ` bert hubert @ 2009-01-11 23:18 ` Evgeniy Polyakov 2009-01-12 4:50 ` Bill Fink 0 siblings, 1 reply; 23+ messages in thread From: Evgeniy Polyakov @ 2009-01-11 23:18 UTC (permalink / raw) To: bert hubert, H. Willstrand, netdev On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > I fully understand. Sometimes I have to talk to stupid devices though. What > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > __u32 tcpi_unacked; > > Which comes from: > > struct tcp_sock { > ... > u32 packets_out; /* Packets which are "in flight" */ > ... > } > > If this becomes 0, perhaps this might tell me everything I sent was acked? 0 means that there are noin-flight packets, which is effectively number of unacked packets. So if your application waits for this field to become zero, it will wait for all sent packets to be acked. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-11 23:18 ` Evgeniy Polyakov @ 2009-01-12 4:50 ` Bill Fink 2009-01-12 9:18 ` Ilpo Järvinen 0 siblings, 1 reply; 23+ messages in thread From: Bill Fink @ 2009-01-12 4:50 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: bert hubert, H. Willstrand, netdev On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > I fully understand. Sometimes I have to talk to stupid devices though. What > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > __u32 tcpi_unacked; > > > > Which comes from: > > > > struct tcp_sock { > > ... > > u32 packets_out; /* Packets which are "in flight" */ > > ... > > } > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > 0 means that there are noin-flight packets, which is effectively number > of unacked packets. So if your application waits for this field to > become zero, it will wait for all sent packets to be acked. I use this type of strategy in nuttcp, and it seems to work fine. I have a loop with a small delay and a check of tcpi_unacked, and break out of the loop if tcpi_unacked becomes 0 or a defined timeout period has passed. -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-12 4:50 ` Bill Fink @ 2009-01-12 9:18 ` Ilpo Järvinen 2009-01-13 5:31 ` Bill Fink 0 siblings, 1 reply; 23+ messages in thread From: Ilpo Järvinen @ 2009-01-12 9:18 UTC (permalink / raw) To: Bill Fink; +Cc: Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Sun, 11 Jan 2009, Bill Fink wrote: > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > I fully understand. Sometimes I have to talk to stupid devices though. What > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > __u32 tcpi_unacked; > > > > > > Which comes from: > > > > > > struct tcp_sock { > > > ... > > > u32 packets_out; /* Packets which are "in flight" */ > > > ... > > > } > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > 0 means that there are noin-flight packets, which is effectively number > > of unacked packets. So if your application waits for this field to > > become zero, it will wait for all sent packets to be acked. > > I use this type of strategy in nuttcp, and it seems to work fine. > I have a loop with a small delay and a check of tcpi_unacked, and > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > period has passed. Checking tcpi_unacked alone won't be reliable. The peer might be slow enough to advertize zero window for a short period of time and during that period you would have packets_out zero... -- i. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-12 9:18 ` Ilpo Järvinen @ 2009-01-13 5:31 ` Bill Fink 2009-02-13 17:02 ` Jeremy Jackson 0 siblings, 1 reply; 23+ messages in thread From: Bill Fink @ 2009-01-13 5:31 UTC (permalink / raw) To: Ilpo Järvinen ; +Cc: Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Mon, 12 Jan 2009, Ilpo Järvinen wrote: > On Sun, 11 Jan 2009, Bill Fink wrote: > > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > > I fully understand. Sometimes I have to talk to stupid devices though. What > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > > > __u32 tcpi_unacked; > > > > > > > > Which comes from: > > > > > > > > struct tcp_sock { > > > > ... > > > > u32 packets_out; /* Packets which are "in flight" */ > > > > ... > > > > } > > > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > > > 0 means that there are noin-flight packets, which is effectively number > > > of unacked packets. So if your application waits for this field to > > > become zero, it will wait for all sent packets to be acked. > > > > I use this type of strategy in nuttcp, and it seems to work fine. > > I have a loop with a small delay and a check of tcpi_unacked, and > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > > period has passed. > > Checking tcpi_unacked alone won't be reliable. The peer might be slow > enough to advertize zero window for a short period of time and during > that period you would have packets_out zero... I'll keep this in mind for the future, although it doesn't seem to be a significant issue in practice. I use this scheme to try and account for the tcpi_total_retrans for the data stream, so if this corner case was hit, it would mean an under reporting of the total TCP retransmissions for the nuttcp test. If I understand you correctly, to hit this corner case, just after the final TCP write, there would have to be no packets in flight together with a zero TCP window. To make it more bullet-proof, I guess after seeing a zero tcpi_unacked, an additional small delay should be performed, and then rechecking for a zero tcpi_unacked. I don't see anything else obvious (to me anyway) in the tcp_info that would be particularly helpful in handling this. -Thanks -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-13 5:31 ` Bill Fink @ 2009-02-13 17:02 ` Jeremy Jackson 2009-02-20 18:10 ` Bill Fink 0 siblings, 1 reply; 23+ messages in thread From: Jeremy Jackson @ 2009-02-13 17:02 UTC (permalink / raw) To: Bill Fink Cc: ilpo.jarvinen, Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Tue, 2009-01-13 at 00:31 -0500, Bill Fink wrote: > On Mon, 12 Jan 2009, Ilpo Järvinen wrote: > > > On Sun, 11 Jan 2009, Bill Fink wrote: > > > > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > > > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > > > I fully understand. Sometimes I have to talk to stupid devices though. What An excellent article on this subject: http://ds9a.nl/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.txt "Luckily, it turns out that Linux keeps track of the amount of unacknowledged data, which can be queried using the SIOCOUTQ ioctl(). Once we see this number hit 0, we can be reasonably sure our data reached at least the remote operating system." is this the same as the TCP_INFO getsockopt() ? if you follow the progression from write(socket_fd, ) ... the data sits in the socket buffer, and SIOCOUTQ is initially zero. If the connection started with a zero window, it could sit like that for a while (sometimes called a "tarpit ?). But, you should still see the data in your socket buffer, yes? So, I think you want to make sure your socket write buffer is empty (converted to unacked data), *then* make sure your unacked data is 0. write(sock, buffer, 1000000); // returns 1000000 shutdown(sock, SHUT_WR); now wait for SIOCOUTQ to hit 0. if window is 0, shutdown() would wait until show device sets window > 0 again, or forever on a tarpitted connection. Either way, when if/when it finishes, you know all data was transmitted, now wait for all of it to be ACKed with SIOCOUTQ. > > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > > > > > __u32 tcpi_unacked; > > > > > > > > > > Which comes from: > > > > > > > > > > struct tcp_sock { > > > > > ... > > > > > u32 packets_out; /* Packets which are "in flight" */ > > > > > ... > > > > > } > > > > > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > > > > > 0 means that there are noin-flight packets, which is effectively number > > > > of unacked packets. So if your application waits for this field to > > > > become zero, it will wait for all sent packets to be acked. > > > > > > I use this type of strategy in nuttcp, and it seems to work fine. > > > I have a loop with a small delay and a check of tcpi_unacked, and > > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > > > period has passed. > > > > Checking tcpi_unacked alone won't be reliable. The peer might be slow > > enough to advertize zero window for a short period of time and during > > that period you would have packets_out zero... > > I'll keep this in mind for the future, although it doesn't seem to > be a significant issue in practice. I use this scheme to try and > account for the tcpi_total_retrans for the data stream, so if this > corner case was hit, it would mean an under reporting of the total > TCP retransmissions for the nuttcp test. > > If I understand you correctly, to hit this corner case, just after > the final TCP write, there would have to be no packets in flight > together with a zero TCP window. To make it more bullet-proof, I > guess after seeing a zero tcpi_unacked, an additional small delay > should be performed, and then rechecking for a zero tcpi_unacked. > I don't see anything else obvious (to me anyway) in the tcp_info > that would be particularly helpful in handling this. -- Jeremy Jackson Coplanar Networks (519)489-4903 http://www.coplanar.net jerj@coplanar.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-02-13 17:02 ` Jeremy Jackson @ 2009-02-20 18:10 ` Bill Fink 0 siblings, 0 replies; 23+ messages in thread From: Bill Fink @ 2009-02-20 18:10 UTC (permalink / raw) To: Jeremy Jackson Cc: ilpo.jarvinen, Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev On Fri, 13 Feb 2009, Jeremy Jackson wrote: > On Tue, 2009-01-13 at 00:31 -0500, Bill Fink wrote: > > On Mon, 12 Jan 2009, Ilpo Järvinen wrote: > > > > > On Sun, 11 Jan 2009, Bill Fink wrote: > > > > > > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote: > > > > > > > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote: > > > > > > I fully understand. Sometimes I have to talk to stupid devices though. What > > An excellent article on this subject: > > http://ds9a.nl/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.txt > > "Luckily, it turns out that Linux keeps track of the amount of > unacknowledged > data, which can be queried using the SIOCOUTQ ioctl(). Once we see this > number hit 0, we can be reasonably sure our data reached at least the > remote > operating system." > > is this the same as the TCP_INFO getsockopt() ? If you mean the tcpinfo_unacked variable, then no it is not the same as the SIOCOUTQ info. > if you follow the progression from write(socket_fd, ) ... the data sits > in > the socket buffer, and SIOCOUTQ is initially zero. If the connection > started with a zero window, > it could sit like that for a while (sometimes called a "tarpit ?). But, > you should still see the data in your socket buffer, yes? > > So, I think you want to make sure your socket write buffer is empty > (converted to unacked data), *then* make sure your unacked data is 0. > > write(sock, buffer, 1000000); // returns 1000000 > shutdown(sock, SHUT_WR); > now wait for SIOCOUTQ to hit 0. > > if window is 0, shutdown() would wait until show device sets window > 0 > again, or forever on a tarpitted connection. Either way, when if/when > it finishes, you know all data was transmitted, now wait for all of it > to be ACKed with SIOCOUTQ. While the "shutdown(sock, SHUT_WR)" might be useful, it isn't actually necessary, since the SIOCOUTQ info includes both unACKed data (reported by tcpinfo_unacked variable) and never sent data (written by app but outside of receiver's allowed window). -Bill > > > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info: > > > > > > > > > > > > __u32 tcpi_unacked; > > > > > > > > > > > > Which comes from: > > > > > > > > > > > > struct tcp_sock { > > > > > > ... > > > > > > u32 packets_out; /* Packets which are "in flight" */ > > > > > > ... > > > > > > } > > > > > > > > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked? > > > > > > > > > > 0 means that there are noin-flight packets, which is effectively number > > > > > of unacked packets. So if your application waits for this field to > > > > > become zero, it will wait for all sent packets to be acked. > > > > > > > > I use this type of strategy in nuttcp, and it seems to work fine. > > > > I have a loop with a small delay and a check of tcpi_unacked, and > > > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout > > > > period has passed. > > > > > > Checking tcpi_unacked alone won't be reliable. The peer might be slow > > > enough to advertize zero window for a short period of time and during > > > that period you would have packets_out zero... > > > > I'll keep this in mind for the future, although it doesn't seem to > > be a significant issue in practice. I use this scheme to try and > > account for the tcpi_total_retrans for the data stream, so if this > > corner case was hit, it would mean an under reporting of the total > > TCP retransmissions for the nuttcp test. > > > > If I understand you correctly, to hit this corner case, just after > > the final TCP write, there would have to be no packets in flight > > together with a zero TCP window. To make it more bullet-proof, I > > guess after seeing a zero tcpi_unacked, an additional small delay > > should be performed, and then rechecking for a zero tcpi_unacked. > > I don't see anything else obvious (to me anyway) in the tcp_info > > that would be particularly helpful in handling this. > > -- > Jeremy Jackson > Coplanar Networks > (519)489-4903 > http://www.coplanar.net > jerj@coplanar.net ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
@ 2009-01-13 6:32 Herbert Xu
2009-01-13 6:56 ` Bill Fink
0 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2009-01-13 6:32 UTC (permalink / raw)
To: billfink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev
Bill Fink <billfink@mindspring.com> wrote:
>
> If I understand you correctly, to hit this corner case, just after
> the final TCP write, there would have to be no packets in flight
> together with a zero TCP window. To make it more bullet-proof, I
> guess after seeing a zero tcpi_unacked, an additional small delay
> should be performed, and then rechecking for a zero tcpi_unacked.
> I don't see anything else obvious (to me anyway) in the tcp_info
> that would be particularly helpful in handling this.
What's wrong with idiag_wqueue? Isn't that a much more direct
way to get this?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-13 6:32 Herbert Xu @ 2009-01-13 6:56 ` Bill Fink 2009-01-13 7:01 ` Herbert Xu 2009-01-13 7:06 ` Rick Jones 0 siblings, 2 replies; 23+ messages in thread From: Bill Fink @ 2009-01-13 6:56 UTC (permalink / raw) To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Tue, 13 Jan 2009, Herbert Xu wrote: > Bill Fink <billfink@mindspring.com> wrote: > > > > If I understand you correctly, to hit this corner case, just after > > the final TCP write, there would have to be no packets in flight > > together with a zero TCP window. To make it more bullet-proof, I > > guess after seeing a zero tcpi_unacked, an additional small delay > > should be performed, and then rechecking for a zero tcpi_unacked. > > I don't see anything else obvious (to me anyway) in the tcp_info > > that would be particularly helpful in handling this. > > What's wrong with idiag_wqueue? Isn't that a much more direct > way to get this? I'm not familiar with idiag_wqueue, but it sounds like it has something to do with INET_DIAG/INET_TCP_DIAG. It was a long time ago, but I seem to recall that using INET_DIAG had a negative impact on performance, and since the main point of nuttcp is to measure TCP/UDP performance, that would be contrary to its primary purpose. Also, I don't want to rely on something that's not guaranteed to be part of the running kernel. -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-13 6:56 ` Bill Fink @ 2009-01-13 7:01 ` Herbert Xu 2009-01-14 7:43 ` Bill Fink 2009-01-13 7:06 ` Rick Jones 1 sibling, 1 reply; 23+ messages in thread From: Herbert Xu @ 2009-01-13 7:01 UTC (permalink / raw) To: Bill Fink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Tue, Jan 13, 2009 at 01:56:14AM -0500, Bill Fink wrote: > > I'm not familiar with idiag_wqueue, but it sounds like it has something > to do with INET_DIAG/INET_TCP_DIAG. It was a long time ago, but I seem > to recall that using INET_DIAG had a negative impact on performance, > and since the main point of nuttcp is to measure TCP/UDP performance, > that would be contrary to its primary purpose. Also, I don't want to > rely on something that's not guaranteed to be part of the running kernel. Well SIOCOUTQ also returns the same information. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-13 7:01 ` Herbert Xu @ 2009-01-14 7:43 ` Bill Fink 2009-01-14 8:29 ` Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: Bill Fink @ 2009-01-14 7:43 UTC (permalink / raw) To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Tue, 13 Jan 2009, Herbert Xu wrote: > On Tue, Jan 13, 2009 at 01:56:14AM -0500, Bill Fink wrote: > > > > I'm not familiar with idiag_wqueue, but it sounds like it has something > > to do with INET_DIAG/INET_TCP_DIAG. It was a long time ago, but I seem > > to recall that using INET_DIAG had a negative impact on performance, > > and since the main point of nuttcp is to measure TCP/UDP performance, > > that would be contrary to its primary purpose. Also, I don't want to > > rely on something that's not guaranteed to be part of the running kernel. > > Well SIOCOUTQ also returns the same information. I like that. If both tcpi_unacked and SIOCOUTQ are zero, that should insure all data has been sent and ACKed. I'll add that to the nuttcp TODO list, although it's not an urgent matter in general usage. The performance argument I gave against INET_DIAG appears to have been bogus. At least just loading the inet_diag and tcp_diag modules didn't have a significant impact on 10-GigE performance with either 1500-byte packets or 9000-byte jumbo frame packets (CPU usage may have increased slightly but even that's not definite). -Thanks -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-14 7:43 ` Bill Fink @ 2009-01-14 8:29 ` Herbert Xu 2009-01-14 9:05 ` Bill Fink 0 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2009-01-14 8:29 UTC (permalink / raw) To: Bill Fink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Wed, Jan 14, 2009 at 02:43:41AM -0500, Bill Fink wrote: > > I like that. If both tcpi_unacked and SIOCOUTQ are zero, that should Why do you still need tcpi_unacked? SIOCOUTQ returns the amount of all outstanding data so that alone should be good enough. > The performance argument I gave against INET_DIAG appears to have been > bogus. At least just loading the inet_diag and tcp_diag modules didn't > have a significant impact on 10-GigE performance with either 1500-byte > packets or 9000-byte jumbo frame packets (CPU usage may have increased > slightly but even that's not definite). Well if you don't make diag requests diag has zero impact on the system. How much of an impact diag has if you do make requests is dependent on the number of open sockets. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-14 8:29 ` Herbert Xu @ 2009-01-14 9:05 ` Bill Fink 2009-01-14 11:30 ` Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: Bill Fink @ 2009-01-14 9:05 UTC (permalink / raw) To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Wed, 14 Jan 2009, Herbert Xu wrote: > On Wed, Jan 14, 2009 at 02:43:41AM -0500, Bill Fink wrote: > > > > I like that. If both tcpi_unacked and SIOCOUTQ are zero, that should > > Why do you still need tcpi_unacked? SIOCOUTQ returns the amount > of all outstanding data so that alone should be good enough. Well, my man tcp(7) just says: SIOCOUTQ Returns the amount of unsent data in the socket send queue. The socket must not be in LISTEN state, otherwise an error (EINVAL) is returned. It's not clear from that that it also includes sent but unacked data. Perhaps it should be changed to say "the amount of unsent or unacked data" instead. On reflection, it makes sense that the send queue includes sent but unacked as well as never sent data, since the unacked data may need to be retransmitted. This all assumes that I'm now correctly understanding what you're saying/implying (I'm getting tired). > > The performance argument I gave against INET_DIAG appears to have been > > bogus. At least just loading the inet_diag and tcp_diag modules didn't > > have a significant impact on 10-GigE performance with either 1500-byte > > packets or 9000-byte jumbo frame packets (CPU usage may have increased > > slightly but even that's not definite). > > Well if you don't make diag requests diag has zero impact on > the system. How much of an impact diag has if you do make requests > is dependent on the number of open sockets. Good to know. -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-14 9:05 ` Bill Fink @ 2009-01-14 11:30 ` Herbert Xu 2009-01-15 6:33 ` Bill Fink 0 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2009-01-14 11:30 UTC (permalink / raw) To: Bill Fink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Wed, Jan 14, 2009 at 04:05:50AM -0500, Bill Fink wrote: > > Well, my man tcp(7) just says: > > SIOCOUTQ > Returns the amount of unsent data in the socket send queue. The > socket must not be in LISTEN state, otherwise an error (EINVAL) > is returned. You still read man pages? :) If you look at the actual code, SIOCOUTQ returns tp->write_seq - tp->snd_una Really you can't get any more precise than that for what you want. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-14 11:30 ` Herbert Xu @ 2009-01-15 6:33 ` Bill Fink 0 siblings, 0 replies; 23+ messages in thread From: Bill Fink @ 2009-01-15 6:33 UTC (permalink / raw) To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Wed, 14 Jan 2009, Herbert Xu wrote: > On Wed, Jan 14, 2009 at 04:05:50AM -0500, Bill Fink wrote: > > > > Well, my man tcp(7) just says: > > > > SIOCOUTQ > > Returns the amount of unsent data in the socket send queue. The > > socket must not be in LISTEN state, otherwise an error (EINVAL) > > is returned. > > You still read man pages? :) > > If you look at the actual code, SIOCOUTQ returns > > tp->write_seq - tp->snd_una > > Really you can't get any more precise than that for what you want. Thanks for the clarification! That does make it crystal clear. -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-13 6:56 ` Bill Fink 2009-01-13 7:01 ` Herbert Xu @ 2009-01-13 7:06 ` Rick Jones 2009-01-14 8:05 ` Bill Fink 1 sibling, 1 reply; 23+ messages in thread From: Rick Jones @ 2009-01-13 7:06 UTC (permalink / raw) To: Bill Fink Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev Bill Fink wrote: > On Tue, 13 Jan 2009, Herbert Xu wrote: > > >>Bill Fink <billfink@mindspring.com> wrote: >> >>>If I understand you correctly, to hit this corner case, just after >>>the final TCP write, there would have to be no packets in flight >>>together with a zero TCP window. To make it more bullet-proof, I >>>guess after seeing a zero tcpi_unacked, an additional small delay >>>should be performed, and then rechecking for a zero tcpi_unacked. >>>I don't see anything else obvious (to me anyway) in the tcp_info >>>that would be particularly helpful in handling this. >> >>What's wrong with idiag_wqueue? Isn't that a much more direct >>way to get this? > > > I'm not familiar with idiag_wqueue, but it sounds like it has something > to do with INET_DIAG/INET_TCP_DIAG. It was a long time ago, but I seem > to recall that using INET_DIAG had a negative impact on performance, > and since the main point of nuttcp is to measure TCP/UDP performance, > that would be contrary to its primary purpose. Also, I don't want to > rely on something that's not guaranteed to be part of the running kernel. How likely is it that the "additional small delay" above would be much less than waiting for a read return of zero after a shutdown(SHUT_WR) call? rick jones ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-13 7:06 ` Rick Jones @ 2009-01-14 8:05 ` Bill Fink 2009-01-14 8:08 ` Rick Jones 0 siblings, 1 reply; 23+ messages in thread From: Bill Fink @ 2009-01-14 8:05 UTC (permalink / raw) To: Rick Jones Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Mon, 12 Jan 2009, Rick Jones wrote: > Bill Fink wrote: > > On Tue, 13 Jan 2009, Herbert Xu wrote: > > > > > >>Bill Fink <billfink@mindspring.com> wrote: > >> > >>>If I understand you correctly, to hit this corner case, just after > >>>the final TCP write, there would have to be no packets in flight > >>>together with a zero TCP window. To make it more bullet-proof, I > >>>guess after seeing a zero tcpi_unacked, an additional small delay > >>>should be performed, and then rechecking for a zero tcpi_unacked. > >>>I don't see anything else obvious (to me anyway) in the tcp_info > >>>that would be particularly helpful in handling this. > >> > >>What's wrong with idiag_wqueue? Isn't that a much more direct > >>way to get this? > > > > > > I'm not familiar with idiag_wqueue, but it sounds like it has something > > to do with INET_DIAG/INET_TCP_DIAG. It was a long time ago, but I seem > > to recall that using INET_DIAG had a negative impact on performance, > > and since the main point of nuttcp is to measure TCP/UDP performance, > > that would be contrary to its primary purpose. Also, I don't want to > > rely on something that's not guaranteed to be part of the running kernel. > > How likely is it that the "additional small delay" above would be much > less than waiting for a read return of zero after a shutdown(SHUT_WR) call? I'm not sure I understand what you're getting at. I did consider doing something like what you suggested, but in the end decided it was simpler to deal with a fully ESTABLISHED connection, than worrying about possible races with a socket being (partially or fully) closed. -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-14 8:05 ` Bill Fink @ 2009-01-14 8:08 ` Rick Jones 2009-01-14 8:32 ` Bill Fink 0 siblings, 1 reply; 23+ messages in thread From: Rick Jones @ 2009-01-14 8:08 UTC (permalink / raw) To: Bill Fink Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev >>How likely is it that the "additional small delay" above would be much >>less than waiting for a read return of zero after a shutdown(SHUT_WR) call? > > > I'm not sure I understand what you're getting at. I did consider doing > something like what you suggested, but in the end decided it was simpler > to deal with a fully ESTABLISHED connection, than worrying about possible > races with a socket being (partially or fully) closed. Ostensibly, using a shutdown(SHUT_WR) and then a wait for a recv return of zero would take about the same length of time as polling local connection stats to see that there were no ostensibly unacked data - both will take one RTT right? and shutdown/read has the added property that it will deal with zero windows automagically. rick ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24? 2009-01-14 8:08 ` Rick Jones @ 2009-01-14 8:32 ` Bill Fink 0 siblings, 0 replies; 23+ messages in thread From: Bill Fink @ 2009-01-14 8:32 UTC (permalink / raw) To: Rick Jones Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev On Wed, 14 Jan 2009, Rick Jones wrote: > >>How likely is it that the "additional small delay" above would be much > >>less than waiting for a read return of zero after a shutdown(SHUT_WR) call? > > > > > > I'm not sure I understand what you're getting at. I did consider doing > > something like what you suggested, but in the end decided it was simpler > > to deal with a fully ESTABLISHED connection, than worrying about possible > > races with a socket being (partially or fully) closed. > > Ostensibly, using a shutdown(SHUT_WR) and then a wait for a recv return > of zero would take about the same length of time as polling local > connection stats to see that there were no ostensibly unacked data - > both will take one RTT right? and shutdown/read has the added property > that it will deal with zero windows automagically. With the shutdown(SHUT_WR)/read() approach, I would have had to set a timeout on the read, to handle the case where the peer just went away, whereas currently I just check elapsed time (I strive to make nuttcp robust in such cases to allow it to be used reliably within scripts run for example from cron). Also, I was (perhaps unncessarily) worried that after the zero read(), the socket would effectively be closed, and I wasn't sure then about the reliability of using tcp_info to get the tcpi_total_retrans at that point. As with most things, there's more than one way to skin a cat. -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2009-02-20 18:11 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-11 21:23 SO_LINGER dead: I get an immediate RST on 2.6.24? bert hubert 2009-01-11 22:08 ` H. Willstrand 2009-01-11 22:45 ` sendfile()? " bert hubert 2009-01-11 22:54 ` Evgeniy Polyakov 2009-01-11 23:08 ` bert hubert 2009-01-11 23:18 ` Evgeniy Polyakov 2009-01-12 4:50 ` Bill Fink 2009-01-12 9:18 ` Ilpo Järvinen 2009-01-13 5:31 ` Bill Fink 2009-02-13 17:02 ` Jeremy Jackson 2009-02-20 18:10 ` Bill Fink -- strict thread matches above, loose matches on Subject: below -- 2009-01-13 6:32 Herbert Xu 2009-01-13 6:56 ` Bill Fink 2009-01-13 7:01 ` Herbert Xu 2009-01-14 7:43 ` Bill Fink 2009-01-14 8:29 ` Herbert Xu 2009-01-14 9:05 ` Bill Fink 2009-01-14 11:30 ` Herbert Xu 2009-01-15 6:33 ` Bill Fink 2009-01-13 7:06 ` Rick Jones 2009-01-14 8:05 ` Bill Fink 2009-01-14 8:08 ` Rick Jones 2009-01-14 8:32 ` Bill Fink
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).