SO_LINGER dead: I get an immediate RST on 2.6.24?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* SO_LINGER dead: I get an immediate RST on 2.6.24?
@ 2009-01-11 21:23 bert hubert
  2009-01-11 22:08 ` H. Willstrand
  0 siblings, 1 reply; 23+ messages in thread
From: bert hubert @ 2009-01-11 21:23 UTC (permalink / raw)
  To: netdev

Hi everybody,

I have an application where I need to send data from A to B, and beforehand,
I don't know how much data this will be. 

B is 'stupid', and consists solely of a TCP/IP port accepting data, and I
have no way to chunk this data. So what I do is issue blocking calls to
write(), shutdown(fd, SHUT_WR), and wait for the fd to become readable which
tells me that the remote has packed up, and I'm good to go.

Before this, I've tried SO_LINGER with various timeouts but nothing helped.

When I tcpdump, I find that my close() is immediately turned into an RST
packet.

Is SO_LINGER a NOOP? Does it still do anything?

I'm about to blog this up - the 'shutdown() and read()' technique is
something I had to purloin from the Apache source.

So I'd love to know the words of the wise on this one.

Thanks.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-11 21:23 SO_LINGER dead: I get an immediate RST on 2.6.24? bert hubert
@ 2009-01-11 22:08 ` H. Willstrand
  2009-01-11 22:45   ` sendfile()? " bert hubert
  0 siblings, 1 reply; 23+ messages in thread
From: H. Willstrand @ 2009-01-11 22:08 UTC (permalink / raw)
  To: bert hubert, netdev

On Sun, Jan 11, 2009 at 10:23 PM, bert hubert <bert.hubert@netherlabs.nl> wrote:
> Hi everybody,
>
> I have an application where I need to send data from A to B, and beforehand,
> I don't know how much data this will be.
>
> B is 'stupid', and consists solely of a TCP/IP port accepting data, and I
> have no way to chunk this data. So what I do is issue blocking calls to
> write(), shutdown(fd, SHUT_WR), and wait for the fd to become readable which
> tells me that the remote has packed up, and I'm good to go.
>
> Before this, I've tried SO_LINGER with various timeouts but nothing helped.
>
> When I tcpdump, I find that my close() is immediately turned into an RST
> packet.
>
> Is SO_LINGER a NOOP? Does it still do anything?
>
> I'm about to blog this up - the 'shutdown() and read()' technique is
> something I had to purloin from the Apache source.
>
> So I'd love to know the words of the wise on this one.
>
> Thanks.
>
> --
> http://www.PowerDNS.com      Open source, database driven DNS Software
> http://netherlabs.nl              Open and Closed source services
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

This is the correct behavior according to RFC 2525, see section 2.17
(there are an example).

//H.W.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-11 22:08 ` H. Willstrand
@ 2009-01-11 22:45   ` bert hubert
  2009-01-11 22:54     ` Evgeniy Polyakov
  0 siblings, 1 reply; 23+ messages in thread
From: bert hubert @ 2009-01-11 22:45 UTC (permalink / raw)
  To: H. Willstrand; +Cc: netdev

On Sun, Jan 11, 2009 at 11:08:16PM +0100, H. Willstrand wrote:
> > Is SO_LINGER a NOOP? Does it still do anything?
> This is the correct behavior according to RFC 2525, see section 2.17
> (there are an example).

Ah - very good, thank you. I'm trying to gather as much information as I can
before writing this all up. This should save netdev & the linux kernel
community a lot of email!

Is there any way to make sure there is no pending output data, so one can
safely call close(), and not get an RST-situation? 

Let me put it more succinctly. What I would very much like to have is what
Linux sendfile() offers in practice.

It appears that if one asks sendfile() to transmit a million bytes, it will
only return when the ACK for the millionth byte is in.

I know that TCP will never be fully fully reliable, but I would love to have
a way to know that the millionth byte was ACKed, or alternatively, that an
error prevented that.

>From what I've read so far, I think the POSIX functions don't offer this.
But does Linux? sendfile appears to get it right..

Thanks.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-11 22:45   ` sendfile()? " bert hubert
@ 2009-01-11 22:54     ` Evgeniy Polyakov
  2009-01-11 23:08       ` bert hubert
  0 siblings, 1 reply; 23+ messages in thread
From: Evgeniy Polyakov @ 2009-01-11 22:54 UTC (permalink / raw)
  To: bert hubert, H. Willstrand, netdev

Hi Bert.

On Sun, Jan 11, 2009 at 11:45:43PM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote:
> Is there any way to make sure there is no pending output data, so one can
> safely call close(), and not get an RST-situation? 

You can try to work with tcp cork options, uncorking the socket means
that stack has sent data to the hardware, there are no other guarantees.

> Let me put it more succinctly. What I would very much like to have is what
> Linux sendfile() offers in practice.
> 
> It appears that if one asks sendfile() to transmit a million bytes, it will
> only return when the ACK for the millionth byte is in.

No it is not, it returns when it believes it has sent all the requested
data, but in practice it can be even not sent but waiting in some
hardware queue.

> I know that TCP will never be fully fully reliable, but I would love to have
> a way to know that the millionth byte was ACKed, or alternatively, that an
> error prevented that.

There is no way to get a notification when data is acked by the remote
side. Generally you should invent some kind of own explicit acks.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-11 22:54     ` Evgeniy Polyakov
@ 2009-01-11 23:08       ` bert hubert
  2009-01-11 23:18         ` Evgeniy Polyakov
  0 siblings, 1 reply; 23+ messages in thread
From: bert hubert @ 2009-01-11 23:08 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: H. Willstrand, netdev

On Mon, Jan 12, 2009 at 01:54:27AM +0300, Evgeniy Polyakov wrote:
> You can try to work with tcp cork options, uncorking the socket means
> that stack has sent data to the hardware, there are no other guarantees.

Ah, smart.

> > It appears that if one asks sendfile() to transmit a million bytes, it will
> > only return when the ACK for the millionth byte is in.
> 
> No it is not, it returns when it believes it has sent all the requested
> data, but in practice it can be even not sent but waiting in some
> hardware queue.

Ah ok.

> > I know that TCP will never be fully fully reliable, but I would love to have
> > a way to know that the millionth byte was ACKed, or alternatively, that an
> > error prevented that.
> 
> There is no way to get a notification when data is acked by the remote
> side. Generally you should invent some kind of own explicit acks.

I fully understand. Sometimes I have to talk to stupid devices though. What
I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info:

        __u32   tcpi_unacked;

Which comes from:

struct tcp_sock {
...
        u32     packets_out;    /* Packets which are "in flight"        */
...
}

If this becomes 0, perhaps this might tell me everything I sent was acked?

	Bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-11 23:08       ` bert hubert
@ 2009-01-11 23:18         ` Evgeniy Polyakov
  2009-01-12  4:50           ` Bill Fink
  0 siblings, 1 reply; 23+ messages in thread
From: Evgeniy Polyakov @ 2009-01-11 23:18 UTC (permalink / raw)
  To: bert hubert, H. Willstrand, netdev

On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote:
> I fully understand. Sometimes I have to talk to stupid devices though. What
> I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info:
> 
>         __u32   tcpi_unacked;
> 
> Which comes from:
> 
> struct tcp_sock {
> ...
>         u32     packets_out;    /* Packets which are "in flight"        */
> ...
> }
> 
> If this becomes 0, perhaps this might tell me everything I sent was acked?

0 means that there are noin-flight packets, which is effectively number
of unacked packets. So if your application waits for this field to
become zero, it will wait for all sent packets to be acked.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-11 23:18         ` Evgeniy Polyakov
@ 2009-01-12  4:50           ` Bill Fink
  2009-01-12  9:18             ` Ilpo Järvinen
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Fink @ 2009-01-12  4:50 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: bert hubert, H. Willstrand, netdev

On Mon, 12 Jan 2009, Evgeniy Polyakov wrote:

> On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote:
> > I fully understand. Sometimes I have to talk to stupid devices though. What
> > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info:
> > 
> >         __u32   tcpi_unacked;
> > 
> > Which comes from:
> > 
> > struct tcp_sock {
> > ...
> >         u32     packets_out;    /* Packets which are "in flight"        */
> > ...
> > }
> > 
> > If this becomes 0, perhaps this might tell me everything I sent was acked?
> 
> 0 means that there are noin-flight packets, which is effectively number
> of unacked packets. So if your application waits for this field to
> become zero, it will wait for all sent packets to be acked.

I use this type of strategy in nuttcp, and it seems to work fine.
I have a loop with a small delay and a check of tcpi_unacked, and
break out of the loop if tcpi_unacked becomes 0 or a defined timeout
period has passed.

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-12  4:50           ` Bill Fink
@ 2009-01-12  9:18             ` Ilpo Järvinen
  2009-01-13  5:31               ` Bill Fink
  0 siblings, 1 reply; 23+ messages in thread
From: Ilpo Järvinen @ 2009-01-12  9:18 UTC (permalink / raw)
  To: Bill Fink; +Cc: Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev

On Sun, 11 Jan 2009, Bill Fink wrote:

> On Mon, 12 Jan 2009, Evgeniy Polyakov wrote:
> 
> > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote:
> > > I fully understand. Sometimes I have to talk to stupid devices though. What
> > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info:
> > > 
> > >         __u32   tcpi_unacked;
> > > 
> > > Which comes from:
> > > 
> > > struct tcp_sock {
> > > ...
> > >         u32     packets_out;    /* Packets which are "in flight"        */
> > > ...
> > > }
> > > 
> > > If this becomes 0, perhaps this might tell me everything I sent was acked?
> > 
> > 0 means that there are noin-flight packets, which is effectively number
> > of unacked packets. So if your application waits for this field to
> > become zero, it will wait for all sent packets to be acked.
> 
> I use this type of strategy in nuttcp, and it seems to work fine.
> I have a loop with a small delay and a check of tcpi_unacked, and
> break out of the loop if tcpi_unacked becomes 0 or a defined timeout
> period has passed.

Checking tcpi_unacked alone won't be reliable. The peer might be slow 
enough to advertize zero window for a short period of time and during 
that period you would have packets_out zero...


-- 
 i.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-12  9:18             ` Ilpo Järvinen
@ 2009-01-13  5:31               ` Bill Fink
  2009-02-13 17:02                 ` Jeremy Jackson
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Fink @ 2009-01-13  5:31 UTC (permalink / raw)
  To:  Ilpo Järvinen ; +Cc: Evgeniy Polyakov, bert hubert, H. Willstrand, Netdev

On Mon, 12 Jan 2009, Ilpo Järvinen wrote:

> On Sun, 11 Jan 2009, Bill Fink wrote:
> 
> > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote:
> > 
> > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote:
> > > > I fully understand. Sometimes I have to talk to stupid devices though. What
> > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info:
> > > > 
> > > >         __u32   tcpi_unacked;
> > > > 
> > > > Which comes from:
> > > > 
> > > > struct tcp_sock {
> > > > ...
> > > >         u32     packets_out;    /* Packets which are "in flight"        */
> > > > ...
> > > > }
> > > > 
> > > > If this becomes 0, perhaps this might tell me everything I sent was acked?
> > > 
> > > 0 means that there are noin-flight packets, which is effectively number
> > > of unacked packets. So if your application waits for this field to
> > > become zero, it will wait for all sent packets to be acked.
> > 
> > I use this type of strategy in nuttcp, and it seems to work fine.
> > I have a loop with a small delay and a check of tcpi_unacked, and
> > break out of the loop if tcpi_unacked becomes 0 or a defined timeout
> > period has passed.
> 
> Checking tcpi_unacked alone won't be reliable. The peer might be slow 
> enough to advertize zero window for a short period of time and during 
> that period you would have packets_out zero...

I'll keep this in mind for the future, although it doesn't seem to
be a significant issue in practice.  I use this scheme to try and
account for the tcpi_total_retrans for the data stream, so if this
corner case was hit, it would mean an under reporting of the total
TCP retransmissions for the nuttcp test.

If I understand you correctly, to hit this corner case, just after
the final TCP write, there would have to be no packets in flight
together with a zero TCP window.  To make it more bullet-proof, I
guess after seeing a zero tcpi_unacked, an additional small delay
should be performed, and then rechecking for a zero tcpi_unacked.
I don't see anything else obvious (to me anyway) in the tcp_info
that would be particularly helpful in handling this.

						-Thanks

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
@ 2009-01-13  6:32 Herbert Xu
  2009-01-13  6:56 ` Bill Fink
  0 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2009-01-13  6:32 UTC (permalink / raw)
  To: billfink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

Bill Fink <billfink@mindspring.com> wrote:
>
> If I understand you correctly, to hit this corner case, just after
> the final TCP write, there would have to be no packets in flight
> together with a zero TCP window.  To make it more bullet-proof, I
> guess after seeing a zero tcpi_unacked, an additional small delay
> should be performed, and then rechecking for a zero tcpi_unacked.
> I don't see anything else obvious (to me anyway) in the tcp_info
> that would be particularly helpful in handling this.

What's wrong with idiag_wqueue? Isn't that a much more direct
way to get this?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-13  6:32 Herbert Xu
@ 2009-01-13  6:56 ` Bill Fink
  2009-01-13  7:01   ` Herbert Xu
  2009-01-13  7:06   ` Rick Jones
  0 siblings, 2 replies; 23+ messages in thread
From: Bill Fink @ 2009-01-13  6:56 UTC (permalink / raw)
  To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Tue, 13 Jan 2009, Herbert Xu wrote:

> Bill Fink <billfink@mindspring.com> wrote:
> >
> > If I understand you correctly, to hit this corner case, just after
> > the final TCP write, there would have to be no packets in flight
> > together with a zero TCP window.  To make it more bullet-proof, I
> > guess after seeing a zero tcpi_unacked, an additional small delay
> > should be performed, and then rechecking for a zero tcpi_unacked.
> > I don't see anything else obvious (to me anyway) in the tcp_info
> > that would be particularly helpful in handling this.
> 
> What's wrong with idiag_wqueue? Isn't that a much more direct
> way to get this?

I'm not familiar with idiag_wqueue, but it sounds like it has something
to do with INET_DIAG/INET_TCP_DIAG.  It was a long time ago, but I seem
to recall that using INET_DIAG had a negative impact on performance,
and since the main point of nuttcp is to measure TCP/UDP performance,
that would be contrary to its primary purpose.  Also, I don't want to
rely on something that's not guaranteed to be part of the running kernel.

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-13  6:56 ` Bill Fink
@ 2009-01-13  7:01   ` Herbert Xu
  2009-01-14  7:43     ` Bill Fink
  2009-01-13  7:06   ` Rick Jones
  1 sibling, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2009-01-13  7:01 UTC (permalink / raw)
  To: Bill Fink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Tue, Jan 13, 2009 at 01:56:14AM -0500, Bill Fink wrote:
>
> I'm not familiar with idiag_wqueue, but it sounds like it has something
> to do with INET_DIAG/INET_TCP_DIAG.  It was a long time ago, but I seem
> to recall that using INET_DIAG had a negative impact on performance,
> and since the main point of nuttcp is to measure TCP/UDP performance,
> that would be contrary to its primary purpose.  Also, I don't want to
> rely on something that's not guaranteed to be part of the running kernel.

Well SIOCOUTQ also returns the same information.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-13  6:56 ` Bill Fink
  2009-01-13  7:01   ` Herbert Xu
@ 2009-01-13  7:06   ` Rick Jones
  2009-01-14  8:05     ` Bill Fink
  1 sibling, 1 reply; 23+ messages in thread
From: Rick Jones @ 2009-01-13  7:06 UTC (permalink / raw)
  To: Bill Fink
  Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

Bill Fink wrote:
> On Tue, 13 Jan 2009, Herbert Xu wrote:
> 
> 
>>Bill Fink <billfink@mindspring.com> wrote:
>>
>>>If I understand you correctly, to hit this corner case, just after
>>>the final TCP write, there would have to be no packets in flight
>>>together with a zero TCP window.  To make it more bullet-proof, I
>>>guess after seeing a zero tcpi_unacked, an additional small delay
>>>should be performed, and then rechecking for a zero tcpi_unacked.
>>>I don't see anything else obvious (to me anyway) in the tcp_info
>>>that would be particularly helpful in handling this.
>>
>>What's wrong with idiag_wqueue? Isn't that a much more direct
>>way to get this?
> 
> 
> I'm not familiar with idiag_wqueue, but it sounds like it has something
> to do with INET_DIAG/INET_TCP_DIAG.  It was a long time ago, but I seem
> to recall that using INET_DIAG had a negative impact on performance,
> and since the main point of nuttcp is to measure TCP/UDP performance,
> that would be contrary to its primary purpose.  Also, I don't want to
> rely on something that's not guaranteed to be part of the running kernel.

How likely is it that the "additional small delay" above would be much 
less than waiting for a read return of zero after a shutdown(SHUT_WR) call?

rick jones

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-13  7:01   ` Herbert Xu
@ 2009-01-14  7:43     ` Bill Fink
  2009-01-14  8:29       ` Herbert Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Fink @ 2009-01-14  7:43 UTC (permalink / raw)
  To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Tue, 13 Jan 2009, Herbert Xu wrote:

> On Tue, Jan 13, 2009 at 01:56:14AM -0500, Bill Fink wrote:
> >
> > I'm not familiar with idiag_wqueue, but it sounds like it has something
> > to do with INET_DIAG/INET_TCP_DIAG.  It was a long time ago, but I seem
> > to recall that using INET_DIAG had a negative impact on performance,
> > and since the main point of nuttcp is to measure TCP/UDP performance,
> > that would be contrary to its primary purpose.  Also, I don't want to
> > rely on something that's not guaranteed to be part of the running kernel.
> 
> Well SIOCOUTQ also returns the same information.

I like that.  If both tcpi_unacked and SIOCOUTQ are zero, that should
insure all data has been sent and ACKed.  I'll add that to the nuttcp
TODO list, although it's not an urgent matter in general usage.

The performance argument I gave against INET_DIAG appears to have been
bogus.  At least just loading the inet_diag and tcp_diag modules didn't
have a significant impact on 10-GigE performance with either 1500-byte
packets or 9000-byte jumbo frame packets (CPU usage may have increased
slightly but even that's not definite).

						-Thanks

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-13  7:06   ` Rick Jones
@ 2009-01-14  8:05     ` Bill Fink
  2009-01-14  8:08       ` Rick Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Fink @ 2009-01-14  8:05 UTC (permalink / raw)
  To: Rick Jones
  Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Mon, 12 Jan 2009, Rick Jones wrote:

> Bill Fink wrote:
> > On Tue, 13 Jan 2009, Herbert Xu wrote:
> > 
> > 
> >>Bill Fink <billfink@mindspring.com> wrote:
> >>
> >>>If I understand you correctly, to hit this corner case, just after
> >>>the final TCP write, there would have to be no packets in flight
> >>>together with a zero TCP window.  To make it more bullet-proof, I
> >>>guess after seeing a zero tcpi_unacked, an additional small delay
> >>>should be performed, and then rechecking for a zero tcpi_unacked.
> >>>I don't see anything else obvious (to me anyway) in the tcp_info
> >>>that would be particularly helpful in handling this.
> >>
> >>What's wrong with idiag_wqueue? Isn't that a much more direct
> >>way to get this?
> > 
> > 
> > I'm not familiar with idiag_wqueue, but it sounds like it has something
> > to do with INET_DIAG/INET_TCP_DIAG.  It was a long time ago, but I seem
> > to recall that using INET_DIAG had a negative impact on performance,
> > and since the main point of nuttcp is to measure TCP/UDP performance,
> > that would be contrary to its primary purpose.  Also, I don't want to
> > rely on something that's not guaranteed to be part of the running kernel.
> 
> How likely is it that the "additional small delay" above would be much 
> less than waiting for a read return of zero after a shutdown(SHUT_WR) call?

I'm not sure I understand what you're getting at.  I did consider doing
something like what you suggested, but in the end decided it was simpler
to deal with a fully ESTABLISHED connection, than worrying about possible
races with a socket being (partially or fully) closed.

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-14  8:05     ` Bill Fink
@ 2009-01-14  8:08       ` Rick Jones
  2009-01-14  8:32         ` Bill Fink
  0 siblings, 1 reply; 23+ messages in thread
From: Rick Jones @ 2009-01-14  8:08 UTC (permalink / raw)
  To: Bill Fink
  Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

>>How likely is it that the "additional small delay" above would be much 
>>less than waiting for a read return of zero after a shutdown(SHUT_WR) call?
> 
> 
> I'm not sure I understand what you're getting at.  I did consider doing
> something like what you suggested, but in the end decided it was simpler
> to deal with a fully ESTABLISHED connection, than worrying about possible
> races with a socket being (partially or fully) closed.

Ostensibly, using a shutdown(SHUT_WR) and then a wait for a recv return 
of zero would take about the same length of time as polling local 
connection stats to see that there were no ostensibly unacked data - 
both will take one RTT right?  and shutdown/read has the added property 
that it will deal with zero windows automagically.

rick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-14  7:43     ` Bill Fink
@ 2009-01-14  8:29       ` Herbert Xu
  2009-01-14  9:05         ` Bill Fink
  0 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2009-01-14  8:29 UTC (permalink / raw)
  To: Bill Fink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Wed, Jan 14, 2009 at 02:43:41AM -0500, Bill Fink wrote:
> 
> I like that.  If both tcpi_unacked and SIOCOUTQ are zero, that should

Why do you still need tcpi_unacked? SIOCOUTQ returns the amount
of all outstanding data so that alone should be good enough.
 
> The performance argument I gave against INET_DIAG appears to have been
> bogus.  At least just loading the inet_diag and tcp_diag modules didn't
> have a significant impact on 10-GigE performance with either 1500-byte
> packets or 9000-byte jumbo frame packets (CPU usage may have increased
> slightly but even that's not definite).

Well if you don't make diag requests diag has zero impact on
the system.  How much of an impact diag has if you do make requests
is dependent on the number of open sockets.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-14  8:08       ` Rick Jones
@ 2009-01-14  8:32         ` Bill Fink
  0 siblings, 0 replies; 23+ messages in thread
From: Bill Fink @ 2009-01-14  8:32 UTC (permalink / raw)
  To: Rick Jones
  Cc: Herbert Xu, ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Wed, 14 Jan 2009, Rick Jones wrote:

> >>How likely is it that the "additional small delay" above would be much 
> >>less than waiting for a read return of zero after a shutdown(SHUT_WR) call?
> > 
> > 
> > I'm not sure I understand what you're getting at.  I did consider doing
> > something like what you suggested, but in the end decided it was simpler
> > to deal with a fully ESTABLISHED connection, than worrying about possible
> > races with a socket being (partially or fully) closed.
> 
> Ostensibly, using a shutdown(SHUT_WR) and then a wait for a recv return 
> of zero would take about the same length of time as polling local 
> connection stats to see that there were no ostensibly unacked data - 
> both will take one RTT right?  and shutdown/read has the added property 
> that it will deal with zero windows automagically.

With the shutdown(SHUT_WR)/read() approach, I would have had to set
a timeout on the read, to handle the case where the peer just went
away, whereas currently I just check elapsed time (I strive to make
nuttcp robust in such cases to allow it to be used reliably within
scripts run for example from cron).

Also, I was (perhaps unncessarily) worried that after the zero read(),
the socket would effectively be closed, and I wasn't sure then about
the reliability of using tcp_info to get the tcpi_total_retrans at that
point.

As with most things, there's more than one way to skin a cat.

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-14  8:29       ` Herbert Xu
@ 2009-01-14  9:05         ` Bill Fink
  2009-01-14 11:30           ` Herbert Xu
  0 siblings, 1 reply; 23+ messages in thread
From: Bill Fink @ 2009-01-14  9:05 UTC (permalink / raw)
  To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Wed, 14 Jan 2009, Herbert Xu wrote:

> On Wed, Jan 14, 2009 at 02:43:41AM -0500, Bill Fink wrote:
> > 
> > I like that.  If both tcpi_unacked and SIOCOUTQ are zero, that should
> 
> Why do you still need tcpi_unacked? SIOCOUTQ returns the amount
> of all outstanding data so that alone should be good enough.

Well, my man tcp(7) just says:

   SIOCOUTQ
          Returns the amount of unsent data in the socket send queue.  The
          socket must not be in LISTEN state, otherwise an error  (EINVAL)
          is returned.

It's not clear from that that it also includes sent but unacked data.
Perhaps it should be changed to say "the amount of unsent or unacked
data" instead.  On reflection, it makes sense that the send queue
includes sent but unacked as well as never sent data, since the unacked
data may need to be retransmitted.  This all assumes that I'm now
correctly understanding what you're saying/implying (I'm getting tired).

> > The performance argument I gave against INET_DIAG appears to have been
> > bogus.  At least just loading the inet_diag and tcp_diag modules didn't
> > have a significant impact on 10-GigE performance with either 1500-byte
> > packets or 9000-byte jumbo frame packets (CPU usage may have increased
> > slightly but even that's not definite).
> 
> Well if you don't make diag requests diag has zero impact on
> the system.  How much of an impact diag has if you do make requests
> is dependent on the number of open sockets.

Good to know.

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-14  9:05         ` Bill Fink
@ 2009-01-14 11:30           ` Herbert Xu
  2009-01-15  6:33             ` Bill Fink
  0 siblings, 1 reply; 23+ messages in thread
From: Herbert Xu @ 2009-01-14 11:30 UTC (permalink / raw)
  To: Bill Fink; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Wed, Jan 14, 2009 at 04:05:50AM -0500, Bill Fink wrote:
>
> Well, my man tcp(7) just says:
> 
>    SIOCOUTQ
>           Returns the amount of unsent data in the socket send queue.  The
>           socket must not be in LISTEN state, otherwise an error  (EINVAL)
>           is returned.

You still read man pages? :)

If you look at the actual code, SIOCOUTQ returns

	tp->write_seq - tp->snd_una

Really you can't get any more precise than that for what you want.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-14 11:30           ` Herbert Xu
@ 2009-01-15  6:33             ` Bill Fink
  0 siblings, 0 replies; 23+ messages in thread
From: Bill Fink @ 2009-01-15  6:33 UTC (permalink / raw)
  To: Herbert Xu; +Cc: ilpo.jarvinen, zbr, bert.hubert, h.willstrand, netdev

On Wed, 14 Jan 2009, Herbert Xu wrote:

> On Wed, Jan 14, 2009 at 04:05:50AM -0500, Bill Fink wrote:
> >
> > Well, my man tcp(7) just says:
> > 
> >    SIOCOUTQ
> >           Returns the amount of unsent data in the socket send queue.  The
> >           socket must not be in LISTEN state, otherwise an error  (EINVAL)
> >           is returned.
> 
> You still read man pages? :)
> 
> If you look at the actual code, SIOCOUTQ returns
> 
> 	tp->write_seq - tp->snd_una
> 
> Really you can't get any more precise than that for what you want.

Thanks for the clarification!  That does make it crystal clear.

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-01-13  5:31               ` Bill Fink
@ 2009-02-13 17:02                 ` Jeremy Jackson
  2009-02-20 18:10                   ` Bill Fink
  0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Jackson @ 2009-02-13 17:02 UTC (permalink / raw)
  To: Bill Fink
  Cc: ilpo.jarvinen, Evgeniy Polyakov, bert hubert, H. Willstrand,
	Netdev

On Tue, 2009-01-13 at 00:31 -0500, Bill Fink wrote:
> On Mon, 12 Jan 2009, Ilpo Järvinen wrote:
> 
> > On Sun, 11 Jan 2009, Bill Fink wrote:
> > 
> > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote:
> > > 
> > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote:
> > > > > I fully understand. Sometimes I have to talk to stupid devices though. What

An excellent article on this subject:

http://ds9a.nl/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.txt

"Luckily, it turns out that Linux keeps track of the amount of
unacknowledged
data, which can be queried using the SIOCOUTQ ioctl(). Once we see this
number hit 0, we can be reasonably sure our data reached at least the
remote
operating system."

is this the same as the TCP_INFO getsockopt() ?

if you follow the progression from write(socket_fd, ) ... the data sits
in
the socket buffer, and SIOCOUTQ is initially zero.  If the connection
started with a zero window,
it could sit like that for a while (sometimes called a "tarpit ?).  But,
you should still see the data in your socket buffer, yes?

So, I think you want to make sure your socket write buffer is empty
(converted to unacked data), *then* make sure your unacked data is 0.

	write(sock, buffer, 1000000);             // returns 1000000
	shutdown(sock, SHUT_WR);
	now wait for SIOCOUTQ to hit 0.

if window is 0, shutdown() would wait until show device sets window > 0
again, or forever on a tarpitted connection.  Either way, when if/when
it finishes, you know all data was transmitted, now wait for all of it
to be ACKed with SIOCOUTQ.


> > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info:
> > > > > 
> > > > >         __u32   tcpi_unacked;
> > > > > 
> > > > > Which comes from:
> > > > > 
> > > > > struct tcp_sock {
> > > > > ...
> > > > >         u32     packets_out;    /* Packets which are "in flight"        */
> > > > > ...
> > > > > }
> > > > > 
> > > > > If this becomes 0, perhaps this might tell me everything I sent was acked?
> > > > 
> > > > 0 means that there are noin-flight packets, which is effectively number
> > > > of unacked packets. So if your application waits for this field to
> > > > become zero, it will wait for all sent packets to be acked.
> > > 
> > > I use this type of strategy in nuttcp, and it seems to work fine.
> > > I have a loop with a small delay and a check of tcpi_unacked, and
> > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout
> > > period has passed.
> > 
> > Checking tcpi_unacked alone won't be reliable. The peer might be slow 
> > enough to advertize zero window for a short period of time and during 
> > that period you would have packets_out zero...
> 
> I'll keep this in mind for the future, although it doesn't seem to
> be a significant issue in practice.  I use this scheme to try and
> account for the tcpi_total_retrans for the data stream, so if this
> corner case was hit, it would mean an under reporting of the total
> TCP retransmissions for the nuttcp test.
> 
> If I understand you correctly, to hit this corner case, just after
> the final TCP write, there would have to be no packets in flight
> together with a zero TCP window.  To make it more bullet-proof, I
> guess after seeing a zero tcpi_unacked, an additional small delay
> should be performed, and then rechecking for a zero tcpi_unacked.
> I don't see anything else obvious (to me anyway) in the tcp_info
> that would be particularly helpful in handling this.

-- 
Jeremy Jackson
Coplanar Networks
(519)489-4903
http://www.coplanar.net
jerj@coplanar.net


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: sendfile()? Re: SO_LINGER dead: I get an immediate RST on 2.6.24?
  2009-02-13 17:02                 ` Jeremy Jackson
@ 2009-02-20 18:10                   ` Bill Fink
  0 siblings, 0 replies; 23+ messages in thread
From: Bill Fink @ 2009-02-20 18:10 UTC (permalink / raw)
  To: Jeremy Jackson
  Cc: ilpo.jarvinen, Evgeniy Polyakov, bert hubert, H. Willstrand,
	Netdev

On Fri, 13 Feb 2009, Jeremy Jackson wrote:

> On Tue, 2009-01-13 at 00:31 -0500, Bill Fink wrote:
> > On Mon, 12 Jan 2009, Ilpo Järvinen wrote:
> > 
> > > On Sun, 11 Jan 2009, Bill Fink wrote:
> > > 
> > > > On Mon, 12 Jan 2009, Evgeniy Polyakov wrote:
> > > > 
> > > > > On Mon, Jan 12, 2009 at 12:08:24AM +0100, bert hubert (bert.hubert@netherlabs.nl) wrote:
> > > > > > I fully understand. Sometimes I have to talk to stupid devices though. What
> 
> An excellent article on this subject:
> 
> http://ds9a.nl/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.txt
> 
> "Luckily, it turns out that Linux keeps track of the amount of
> unacknowledged
> data, which can be queried using the SIOCOUTQ ioctl(). Once we see this
> number hit 0, we can be reasonably sure our data reached at least the
> remote
> operating system."
> 
> is this the same as the TCP_INFO getsockopt() ?

If you mean the tcpinfo_unacked variable, then no it is not the same
as the SIOCOUTQ info.

> if you follow the progression from write(socket_fd, ) ... the data sits
> in
> the socket buffer, and SIOCOUTQ is initially zero.  If the connection
> started with a zero window,
> it could sit like that for a while (sometimes called a "tarpit ?).  But,
> you should still see the data in your socket buffer, yes?
> 
> So, I think you want to make sure your socket write buffer is empty
> (converted to unacked data), *then* make sure your unacked data is 0.
> 
> 	write(sock, buffer, 1000000);             // returns 1000000
> 	shutdown(sock, SHUT_WR);
> 	now wait for SIOCOUTQ to hit 0.
> 
> if window is 0, shutdown() would wait until show device sets window > 0
> again, or forever on a tarpitted connection.  Either way, when if/when
> it finishes, you know all data was transmitted, now wait for all of it
> to be ACKed with SIOCOUTQ.

While the "shutdown(sock, SHUT_WR)" might be useful, it isn't actually
necessary, since the SIOCOUTQ info includes both unACKed data (reported
by tcpinfo_unacked variable) and never sent data (written by app but
outside of receiver's allowed window).

						-Bill



> > > > > > I do find is the TCP_INFO ioctl, which offers this field in struct tcp_info:
> > > > > > 
> > > > > >         __u32   tcpi_unacked;
> > > > > > 
> > > > > > Which comes from:
> > > > > > 
> > > > > > struct tcp_sock {
> > > > > > ...
> > > > > >         u32     packets_out;    /* Packets which are "in flight"        */
> > > > > > ...
> > > > > > }
> > > > > > 
> > > > > > If this becomes 0, perhaps this might tell me everything I sent was acked?
> > > > > 
> > > > > 0 means that there are noin-flight packets, which is effectively number
> > > > > of unacked packets. So if your application waits for this field to
> > > > > become zero, it will wait for all sent packets to be acked.
> > > > 
> > > > I use this type of strategy in nuttcp, and it seems to work fine.
> > > > I have a loop with a small delay and a check of tcpi_unacked, and
> > > > break out of the loop if tcpi_unacked becomes 0 or a defined timeout
> > > > period has passed.
> > > 
> > > Checking tcpi_unacked alone won't be reliable. The peer might be slow 
> > > enough to advertize zero window for a short period of time and during 
> > > that period you would have packets_out zero...
> > 
> > I'll keep this in mind for the future, although it doesn't seem to
> > be a significant issue in practice.  I use this scheme to try and
> > account for the tcpi_total_retrans for the data stream, so if this
> > corner case was hit, it would mean an under reporting of the total
> > TCP retransmissions for the nuttcp test.
> > 
> > If I understand you correctly, to hit this corner case, just after
> > the final TCP write, there would have to be no packets in flight
> > together with a zero TCP window.  To make it more bullet-proof, I
> > guess after seeing a zero tcpi_unacked, an additional small delay
> > should be performed, and then rechecking for a zero tcpi_unacked.
> > I don't see anything else obvious (to me anyway) in the tcp_info
> > that would be particularly helpful in handling this.
> 
> -- 
> Jeremy Jackson
> Coplanar Networks
> (519)489-4903
> http://www.coplanar.net
> jerj@coplanar.net

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-02-20 18:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-11 21:23 SO_LINGER dead: I get an immediate RST on 2.6.24? bert hubert
2009-01-11 22:08 ` H. Willstrand
2009-01-11 22:45   ` sendfile()? " bert hubert
2009-01-11 22:54     ` Evgeniy Polyakov
2009-01-11 23:08       ` bert hubert
2009-01-11 23:18         ` Evgeniy Polyakov
2009-01-12  4:50           ` Bill Fink
2009-01-12  9:18             ` Ilpo Järvinen
2009-01-13  5:31               ` Bill Fink
2009-02-13 17:02                 ` Jeremy Jackson
2009-02-20 18:10                   ` Bill Fink
  -- strict thread matches above, loose matches on Subject: below --
2009-01-13  6:32 Herbert Xu
2009-01-13  6:56 ` Bill Fink
2009-01-13  7:01   ` Herbert Xu
2009-01-14  7:43     ` Bill Fink
2009-01-14  8:29       ` Herbert Xu
2009-01-14  9:05         ` Bill Fink
2009-01-14 11:30           ` Herbert Xu
2009-01-15  6:33             ` Bill Fink
2009-01-13  7:06   ` Rick Jones
2009-01-14  8:05     ` Bill Fink
2009-01-14  8:08       ` Rick Jones
2009-01-14  8:32         ` Bill Fink

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).