* Detecting TCP loss on the receiving side?
@ 2008-07-10 19:44 Dan Noé
2008-07-10 20:11 ` Rick Jones
2008-07-13 18:52 ` Andi Kleen
0 siblings, 2 replies; 9+ messages in thread
From: Dan Noé @ 2008-07-10 19:44 UTC (permalink / raw)
To: netdev
We're trying to troubleshoot some problems and we'd like to use the
struct tcp_info method to gain some information and attempt to detect
TCP loss events on the receiving side of a TCP stream. The problem is
struct tcp_info is not well documented and my attempts to trace it seem
to reveal that most of the statistics are relevant to the sending side
of a TCP connection.
Are there any fields (or any other way) to easily detect loss events?
We're mostly concerned with detecting when TCP loss or reordering delays
things resulting in additional latency.
Thanks much,
Dan
--
Dan Noé
Software Engineer
Lime Brokerage LLC
781-370-2518
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 19:44 Detecting TCP loss on the receiving side? Dan Noé
@ 2008-07-10 20:11 ` Rick Jones
2008-07-10 20:19 ` Dan Noé
2008-07-13 18:52 ` Andi Kleen
1 sibling, 1 reply; 9+ messages in thread
From: Rick Jones @ 2008-07-10 20:11 UTC (permalink / raw)
To: Dan Noé; +Cc: netdev
Dan Noé wrote:
> We're trying to troubleshoot some problems and we'd like to use the
> struct tcp_info method to gain some information and attempt to detect
> TCP loss events on the receiving side of a TCP stream. The problem is
> struct tcp_info is not well documented and my attempts to trace it seem
> to reveal that most of the statistics are relevant to the sending side
> of a TCP connection.
>
> Are there any fields (or any other way) to easily detect loss events?
> We're mostly concerned with detecting when TCP loss or reordering delays
> things resulting in additional latency.
If this is just for troubleshooting, why not just take a tcpdump trace?
rick jones
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 20:11 ` Rick Jones
@ 2008-07-10 20:19 ` Dan Noé
2008-07-10 21:05 ` John Heffner
2008-07-10 21:14 ` Rick Jones
0 siblings, 2 replies; 9+ messages in thread
From: Dan Noé @ 2008-07-10 20:19 UTC (permalink / raw)
To: Rick Jones; +Cc: Dan Noé, netdev
Rick Jones wrote:
> If this is just for troubleshooting, why not just take a tcpdump trace?
We're pushing a lot of data.. several Mbps pretty much all day long..
and the (suspected) loss occurs sporadically. Ideally we'd like to be
able to easily correlate it with latency seen in our app.
Cheers,
Dan
--
Dan Noé
Software Engineer
Lime Brokerage LLC
781-370-2518
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 20:19 ` Dan Noé
@ 2008-07-10 21:05 ` John Heffner
2008-07-10 21:20 ` Rick Jones
2008-07-10 21:14 ` Rick Jones
1 sibling, 1 reply; 9+ messages in thread
From: John Heffner @ 2008-07-10 21:05 UTC (permalink / raw)
To: Dan Noé; +Cc: Rick Jones, netdev
On Thu, Jul 10, 2008 at 1:19 PM, Dan Noé <dnoe@limebrokerage.com> wrote:
> Rick Jones wrote:
>>
>> If this is just for troubleshooting, why not just take a tcpdump trace?
>
> We're pushing a lot of data.. several Mbps pretty much all day long.. and
> the (suspected) loss occurs sporadically. Ideally we'd like to be able to
> easily correlate it with latency seen in our app.
Looking for loss at the receiver is a bit tricky. It doesn't look
like struct tcp_info has enough information to do this easily. If you
are able to install a custom kernel on this machine, the Web100 patch
would be able to gather enough information to figure it out. The
basic idea would be to look for a difference between RcvNxt and
RcvMax.
On the other hand, several Mbps is not that much. It's probably not
that hard to take tcpdumps split up every N minutes, and analyze
these. One thing to look for would be sack blocks coming from the
receiver (assuming sack is enabled.)
-John
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 20:19 ` Dan Noé
2008-07-10 21:05 ` John Heffner
@ 2008-07-10 21:14 ` Rick Jones
1 sibling, 0 replies; 9+ messages in thread
From: Rick Jones @ 2008-07-10 21:14 UTC (permalink / raw)
To: Dan Noé; +Cc: netdev
Dan Noé wrote:
> Rick Jones wrote:
>
>> If this is just for troubleshooting, why not just take a tcpdump trace?
>
>
> We're pushing a lot of data.. several Mbps pretty much all day long..
> and the (suspected) loss occurs sporadically. Ideally we'd like to be
> able to easily correlate it with latency seen in our app.
If your app can already log a timestamped "high latency" warning, then
it would simply be a matter of a big disc and comparing timestamps in
the tcpdump trace :)
Also, it appears that for what you want, you only need to capture up
through the TCP header, so while it may be many Mbit/s on the wire, it
will be fewer Mbit/s in the trace.
rick jones
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 21:05 ` John Heffner
@ 2008-07-10 21:20 ` Rick Jones
2008-07-10 21:34 ` John Heffner
2008-07-10 22:15 ` Dan Noe
0 siblings, 2 replies; 9+ messages in thread
From: Rick Jones @ 2008-07-10 21:20 UTC (permalink / raw)
To: John Heffner; +Cc: Dan Noé, netdev
John Heffner wrote:
> Looking for loss at the receiver is a bit tricky. It doesn't look
> like struct tcp_info has enough information to do this easily. If you
> are able to install a custom kernel on this machine, the Web100 patch
> would be able to gather enough information to figure it out. The
> basic idea would be to look for a difference between RcvNxt and
> RcvMax.
And even then it depends on the connections having multiple segments in
flight at one time. Although I suppose that cuts both ways and affects
the tracing too, but perhaps not to the same extent.
Dan - seeing "brokerage" in your email and worries about latency makes
me think that your app(s) are pushing around lots of small messages -
are those spread-out across lots of connections, or are they
consolidated into a rather smaller number of connections? Also, what is
the magnitude of the latency in these latency events?
rick jones
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 21:20 ` Rick Jones
@ 2008-07-10 21:34 ` John Heffner
2008-07-10 22:15 ` Dan Noe
1 sibling, 0 replies; 9+ messages in thread
From: John Heffner @ 2008-07-10 21:34 UTC (permalink / raw)
To: Rick Jones; +Cc: Dan Noé, netdev
On Thu, Jul 10, 2008 at 2:20 PM, Rick Jones <rick.jones2@hp.com> wrote:
> John Heffner wrote:
>>
>> Looking for loss at the receiver is a bit tricky. It doesn't look
>> like struct tcp_info has enough information to do this easily. If you
>> are able to install a custom kernel on this machine, the Web100 patch
>> would be able to gather enough information to figure it out. The
>> basic idea would be to look for a difference between RcvNxt and
>> RcvMax.
>
> And even then it depends on the connections having multiple segments in
> flight at one time. Although I suppose that cuts both ways and affects the
> tracing too, but perhaps not to the same extent.
>
> Dan - seeing "brokerage" in your email and worries about latency makes me
> think that your app(s) are pushing around lots of small messages - are those
> spread-out across lots of connections, or are they consolidated into a
> rather smaller number of connections? Also, what is the magnitude of the
> latency in these latency events?
Yes, a very slow rate makes things trickier, but transmitting less
than one segment per minRTO (200 ms) seems pretty unlikely.
Otherwise, the receiver should observe reordering.
-John
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 21:20 ` Rick Jones
2008-07-10 21:34 ` John Heffner
@ 2008-07-10 22:15 ` Dan Noe
1 sibling, 0 replies; 9+ messages in thread
From: Dan Noe @ 2008-07-10 22:15 UTC (permalink / raw)
To: Rick Jones; +Cc: John Heffner, netdev
On 07/10/2008 05:20 PM, Rick Jones wrote:
> John Heffner wrote:
>> Looking for loss at the receiver is a bit tricky. It doesn't look
>> like struct tcp_info has enough information to do this easily. If you
>> are able to install a custom kernel on this machine, the Web100 patch
>> would be able to gather enough information to figure it out. The
>> basic idea would be to look for a difference between RcvNxt and
>> RcvMax.
>
> And even then it depends on the connections having multiple segments
> in flight at one time. Although I suppose that cuts both ways and
> affects the tracing too, but perhaps not to the same extent.
>
> Dan - seeing "brokerage" in your email and worries about latency makes
> me think that your app(s) are pushing around lots of small messages -
> are those spread-out across lots of connections, or are they
> consolidated into a rather smaller number of connections? Also, what
> is the magnitude of the latency in these latency events?
Yeah, without going into too much detail: We are getting data from a
market via a TCP connection. Usually this is one or two connections.
It is dominated by receiving traffic (from peer). The message rates can
be very bursty, and the size is small. The links are usually short and
fairly fat - metro fast Ethernet, for example, and the geographic
latency is low. We believe loss is occurring but we're trying to
determine how often and where. The data comes on fairly fast and a loss
event (even with fast retransmit) means things piling up in queues and a
sudden flood of data coming into userspace (which is a pretty serious
latency hit on this scale).
More and more places are moving to multicast (which has its own quirks)
but some are still using TCP :)
Cheers,
Dan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Detecting TCP loss on the receiving side?
2008-07-10 19:44 Detecting TCP loss on the receiving side? Dan Noé
2008-07-10 20:11 ` Rick Jones
@ 2008-07-13 18:52 ` Andi Kleen
1 sibling, 0 replies; 9+ messages in thread
From: Andi Kleen @ 2008-07-13 18:52 UTC (permalink / raw)
To: Dan Noé; +Cc: netdev
Dan Noé <dnoe@limebrokerage.com> writes:
> We're trying to troubleshoot some problems and we'd like to use the
> struct tcp_info method to gain some information and attempt to detect
> TCP loss events on the receiving side of a TCP stream.
Receiver Loss can be diagnosed using netstat -s at least. Especially if you
can rule out reordering. But only system global currently, although
that might change with containers.
-Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-13 18:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-10 19:44 Detecting TCP loss on the receiving side? Dan Noé
2008-07-10 20:11 ` Rick Jones
2008-07-10 20:19 ` Dan Noé
2008-07-10 21:05 ` John Heffner
2008-07-10 21:20 ` Rick Jones
2008-07-10 21:34 ` John Heffner
2008-07-10 22:15 ` Dan Noe
2008-07-10 21:14 ` Rick Jones
2008-07-13 18:52 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).