[LARTC] [OT]: rtt measurement using tcp timestamps from a MITM position

From: Patrick McHardy <kaber@trash.net>
To: lartc@vger.kernel.org
Subject: [LARTC] [OT]: rtt measurement using tcp timestamps from a MITM position
Date: Fri, 28 Jun 2002 02:08:10 +0000	[thread overview]
Message-ID: <marc-lartc-102523024305421@msgid-missing> (raw)

Hi everyone,

i know this is not the right place to discuss this, but i assume some 
people here might have some good ideas which could help me.
Also, i don't really know where else to turn ..

I'm writing a tcp rate control implementation for linux at the moment. 
For people not familiar with rate control, is basically works by 
manipulating the tcp window size to force the sender not to exceed the 
bandwidth you would like a particular connection to have. The nice thing 
about it is it works without throwing packets away, you just "tell" the 
sender how fast you would like it to go.

For the window size calculation the roundtrip time needs to be known.
One approach would be to remember the time a segment passed and to 
calculate the difference when it is acknowledged. In order not to have 
to remeber many sequence numbers/times usually only one rtt per window 
size is calculated. This works fine for low packet rates (small 
windows), for high rates the estimated rtt may be seriously wrong.

RFC1323 comes up with a solution for this, the TCP timestamp option:
The sender puts a 32bit timestamp in the tcp header, the receiver echos 
this field in its acknowledge. The sender just has to calculate the 
difference to get the rtt. This can be done with every packet sent 
without storing additional data.

The problem arises if you want to calculate rtt using timestamps from a 
man-in-the-middle position. The timestamps themselves are meaningless, 
you can't know how the sender chose them. One could remeber all 
timestamps and when they passed and calculate the difference when it is 
echoed back by the receiver, again this would mean storing probably many 
timestamps/receive times.
Another way would be to replace them by your own timestamps, but this 
would prevent the real sender to perform accurate rtt estimation.

A solution could work like this:
RFC1323 specifies the senders timestamp clock should increase by one 
every 1ms - 1s. This means the low 16bit will wrap
every ~11 minutes - ~18hours. We could just remeber the high 16bit and 
replace them with a 16bit timestamp of our own.
On reception of a echoed timestamp, we calculate the difference and put 
the original 16bit back in and pass it on.

The problem with this is that timestamps are not only used by the sender 
to calculate the rtt but also by the receiver for
PAWS (protect against wrapped sequence numbers). From RFC1323: "PAWS 
uses the same TCP Timestamps option as the RTTM mechanism described 
earlier, and assumes that every received TCP segment (including data and 
ACK segments) contains a timestamp SEG.TSval whose values are monotone 
non-decreasing in time. The basic idea is that a segment can be 
discarded as an old duplicate if it is received with a timestamp 
SEG.TSval less than some timestamp recently received on this connection."

This means we have to make sure the resulting timestamp (16bit our 
timestamp, original low 16bit) still has the property of beeing monotone 
non-decreasing in time, otherwise PAWS will reject retransmitted segments.

The solution i came up with breaks PAWS itself, the protection against 
wrapped sequence numbers will be gone. This is not really a problem 
(remeber i need it for rate control) since rate control is usually not 
done on gigabit backbone routers but on corporate border routers.

It works like this:

Timestamp Option:

       31    16 15      0          31     16 15       0
tsval: [  UH   |   LH   ]   tsecr: [   UH   |    LH   ]

UH means upper half, LH lower half, tsval is the senders timestamp, 
tsecr the echoed value.

For each direction, three variables need to be kept:
ts.UH           Upper half of timestamps currently transmitted by sender
ts.UH.last      ts.UH before LH wraparound
ts.wrap         time wraparound occured

On reception of a timestamp the following is done (in pseudo C code):

/* tsval handling */

if (! ts.UH)
    ts.UH = tsval.UH;       /* remeber upper half */

if (tsval.UH != ts.UH) {    /* low 16 bit wraped */
    ts.wrap = now;
    ts.UH.last = ts.UH;
    ts.UH = tsval.UH;
}

tsval.UH = now;             /* put in out timestamp */

if (now = ts.wrap)
    tsval.UH++;             /* increment UH to reflect LH wraparound */

/* tsecr handling */

rtt = tsecr.UH - now;

if (tsecr.UH < ts.wrap)
    UH = ts.UH.last;        /* if timestamp was generated before
                                LH wrap around, put back last LH */
else
    UH = ts.UH;             /* current LH otherwise */

This seems to keep the timestamp values seen by the receiver non-decreasing.

The remaining problem are "Outdated Timestamps".

 From RFC1323:
"If a connection remains idle long enough for the timestamp clock of the 
other TCP to wrap its sign bit, then the value saved in TS.Recent will 
become too old; as a result, the PAWS mechanism will cause all 
subsequent segments to be rejected, freezing the connection (until the 
timestamp clock wraps its sign bit again).
With the chosen range of timestamp clock frequencies (1 sec to 1 ms), 
the time to wrap the sign bit will be between 24.8 days and 24800 days."

A TCP usually takes care of this (wraparound after min. 24.8 days), but 
this will not be true anymore. if we choose our timestamp clock to 
increase once every 1 ms the sign bit will wrap after 5.5 minutes. I'm 
not sure what to do about this (this is why i'm writing), does anyone 
here have good ideas? I would also be happy about a completly different 
approach, somehing totaly passive would be nice .. :)

Thanks (for the time you spent reading until down here :)
Patrick

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/