netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Miller <davem@davemloft.net>
To: eric.dumazet@gmail.com
Cc: netdev@vger.kernel.org, ilpo.jarvinen@helsinki.fi
Subject: Re: [RFC] tcp: delack_timer expiration changes for every frame
Date: Wed, 02 Jun 2010 04:11:57 -0700 (PDT)	[thread overview]
Message-ID: <20100602.041157.24595101.davem@davemloft.net> (raw)
In-Reply-To: <1274388439.2508.27.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 May 2010 22:47:19 +0200

> So we change the delack_timer by a positive delta (~ HZ/10) and a
>  negative delta (~HZ/10), on the typical netperf TCP_RR workload.

Yes, this is silly.

> Here, the incoming frame is handled by netperf, doing a recvmsg().
> tcp_send_delayed_ack() sets the delack_timer to jiffies + HZ/25
> 
> [  392.207721] timer->expires=23045, new expires=23019(10) diff=26 timer=e5ecb754
 ...
> [  392.209221]  [<c1279ce7>] ? tcp_send_delayed_ack+0xb5/0xc1
> [  392.209282]  [<c1276d26>] ? tcp_rcv_established+0x39f/0x4f7

HZ/25 is TCP_DELACK_MIN and TCP_ATO_MIN, but the actual value we use
here involves incorporation of various measurements made on the
connection (RTO, etc.)

 ...
> tcp_v4_rcv() sets the delack timer to 37 ticks, so mod_timer() optimizations is not
> working at all.
> 
> [  392.217454] timer->expires=23019, new expires=23049(37) diff=-30 timer=e5ecb754
 ...
> [  392.218765]  [<c127cf56>] ? tcp_v4_rcv+0x41c/0x6b7
> [  392.218826]  [<c1265832>] ? ip_local_deliver_finish+0xe9/0x178

There must be a tail-call here at tcp_v4_rcv() or something missed in
the backtrace stack scanning logic, because tcp_v4_rcv() and it's main
inline tcp_v4_do_rcv() do not modify established state socket timers,
and in particular do not modify the delack timer, that I can see.

It must be in via tcp_rcv_established() or similar.
...

Nevermind, it's the inlined prequeue stuff.

It uses a seperate calculation of the delack timer offset, independant
of the one made by tcp_send_delayed_ack(), it's timer offset formula is:

	(3 * tcp_rto_min(sk)) / 4

with a MAX of:

	TCP_RTO_MAX

So every time we go in and out of recvmsg() we'll hop between these
two different delayed ACK settings.

The prequeue logic is trying to stretch the delayed ACK to 3/4 of a
window of data.  It's set a bit high, intentionally, in the hopes that
we'll get the process into recvmsg() and have it emit it's response
packet from a subsequent sendmsg() (that the ACK can ride on) before
this timer fires.

But when we drop the socket lock to sleep or return to userspace, one
of the next packets is just going to reset this timer differently.

While the intentions of the prequeue code look legit, the use of two
different delayed ACK timeout schemes has bad implications elsewhere.

For example, if the delack timer does actually fire, there is this
ATO fixup code here:

		if (!icsk->icsk_ack.pingpong) {
			/* Delayed ACK missed: inflate ATO. */
			icsk->icsk_ack.ato = min(icsk->icsk_ack.ato << 1, icsk->icsk_rto);
		} else {
			/* Delayed ACK missed: leave pingpong mode and
			 * deflate ATO.
			 */
			icsk->icsk_ack.pingpong = 0;
			icsk->icsk_ack.ato      = TCP_ATO_MIN;
		}

which is totally wrong if the delack timer offset is the one
calculated by the prequeue code.  Doubling the ATO in that case
is completely the wrong thing to do.

So yes we have all kinds of inconsistencies here and we should
probably unify things so that the timer gets kicked less often.

      reply	other threads:[~2010-06-02 11:11 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-20 20:47 [RFC] tcp: delack_timer expiration changes for every frame Eric Dumazet
2010-06-02 11:11 ` David Miller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100602.041157.24595101.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=ilpo.jarvinen@helsinki.fi \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).