Re: [PATCH] net/ipv4, linux-2.6.30.4

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
To: David Miller <davem@davemloft.net>
Cc: "slot.daniel@gmail.com" <slot.daniel@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH] net/ipv4, linux-2.6.30.4
Date: Thu, 13 Aug 2009 14:40:42 +0200	[thread overview]
Message-ID: <4A8409CA.70200@nets.rwth-aachen.de> (raw)
In-Reply-To: <20090812.145549.228391386.davem@davemloft.net>

David Miller schrieb:
> From: Daniel Slot <slot.daniel@gmail.com>
> Date: Wed, 12 Aug 2009 20:47:44 +0200
> 
>> RFC 4653 specifies Non-Congestion Robustness (NCR) for TCP.
>> In the absence of explicit congestion notification from the network, TCP
>> uses loss as an indication of congestion.
>> One of the ways TCP detects loss is using the arrival of three duplicate
>> acknowledgments.
>> However, this heuristic is not always correct,
>> notably in the case when network paths reorder segments (for whatever
>> reason), resulting in degraded performance.
> 
> Linux's TCP stack already has sophisticated reordering detection.ä

Hmm, sophisticated? Sorry, it seemed pretty rudimental/random to me.

Firstly, tp->reordering never shrinks for a given connection
unless an RTO occurs. If that happens tp->reordering is reset to sysctl_tcp_reordering
(but it was initialized with a potentially different value from destination cache).
Why?

Secondly, it simply disables FACK? Disabling FACK completely may (or not) be
the correct solution, if reordering is present. But why don't reenable FACK
after no more reordering is detected? It won't even get re-enabled if an RTO occurs.
It seems even more strange that tp->reordering is used in FACK paths, too.
So if one sets a high sysctl_tcp_reordering, because one expects reordering,
tcp_update_reordering will probably NOT disable FACK, but instead FACK will
be used with a high tp->reordering value.

Thirdly, in most cases it will only
trigger if spurious retransmits already happened. If it triggers in advance (due to
SACK logic, the updated reordering metric will be IMO one to small, leading again
to a spurios retransmit, if a reordering event with the same length will happen again)
IOW it will mostly only reduce the damage to congestion control, but will send out
spurious packets nevertheless.

In my point of view, on should at least build some EWMA or histogram, or build some
whatever statistcs to measure detected reordering and based on this measurement,
adjust the dupthresh (or max_burst, or whatever). Off course, there is always
the question of how much better such an sophisticated statistic will work,
than the current very pragmatic solution...

Please correct me if I'm wrong or just too stupid to understand this stuff.
(very likely;-)

> 
>> TCP-NCR is designed to mitigate this degraded performance by increasing the
>> number of duplicate acknowledgments required to trigger loss recovery,
>> based on the current state of the connection, in an effort to better
>> disambiguate true segment loss from segment reordering.
> 
> We already have code in the stack which tries to detect packet
> reordering with a high level of sophistication.

On the contrary RFC 4653 does not even try to detect reordering. It simply
delays the congestion response in a way which seems very straightforward.
Of course there is the negative impact of increased latency. (Loss recovery
takes longer). However, for large ftp/http transfers, who cares about latency?

There must be some logic in the kernel to detect applications which are
doing bulk transfers for the buffer autotuning, what about enabling RFC 4653
in case such an application is detected?

Daniel, I would assume RFC 4653 would simply work with FACK, at least if
there is no reordering present?

Best regards,
Arnd

-- 
Dipl.-Inform. Arnd Hannemann
RWTH Aachen University
Dept. of Computer Science, Informatik 4
Ahornstr. 55, D-52074 Aachen, Germany
Phone: (+49 241) 80-21423 Fax: (+49 241) 80-22220

next prev parent reply	other threads:[~2009-08-13 12:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bb6e06c00908121147h2dab7d0kf5841a40956c5c56@mail.gmail.com>
2009-08-12 21:55 ` [PATCH] net/ipv4, linux-2.6.30.4 David Miller
2009-08-13 12:40   ` Arnd Hannemann [this message]
2009-08-13 20:13     ` David Miller
2009-08-13 20:15     ` David Miller
2009-08-14 11:52       ` Daniel Slot
2009-08-12 18:59 Daniel Slot
  -- strict thread matches above, loose matches on Subject: below --
2009-08-12 18:50 Daniel Slot
2009-08-12 19:02 ` Stephen Hemminger
2009-08-12 19:27   ` Daniel Slot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A8409CA.70200@nets.rwth-aachen.de \
    --to=hannemann@nets.rwth-aachen.de \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=slot.daniel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).