Re: issue with new TCP TSO stuff

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "David S. Miller" <davem@davemloft.net>
To: herbert@gondor.apana.org.au
Cc: netdev@oss.sgi.com
Subject: Re: issue with new TCP TSO stuff
Date: Thu, 12 May 2005 21:36:18 -0700 (PDT)	[thread overview]
Message-ID: <20050512.213618.14358320.davem@davemloft.net> (raw)
In-Reply-To: <20050512235236.GA22658@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: issue with new TCP TSO stuff
Date: Fri, 13 May 2005 09:52:36 +1000

> > #2 could be handled by down-sizing TSO frames when packet loss occurs.
> > Ie. tcp_retransmit_skb() or whatever will segmentize a TSO packet
> > which is within the sequence it is trying to retransmit.  Implementing
> > this is non- trivial mostly due to the fact that it has to work handle
> > GFP_ATOMIC memory failures and also get all the pcount crap correct.
> 
> I think most of the code to do that might already be there.  Pity
> we'll have to keep track of the pcount though.

Yes, when we walk the queue to retransmit, we chop up the
TSO frame via tcp_trim_head() and tcp_fragment().  It is
very loose and would need refinements though.

I think we'd be OK, because even if we get a GFP_ATOMIC allocation
failure, it's just like any other failure.  We're doing retransmits
so a timer will be the worst case deadlock breaker should we get an
allocation failure while chopping up the TSO frame.

This actually makes me worried about sk->sk_send_head advancement.
I think we have a deadlock possible here.  If sk->sk_send_head is
the only packet in the output queue, and the remote system has no
reason to send us zero-window probes or any other packets, we could
wedge if the skb_clone() fails for the tcp_transmit_skb() call.
We won't set any timers if we fail to send that single sk->sk_send_head
write queue packet, and we have no pending frames waiting to be ACK'd,
so nothing will wake us up to retry the transmit.

> > and size appropriately.  I think the tcp_snd_test() simplifications
> > made by my TSO Reloaded patch would help a lot here.  The send test is
> > logically now split to it's two tests 1) whether to send anything at
> > all, and 2) once #1 passes, how many such packets.
> 
> That was another one of my comments to your patch :) I was going to
> suggest that we move the cwnd/in_flight loop from tcp_snd_test to
> emit_send_queue.

One elegant part of the tcp_snd_test() in the TCP Reloaded patch is
that tcp_write_queue() calls it exactly once.

That loop in there is bogus and unnecessary.  In the changelog for
the TCP Reloaded patch I mention that this can be simplified into:

	new_packets = skb_queue_len(&sk->sk_write_queue) - in_flight;
	cwnd -= in_flight;
	return min(new_packets, cwnd);

It is a BUG() for this calculation to be larger than the number of
packets from sk->sk_send_head to the end of the write queue (which is
the invariant we must guarentee for our caller).  We could even check
for this while debugging the implementation early on.

> > This would be a sort of super-TSO that would do less segmenting work
> > than even a "perfect" TSO segmenter would.
> > 
> > I'm still not sure which approach is best, just throwing around some
> > ideas.
> 
> On paper your new super-TSO approach sounds the best.  However, I
> suppose we won't know for sure until we try :)

Yeah, TCP Reloaded sounded really nice on paper too :-)

To handle these super-TSO frames properly, we would need to add
sk->sk_send_offset to go along with sk->sk_send_head.  So then
update_send_head() would advance sk->sk_send_offset, and if that
completely consumed sk->sk_send_head then the pointer would be
advanced.

Tests of sk->sk_send_head would need to be fixed up similarly.

next prev parent reply	other threads:[~2005-05-13  4:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-12  5:30 issue with new TCP TSO stuff David S. Miller
2005-05-12 10:05 ` Herbert Xu
2005-05-12 20:13   ` David S. Miller
2005-05-12 21:47     ` Herbert Xu
2005-05-12 22:10       ` Herbert Xu
2005-05-12 22:52         ` David S. Miller
2005-05-12 23:10           ` Herbert Xu
2005-05-12 23:24             ` David S. Miller
2005-05-12 23:52               ` Herbert Xu
2005-05-13  4:36                 ` David S. Miller [this message]
2005-05-13 13:25                   ` Herbert Xu
2005-05-12 22:46       ` David S. Miller
2005-05-12 14:13 ` Andi Kleen
2005-05-12 19:26   ` David S. Miller
     [not found]     ` <20050512200251.GA72662@muc.de>
2005-05-12 20:03       ` David S. Miller
2005-05-12 20:26         ` Andi Kleen
2005-05-12 22:34           ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050512.213618.14358320.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).