netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TSO prevents cwnd growth on 2.6 kernels
@ 2005-03-25 18:18 Scott M. Ferris
  2005-03-25 18:40 ` David S. Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Scott M. Ferris @ 2005-03-25 18:18 UTC (permalink / raw)
  To: netdev

TSO still has problems in 2.6.12-rc1. 

tcp_ack() will only increase the congestion window if
(tcp_packets_in_flight() >= tp->snd_cwnd), which prevents cwnd from
growing when cwnd is larger than the flight size.

When TSO is in use, tcp_write_xmit() often won't be willing to send
enough to increase packets_in_flight all the way to tp->snd_cwnd,
because tcp_snd_test() does an all-or-nothing check to see if all of
tcp_skb_pcount(skb) can be sent without overflowing snd_cwnd.  Unless
the pcount happens to perfectly fill the cwnd, returning acks won't be
able to increase cwnd, because tcp_packets_in_flight() < tp->snd_cwnd.

In the particular case I was looking it, it resulted in some really
pathological behavior following packet loss and retransmission.  After
the last lost packet was retransmitted, both cwnd and packets_out were
1.  When the ack for the final retransmission arrived, cwnd was
incremented to 2, and packets_in_flight dropped to 0.

Because TSO was on, the head skb on the send queue had a pcount of 45.
This prevented tcp_write_xmit() from sending anything at all, because
tcp_snd_test() can't fit 45 packets into a cwnd of 2.  Because
tcp_write_xmit() failed, a timer was scheduled for a zero-window
probe.

The probe timer eventually went off, and tcp_write_wakeup() noticed
that in fact the window was not closed, and sent 1 data packet.  The
ack for that 1 packet can't increment cwnd, because the flight size of
1 is still less than cwnd of 2.  This began a sequence of 43 window
probes being sent by the timer, each of which peeled one packet off of
the head skb, slowly dropping it's tcp_skb_pcount().  Eventually the
pcount reached 2, and tcp_write_xmit() was finally willing to send the
skb and increase tcp_packets_in_flight() all the way to cwnd.  This
allowed an ack to finally increment cwnd to 3.

The problem then repeated, because the next skb in the send queue also
has a pcount of 45, which won't fit in a cwnd of 3, beginning another
long sequence of window probes which slowly drained the head skb's
pcount to 3, at which point tcp_write_xmit() was finally willing to
send some packets, which got acked, bringing cwnd to 4.

This just kept happening over and over again, giving pathetic
throughput until the cwnd finally grew larger than the typical
tcp_skb_pcount() values in the send queue.

tcp_write_xmit() really needs to be willing to increase
tcp_packets_in_flight() high enough that tcp_ack() will allow the
congestion window to grow, even when TSO is being used, otherwise
there may be a dramatic reduction in the cwnd growth rate.

I can easily test a patch if someone has time to create one.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TSO prevents cwnd growth on 2.6 kernels
  2005-03-25 18:18 TSO prevents cwnd growth on 2.6 kernels Scott M. Ferris
@ 2005-03-25 18:40 ` David S. Miller
  2005-03-25 19:34   ` Scott M. Ferris
  0 siblings, 1 reply; 4+ messages in thread
From: David S. Miller @ 2005-03-25 18:40 UTC (permalink / raw)
  To: Scott M. Ferris; +Cc: netdev

On Fri, 25 Mar 2005 12:18:04 -0600
"Scott M. Ferris" <sferris@acm.org> wrote:

> TSO still has problems in 2.6.12-rc1. 

We know it's busted.  I haven't gotten to fixing this stuff
up yet.

> tcp_write_xmit() really needs to be willing to increase
> tcp_packets_in_flight() high enough that tcp_ack() will allow the
> congestion window to grow, even when TSO is being used, otherwise
> there may be a dramatic reduction in the cwnd growth rate.

Are you suggesting to let it go past tp->snd_cwnd?  We can't
ever do that.  tp->snd_cwnd is a hard limit on the number
of frames we may have outstanding on the network at one time,
TSO or not.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TSO prevents cwnd growth on 2.6 kernels
  2005-03-25 18:40 ` David S. Miller
@ 2005-03-25 19:34   ` Scott M. Ferris
  2005-03-25 19:40     ` David S. Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Scott M. Ferris @ 2005-03-25 19:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: Scott M. Ferris, netdev

On Fri, Mar 25, 2005 at 10:40:00AM -0800, David S. Miller wrote:
>
> We know it's busted.  I haven't gotten to fixing this stuff
> up yet.

Perhaps TSO should default to off for all drivers then?  I hadn't even
noticed it was enabled until after I was debugging the problem.

> Are you suggesting to let it go past tp->snd_cwnd?  We can't
> ever do that.  tp->snd_cwnd is a hard limit on the number
> of frames we may have outstanding on the network at one time,
> TSO or not.

No, I think we all agree that exceeding cwnd is a bad idea.  I'm just
saying that failing to reach cwnd is also broken, especially if it
results in tcp_write_xmit() sending nothing at all when cwnd is small.

-- 
Scott M. Ferris,
sferris@acm.org 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TSO prevents cwnd growth on 2.6 kernels
  2005-03-25 19:34   ` Scott M. Ferris
@ 2005-03-25 19:40     ` David S. Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David S. Miller @ 2005-03-25 19:40 UTC (permalink / raw)
  To: Scott M. Ferris; +Cc: sferris, netdev

On Fri, 25 Mar 2005 13:34:59 -0600
"Scott M. Ferris" <sferris@acm.org> wrote:

> > Are you suggesting to let it go past tp->snd_cwnd?  We can't
> > ever do that.  tp->snd_cwnd is a hard limit on the number
> > of frames we may have outstanding on the network at one time,
> > TSO or not.
> 
> No, I think we all agree that exceeding cwnd is a bad idea.  I'm just
> saying that failing to reach cwnd is also broken, especially if it
> results in tcp_write_xmit() sending nothing at all when cwnd is small.

But if we only have a TSO frame at the head,  which would make us
exceed tp->snd_cwnd, the only option is to chop up the TSO frame.
So I guess that's your idea?

It's similar to the real fix for all of this, which I posted a detailed
description of about 1 or 2 months ago.  Check the netdev archives.
Basically, we don't build the TSO frames until transmit time thus:

1) We always fill the CWND
2) We size the TSO based upon the CWND at send time not at
   the time we are sucking in the data from userspace
3) No more multi-packet TSO frames in the send queue, thus no
   more disabling of TSO during packet loss and no more weird
   packet counting stuff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-03-25 19:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-25 18:18 TSO prevents cwnd growth on 2.6 kernels Scott M. Ferris
2005-03-25 18:40 ` David S. Miller
2005-03-25 19:34   ` Scott M. Ferris
2005-03-25 19:40     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).