Re: Socket buffer sizes with autotuning

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Socket buffer sizes with autotuning
@ 2008-04-23 23:29 Jerry Chu
  2008-04-24 16:32 ` John Heffner
  2008-04-25  7:05 ` David Miller
  0 siblings, 2 replies; 38+ messages in thread
From: Jerry Chu @ 2008-04-23 23:29 UTC (permalink / raw)
  To: netdev

I've been seeing the same problem here and am trying to fix it.
My fix is to not count those pkts still in the host queue as "prior_in_flight"
when feeding the latter to tcp_cong_avoid(). This should cause
tcp_is_cwnd_limited() test to fail when the previous in_flight build-up
is all due to the large host queue, and stop the cwnd to grow beyond
what's really necessary.

"sysctl_tcp_tso_win_divisor causes cwnd" also unnecessarily inflates
cwnd quite a bit when TSO is enabled.

Jerry

From: David Miller <davem@davemloft.net>

>
>
> Date: Tue, Apr 22, 2008 at 8:59 PM
>
> Subject: Re: Socket buffer sizes with autotuning
> To: johnwheffner@gmail.com
> Cc: rick.jones2@hp.com, netdev@vger.kernel.org
>
>
> From: "John Heffner" <johnwheffner@gmail.com>
> Date: Tue, 22 Apr 2008 19:17:39 -0700
>
>
>
> > On Tue, Apr 22, 2008 at 5:38 PM, Rick Jones <rick.jones2@hp.com> wrote:
> > >  oslowest:~# netstat -an | grep ESTAB
> > >  ...
> > >  tcp    0 2760560 10.208.0.1:40500    10.208.0.45:42049   ESTABLISHED
> > >  ...
> > >
> > >  Is this expected behaviour?
> >
>
> > What is your interface txqueuelen and mtu?  If you have a very large
> > interface queue, TCP will happily fill it up unless you are using a
> > delay-based congestion controller.
>
>
> Yes, that's the fundamental problem with loss based congestion
> control.  If there are any queues in the path, TCP will fill them up.
>
> Vegas and other similar techniques are able to avoid this, but come
> with the fundamental flaw that it's easy to get them into situations
> where they do not respond to increases in pipe space adequately, and
> thus underperform compared to loss based algorithms.
>
>
>
> --
>
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-23 23:29 Socket buffer sizes with autotuning Jerry Chu
@ 2008-04-24 16:32 ` John Heffner
  2008-04-25  0:49   ` Jerry Chu
  2008-04-25  7:05 ` David Miller
  1 sibling, 1 reply; 38+ messages in thread
From: John Heffner @ 2008-04-24 16:32 UTC (permalink / raw)
  To: Jerry Chu; +Cc: netdev

On Wed, Apr 23, 2008 at 4:29 PM, Jerry Chu <hkchu@google.com> wrote:
>
> I've been seeing the same problem here and am trying to fix it.
>  My fix is to not count those pkts still in the host queue as "prior_in_flight"
>  when feeding the latter to tcp_cong_avoid(). This should cause
>  tcp_is_cwnd_limited() test to fail when the previous in_flight build-up
>  is all due to the large host queue, and stop the cwnd to grow beyond
>  what's really necessary.

Sounds like a useful optimization.  Do you have a patch?

  -John

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-24 16:32 ` John Heffner
@ 2008-04-25  0:49   ` Jerry Chu
  2008-04-25  6:46     ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-04-25  0:49 UTC (permalink / raw)
  To: John Heffner; +Cc: netdev, rick.jones2, davem

On Thu, Apr 24, 2008 at 9:32 AM, John Heffner <johnwheffner@gmail.com> wrote:
>
> On Wed, Apr 23, 2008 at 4:29 PM, Jerry Chu <hkchu@google.com> wrote:
> >
> > I've been seeing the same problem here and am trying to fix it.
> >  My fix is to not count those pkts still in the host queue as "prior_in_flight"
> >  when feeding the latter to tcp_cong_avoid(). This should cause
> >  tcp_is_cwnd_limited() test to fail when the previous in_flight build-up
> >  is all due to the large host queue, and stop the cwnd to grow beyond
> >  what's really necessary.
>
> Sounds like a useful optimization.  Do you have a patch?

Am working on one, but still need to completely rootcause the problem first,
and do a lot more testing. I, like Rick Jones, have for a while thought either
the autotuning, or the Congestion Window Validation (rfc2861) code should
dampen the cwnd growth so the bug must be there, until last week when I
decided to get to the bottom of this problem.

One question: I currently use skb_shinfo(skb)->dataref == 1 for skb's on the
sk_write_queue list as the heuristic to determine if a packet has hit the wire.
This seems a good solution for the normal cases without requiring changes
to the driver to notify TCP in the xmit completion path. But I can imagine there
may be cases where another below-IP consumer of skb, e.g., tcpdump, can
nullify the above heuristic. If the below IP consumer causes the skb ref count
to drop to 1 prematurally, well the inflated cwnd problem comes back but it's
no worse than before. What if the below IP skb reader causes the skb
ref count to remain > 1 while pkts have long hit the wire? This may cause the
fix to prevent cwnd from growing when needed, hence hurting performance.
Is there a better solution than checking against dataref to determine if a pkt
has hit the wire?

Also the code to determine when/how much to defer in the TSO path seems
too aggressive. It's currently based on a percentage
(sysctl_tcp_tso_win_divisor)
of min(snd_wnd, snd_cwnd). Would it be too much if the value is large? E.g.,
when I disable sysctl_tcp_tso_win_divisor, the cwnd of my simple netperf run
drops exactly 1/3 from 1037 (segments) to 695. It seems to me the TSO
defer factor should be based on an absolute count, e.g., 64KB.

Jerry

>
>  -John
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-25  0:49   ` Jerry Chu
@ 2008-04-25  6:46     ` David Miller
  2008-04-25 21:29       ` Jerry Chu
  2008-04-28 18:30       ` Jerry Chu
  0 siblings, 2 replies; 38+ messages in thread
From: David Miller @ 2008-04-25  6:46 UTC (permalink / raw)
  To: hkchu; +Cc: johnwheffner, netdev, rick.jones2

From: "Jerry Chu" <hkchu@google.com>
Date: Thu, 24 Apr 2008 17:49:33 -0700

> One question: I currently use skb_shinfo(skb)->dataref == 1 for skb's on the
> sk_write_queue list as the heuristic to determine if a packet has hit the wire.

This doesn't work for the reasons that you mention in detail next :-)

> Is there a better solution than checking against dataref to determine if a pkt
> has hit the wire?

Unfortunately, no there isn't.

Part of the issue is that the driver is only working with a clone, but
if a packet gets resent before the driver gives up it's reference,
we'll make a completely new copy.

But even assuming we could say that the driver gets a clone all the
time, the "sent" state would need to be in the shared data area.

> Also the code to determine when/how much to defer in the TSO path seems
> too aggressive. It's currently based on a percentage
> (sysctl_tcp_tso_win_divisor)
> of min(snd_wnd, snd_cwnd). Would it be too much if the value is large? E.g.,
> when I disable sysctl_tcp_tso_win_divisor, the cwnd of my simple netperf run
> drops exactly 1/3 from 1037 (segments) to 695. It seems to me the TSO
> defer factor should be based on an absolute count, e.g., 64KB.

This is one of the most difficult knobs to get right in the TSO code.

If the percentage is too low, you'll notice that cpu utilization
increases because you aren't accumulating enough data to send down the
largest possible TSO frames.

But yes you are absolutely right that we should have a hard limit
of 64K here, since we can't build a larger TSO frame anyways.

In fact I thought we had something like that here already :-/

Wait, in fact we do, it's just hidden behind a variable now:

	/* If a full-sized TSO skb can be sent, do it. */
	if (limit >= sk->sk_gso_max_size)
		goto send_now;

:-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-25  6:46     ` David Miller
@ 2008-04-25 21:29       ` Jerry Chu
  2008-04-25 21:35         ` David Miller
  2008-04-28 18:30       ` Jerry Chu
  1 sibling, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-04-25 21:29 UTC (permalink / raw)
  To: David Miller; +Cc: johnwheffner, netdev, rick.jones2

On Thu, Apr 24, 2008 at 11:46 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Thu, 24 Apr 2008 17:49:33 -0700
>
>
>  > One question: I currently use skb_shinfo(skb)->dataref == 1 for skb's on the
>  > sk_write_queue list as the heuristic to determine if a packet has hit the wire.
>
>  This doesn't work for the reasons that you mention in detail next :-)
>
>
>  > Is there a better solution than checking against dataref to determine if a pkt
>  > has hit the wire?
>
>  Unfortunately, no there isn't.
>
>  Part of the issue is that the driver is only working with a clone, but
>  if a packet gets resent before the driver gives up it's reference,
>  we'll make a completely new copy.

I think we can ignore this case if it happens rarely.

>
>  But even assuming we could say that the driver gets a clone all the
>  time, the "sent" state would need to be in the shared data area.

Ok.

>
>
>  > Also the code to determine when/how much to defer in the TSO path seems
>  > too aggressive. It's currently based on a percentage
>  > (sysctl_tcp_tso_win_divisor)
>  > of min(snd_wnd, snd_cwnd). Would it be too much if the value is large? E.g.,
>  > when I disable sysctl_tcp_tso_win_divisor, the cwnd of my simple netperf run
>  > drops exactly 1/3 from 1037 (segments) to 695. It seems to me the TSO
>  > defer factor should be based on an absolute count, e.g., 64KB.
>
>  This is one of the most difficult knobs to get right in the TSO code.
>
>  If the percentage is too low, you'll notice that cpu utilization
>  increases because you aren't accumulating enough data to send down the
>  largest possible TSO frames.

Well, there is a fine line to walk before CPU efficiency and traffic
burstiness. The TSO defer code causes a few hundred KB of bursts that
quickly blow away our small switch buffers. The matter may get even
worse for 10GE.

>
>  But yes you are absolutely right that we should have a hard limit
>  of 64K here, since we can't build a larger TSO frame anyways.
>
>  In fact I thought we had something like that here already :-/
>
>  Wait, in fact we do, it's just hidden behind a variable now:
>
>         /* If a full-sized TSO skb can be sent, do it. */
>         if (limit >= sk->sk_gso_max_size)
>                 goto send_now;

Oh, just realized I've been working on a very "old" (2.6.18 :-)
version of kernel.
Will get the latest 2.6.25 and take a look. I can't find "skb_release_all()"
function you pointed in a later mail either. Guess the Linux kernel
code is rewritten every few month :-(.

Jerry

>
>  :-)
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-25 21:29       ` Jerry Chu
@ 2008-04-25 21:35         ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2008-04-25 21:35 UTC (permalink / raw)
  To: hkchu; +Cc: johnwheffner, netdev, rick.jones2

From: "Jerry Chu" <hkchu@google.com>
Date: Fri, 25 Apr 2008 14:29:25 -0700

> Will get the latest 2.6.25 and take a look. I can't find "skb_release_all()"
> function you pointed in a later mail either. Guess the Linux kernel
> code is rewritten every few month :-(.

You want bugs fixed, like the one you reported here, don't you?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-25  6:46     ` David Miller
  2008-04-25 21:29       ` Jerry Chu
@ 2008-04-28 18:30       ` Jerry Chu
  2008-04-28 19:21         ` John Heffner
  1 sibling, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-04-28 18:30 UTC (permalink / raw)
  To: David Miller; +Cc: johnwheffner, netdev, rick.jones2

On Thu, Apr 24, 2008 at 11:46 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Thu, 24 Apr 2008 17:49:33 -0700
>
>
>  > One question: I currently use skb_shinfo(skb)->dataref == 1 for skb's on the
>  > sk_write_queue list as the heuristic to determine if a packet has hit the wire.
>
>  This doesn't work for the reasons that you mention in detail next :-)
>
>
>  > Is there a better solution than checking against dataref to determine if a pkt
>  > has hit the wire?
>
>  Unfortunately, no there isn't.
>
>  Part of the issue is that the driver is only working with a clone, but
>  if a packet gets resent before the driver gives up it's reference,
>  we'll make a completely new copy.
>
>  But even assuming we could say that the driver gets a clone all the
>  time, the "sent" state would need to be in the shared data area.
>
>
>  > Also the code to determine when/how much to defer in the TSO path seems
>  > too aggressive. It's currently based on a percentage
>  > (sysctl_tcp_tso_win_divisor)
>  > of min(snd_wnd, snd_cwnd). Would it be too much if the value is large? E.g.,
>  > when I disable sysctl_tcp_tso_win_divisor, the cwnd of my simple netperf run
>  > drops exactly 1/3 from 1037 (segments) to 695. It seems to me the TSO
>  > defer factor should be based on an absolute count, e.g., 64KB.
>
>  This is one of the most difficult knobs to get right in the TSO code.
>
>  If the percentage is too low, you'll notice that cpu utilization
>  increases because you aren't accumulating enough data to send down the
>  largest possible TSO frames.
>
>  But yes you are absolutely right that we should have a hard limit
>  of 64K here, since we can't build a larger TSO frame anyways.
>
>  In fact I thought we had something like that here already :-/
>
>  Wait, in fact we do, it's just hidden behind a variable now:
>
>         /* If a full-sized TSO skb can be sent, do it. */
>         if (limit >= sk->sk_gso_max_size)
>                 goto send_now;
>
>  :-)

Correct, but its counterpart doesn't exist in tcp_is_cwnd_limited(). So
cwnd will continue to grow when left < cwnd/sysctl_tcp_tso_win_divisor,
which can be very large when cwnd is large.

If I change tcp_tso_win_divisor to 0, cwnd max out at 695 rather than 1037,
exactly off by 1/3. I tried to add the same check to tcp_is_cwnd_limited():

diff -c /tmp/tcp.h.old tcp.h
*** /tmp/tcp.h.old      Mon Apr 28 11:00:44 2008
--- tcp.h       Mon Apr 28 10:54:10 2008
***************
*** 828,833 ****
--- 828,835 ----
                return 0;

        left = tp->snd_cwnd - in_flight;
+       if (left >= 65536)
+               return 0;
        if (sysctl_tcp_tso_win_divisor)
                return left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd;
        else

>

But it doesn't seem to help (cwnd still grows to 1037).

Jerry

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-28 18:30       ` Jerry Chu
@ 2008-04-28 19:21         ` John Heffner
  2008-04-28 20:44           ` Jerry Chu
       [not found]           ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
  0 siblings, 2 replies; 38+ messages in thread
From: John Heffner @ 2008-04-28 19:21 UTC (permalink / raw)
  To: Jerry Chu; +Cc: David Miller, netdev, rick.jones2

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

On Mon, Apr 28, 2008 at 11:30 AM, Jerry Chu <hkchu@google.com> wrote:
>  Correct, but its counterpart doesn't exist in tcp_is_cwnd_limited(). So
>  cwnd will continue to grow when left < cwnd/sysctl_tcp_tso_win_divisor,
>  which can be very large when cwnd is large.
>
>  If I change tcp_tso_win_divisor to 0, cwnd max out at 695 rather than 1037,
>  exactly off by 1/3. I tried to add the same check to tcp_is_cwnd_limited():
>
>  diff -c /tmp/tcp.h.old tcp.h
>  *** /tmp/tcp.h.old      Mon Apr 28 11:00:44 2008
>  --- tcp.h       Mon Apr 28 10:54:10 2008
>  ***************
>  *** 828,833 ****
>  --- 828,835 ----
>                 return 0;
>
>         left = tp->snd_cwnd - in_flight;
>  +       if (left >= 65536)
>  +               return 0;
>         if (sysctl_tcp_tso_win_divisor)
>                 return left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd;
>         else
>
>  >
>
>  But it doesn't seem to help (cwnd still grows to 1037).

Cwnd is in packets, not bytes.

Try this series of patches, against net-next.

  -John

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-NET-Allow-send-limited-cwnd-to-grow-up-to-max_bur.patch --]
[-- Type: text/x-diff; name=0001-NET-Allow-send-limited-cwnd-to-grow-up-to-max_bur.patch, Size: 1231 bytes --]

From 57039cfc486ae4deac3b1fecaa238b5483c034df Mon Sep 17 00:00:00 2001
From: John Heffner <johnwheffner@gmail.com>
Date: Mon, 28 Apr 2008 12:17:49 -0700
Subject: [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled

This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow
up to tcp_max_burst() even when sk_can_gso() is false, or when
sysctl_tcp_tso_win_divisor != 0.

Signed-off-by: John Heffner <johnwheffner@gmail.com>
---
 net/ipv4/tcp_cong.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 3a6be23..bfb1996 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -285,14 +285,11 @@ int tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
 	if (in_flight >= tp->snd_cwnd)
 		return 1;
 
-	if (!sk_can_gso(sk))
-		return 0;
-
 	left = tp->snd_cwnd - in_flight;
-	if (sysctl_tcp_tso_win_divisor)
-		return left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd;
-	else
-		return left <= tcp_max_burst(tp);
+	if (sk_can_gso(sk) &&
+	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd)
+		return 1;
+	return left <= tcp_max_burst(tp);
 }
 EXPORT_SYMBOL_GPL(tcp_is_cwnd_limited);
 
-- 
1.5.2.5


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-NET-Limit-cwnd-growth-when-deferring-for-GSO.patch --]
[-- Type: text/x-diff; name=0002-NET-Limit-cwnd-growth-when-deferring-for-GSO.patch, Size: 992 bytes --]

From fc1d9fe99748f3811a0a5b9d93463cf0e4788f56 Mon Sep 17 00:00:00 2001
From: John Heffner <johnwheffner@gmail.com>
Date: Mon, 28 Apr 2008 12:19:22 -0700
Subject: [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO

This fixes inappropriately large cwnd growth on sender-limited flows
when GSO is enabled, limiting cwnd growth to 64k.

Signed-off-by: John Heffner <johnwheffner@gmail.com>
---
 net/ipv4/tcp_cong.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index bfb1996..6a25082 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -287,7 +287,8 @@ int tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
 
 	left = tp->snd_cwnd - in_flight;
 	if (sk_can_gso(sk) &&
-	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd)
+	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd &&
+	    left * tp->mss_cache < sk->sk_gso_max_size)
 		return 1;
 	return left <= tcp_max_burst(tp);
 }
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-28 19:21         ` John Heffner
@ 2008-04-28 20:44           ` Jerry Chu
  2008-04-28 23:22             ` [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled John Heffner
       [not found]           ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
  1 sibling, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-04-28 20:44 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev, rick.jones2

On Mon, Apr 28, 2008 at 12:21 PM, John Heffner <johnwheffner@gmail.com> wrote:
>
> On Mon, Apr 28, 2008 at 11:30 AM, Jerry Chu <hkchu@google.com> wrote:
> >  Correct, but its counterpart doesn't exist in tcp_is_cwnd_limited(). So
> >  cwnd will continue to grow when left < cwnd/sysctl_tcp_tso_win_divisor,
> >  which can be very large when cwnd is large.
> >
> >  If I change tcp_tso_win_divisor to 0, cwnd max out at 695 rather than 1037,
> >  exactly off by 1/3. I tried to add the same check to tcp_is_cwnd_limited():
> >
> >  diff -c /tmp/tcp.h.old tcp.h
> >  *** /tmp/tcp.h.old      Mon Apr 28 11:00:44 2008
> >  --- tcp.h       Mon Apr 28 10:54:10 2008
> >  ***************
> >  *** 828,833 ****
> >  --- 828,835 ----
> >                 return 0;
> >
> >         left = tp->snd_cwnd - in_flight;
> >  +       if (left >= 65536)
> >  +               return 0;
> >         if (sysctl_tcp_tso_win_divisor)
> >                 return left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd;
> >         else
> >
> >  >
> >
> >  But it doesn't seem to help (cwnd still grows to 1037).
>
> Cwnd is in packets, not bytes.

Yes, of course.

>
> Try this series of patches, against net-next.

Ah, you already know about this problem. Yes it does the trick.

Jerry

>
>  -John
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled
  2008-04-28 20:44           ` Jerry Chu
@ 2008-04-28 23:22             ` John Heffner
  2008-04-28 23:22               ` [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO John Heffner
  0 siblings, 1 reply; 38+ messages in thread
From: John Heffner @ 2008-04-28 23:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, John Heffner

This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow
up to tcp_max_burst() even when sk_can_gso() is false, or when
sysctl_tcp_tso_win_divisor != 0.

Signed-off-by: John Heffner <johnwheffner@gmail.com>
---
 net/ipv4/tcp_cong.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 3a6be23..bfb1996 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -285,14 +285,11 @@ int tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
 	if (in_flight >= tp->snd_cwnd)
 		return 1;
 
-	if (!sk_can_gso(sk))
-		return 0;
-
 	left = tp->snd_cwnd - in_flight;
-	if (sysctl_tcp_tso_win_divisor)
-		return left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd;
-	else
-		return left <= tcp_max_burst(tp);
+	if (sk_can_gso(sk) &&
+	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd)
+		return 1;
+	return left <= tcp_max_burst(tp);
 }
 EXPORT_SYMBOL_GPL(tcp_is_cwnd_limited);
 
-- 
1.5.2.5


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO
  2008-04-28 23:22             ` [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled John Heffner
@ 2008-04-28 23:22               ` John Heffner
  0 siblings, 0 replies; 38+ messages in thread
From: John Heffner @ 2008-04-28 23:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, John Heffner

This fixes inappropriately large cwnd growth on sender-limited flows
when GSO is enabled, limiting cwnd growth to 64k.

Signed-off-by: John Heffner <johnwheffner@gmail.com>
---
 net/ipv4/tcp_cong.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index bfb1996..6a25082 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -287,7 +287,8 @@ int tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
 
 	left = tp->snd_cwnd - in_flight;
 	if (sk_can_gso(sk) &&
-	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd)
+	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd &&
+	    left * tp->mss_cache < sk->sk_gso_max_size)
 		return 1;
 	return left <= tcp_max_burst(tp);
 }
-- 
1.5.2.5


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

^ permalink raw reply related	[flat|nested] 38+ messages in thread

[parent not found: <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>]

* Re: Socket buffer sizes with autotuning
       [not found]           ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
@ 2008-04-28 23:28             ` John Heffner
  2008-04-28 23:35               ` David Miller
  2008-04-29  2:20               ` Jerry Chu
  0 siblings, 2 replies; 38+ messages in thread
From: John Heffner @ 2008-04-28 23:28 UTC (permalink / raw)
  To: Jerry Chu; +Cc: David Miller, netdev, rick.jones2

On Mon, Apr 28, 2008 at 1:38 PM, Jerry Chu <hkchu@google.com> wrote:
> > Try this series of patches, against net-next.
>
> Ah, you already know about this problem. Yes it does the trick.

I had not actually known about this.  (Just cooked up the patches
after I saw your mail.)  This looks to me like a clear bug.  After
looking carefully at the code there, it seems there was another small
problem, too, though not likely to have much effect.

Dave, can you apply the patches I just sent?

Thanks,
  -John

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-28 23:28             ` Socket buffer sizes with autotuning John Heffner
@ 2008-04-28 23:35               ` David Miller
  2008-04-29  2:20               ` Jerry Chu
  1 sibling, 0 replies; 38+ messages in thread
From: David Miller @ 2008-04-28 23:35 UTC (permalink / raw)
  To: johnwheffner; +Cc: hkchu, netdev, rick.jones2

From: "John Heffner" <johnwheffner@gmail.com>
Date: Mon, 28 Apr 2008 16:28:16 -0700

> Dave, can you apply the patches I just sent?

Sure thing.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-28 23:28             ` Socket buffer sizes with autotuning John Heffner
  2008-04-28 23:35               ` David Miller
@ 2008-04-29  2:20               ` Jerry Chu
  1 sibling, 0 replies; 38+ messages in thread
From: Jerry Chu @ 2008-04-29  2:20 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev, rick.jones2

Just for the record, in my netperf RR (1MB request/20B reply) over
1GE/tg3 test on top of 2.6.18, changing tcp_tso_win_divisor to 0
from the default of 3 gets the cwnd down to 695 from 1037. Applying
the bound check (left * tp->mss_cache < 65536) will get cwnd down
to 737, not 695. Still big improvement though. (Anyway this may be
moot after I fix the real culprit.)

Jerry

On Mon, Apr 28, 2008 at 4:28 PM, John Heffner <johnwheffner@gmail.com> wrote:
>
> On Mon, Apr 28, 2008 at 1:38 PM, Jerry Chu <hkchu@google.com> wrote:
> > > Try this series of patches, against net-next.
> >
> > Ah, you already know about this problem. Yes it does the trick.
>
> I had not actually known about this.  (Just cooked up the patches
> after I saw your mail.)  This looks to me like a clear bug.  After
> looking carefully at the code there, it seems there was another small
> problem, too, though not likely to have much effect.
>
> Dave, can you apply the patches I just sent?
>
> Thanks,
>  -John
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-23 23:29 Socket buffer sizes with autotuning Jerry Chu
  2008-04-24 16:32 ` John Heffner
@ 2008-04-25  7:05 ` David Miller
  2008-05-07  3:57   ` Jerry Chu
  1 sibling, 1 reply; 38+ messages in thread
From: David Miller @ 2008-04-25  7:05 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Wed, 23 Apr 2008 16:29:58 -0700

> I've been seeing the same problem here and am trying to fix it.
> My fix is to not count those pkts still in the host queue as "prior_in_flight"
> when feeding the latter to tcp_cong_avoid(). This should cause
> tcp_is_cwnd_limited() test to fail when the previous in_flight build-up
> is all due to the large host queue, and stop the cwnd to grow beyond
> what's really necessary.

Does something like the following suit your needs?

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 299ec4b..6cdf4be 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -140,6 +140,7 @@ struct skb_frag_struct {
  */
 struct skb_shared_info {
 	atomic_t	dataref;
+	atomic_t	*in_flight;
 	unsigned short	nr_frags;
 	unsigned short	gso_size;
 	/* Warning: this field is not always filled in (UFO)! */
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index d96d9b1..62bb58d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -271,6 +271,8 @@ struct tcp_sock {
 	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
 	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
 
+	atomic_t host_inflight;	/* packets queued in transmit path	*/
+
 	/* Data for direct copy to user */
 	struct {
 		struct sk_buff_head	prequeue;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4fe605f..a6880c2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -212,6 +212,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	/* make sure we initialize shinfo sequentially */
 	shinfo = skb_shinfo(skb);
 	atomic_set(&shinfo->dataref, 1);
+	shinfo->in_flight = NULL;
 	shinfo->nr_frags  = 0;
 	shinfo->gso_size = 0;
 	shinfo->gso_segs = 0;
@@ -403,6 +404,8 @@ static void skb_release_all(struct sk_buff *skb)
 void __kfree_skb(struct sk_buff *skb)
 {
 	skb_release_all(skb);
+	if (skb_shinfo(skb)->in_flight)
+		atomic_dec(skb_shinfo(skb)->in_flight);
 	kfree_skbmem(skb);
 }
 
@@ -486,6 +489,8 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 	atomic_set(&n->users, 1);
 
 	atomic_inc(&(skb_shinfo(skb)->dataref));
+	if (skb_shinfo(skb)->in_flight)
+		atomic_inc(skb_shinfo(skb)->in_flight);
 	skb->cloned = 1;
 
 	return n;
@@ -743,6 +748,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	skb->hdr_len  = 0;
 	skb->nohdr    = 0;
 	atomic_set(&skb_shinfo(skb)->dataref, 1);
+	skb_shinfo(skb)->in_flight = NULL;
 	return 0;
 
 nodata:
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f886531..28a71fd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -479,6 +479,7 @@ static inline void skb_entail(struct sock *sk, struct sk_buff *skb)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
 
+	skb_shinfo(skb)->in_flight = &tp->host_inflight;
 	skb->csum    = 0;
 	tcb->seq     = tcb->end_seq = tp->write_seq;
 	tcb->flags   = TCPCB_FLAG_ACK;


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-04-25  7:05 ` David Miller
@ 2008-05-07  3:57   ` Jerry Chu
  2008-05-07  4:27     ` David Miller
  2008-05-07  4:28     ` David Miller
  0 siblings, 2 replies; 38+ messages in thread
From: Jerry Chu @ 2008-05-07  3:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

I fail to see how adding shinfo->in_flight to count how many outstanding clones
are there can help accounting for how many "host_inflight" pkts. Part
of the problems,
as you've mentioned before, is that the driver may not always get a
clone. It may
be getting a copy (e.g., when GSO is on?) hence losing all its connection to the
original tp and any chance to have the pkt properly accounted for as
host_infligh
by TCP. The skb may also be cloned more than once (e.g., due to tcpdump)...

That said, I also fail to come up with a more bullet-proof solution
after studying
much of the TSO/GSO code without requring driver and more skb changes. So
I'm currently leaning toward my original fix of checking
if (1 == (atomic_read(&skb_shinfo(skb1)->dataref) & SKB_DATAREF_MASK))

My current prototype scans either sk_send_head or sk_write_queue backwards
until the above condition is true. I'm thinking about adding and
maintaining a new "tp->host_queue_head" field to avoid most of the
scanning. Also it seems much
less costly to add a new field to tcp_sock than to
skb/skb_shared_info. If you have
a better idea please let me know.

Jerry

On Fri, Apr 25, 2008 at 12:05 AM, David Miller <davem@davemloft.net> wrote:
>
> From: "Jerry Chu" <hkchu@google.com>
> Date: Wed, 23 Apr 2008 16:29:58 -0700
>
>
> > I've been seeing the same problem here and am trying to fix it.
> > My fix is to not count those pkts still in the host queue as "prior_in_flight"
> > when feeding the latter to tcp_cong_avoid(). This should cause
> > tcp_is_cwnd_limited() test to fail when the previous in_flight build-up
> > is all due to the large host queue, and stop the cwnd to grow beyond
> > what's really necessary.
>
> Does something like the following suit your needs?
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 299ec4b..6cdf4be 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -140,6 +140,7 @@ struct skb_frag_struct {
>  */
>  struct skb_shared_info {
>        atomic_t        dataref;
> +       atomic_t        *in_flight;
>        unsigned short  nr_frags;
>        unsigned short  gso_size;
>        /* Warning: this field is not always filled in (UFO)! */
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index d96d9b1..62bb58d 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -271,6 +271,8 @@ struct tcp_sock {
>        u32     rcv_tstamp;     /* timestamp of last received ACK (for keepalives) */
>        u32     lsndtime;       /* timestamp of last sent data packet (for restart window) */
>
> +       atomic_t host_inflight; /* packets queued in transmit path      */
> +
>        /* Data for direct copy to user */
>        struct {
>                struct sk_buff_head     prequeue;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 4fe605f..a6880c2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -212,6 +212,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>        /* make sure we initialize shinfo sequentially */
>        shinfo = skb_shinfo(skb);
>        atomic_set(&shinfo->dataref, 1);
> +       shinfo->in_flight = NULL;
>        shinfo->nr_frags  = 0;
>        shinfo->gso_size = 0;
>        shinfo->gso_segs = 0;
> @@ -403,6 +404,8 @@ static void skb_release_all(struct sk_buff *skb)
>  void __kfree_skb(struct sk_buff *skb)
>  {
>        skb_release_all(skb);
> +       if (skb_shinfo(skb)->in_flight)
> +               atomic_dec(skb_shinfo(skb)->in_flight);
>        kfree_skbmem(skb);
>  }
>
> @@ -486,6 +489,8 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
>        atomic_set(&n->users, 1);
>
>        atomic_inc(&(skb_shinfo(skb)->dataref));
> +       if (skb_shinfo(skb)->in_flight)
> +               atomic_inc(skb_shinfo(skb)->in_flight);
>        skb->cloned = 1;
>
>        return n;
> @@ -743,6 +748,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>        skb->hdr_len  = 0;
>        skb->nohdr    = 0;
>        atomic_set(&skb_shinfo(skb)->dataref, 1);
> +       skb_shinfo(skb)->in_flight = NULL;
>        return 0;
>
>  nodata:
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index f886531..28a71fd 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -479,6 +479,7 @@ static inline void skb_entail(struct sock *sk, struct sk_buff *skb)
>        struct tcp_sock *tp = tcp_sk(sk);
>        struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
>
> +       skb_shinfo(skb)->in_flight = &tp->host_inflight;
>        skb->csum    = 0;
>        tcb->seq     = tcb->end_seq = tp->write_seq;
>        tcb->flags   = TCPCB_FLAG_ACK;
>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07  3:57   ` Jerry Chu
@ 2008-05-07  4:27     ` David Miller
  2008-05-07 18:36       ` Jerry Chu
  2008-05-07  4:28     ` David Miller
  1 sibling, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-07  4:27 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Tue, 6 May 2008 20:57:46 -0700

> I fail to see how adding shinfo->in_flight to count how many
> outstanding clones are there can help accounting for how many
> "host_inflight" pkts. Part of the problems, as you've mentioned
> before, is that the driver may not always get a clone.

Sure but it will get one %99.9999 of the time.

With TCP, as long as that clone is alive the driver has it.  And the
counter only counts the clones.

Anyways, did you even test my patch and try to use it for your needs
or is this analysis purely from your inspection of it? :-/


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07  4:27     ` David Miller
@ 2008-05-07 18:36       ` Jerry Chu
  2008-05-07 21:18         ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-07 18:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, May 6, 2008 at 9:27 PM, David Miller <davem@davemloft.net> wrote:

>
> From: "Jerry Chu" <hkchu@google.com>
> Date: Tue, 6 May 2008 20:57:46 -0700
>
>
> > I fail to see how adding shinfo->in_flight to count how many
> > outstanding clones are there can help accounting for how many
> > "host_inflight" pkts. Part of the problems, as you've mentioned
> > before, is that the driver may not always get a clone.
>
> Sure but it will get one %99.9999 of the time.

I couldn't care less about some rare boundary case either. What I'm
concerned about is if there is some popular below-IP module/feature that,
when turned on, will render my host_inflight check (using dataref==1 as
an indication that the pkt has left the host) completely bogus. Turning on
pkt tapping (e.g., tcpdump) seems to be one such case, although one
may argue that belongs to those edge usage conditions that we don't
need to worry about. (OTOH it may be undesirable for tcpdump to have
significant performance side effect for those who try to use tcpdump to
debug performance problem.) Turning on GSO but not TSO seems to
be another case where the driver gets a copy all the time (kernel version
2.6.18).

If there is no other popular below-IP module/feature that will break the
dataref==1 then perhaps my initial concern was invalid. In either case, I
don't see your patch any better than my dataref==1 check. Neither one
address the below-IP module/feature concern described before.

>
>
> With TCP, as long as that clone is alive the driver has it.  And the
> counter only counts the clones.
>
> Anyways, did you even test my patch and try to use it for your needs
> or is this analysis purely from your inspection of it? :-/

No I haven't tested your patch. I tried to understand skb better before
applying your patch. After I studied bunch of code, I come to the conclusion
that your patch won't work for me. First it tracks # of clones, which is not
what I need. E.g., tcpdump will cause host_inflight to be grossly wrong.
Ok, maybe we can ignore tcpdump, your patch counts # of cloned skb
whereas I need a count of # of pkts. Perhaps this can be fixed also, but it
then dawned on me that wouldn't it be more desirable to add the space
overhead per tcp_sock than per skb?

Jerry

>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07 18:36       ` Jerry Chu
@ 2008-05-07 21:18         ` David Miller
  2008-05-08  1:37           ` Jerry Chu
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-07 21:18 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Wed, 7 May 2008 11:36:59 -0700

> No I haven't tested your patch. I tried to understand skb better before
> applying your patch. After I studied bunch of code, I come to the conclusion
> that your patch won't work for me. First it tracks # of clones, which is not
> what I need. E.g., tcpdump will cause host_inflight to be grossly wrong.

We can make sub-clones not count.

Also, we already can distinguish this case, because all SKB clones
made by TCP are fast-clones.  So we could only bump the counter for
fast clones.  If tcpdump clones it again, it won't be a fast clone and
therefore we can avoid bumping the counter in that case.  Similarly
for other features that want to clone.

Please try to get your idea working with my infrastructure.  We
can modify it to behave however you need it to, but at the core
it's the idea that tracks the state most directly and properly.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07 21:18         ` David Miller
@ 2008-05-08  1:37           ` Jerry Chu
  2008-05-08  1:43             ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-08  1:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, May 7, 2008 at 2:18 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Wed, 7 May 2008 11:36:59 -0700
>
>
>  > No I haven't tested your patch. I tried to understand skb better before
>  > applying your patch. After I studied bunch of code, I come to the conclusion
>  > that your patch won't work for me. First it tracks # of clones, which is not
>  > what I need. E.g., tcpdump will cause host_inflight to be grossly wrong.
>
>  We can make sub-clones not count.
>
>  Also, we already can distinguish this case, because all SKB clones
>  made by TCP are fast-clones.  So we could only bump the counter for
>  fast clones.  If tcpdump clones it again, it won't be a fast clone and
>  therefore we can avoid bumping the counter in that case.  Similarly
>  for other features that want to clone.

Ok, just Google search "skb fast clone" and found some posting from you.
Will take a look.

>
>  Please try to get your idea working with my infrastructure.  We
>  can modify it to behave however you need it to, but at the core
>  it's the idea that tracks the state most directly and properly.
>

Ok, will give it a try. First i'll fix your patch to
atomic_add()/atomic_sub() by
skb_shinfo(skb)->gso_segs rather than always 1, in order for GSO/TSO to work.

One problem came up to my mind - it seems possible for __kfree_skb() to
access skb_shinfo(skb)->in_flight whose tp has been freed up since only the
original skb's on TCP's rexmit list have the owner set and socket
held. One solution
is for TCP to zap skb_shinfo(skb)->in_flight field when it's ready to
free up skb.
I can hack sock_wfree() to do this, but I don't know how to do it right.

Jerry

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-08  1:37           ` Jerry Chu
@ 2008-05-08  1:43             ` David Miller
  2008-05-08  3:33               ` Jerry Chu
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-08  1:43 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Wed, 7 May 2008 18:37:01 -0700

> Ok, will give it a try. First i'll fix your patch to
> atomic_add()/atomic_sub() by
> skb_shinfo(skb)->gso_segs rather than always 1, in order for GSO/TSO to work.

That might not work.  gso_segs can change over time as retransmit
packets get split up due to SACKs etc.  it needs to be audited,
at the very least.

> One problem came up to my mind - it seems possible for __kfree_skb() to
> access skb_shinfo(skb)->in_flight whose tp has been freed up since only the
> original skb's on TCP's rexmit list have the owner set and socket
> held. One solution
> is for TCP to zap skb_shinfo(skb)->in_flight field when it's ready to
> free up skb.
> I can hack sock_wfree() to do this, but I don't know how to do it right.

There will be references to the socket, so this should be ok.

If it isn't we can adjust the count and zap the pointer in
skb_orphan().

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-08  1:43             ` David Miller
@ 2008-05-08  3:33               ` Jerry Chu
  2008-05-12 22:22                 ` Jerry Chu
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-08  3:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

There seems to be quite a bit of complexity plus one additional pointer
field per skb_shared_info to make skb better track when a pkt leaves
the host. Now I wonder if it's really a better solution than my original,
simply checking dataref==1 approach which, although not bullet proof,
may be "good enough" for all practical purposes?

Jerry

On Wed, May 7, 2008 at 6:43 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Wed, 7 May 2008 18:37:01 -0700
>
>
>  > Ok, will give it a try. First i'll fix your patch to
>  > atomic_add()/atomic_sub() by
>  > skb_shinfo(skb)->gso_segs rather than always 1, in order for GSO/TSO to work.
>
>  That might not work.  gso_segs can change over time as retransmit
>  packets get split up due to SACKs etc.  it needs to be audited,
>  at the very least.
>
>
>  > One problem came up to my mind - it seems possible for __kfree_skb() to
>  > access skb_shinfo(skb)->in_flight whose tp has been freed up since only the
>  > original skb's on TCP's rexmit list have the owner set and socket
>  > held. One solution
>  > is for TCP to zap skb_shinfo(skb)->in_flight field when it's ready to
>  > free up skb.
>  > I can hack sock_wfree() to do this, but I don't know how to do it right.
>
>  There will be references to the socket, so this should be ok.
>
>  If it isn't we can adjust the count and zap the pointer in
>  skb_orphan().
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-08  3:33               ` Jerry Chu
@ 2008-05-12 22:22                 ` Jerry Chu
  2008-05-12 22:29                   ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-12 22:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

I did a quick prototype based on your idea of adding an "in_flight"
field to skb_shared_info to track how many in-flight clones in the
host. I tested
it quickly and it doesn't work. After some thought it was obvious why it
won't work. It's because what the TCP stack needs is to track how
many in-flight pkts are in the host, but your proposed patch increments
"in_flight" once on the 1st __skb_clone() to be sent to the driver, but
decrements "in_flight" TWICE, one for each of the clones to be freed.
I did a quick hack to make it work for my limited test case but I haven't
figured out an acceptable (non-hack) solution.

Continued testing, I discovered the problem I described below where
"in_flight" may point to a tp that has already been freed can not be
addressed by zapping skb_shinfo(skb)->in_flight in sock_wfree(). The
reason is that pkts may be acked and freed by TCP before driver freeing
up its clone copy (e.g., due to driver lazy reclaim...) When that happens
the "host_inflight" accounting will get messed up.

Jerry

On Wed, May 7, 2008 at 8:33 PM, Jerry Chu <hkchu@google.com> wrote:
> There seems to be quite a bit of complexity plus one additional pointer
>  field per skb_shared_info to make skb better track when a pkt leaves
>  the host. Now I wonder if it's really a better solution than my original,
>  simply checking dataref==1 approach which, although not bullet proof,
>  may be "good enough" for all practical purposes?
>
>  Jerry
>
>
>
>  On Wed, May 7, 2008 at 6:43 PM, David Miller <davem@davemloft.net> wrote:
>  > From: "Jerry Chu" <hkchu@google.com>
>  >  Date: Wed, 7 May 2008 18:37:01 -0700
>  >
>  >
>  >  > Ok, will give it a try. First i'll fix your patch to
>  >  > atomic_add()/atomic_sub() by
>  >  > skb_shinfo(skb)->gso_segs rather than always 1, in order for GSO/TSO to work.
>  >
>  >  That might not work.  gso_segs can change over time as retransmit
>  >  packets get split up due to SACKs etc.  it needs to be audited,
>  >  at the very least.
>  >
>  >
>  >  > One problem came up to my mind - it seems possible for __kfree_skb() to
>  >  > access skb_shinfo(skb)->in_flight whose tp has been freed up since only the
>  >  > original skb's on TCP's rexmit list have the owner set and socket
>  >  > held. One solution
>  >  > is for TCP to zap skb_shinfo(skb)->in_flight field when it's ready to
>  >  > free up skb.
>  >  > I can hack sock_wfree() to do this, but I don't know how to do it right.
>  >
>  >  There will be references to the socket, so this should be ok.
>  >
>  >  If it isn't we can adjust the count and zap the pointer in
>  >  skb_orphan().
>  >
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-12 22:22                 ` Jerry Chu
@ 2008-05-12 22:29                   ` David Miller
  2008-05-12 22:31                     ` David Miller
  2008-05-12 22:58                     ` Jerry Chu
  0 siblings, 2 replies; 38+ messages in thread
From: David Miller @ 2008-05-12 22:29 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Mon, 12 May 2008 15:22:55 -0700

> I did a quick prototype based on your idea of adding an "in_flight"
> field to skb_shared_info to track how many in-flight clones in the
> host. I tested
> it quickly and it doesn't work. After some thought it was obvious why it
> won't work. It's because what the TCP stack needs is to track how
> many in-flight pkts are in the host, but your proposed patch increments
> "in_flight" once on the 1st __skb_clone() to be sent to the driver, but
> decrements "in_flight" TWICE, one for each of the clones to be freed.
> I did a quick hack to make it work for my limited test case but I haven't
> figured out an acceptable (non-hack) solution.

That's easy to fix, only set the in_flight pointer in the child clone
skb.  Thanks for figuring that out.

> Continued testing, I discovered the problem I described below where
> "in_flight" may point to a tp that has already been freed can not be
> addressed by zapping skb_shinfo(skb)->in_flight in sock_wfree(). The
> reason is that pkts may be acked and freed by TCP before driver freeing
> up its clone copy (e.g., due to driver lazy reclaim...) When that happens
> the "host_inflight" accounting will get messed up.

Simply notice, when we're about to decrement in_flight, that the data
reference is one.  You can take appropriate actions if so.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-12 22:29                   ` David Miller
@ 2008-05-12 22:31                     ` David Miller
  2008-05-13  3:56                       ` Jerry Chu
  2008-05-12 22:58                     ` Jerry Chu
  1 sibling, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-12 22:31 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: David Miller <davem@davemloft.net>
Date: Mon, 12 May 2008 15:29:00 -0700 (PDT)

> Simply notice, when we're about to decrement in_flight, that the data
> reference is one.  You can take appropriate actions if so.

Actually, there is an even simpler solution.  Since this is a fast
clone, when the parent SKB is freed we can see if the fast-clone child
is active and has the in_flight atomic_t pointer.  If so, we decrement
in_flight and zap it to NULL.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-12 22:31                     ` David Miller
@ 2008-05-13  3:56                       ` Jerry Chu
  2008-05-13  3:58                         ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-13  3:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, May 12, 2008 at 3:31 PM, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
>  Date: Mon, 12 May 2008 15:29:00 -0700 (PDT)
>
>
>  > Simply notice, when we're about to decrement in_flight, that the data
>  > reference is one.  You can take appropriate actions if so.
>
>  Actually, there is an even simpler solution.  Since this is a fast
>  clone, when the parent SKB is freed we can see if the fast-clone child
>  is active and has the in_flight atomic_t pointer.  If so, we decrement
>  in_flight and zap it to NULL.

I thought about that too. But what happens if the clone is being freed by
the driver at the same time when TCP is freeing the parent? Isn't there an
intrinsic race condition here? If true, can we use some atomic operation to
zap in_flight, but also return to the caller if in_flight has been nullified
already so that the accounting can be handled correctly? (Although
tp->host_inflight doesn't need to be super accurate, if there is no simple
solution above I'll have to reset tp->host_inflight periodically and convince
myself the pathological case of host_inflight drifting forever won't ever
happen...)

Jerry

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-13  3:56                       ` Jerry Chu
@ 2008-05-13  3:58                         ` David Miller
  2008-05-13  4:00                           ` Jerry Chu
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-13  3:58 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Mon, 12 May 2008 20:56:16 -0700

> If true, can we use some atomic operation to zap in_flight, but also
> return to the caller if in_flight has been nullified already so that
> the accounting can be handled correctly?

xchg() should work for that purpose.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-13  3:58                         ` David Miller
@ 2008-05-13  4:00                           ` Jerry Chu
  2008-05-13  4:02                             ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-13  4:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Does xchg() return the old value? (Can't find any comments on it.)

Jerry

On Mon, May 12, 2008 at 8:58 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Mon, 12 May 2008 20:56:16 -0700
>
>
>  > If true, can we use some atomic operation to zap in_flight, but also
>  > return to the caller if in_flight has been nullified already so that
>  > the accounting can be handled correctly?
>
>  xchg() should work for that purpose.
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-13  4:00                           ` Jerry Chu
@ 2008-05-13  4:02                             ` David Miller
  2008-05-17  1:13                               ` Jerry Chu
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-13  4:02 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Mon, 12 May 2008 21:00:08 -0700

> Does xchg() return the old value? (Can't find any comments on it.)

Yes, otherwise it wouldn't be very useful.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-13  4:02                             ` David Miller
@ 2008-05-17  1:13                               ` Jerry Chu
  2008-05-17  1:29                                 ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-17  1:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Ok, now the code seems to work when TSO is disabled, i.e., cwnd
remains low (40-60pkts) and only grows slowly for extended period on
a back2back 1GbE link. It also works when tcpdump was running
after I fixed the code, per your suggestion, to only bump the in_flight
count for fast clones.

But it doesn't work when TSO is enabled. I've fixed tso_fragment() to
correctly set skb_shinfo(buff)->in_flight after the split. But cwnd still
grows rapidly to a few hundred pkts (although smaller than w/o
the fix). The host inflight accounting gets screwed up. It looks like
pskb_expand_head() called by tcp_tso_acked()->tcp_trim_head()
messes up the accounting but I don't know how to fix it (still trying
to understand this complex piece of code). There could be other
reason as well.

Jerry

On Mon, May 12, 2008 at 9:02 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
> Date: Mon, 12 May 2008 21:00:08 -0700
>
>> Does xchg() return the old value? (Can't find any comments on it.)
>
> Yes, otherwise it wouldn't be very useful.
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-17  1:13                               ` Jerry Chu
@ 2008-05-17  1:29                                 ` David Miller
  2008-05-17  1:47                                   ` Jerry Chu
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-17  1:29 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Fri, 16 May 2008 18:13:20 -0700

> The host inflight accounting gets screwed up. It looks like
> pskb_expand_head() called by tcp_tso_acked()->tcp_trim_head() messes
> up the accounting but I don't know how to fix it (still trying to
> understand this complex piece of code). There could be other reason
> as well.

This is just like freeing up a normal SKB, so decrementing the
in_flight value the appropriate number of packets should do the right
thing.

tcp_tso_acked() calculates this adjustment for you, in packets_acked.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-17  1:29                                 ` David Miller
@ 2008-05-17  1:47                                   ` Jerry Chu
  0 siblings, 0 replies; 38+ messages in thread
From: Jerry Chu @ 2008-05-17  1:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Fri, May 16, 2008 at 6:29 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
> Date: Fri, 16 May 2008 18:13:20 -0700
>
>> The host inflight accounting gets screwed up. It looks like
>> pskb_expand_head() called by tcp_tso_acked()->tcp_trim_head() messes
>> up the accounting but I don't know how to fix it (still trying to
>> understand this complex piece of code). There could be other reason
>> as well.
>
> This is just like freeing up a normal SKB, so decrementing the
> in_flight value the appropriate number of packets should do the right
> thing.
>
> tcp_tso_acked() calculates this adjustment for you, in packets_acked.
>

The current pskb_expand_head() code sets skb_shinfo(skb)->in_flight
to NULL to leave the in_flight accounting solely to the other clone to
handle. To calculate in_flight more accurately, it seems to require
splitting the gso_segs and in_flight accounting bewteen the clone and
the skb (with new header), right? Plus any race condition to take care...
Hard to think clearly in a Friday afternoon :-(

Jerry

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-12 22:29                   ` David Miller
  2008-05-12 22:31                     ` David Miller
@ 2008-05-12 22:58                     ` Jerry Chu
  2008-05-12 23:01                       ` David Miller
  1 sibling, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-12 22:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, May 12, 2008 at 3:29 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Mon, 12 May 2008 15:22:55 -0700
>
>
>  > I did a quick prototype based on your idea of adding an "in_flight"
>  > field to skb_shared_info to track how many in-flight clones in the
>  > host. I tested
>  > it quickly and it doesn't work. After some thought it was obvious why it
>  > won't work. It's because what the TCP stack needs is to track how
>  > many in-flight pkts are in the host, but your proposed patch increments
>  > "in_flight" once on the 1st __skb_clone() to be sent to the driver, but
>  > decrements "in_flight" TWICE, one for each of the clones to be freed.
>  > I did a quick hack to make it work for my limited test case but I haven't
>  > figured out an acceptable (non-hack) solution.
>
>  That's easy to fix, only set the in_flight pointer in the child clone
>  skb.  Thanks for figuring that out.

I must be missing something. All the clones share the same
"skb_shared_info", how does one only set in_flight in one clone but
not the other?

>
>
>  > Continued testing, I discovered the problem I described below where
>  > "in_flight" may point to a tp that has already been freed can not be
>  > addressed by zapping skb_shinfo(skb)->in_flight in sock_wfree(). The
>  > reason is that pkts may be acked and freed by TCP before driver freeing
>  > up its clone copy (e.g., due to driver lazy reclaim...) When that happens
>  > the "host_inflight" accounting will get messed up.
>
>  Simply notice, when we're about to decrement in_flight, that the data
>  reference is one.  You can take appropriate actions if so.
>

Jerry

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-12 22:58                     ` Jerry Chu
@ 2008-05-12 23:01                       ` David Miller
  0 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2008-05-12 23:01 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Mon, 12 May 2008 15:58:58 -0700

> I must be missing something. All the clones share the same
> "skb_shared_info", how does one only set in_flight in one clone but
> not the other?

These are fast clones, we "know" which is the parent and which is
the child.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07  3:57   ` Jerry Chu
  2008-05-07  4:27     ` David Miller
@ 2008-05-07  4:28     ` David Miller
  2008-05-07 18:54       ` Jerry Chu
  1 sibling, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-07  4:28 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Tue, 6 May 2008 20:57:46 -0700

> It may be getting a copy (e.g., when GSO is on?) hence losing all
> its connection to the original tp and any chance to have the pkt
> properly accounted for as host_infligh by TCP. The skb may also be
> cloned more than once (e.g., due to tcpdump)...

It only gets a copy if the SKB is already cloned and that clone is
alive somewhere (f.e. stuck in the device, a rare occurance by
the time we retransmit).

It is a clone %99.999999 of the time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07  4:28     ` David Miller
@ 2008-05-07 18:54       ` Jerry Chu
  2008-05-07 21:20         ` David Miller
  0 siblings, 1 reply; 38+ messages in thread
From: Jerry Chu @ 2008-05-07 18:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, May 6, 2008 at 9:28 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>
> Date: Tue, 6 May 2008 20:57:46 -0700
>
>
> > It may be getting a copy (e.g., when GSO is on?) hence losing all
>  > its connection to the original tp and any chance to have the pkt
>  > properly accounted for as host_infligh by TCP. The skb may also be
>  > cloned more than once (e.g., due to tcpdump)...
>
>  It only gets a copy if the SKB is already cloned and that clone is
>  alive somewhere (f.e. stuck in the device, a rare occurance by
>  the time we retransmit).

This is one of many things about skb that I still don't completely understand.
Why in tcp_transmit_skb() we'll have to pskb_copy() if skb_cloned()?
Can't we clone a skb mulitple times? Is it due to some special optimization
from skb->fclone stuff... that imposes this restriction?

>
>  It is a clone %99.999999 of the time.

When I turned on GSO but not TSO, I believe it's a copy %99.999999 of
the time. (I must confess I don't understand GSO code yet. I temporarily
ran out of stream after studying bunch of other code.)

Jerry

>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07 18:54       ` Jerry Chu
@ 2008-05-07 21:20         ` David Miller
  2008-05-08  0:16           ` Jerry Chu
  0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2008-05-07 21:20 UTC (permalink / raw)
  To: hkchu; +Cc: netdev

From: "Jerry Chu" <hkchu@google.com>
Date: Wed, 7 May 2008 11:54:01 -0700

> This is one of many things about skb that I still don't completely understand.
> Why in tcp_transmit_skb() we'll have to pskb_copy() if skb_cloned()?

The other clone holder owns the packet header area.  All packets on
the retransmit queue of TCP are headerless.  The call sites down
into the device add the headers.

Therefore we cannot have two paths modifying the headers at the same
time.

> Can't we clone a skb mulitple times? Is it due to some special optimization
> from skb->fclone stuff... that imposes this restriction?

No, it has nothing to do with fclone.  It has to do with what instance
owns the packet header area in front of the user's TCP data.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Socket buffer sizes with autotuning
  2008-05-07 21:20         ` David Miller
@ 2008-05-08  0:16           ` Jerry Chu
  0 siblings, 0 replies; 38+ messages in thread
From: Jerry Chu @ 2008-05-08  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, May 7, 2008 at 2:20 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Wed, 7 May 2008 11:54:01 -0700
>
>
>  > This is one of many things about skb that I still don't completely understand.
>  > Why in tcp_transmit_skb() we'll have to pskb_copy() if skb_cloned()?
>
>  The other clone holder owns the packet header area.  All packets on
>  the retransmit queue of TCP are headerless.  The call sites down
>  into the device add the headers.
>
>  Therefore we cannot have two paths modifying the headers at the same
>  time.

Oh I see. I think I confused pskb_copy() with skb_copy() and kept thinking
why do we need to copy the whole thing...

>
>
>  > Can't we clone a skb mulitple times? Is it due to some special optimization
>  > from skb->fclone stuff... that imposes this restriction?
>
>  No, it has nothing to do with fclone.  It has to do with what instance
>  owns the packet header area in front of the user's TCP data.

Thanks for explaining.

Jerry

>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2008-05-17  1:47 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-23 23:29 Socket buffer sizes with autotuning Jerry Chu
2008-04-24 16:32 ` John Heffner
2008-04-25  0:49   ` Jerry Chu
2008-04-25  6:46     ` David Miller
2008-04-25 21:29       ` Jerry Chu
2008-04-25 21:35         ` David Miller
2008-04-28 18:30       ` Jerry Chu
2008-04-28 19:21         ` John Heffner
2008-04-28 20:44           ` Jerry Chu
2008-04-28 23:22             ` [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled John Heffner
2008-04-28 23:22               ` [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO John Heffner
     [not found]           ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
2008-04-28 23:28             ` Socket buffer sizes with autotuning John Heffner
2008-04-28 23:35               ` David Miller
2008-04-29  2:20               ` Jerry Chu
2008-04-25  7:05 ` David Miller
2008-05-07  3:57   ` Jerry Chu
2008-05-07  4:27     ` David Miller
2008-05-07 18:36       ` Jerry Chu
2008-05-07 21:18         ` David Miller
2008-05-08  1:37           ` Jerry Chu
2008-05-08  1:43             ` David Miller
2008-05-08  3:33               ` Jerry Chu
2008-05-12 22:22                 ` Jerry Chu
2008-05-12 22:29                   ` David Miller
2008-05-12 22:31                     ` David Miller
2008-05-13  3:56                       ` Jerry Chu
2008-05-13  3:58                         ` David Miller
2008-05-13  4:00                           ` Jerry Chu
2008-05-13  4:02                             ` David Miller
2008-05-17  1:13                               ` Jerry Chu
2008-05-17  1:29                                 ` David Miller
2008-05-17  1:47                                   ` Jerry Chu
2008-05-12 22:58                     ` Jerry Chu
2008-05-12 23:01                       ` David Miller
2008-05-07  4:28     ` David Miller
2008-05-07 18:54       ` Jerry Chu
2008-05-07 21:20         ` David Miller
2008-05-08  0:16           ` Jerry Chu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).