netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jerry Chu" <hkchu@google.com>
To: "David Miller" <davem@davemloft.net>
Cc: johnwheffner@gmail.com, netdev@vger.kernel.org, rick.jones2@hp.com
Subject: Re: Socket buffer sizes with autotuning
Date: Mon, 28 Apr 2008 11:30:51 -0700	[thread overview]
Message-ID: <d1c2719f0804281130n7b0aaab8t54b2a585cff53a99@mail.gmail.com> (raw)
In-Reply-To: <20080424.234628.170849475.davem@davemloft.net>

On Thu, Apr 24, 2008 at 11:46 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
>  Date: Thu, 24 Apr 2008 17:49:33 -0700
>
>
>  > One question: I currently use skb_shinfo(skb)->dataref == 1 for skb's on the
>  > sk_write_queue list as the heuristic to determine if a packet has hit the wire.
>
>  This doesn't work for the reasons that you mention in detail next :-)
>
>
>  > Is there a better solution than checking against dataref to determine if a pkt
>  > has hit the wire?
>
>  Unfortunately, no there isn't.
>
>  Part of the issue is that the driver is only working with a clone, but
>  if a packet gets resent before the driver gives up it's reference,
>  we'll make a completely new copy.
>
>  But even assuming we could say that the driver gets a clone all the
>  time, the "sent" state would need to be in the shared data area.
>
>
>  > Also the code to determine when/how much to defer in the TSO path seems
>  > too aggressive. It's currently based on a percentage
>  > (sysctl_tcp_tso_win_divisor)
>  > of min(snd_wnd, snd_cwnd). Would it be too much if the value is large? E.g.,
>  > when I disable sysctl_tcp_tso_win_divisor, the cwnd of my simple netperf run
>  > drops exactly 1/3 from 1037 (segments) to 695. It seems to me the TSO
>  > defer factor should be based on an absolute count, e.g., 64KB.
>
>  This is one of the most difficult knobs to get right in the TSO code.
>
>  If the percentage is too low, you'll notice that cpu utilization
>  increases because you aren't accumulating enough data to send down the
>  largest possible TSO frames.
>
>  But yes you are absolutely right that we should have a hard limit
>  of 64K here, since we can't build a larger TSO frame anyways.
>
>  In fact I thought we had something like that here already :-/
>
>  Wait, in fact we do, it's just hidden behind a variable now:
>
>         /* If a full-sized TSO skb can be sent, do it. */
>         if (limit >= sk->sk_gso_max_size)
>                 goto send_now;
>
>  :-)

Correct, but its counterpart doesn't exist in tcp_is_cwnd_limited(). So
cwnd will continue to grow when left < cwnd/sysctl_tcp_tso_win_divisor,
which can be very large when cwnd is large.

If I change tcp_tso_win_divisor to 0, cwnd max out at 695 rather than 1037,
exactly off by 1/3. I tried to add the same check to tcp_is_cwnd_limited():

diff -c /tmp/tcp.h.old tcp.h
*** /tmp/tcp.h.old      Mon Apr 28 11:00:44 2008
--- tcp.h       Mon Apr 28 10:54:10 2008
***************
*** 828,833 ****
--- 828,835 ----
                return 0;

        left = tp->snd_cwnd - in_flight;
+       if (left >= 65536)
+               return 0;
        if (sysctl_tcp_tso_win_divisor)
                return left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd;
        else

>

But it doesn't seem to help (cwnd still grows to 1037).

Jerry

  parent reply	other threads:[~2008-04-28 18:31 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-23 23:29 Socket buffer sizes with autotuning Jerry Chu
2008-04-24 16:32 ` John Heffner
2008-04-25  0:49   ` Jerry Chu
2008-04-25  6:46     ` David Miller
2008-04-25 21:29       ` Jerry Chu
2008-04-25 21:35         ` David Miller
2008-04-28 18:30       ` Jerry Chu [this message]
2008-04-28 19:21         ` John Heffner
2008-04-28 20:44           ` Jerry Chu
2008-04-28 23:22             ` [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled John Heffner
2008-04-28 23:22               ` [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO John Heffner
     [not found]           ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
2008-04-28 23:28             ` Socket buffer sizes with autotuning John Heffner
2008-04-28 23:35               ` David Miller
2008-04-29  2:20               ` Jerry Chu
2008-04-25  7:05 ` David Miller
2008-05-07  3:57   ` Jerry Chu
2008-05-07  4:27     ` David Miller
2008-05-07 18:36       ` Jerry Chu
2008-05-07 21:18         ` David Miller
2008-05-08  1:37           ` Jerry Chu
2008-05-08  1:43             ` David Miller
2008-05-08  3:33               ` Jerry Chu
2008-05-12 22:22                 ` Jerry Chu
2008-05-12 22:29                   ` David Miller
2008-05-12 22:31                     ` David Miller
2008-05-13  3:56                       ` Jerry Chu
2008-05-13  3:58                         ` David Miller
2008-05-13  4:00                           ` Jerry Chu
2008-05-13  4:02                             ` David Miller
2008-05-17  1:13                               ` Jerry Chu
2008-05-17  1:29                                 ` David Miller
2008-05-17  1:47                                   ` Jerry Chu
2008-05-12 22:58                     ` Jerry Chu
2008-05-12 23:01                       ` David Miller
2008-05-07  4:28     ` David Miller
2008-05-07 18:54       ` Jerry Chu
2008-05-07 21:20         ` David Miller
2008-05-08  0:16           ` Jerry Chu
     [not found] <d1c2719f0804241829s1bc3f41ejf7ebbff73ed96578@mail.gmail.com>
2008-04-25  7:06 ` Andi Kleen
2008-04-25  7:28   ` David Miller
2008-04-25  7:48     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2008-04-23  0:38 Rick Jones
2008-04-23  2:17 ` John Heffner
2008-04-23  3:59   ` David Miller
2008-04-23 16:32     ` Rick Jones
2008-04-23 16:58       ` John Heffner
2008-04-23 17:24         ` Rick Jones
2008-04-23 17:41           ` John Heffner
2008-04-23 17:46             ` Rick Jones
2008-04-24 22:21     ` Andi Kleen
2008-04-24 22:39       ` John Heffner
2008-04-25  1:28       ` David Miller
     [not found]       ` <65634d660804242234w66455bedve44801a98e3de9d9@mail.gmail.com>
2008-04-25  6:36         ` David Miller
2008-04-25  7:42           ` Tom Herbert
2008-04-25  7:46             ` David Miller
2008-04-28 17:51               ` Tom Herbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1c2719f0804281130n7b0aaab8t54b2a585cff53a99@mail.gmail.com \
    --to=hkchu@google.com \
    --cc=davem@davemloft.net \
    --cc=johnwheffner@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).