From: "Jerry Chu" <hkchu@google.com>
To: "David Miller" <davem@davemloft.net>
Cc: johnwheffner@gmail.com, netdev@vger.kernel.org, rick.jones2@hp.com
Subject: Re: Socket buffer sizes with autotuning
Date: Fri, 25 Apr 2008 14:29:25 -0700 [thread overview]
Message-ID: <d1c2719f0804251429l26118ef0j8a386103ee41f0ea@mail.gmail.com> (raw)
In-Reply-To: <20080424.234628.170849475.davem@davemloft.net>
On Thu, Apr 24, 2008 at 11:46 PM, David Miller <davem@davemloft.net> wrote:
> From: "Jerry Chu" <hkchu@google.com>
> Date: Thu, 24 Apr 2008 17:49:33 -0700
>
>
> > One question: I currently use skb_shinfo(skb)->dataref == 1 for skb's on the
> > sk_write_queue list as the heuristic to determine if a packet has hit the wire.
>
> This doesn't work for the reasons that you mention in detail next :-)
>
>
> > Is there a better solution than checking against dataref to determine if a pkt
> > has hit the wire?
>
> Unfortunately, no there isn't.
>
> Part of the issue is that the driver is only working with a clone, but
> if a packet gets resent before the driver gives up it's reference,
> we'll make a completely new copy.
I think we can ignore this case if it happens rarely.
>
> But even assuming we could say that the driver gets a clone all the
> time, the "sent" state would need to be in the shared data area.
Ok.
>
>
> > Also the code to determine when/how much to defer in the TSO path seems
> > too aggressive. It's currently based on a percentage
> > (sysctl_tcp_tso_win_divisor)
> > of min(snd_wnd, snd_cwnd). Would it be too much if the value is large? E.g.,
> > when I disable sysctl_tcp_tso_win_divisor, the cwnd of my simple netperf run
> > drops exactly 1/3 from 1037 (segments) to 695. It seems to me the TSO
> > defer factor should be based on an absolute count, e.g., 64KB.
>
> This is one of the most difficult knobs to get right in the TSO code.
>
> If the percentage is too low, you'll notice that cpu utilization
> increases because you aren't accumulating enough data to send down the
> largest possible TSO frames.
Well, there is a fine line to walk before CPU efficiency and traffic
burstiness. The TSO defer code causes a few hundred KB of bursts that
quickly blow away our small switch buffers. The matter may get even
worse for 10GE.
>
> But yes you are absolutely right that we should have a hard limit
> of 64K here, since we can't build a larger TSO frame anyways.
>
> In fact I thought we had something like that here already :-/
>
> Wait, in fact we do, it's just hidden behind a variable now:
>
> /* If a full-sized TSO skb can be sent, do it. */
> if (limit >= sk->sk_gso_max_size)
> goto send_now;
Oh, just realized I've been working on a very "old" (2.6.18 :-)
version of kernel.
Will get the latest 2.6.25 and take a look. I can't find "skb_release_all()"
function you pointed in a later mail either. Guess the Linux kernel
code is rewritten every few month :-(.
Jerry
>
> :-)
>
next prev parent reply other threads:[~2008-04-25 21:29 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-23 23:29 Socket buffer sizes with autotuning Jerry Chu
2008-04-24 16:32 ` John Heffner
2008-04-25 0:49 ` Jerry Chu
2008-04-25 6:46 ` David Miller
2008-04-25 21:29 ` Jerry Chu [this message]
2008-04-25 21:35 ` David Miller
2008-04-28 18:30 ` Jerry Chu
2008-04-28 19:21 ` John Heffner
2008-04-28 20:44 ` Jerry Chu
2008-04-28 23:22 ` [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled John Heffner
2008-04-28 23:22 ` [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO John Heffner
[not found] ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
2008-04-28 23:28 ` Socket buffer sizes with autotuning John Heffner
2008-04-28 23:35 ` David Miller
2008-04-29 2:20 ` Jerry Chu
2008-04-25 7:05 ` David Miller
2008-05-07 3:57 ` Jerry Chu
2008-05-07 4:27 ` David Miller
2008-05-07 18:36 ` Jerry Chu
2008-05-07 21:18 ` David Miller
2008-05-08 1:37 ` Jerry Chu
2008-05-08 1:43 ` David Miller
2008-05-08 3:33 ` Jerry Chu
2008-05-12 22:22 ` Jerry Chu
2008-05-12 22:29 ` David Miller
2008-05-12 22:31 ` David Miller
2008-05-13 3:56 ` Jerry Chu
2008-05-13 3:58 ` David Miller
2008-05-13 4:00 ` Jerry Chu
2008-05-13 4:02 ` David Miller
2008-05-17 1:13 ` Jerry Chu
2008-05-17 1:29 ` David Miller
2008-05-17 1:47 ` Jerry Chu
2008-05-12 22:58 ` Jerry Chu
2008-05-12 23:01 ` David Miller
2008-05-07 4:28 ` David Miller
2008-05-07 18:54 ` Jerry Chu
2008-05-07 21:20 ` David Miller
2008-05-08 0:16 ` Jerry Chu
[not found] <d1c2719f0804241829s1bc3f41ejf7ebbff73ed96578@mail.gmail.com>
2008-04-25 7:06 ` Andi Kleen
2008-04-25 7:28 ` David Miller
2008-04-25 7:48 ` Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2008-04-23 0:38 Rick Jones
2008-04-23 2:17 ` John Heffner
2008-04-23 3:59 ` David Miller
2008-04-23 16:32 ` Rick Jones
2008-04-23 16:58 ` John Heffner
2008-04-23 17:24 ` Rick Jones
2008-04-23 17:41 ` John Heffner
2008-04-23 17:46 ` Rick Jones
2008-04-24 22:21 ` Andi Kleen
2008-04-24 22:39 ` John Heffner
2008-04-25 1:28 ` David Miller
[not found] ` <65634d660804242234w66455bedve44801a98e3de9d9@mail.gmail.com>
2008-04-25 6:36 ` David Miller
2008-04-25 7:42 ` Tom Herbert
2008-04-25 7:46 ` David Miller
2008-04-28 17:51 ` Tom Herbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d1c2719f0804251429l26118ef0j8a386103ee41f0ea@mail.gmail.com \
--to=hkchu@google.com \
--cc=davem@davemloft.net \
--cc=johnwheffner@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=rick.jones2@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).