netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jerry Chu" <hkchu@google.com>
To: "David Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Subject: Re: Socket buffer sizes with autotuning
Date: Mon, 12 May 2008 15:22:55 -0700	[thread overview]
Message-ID: <d1c2719f0805121522g4767f585h4b33790318f44264@mail.gmail.com> (raw)
In-Reply-To: <d1c2719f0805072033u38dfffaes71aed3efbc8035c0@mail.gmail.com>

I did a quick prototype based on your idea of adding an "in_flight"
field to skb_shared_info to track how many in-flight clones in the
host. I tested
it quickly and it doesn't work. After some thought it was obvious why it
won't work. It's because what the TCP stack needs is to track how
many in-flight pkts are in the host, but your proposed patch increments
"in_flight" once on the 1st __skb_clone() to be sent to the driver, but
decrements "in_flight" TWICE, one for each of the clones to be freed.
I did a quick hack to make it work for my limited test case but I haven't
figured out an acceptable (non-hack) solution.

Continued testing, I discovered the problem I described below where
"in_flight" may point to a tp that has already been freed can not be
addressed by zapping skb_shinfo(skb)->in_flight in sock_wfree(). The
reason is that pkts may be acked and freed by TCP before driver freeing
up its clone copy (e.g., due to driver lazy reclaim...) When that happens
the "host_inflight" accounting will get messed up.

Jerry

On Wed, May 7, 2008 at 8:33 PM, Jerry Chu <hkchu@google.com> wrote:
> There seems to be quite a bit of complexity plus one additional pointer
>  field per skb_shared_info to make skb better track when a pkt leaves
>  the host. Now I wonder if it's really a better solution than my original,
>  simply checking dataref==1 approach which, although not bullet proof,
>  may be "good enough" for all practical purposes?
>
>  Jerry
>
>
>
>  On Wed, May 7, 2008 at 6:43 PM, David Miller <davem@davemloft.net> wrote:
>  > From: "Jerry Chu" <hkchu@google.com>
>  >  Date: Wed, 7 May 2008 18:37:01 -0700
>  >
>  >
>  >  > Ok, will give it a try. First i'll fix your patch to
>  >  > atomic_add()/atomic_sub() by
>  >  > skb_shinfo(skb)->gso_segs rather than always 1, in order for GSO/TSO to work.
>  >
>  >  That might not work.  gso_segs can change over time as retransmit
>  >  packets get split up due to SACKs etc.  it needs to be audited,
>  >  at the very least.
>  >
>  >
>  >  > One problem came up to my mind - it seems possible for __kfree_skb() to
>  >  > access skb_shinfo(skb)->in_flight whose tp has been freed up since only the
>  >  > original skb's on TCP's rexmit list have the owner set and socket
>  >  > held. One solution
>  >  > is for TCP to zap skb_shinfo(skb)->in_flight field when it's ready to
>  >  > free up skb.
>  >  > I can hack sock_wfree() to do this, but I don't know how to do it right.
>  >
>  >  There will be references to the socket, so this should be ok.
>  >
>  >  If it isn't we can adjust the count and zap the pointer in
>  >  skb_orphan().
>  >
>

  reply	other threads:[~2008-05-12 22:23 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-23 23:29 Socket buffer sizes with autotuning Jerry Chu
2008-04-24 16:32 ` John Heffner
2008-04-25  0:49   ` Jerry Chu
2008-04-25  6:46     ` David Miller
2008-04-25 21:29       ` Jerry Chu
2008-04-25 21:35         ` David Miller
2008-04-28 18:30       ` Jerry Chu
2008-04-28 19:21         ` John Heffner
2008-04-28 20:44           ` Jerry Chu
2008-04-28 23:22             ` [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled John Heffner
2008-04-28 23:22               ` [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO John Heffner
     [not found]           ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
2008-04-28 23:28             ` Socket buffer sizes with autotuning John Heffner
2008-04-28 23:35               ` David Miller
2008-04-29  2:20               ` Jerry Chu
2008-04-25  7:05 ` David Miller
2008-05-07  3:57   ` Jerry Chu
2008-05-07  4:27     ` David Miller
2008-05-07 18:36       ` Jerry Chu
2008-05-07 21:18         ` David Miller
2008-05-08  1:37           ` Jerry Chu
2008-05-08  1:43             ` David Miller
2008-05-08  3:33               ` Jerry Chu
2008-05-12 22:22                 ` Jerry Chu [this message]
2008-05-12 22:29                   ` David Miller
2008-05-12 22:31                     ` David Miller
2008-05-13  3:56                       ` Jerry Chu
2008-05-13  3:58                         ` David Miller
2008-05-13  4:00                           ` Jerry Chu
2008-05-13  4:02                             ` David Miller
2008-05-17  1:13                               ` Jerry Chu
2008-05-17  1:29                                 ` David Miller
2008-05-17  1:47                                   ` Jerry Chu
2008-05-12 22:58                     ` Jerry Chu
2008-05-12 23:01                       ` David Miller
2008-05-07  4:28     ` David Miller
2008-05-07 18:54       ` Jerry Chu
2008-05-07 21:20         ` David Miller
2008-05-08  0:16           ` Jerry Chu
     [not found] <d1c2719f0804241829s1bc3f41ejf7ebbff73ed96578@mail.gmail.com>
2008-04-25  7:06 ` Andi Kleen
2008-04-25  7:28   ` David Miller
2008-04-25  7:48     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2008-04-23  0:38 Rick Jones
2008-04-23  2:17 ` John Heffner
2008-04-23  3:59   ` David Miller
2008-04-23 16:32     ` Rick Jones
2008-04-23 16:58       ` John Heffner
2008-04-23 17:24         ` Rick Jones
2008-04-23 17:41           ` John Heffner
2008-04-23 17:46             ` Rick Jones
2008-04-24 22:21     ` Andi Kleen
2008-04-24 22:39       ` John Heffner
2008-04-25  1:28       ` David Miller
     [not found]       ` <65634d660804242234w66455bedve44801a98e3de9d9@mail.gmail.com>
2008-04-25  6:36         ` David Miller
2008-04-25  7:42           ` Tom Herbert
2008-04-25  7:46             ` David Miller
2008-04-28 17:51               ` Tom Herbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1c2719f0805121522g4767f585h4b33790318f44264@mail.gmail.com \
    --to=hkchu@google.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).