From: "Jerry Chu" <hkchu@google.com>
To: "David Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Subject: Re: Socket buffer sizes with autotuning
Date: Tue, 6 May 2008 20:57:46 -0700 [thread overview]
Message-ID: <d1c2719f0805062057q62decefcg7c3de96271c4b26d@mail.gmail.com> (raw)
In-Reply-To: <20080425.000547.152086801.davem@davemloft.net>
I fail to see how adding shinfo->in_flight to count how many outstanding clones
are there can help accounting for how many "host_inflight" pkts. Part
of the problems,
as you've mentioned before, is that the driver may not always get a
clone. It may
be getting a copy (e.g., when GSO is on?) hence losing all its connection to the
original tp and any chance to have the pkt properly accounted for as
host_infligh
by TCP. The skb may also be cloned more than once (e.g., due to tcpdump)...
That said, I also fail to come up with a more bullet-proof solution
after studying
much of the TSO/GSO code without requring driver and more skb changes. So
I'm currently leaning toward my original fix of checking
if (1 == (atomic_read(&skb_shinfo(skb1)->dataref) & SKB_DATAREF_MASK))
My current prototype scans either sk_send_head or sk_write_queue backwards
until the above condition is true. I'm thinking about adding and
maintaining a new "tp->host_queue_head" field to avoid most of the
scanning. Also it seems much
less costly to add a new field to tcp_sock than to
skb/skb_shared_info. If you have
a better idea please let me know.
Jerry
On Fri, Apr 25, 2008 at 12:05 AM, David Miller <davem@davemloft.net> wrote:
>
> From: "Jerry Chu" <hkchu@google.com>
> Date: Wed, 23 Apr 2008 16:29:58 -0700
>
>
> > I've been seeing the same problem here and am trying to fix it.
> > My fix is to not count those pkts still in the host queue as "prior_in_flight"
> > when feeding the latter to tcp_cong_avoid(). This should cause
> > tcp_is_cwnd_limited() test to fail when the previous in_flight build-up
> > is all due to the large host queue, and stop the cwnd to grow beyond
> > what's really necessary.
>
> Does something like the following suit your needs?
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 299ec4b..6cdf4be 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -140,6 +140,7 @@ struct skb_frag_struct {
> */
> struct skb_shared_info {
> atomic_t dataref;
> + atomic_t *in_flight;
> unsigned short nr_frags;
> unsigned short gso_size;
> /* Warning: this field is not always filled in (UFO)! */
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index d96d9b1..62bb58d 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -271,6 +271,8 @@ struct tcp_sock {
> u32 rcv_tstamp; /* timestamp of last received ACK (for keepalives) */
> u32 lsndtime; /* timestamp of last sent data packet (for restart window) */
>
> + atomic_t host_inflight; /* packets queued in transmit path */
> +
> /* Data for direct copy to user */
> struct {
> struct sk_buff_head prequeue;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 4fe605f..a6880c2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -212,6 +212,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> /* make sure we initialize shinfo sequentially */
> shinfo = skb_shinfo(skb);
> atomic_set(&shinfo->dataref, 1);
> + shinfo->in_flight = NULL;
> shinfo->nr_frags = 0;
> shinfo->gso_size = 0;
> shinfo->gso_segs = 0;
> @@ -403,6 +404,8 @@ static void skb_release_all(struct sk_buff *skb)
> void __kfree_skb(struct sk_buff *skb)
> {
> skb_release_all(skb);
> + if (skb_shinfo(skb)->in_flight)
> + atomic_dec(skb_shinfo(skb)->in_flight);
> kfree_skbmem(skb);
> }
>
> @@ -486,6 +489,8 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
> atomic_set(&n->users, 1);
>
> atomic_inc(&(skb_shinfo(skb)->dataref));
> + if (skb_shinfo(skb)->in_flight)
> + atomic_inc(skb_shinfo(skb)->in_flight);
> skb->cloned = 1;
>
> return n;
> @@ -743,6 +748,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> skb->hdr_len = 0;
> skb->nohdr = 0;
> atomic_set(&skb_shinfo(skb)->dataref, 1);
> + skb_shinfo(skb)->in_flight = NULL;
> return 0;
>
> nodata:
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index f886531..28a71fd 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -479,6 +479,7 @@ static inline void skb_entail(struct sock *sk, struct sk_buff *skb)
> struct tcp_sock *tp = tcp_sk(sk);
> struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
>
> + skb_shinfo(skb)->in_flight = &tp->host_inflight;
> skb->csum = 0;
> tcb->seq = tcb->end_seq = tp->write_seq;
> tcb->flags = TCPCB_FLAG_ACK;
>
>
next prev parent reply other threads:[~2008-05-07 3:57 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-23 23:29 Socket buffer sizes with autotuning Jerry Chu
2008-04-24 16:32 ` John Heffner
2008-04-25 0:49 ` Jerry Chu
2008-04-25 6:46 ` David Miller
2008-04-25 21:29 ` Jerry Chu
2008-04-25 21:35 ` David Miller
2008-04-28 18:30 ` Jerry Chu
2008-04-28 19:21 ` John Heffner
2008-04-28 20:44 ` Jerry Chu
2008-04-28 23:22 ` [PATCH 1/2] [NET]: Allow send-limited cwnd to grow up to max_burst when gso disabled John Heffner
2008-04-28 23:22 ` [PATCH 2/2] [NET]: Limit cwnd growth when deferring for GSO John Heffner
[not found] ` <d1c2719f0804281338j3984cf2bga31def0c2c1192a1@mail.gmail.com>
2008-04-28 23:28 ` Socket buffer sizes with autotuning John Heffner
2008-04-28 23:35 ` David Miller
2008-04-29 2:20 ` Jerry Chu
2008-04-25 7:05 ` David Miller
2008-05-07 3:57 ` Jerry Chu [this message]
2008-05-07 4:27 ` David Miller
2008-05-07 18:36 ` Jerry Chu
2008-05-07 21:18 ` David Miller
2008-05-08 1:37 ` Jerry Chu
2008-05-08 1:43 ` David Miller
2008-05-08 3:33 ` Jerry Chu
2008-05-12 22:22 ` Jerry Chu
2008-05-12 22:29 ` David Miller
2008-05-12 22:31 ` David Miller
2008-05-13 3:56 ` Jerry Chu
2008-05-13 3:58 ` David Miller
2008-05-13 4:00 ` Jerry Chu
2008-05-13 4:02 ` David Miller
2008-05-17 1:13 ` Jerry Chu
2008-05-17 1:29 ` David Miller
2008-05-17 1:47 ` Jerry Chu
2008-05-12 22:58 ` Jerry Chu
2008-05-12 23:01 ` David Miller
2008-05-07 4:28 ` David Miller
2008-05-07 18:54 ` Jerry Chu
2008-05-07 21:20 ` David Miller
2008-05-08 0:16 ` Jerry Chu
[not found] <d1c2719f0804241829s1bc3f41ejf7ebbff73ed96578@mail.gmail.com>
2008-04-25 7:06 ` Andi Kleen
2008-04-25 7:28 ` David Miller
2008-04-25 7:48 ` Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2008-04-23 0:38 Rick Jones
2008-04-23 2:17 ` John Heffner
2008-04-23 3:59 ` David Miller
2008-04-23 16:32 ` Rick Jones
2008-04-23 16:58 ` John Heffner
2008-04-23 17:24 ` Rick Jones
2008-04-23 17:41 ` John Heffner
2008-04-23 17:46 ` Rick Jones
2008-04-24 22:21 ` Andi Kleen
2008-04-24 22:39 ` John Heffner
2008-04-25 1:28 ` David Miller
[not found] ` <65634d660804242234w66455bedve44801a98e3de9d9@mail.gmail.com>
2008-04-25 6:36 ` David Miller
2008-04-25 7:42 ` Tom Herbert
2008-04-25 7:46 ` David Miller
2008-04-28 17:51 ` Tom Herbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d1c2719f0805062057q62decefcg7c3de96271c4b26d@mail.gmail.com \
--to=hkchu@google.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).