* [PATCH net-next] tcp: introduce tcp_try_coalesce
@ 2012-04-23 17:11 Eric Dumazet
2012-04-24 1:13 ` Neal Cardwell
2012-04-24 2:39 ` Neal Cardwell
0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2012-04-23 17:11 UTC (permalink / raw)
To: David Miller
Cc: netdev, Neal Cardwell, Tom Herbert, Maciej Żenczykowski,
Ilpo Järvinen
From: Eric Dumazet <edumazet@google.com>
commit c8628155ece3 (tcp: reduce out_of_order memory use) took care of
coalescing tcp segments provided by legacy devices (linear skbs)
We extend this idea to fragged skbs, as their truesize can be heavy.
ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.
Use this coalescing strategy for receive queue too.
This contributes to reduce number of tcp collapses, at minimal cost, and
reduces memory overhead and packets drops.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
net/ipv4/tcp_input.c | 79 ++++++++++++++++++++++++++++++++---------
1 file changed, 62 insertions(+), 17 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 37e1c5c..bd7aef5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4449,6 +4449,58 @@ static inline int tcp_try_rmem_schedule(struct sock *sk, unsigned int size)
return 0;
}
+/**
+ * tcp_try_coalesce - try to merge skb to prior one
+ * @sk: socket
+ * @to: prior buffer
+ * @from: buffer to add in queue
+ *
+ * Before queueing skb @from after @to, try to merge them
+ * to reduce overall memory use and queue lengths, if cost is small.
+ * Packets in ofo or receive queues can stay a long time.
+ * Better try to coalesce them right now to avoid future collapses.
+ * Returns > 0 value if caller should free @from instead of queueing it
+ */
+static int tcp_try_coalesce(struct sock *sk,
+ struct sk_buff *to,
+ struct sk_buff *from)
+{
+ int len = from->len;
+
+ if (tcp_hdr(from)->fin)
+ return 0;
+ if (len <= skb_tailroom(to)) {
+ BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
+merge:
+ NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
+ TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
+ TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
+ return 1;
+ }
+ if (skb_headlen(from) == 0 &&
+ !skb_has_frag_list(to) &&
+ !skb_has_frag_list(from) &&
+ (skb_shinfo(to)->nr_frags +
+ skb_shinfo(from)->nr_frags <= MAX_SKB_FRAGS)) {
+ int delta = from->truesize - ksize(from->head) -
+ SKB_DATA_ALIGN(sizeof(struct sk_buff));
+
+ WARN_ON_ONCE(delta < len);
+ memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
+ skb_shinfo(from)->frags,
+ skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
+ skb_shinfo(to)->nr_frags += skb_shinfo(from)->nr_frags;
+ skb_shinfo(from)->nr_frags = 0;
+ to->truesize += delta;
+ atomic_add(delta, &sk->sk_rmem_alloc);
+ sk_mem_charge(sk, delta);
+ to->len += len;
+ to->data_len += len;
+ goto merge;
+ }
+ return 0;
+}
+
static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
@@ -4487,23 +4539,11 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
end_seq = TCP_SKB_CB(skb)->end_seq;
if (seq == TCP_SKB_CB(skb1)->end_seq) {
- /* Packets in ofo can stay in queue a long time.
- * Better try to coalesce them right now
- * to avoid future tcp_collapse_ofo_queue(),
- * probably the most expensive function in tcp stack.
- */
- if (skb->len <= skb_tailroom(skb1) && !tcp_hdr(skb)->fin) {
- NET_INC_STATS_BH(sock_net(sk),
- LINUX_MIB_TCPRCVCOALESCE);
- BUG_ON(skb_copy_bits(skb, 0,
- skb_put(skb1, skb->len),
- skb->len));
- TCP_SKB_CB(skb1)->end_seq = end_seq;
- TCP_SKB_CB(skb1)->ack_seq = TCP_SKB_CB(skb)->ack_seq;
+ if (tcp_try_coalesce(sk, skb1, skb) <= 0) {
+ __skb_queue_after(&tp->out_of_order_queue, skb1, skb);
+ } else {
__kfree_skb(skb);
skb = NULL;
- } else {
- __skb_queue_after(&tp->out_of_order_queue, skb1, skb);
}
if (!tp->rx_opt.num_sacks ||
@@ -4624,13 +4664,18 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
}
if (eaten <= 0) {
+ struct sk_buff *tail;
queue_and_out:
if (eaten < 0 &&
tcp_try_rmem_schedule(sk, skb->truesize))
goto drop;
- skb_set_owner_r(skb, sk);
- __skb_queue_tail(&sk->sk_receive_queue, skb);
+ tail = skb_peek_tail(&sk->sk_receive_queue);
+ eaten = tail ? tcp_try_coalesce(sk, tail, skb) : -1;
+ if (eaten <= 0) {
+ skb_set_owner_r(skb, sk);
+ __skb_queue_tail(&sk->sk_receive_queue, skb);
+ }
}
tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
if (skb->len)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] tcp: introduce tcp_try_coalesce
2012-04-23 17:11 [PATCH net-next] tcp: introduce tcp_try_coalesce Eric Dumazet
@ 2012-04-24 1:13 ` Neal Cardwell
2012-04-24 2:13 ` Eric Dumazet
2012-04-24 2:39 ` Neal Cardwell
1 sibling, 1 reply; 9+ messages in thread
From: Neal Cardwell @ 2012-04-24 1:13 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, netdev, Tom Herbert, Maciej Żenczykowski,
Ilpo Järvinen
On Mon, Apr 23, 2012 at 1:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> commit c8628155ece3 (tcp: reduce out_of_order memory use) took care of
> coalescing tcp segments provided by legacy devices (linear skbs)
>
> We extend this idea to fragged skbs, as their truesize can be heavy.
>
> ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.
>
> Use this coalescing strategy for receive queue too.
>
> This contributes to reduce number of tcp collapses, at minimal cost, and
> reduces memory overhead and packets drops.
The mechanics look solid, but I'm a little concerned about the
potential added overhead for the new case where tcp_try_coalesce()
does a skb_copy_bits() for in-order data that it is coalescing at the
end of the sk_receive_queue. Do you have any performance numbers for
this case to help suggest whether this added copy is a concern?
neal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] tcp: introduce tcp_try_coalesce
2012-04-24 1:13 ` Neal Cardwell
@ 2012-04-24 2:13 ` Eric Dumazet
2012-04-24 2:29 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2012-04-24 2:13 UTC (permalink / raw)
To: Neal Cardwell
Cc: David Miller, netdev, Tom Herbert, Maciej Żenczykowski,
Ilpo Järvinen
On Mon, 2012-04-23 at 21:13 -0400, Neal Cardwell wrote:
> On Mon, Apr 23, 2012 at 1:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > commit c8628155ece3 (tcp: reduce out_of_order memory use) took care of
> > coalescing tcp segments provided by legacy devices (linear skbs)
> >
> > We extend this idea to fragged skbs, as their truesize can be heavy.
> >
> > ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.
> >
> > Use this coalescing strategy for receive queue too.
> >
> > This contributes to reduce number of tcp collapses, at minimal cost, and
> > reduces memory overhead and packets drops.
>
> The mechanics look solid, but I'm a little concerned about the
> potential added overhead for the new case where tcp_try_coalesce()
> does a skb_copy_bits() for in-order data that it is coalescing at the
> end of the sk_receive_queue. Do you have any performance numbers for
> this case to help suggest whether this added copy is a concern?
>
> neal
This never happens on connections where performance matters : skb head
can only contains one full mss segment.
This part is only used on wifi devices, where skb head is really fat.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] tcp: introduce tcp_try_coalesce
2012-04-24 2:13 ` Eric Dumazet
@ 2012-04-24 2:29 ` Eric Dumazet
0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2012-04-24 2:29 UTC (permalink / raw)
To: Neal Cardwell
Cc: David Miller, netdev, Tom Herbert, Maciej Żenczykowski,
Ilpo Järvinen
On Tue, 2012-04-24 at 04:13 +0200, Eric Dumazet wrote:
> This never happens on connections where performance matters : skb head
> can only contains one full mss segment.
By the way, top end hardware uses fragged skbs.
Incidentally this gives better splice() performance (avoids copy of skb
head to pages)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] tcp: introduce tcp_try_coalesce
2012-04-23 17:11 [PATCH net-next] tcp: introduce tcp_try_coalesce Eric Dumazet
2012-04-24 1:13 ` Neal Cardwell
@ 2012-04-24 2:39 ` Neal Cardwell
2012-04-24 2:46 ` David Miller
1 sibling, 1 reply; 9+ messages in thread
From: Neal Cardwell @ 2012-04-24 2:39 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, netdev, Tom Herbert, Maciej Żenczykowski,
Ilpo Järvinen
On Mon, Apr 23, 2012 at 1:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> commit c8628155ece3 (tcp: reduce out_of_order memory use) took care of
> coalescing tcp segments provided by legacy devices (linear skbs)
>
> We extend this idea to fragged skbs, as their truesize can be heavy.
>
> ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.
>
> Use this coalescing strategy for receive queue too.
>
> This contributes to reduce number of tcp collapses, at minimal cost, and
> reduces memory overhead and packets drops.
Acked-by: Neal Cardwell <ncardwell@google.com>
Thanks for the background info, Eric.
neal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] tcp: introduce tcp_try_coalesce
2012-04-24 2:39 ` Neal Cardwell
@ 2012-04-24 2:46 ` David Miller
2012-04-24 2:59 ` Eric Dumazet
2012-04-24 3:34 ` [PATCH net-next] tcp: tcp_try_coalesce returns a boolean Eric Dumazet
0 siblings, 2 replies; 9+ messages in thread
From: David Miller @ 2012-04-24 2:46 UTC (permalink / raw)
To: ncardwell; +Cc: eric.dumazet, netdev, therbert, maze, ilpo.jarvinen
From: Neal Cardwell <ncardwell@google.com>
Date: Mon, 23 Apr 2012 22:39:10 -0400
> On Mon, Apr 23, 2012 at 1:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> commit c8628155ece3 (tcp: reduce out_of_order memory use) took care of
>> coalescing tcp segments provided by legacy devices (linear skbs)
>>
>> We extend this idea to fragged skbs, as their truesize can be heavy.
>>
>> ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.
>>
>> Use this coalescing strategy for receive queue too.
>>
>> This contributes to reduce number of tcp collapses, at minimal cost, and
>> reduces memory overhead and packets drops.
>
> Acked-by: Neal Cardwell <ncardwell@google.com>
>
> Thanks for the background info, Eric.
Applied, thanks Eric.
Although I'd like to ask you to clean up tcp_try_coalesce() a bit.
It effectively returns a boolean, but you've clouded this up by
returning an int and defining it in the comment to return "> 0" or
not.
Just make it return a real bool.
I know why you did this, it makes the "eaten" code somewhat simpler in
tcp_data_queue(), but overall it's more confusing how it is now.
People look at how the tcp_try_coalesce() return value is interpreted
and say "in what cases can it return a negative value?" We both know
it can't, but you have to read the entire function to figure that out.
And that's by definition not intuitive.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] tcp: introduce tcp_try_coalesce
2012-04-24 2:46 ` David Miller
@ 2012-04-24 2:59 ` Eric Dumazet
2012-04-24 3:34 ` [PATCH net-next] tcp: tcp_try_coalesce returns a boolean Eric Dumazet
1 sibling, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2012-04-24 2:59 UTC (permalink / raw)
To: David Miller; +Cc: ncardwell, netdev, therbert, maze, ilpo.jarvinen
On Mon, 2012-04-23 at 22:46 -0400, David Miller wrote:
> Applied, thanks Eric.
>
> Although I'd like to ask you to clean up tcp_try_coalesce() a bit.
>
> It effectively returns a boolean, but you've clouded this up by
> returning an int and defining it in the comment to return "> 0" or
> not.
>
> Just make it return a real bool.
>
> I know why you did this, it makes the "eaten" code somewhat simpler in
> tcp_data_queue(), but overall it's more confusing how it is now.
>
> People look at how the tcp_try_coalesce() return value is interpreted
> and say "in what cases can it return a negative value?" We both know
> it can't, but you have to read the entire function to figure that out.
>
> And that's by definition not intuitive.
>
> Thanks.
Sure I'll do the cleanup. You guessed correctly why I did that ;)
In the beginning I did a "return len;" instead of "return 1;" and felt a
bit uncomfortable in case we merged a zero length message.
Then I added the !th->fin test inside tcp_try_coalesce()
(my initial patch allowed the fin being set for the tcp_data_queue()
case since tcp_fin() was called anyway)
Thanks
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next] tcp: tcp_try_coalesce returns a boolean
2012-04-24 2:46 ` David Miller
2012-04-24 2:59 ` Eric Dumazet
@ 2012-04-24 3:34 ` Eric Dumazet
2012-04-24 3:37 ` David Miller
1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2012-04-24 3:34 UTC (permalink / raw)
To: David Miller; +Cc: ncardwell, netdev, therbert, maze, ilpo.jarvinen
From: Eric Dumazet <edumazet@google.com>
This clarifies code intention, as suggested by David.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_input.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bd7aef5..67f352a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4459,23 +4459,23 @@ static inline int tcp_try_rmem_schedule(struct sock *sk, unsigned int size)
* to reduce overall memory use and queue lengths, if cost is small.
* Packets in ofo or receive queues can stay a long time.
* Better try to coalesce them right now to avoid future collapses.
- * Returns > 0 value if caller should free @from instead of queueing it
+ * Returns true if caller should free @from instead of queueing it
*/
-static int tcp_try_coalesce(struct sock *sk,
- struct sk_buff *to,
- struct sk_buff *from)
+static bool tcp_try_coalesce(struct sock *sk,
+ struct sk_buff *to,
+ struct sk_buff *from)
{
int len = from->len;
if (tcp_hdr(from)->fin)
- return 0;
+ return false;
if (len <= skb_tailroom(to)) {
BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
merge:
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
- return 1;
+ return true;
}
if (skb_headlen(from) == 0 &&
!skb_has_frag_list(to) &&
@@ -4498,7 +4498,7 @@ merge:
to->data_len += len;
goto merge;
}
- return 0;
+ return false;
}
static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
@@ -4539,7 +4539,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
end_seq = TCP_SKB_CB(skb)->end_seq;
if (seq == TCP_SKB_CB(skb1)->end_seq) {
- if (tcp_try_coalesce(sk, skb1, skb) <= 0) {
+ if (!tcp_try_coalesce(sk, skb1, skb)) {
__skb_queue_after(&tp->out_of_order_queue, skb1, skb);
} else {
__kfree_skb(skb);
@@ -4671,7 +4671,7 @@ queue_and_out:
goto drop;
tail = skb_peek_tail(&sk->sk_receive_queue);
- eaten = tail ? tcp_try_coalesce(sk, tail, skb) : -1;
+ eaten = (tail && tcp_try_coalesce(sk, tail, skb)) ? 1 : 0;
if (eaten <= 0) {
skb_set_owner_r(skb, sk);
__skb_queue_tail(&sk->sk_receive_queue, skb);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next] tcp: tcp_try_coalesce returns a boolean
2012-04-24 3:34 ` [PATCH net-next] tcp: tcp_try_coalesce returns a boolean Eric Dumazet
@ 2012-04-24 3:37 ` David Miller
0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2012-04-24 3:37 UTC (permalink / raw)
To: eric.dumazet; +Cc: ncardwell, netdev, therbert, maze, ilpo.jarvinen
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 24 Apr 2012 05:34:36 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> This clarifies code intention, as suggested by David.
>
> Suggested-by: David Miller <davem@davemloft.net>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-04-24 3:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-23 17:11 [PATCH net-next] tcp: introduce tcp_try_coalesce Eric Dumazet
2012-04-24 1:13 ` Neal Cardwell
2012-04-24 2:13 ` Eric Dumazet
2012-04-24 2:29 ` Eric Dumazet
2012-04-24 2:39 ` Neal Cardwell
2012-04-24 2:46 ` David Miller
2012-04-24 2:59 ` Eric Dumazet
2012-04-24 3:34 ` [PATCH net-next] tcp: tcp_try_coalesce returns a boolean Eric Dumazet
2012-04-24 3:37 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).