* [PATCH] tcp: avoid wakeups for pure ACK
@ 2013-02-27 17:05 Eric Dumazet
2013-02-27 18:04 ` David Miller
2013-02-28 20:38 ` David Miller
0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-02-27 17:05 UTC (permalink / raw)
To: David Miller
Cc: netdev, Neal Cardwell, Tom Herbert, Yuchung Cheng, Andi Kleen
From: Eric Dumazet <edumazet@google.com>
TCP prequeue mechanism purpose is to let incoming packets
being processed by the thread currently blocked in tcp_recvmsg(),
instead of behalf of the softirq handler, to better adapt flow
control on receiver host capacity to schedule the consumer.
But in typical request/answer workloads, we send request, then
block to receive the answer. And before the actual answer, TCP
stack receives the ACK packets acknowledging the request.
Processing pure ACK on behalf of the thread blocked in tcp_recvmsg()
is a waste of resources, as thread has to immediately sleep again
because it got no payload.
This patch avoids the extra context switches and scheduler overhead.
Before patch :
a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency
a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
231676
Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':
116251.501765 task-clock # 11.369 CPUs utilized
5,025,463 context-switches # 0.043 M/sec
1,074,511 CPU-migrations # 0.009 M/sec
216,923 page-faults # 0.002 M/sec
311,636,972,396 cycles # 2.681 GHz
260,507,138,069 stalled-cycles-frontend # 83.59% frontend cycles idle
155,590,092,840 stalled-cycles-backend # 49.93% backend cycles idle
100,101,255,411 instructions # 0.32 insns per cycle
# 2.60 stalled cycles per insn
16,535,930,999 branches # 142.243 M/sec
646,483,591 branch-misses # 3.91% of all branches
10.225482774 seconds time elapsed
After patch :
a:~# echo 0 >/proc/sys/net/ipv4/tcp_low_latency
a:~# perf stat ./super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k
233297
Performance counter stats for './super_netperf 300 -t TCP_RR -l 10 -H 7.7.7.84 -- -r 8k,8k':
91084.870855 task-clock # 8.887 CPUs utilized
2,485,916 context-switches # 0.027 M/sec
815,520 CPU-migrations # 0.009 M/sec
216,932 page-faults # 0.002 M/sec
245,195,022,629 cycles # 2.692 GHz
202,635,777,041 stalled-cycles-frontend # 82.64% frontend cycles idle
124,280,372,407 stalled-cycles-backend # 50.69% backend cycles idle
83,457,289,618 instructions # 0.34 insns per cycle
# 2.43 stalled cycles per insn
13,431,472,361 branches # 147.461 M/sec
504,470,665 branch-misses # 3.76% of all branches
10.249594448 seconds time elapsed
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
David : Feel free to postpone this to 3.10.
I'll send a patch to move tcp_prequeue() out of line when net-next opens
include/net/tcp.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 23f2e98..cf0694d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1045,6 +1045,10 @@ static inline bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)
if (sysctl_tcp_low_latency || !tp->ucopy.task)
return false;
+ if (skb->len <= tcp_hdrlen(skb) &&
+ skb_queue_len(&tp->ucopy.prequeue) == 0)
+ return false;
+
__skb_queue_tail(&tp->ucopy.prequeue, skb);
tp->ucopy.memory += skb->truesize;
if (tp->ucopy.memory > sk->sk_rcvbuf) {
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] tcp: avoid wakeups for pure ACK
2013-02-27 17:05 [PATCH] tcp: avoid wakeups for pure ACK Eric Dumazet
@ 2013-02-27 18:04 ` David Miller
2013-02-27 18:17 ` Eric Dumazet
2013-02-28 20:38 ` David Miller
1 sibling, 1 reply; 5+ messages in thread
From: David Miller @ 2013-02-27 18:04 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, ncardwell, therbert, ycheng, ak
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Feb 2013 09:05:03 -0800
> Processing pure ACK on behalf of the thread blocked in tcp_recvmsg()
> is a waste of resources, as thread has to immediately sleep again
> because it got no payload.
More than one thread can be operating on the socket, the other one
could be waiting for the window to open up in order to do a send. Are
you absolutely sure that we won't have a problem in that situation?
In fact I wonder if that does the right thing right now.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] tcp: avoid wakeups for pure ACK
2013-02-27 18:04 ` David Miller
@ 2013-02-27 18:17 ` Eric Dumazet
2013-02-27 21:15 ` David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2013-02-27 18:17 UTC (permalink / raw)
To: David Miller; +Cc: netdev, ncardwell, therbert, ycheng, ak
On Wed, 2013-02-27 at 13:04 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 27 Feb 2013 09:05:03 -0800
>
> > Processing pure ACK on behalf of the thread blocked in tcp_recvmsg()
> > is a waste of resources, as thread has to immediately sleep again
> > because it got no payload.
>
> More than one thread can be operating on the socket, the other one
> could be waiting for the window to open up in order to do a send. Are
> you absolutely sure that we won't have a problem in that situation?
Yes, more than one thread can be operating, but the prequeue wakeups the
one blocked in tcp_recvmsg() only, because of :
wake_up_interruptible_sync_poll(sk_sleep(sk),
POLLIN | POLLRDNORM | POLLRDBAND);
Then the ACK processing might/should wakeup the other thread blocked in
tcp_sendmsg().
So this patch will also help this (not very usual) situation, as we will
only wakeup the tcp_sendmsg() thread when ACK is processed from softirq
handler, and let the thread blocked in tcp_recvmsg() sleeping.
> In fact I wonder if that does the right thing right now.
Right now it is working, because at least one thread will process the
prequeue at the exit of tcp_recvmsg()
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] tcp: avoid wakeups for pure ACK
2013-02-27 18:17 ` Eric Dumazet
@ 2013-02-27 21:15 ` David Miller
0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2013-02-27 21:15 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, ncardwell, therbert, ycheng, ak
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Feb 2013 10:17:39 -0800
> So this patch will also help this (not very usual) situation, as we will
> only wakeup the tcp_sendmsg() thread when ACK is processed from softirq
> handler, and let the thread blocked in tcp_recvmsg() sleeping.
Ok, thanks for the analysis.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] tcp: avoid wakeups for pure ACK
2013-02-27 17:05 [PATCH] tcp: avoid wakeups for pure ACK Eric Dumazet
2013-02-27 18:04 ` David Miller
@ 2013-02-28 20:38 ` David Miller
1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2013-02-28 20:38 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, ncardwell, therbert, ycheng, ak
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Feb 2013 09:05:03 -0800
> TCP prequeue mechanism purpose is to let incoming packets
> being processed by the thread currently blocked in tcp_recvmsg(),
> instead of behalf of the softirq handler, to better adapt flow
> control on receiver host capacity to schedule the consumer.
>
> But in typical request/answer workloads, we send request, then
> block to receive the answer. And before the actual answer, TCP
> stack receives the ACK packets acknowledging the request.
>
> Processing pure ACK on behalf of the thread blocked in tcp_recvmsg()
> is a waste of resources, as thread has to immediately sleep again
> because it got no payload.
>
> This patch avoids the extra context switches and scheduler overhead.
...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-02-28 20:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-27 17:05 [PATCH] tcp: avoid wakeups for pure ACK Eric Dumazet
2013-02-27 18:04 ` David Miller
2013-02-27 18:17 ` Eric Dumazet
2013-02-27 21:15 ` David Miller
2013-02-28 20:38 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).