netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/8] tcp: tsq: performance series
@ 2016-12-03 19:14 Eric Dumazet
  2016-12-03 19:14 ` [PATCH v2 net-next 1/8] tcp: tsq: add tsq_flags / tsq_enum Eric Dumazet
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Eric Dumazet @ 2016-12-03 19:14 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Yuchung Cheng, Eric Dumazet

Under very high TX stress, CPU handling NIC TX completions can spend
considerable amount of cycles handling TSQ (TCP Small Queues) logic.

This patch series avoids some atomic operations, but most notable
patch is the 3rd one, allowing other cpus processing ACK packets and
calling tcp_write_xmit() to grab TCP_TSQ_DEFERRED so that
tcp_tasklet_func() can skip already processed sockets.

This avoid lots of lock acquisitions and cache lines accesses,
particularly under load.

In v2, I added :

- tcp_small_queue_check() change to allow 1st and 2nd packets
  in write queue to be sent, even in the case TX completion of
  already acknowledged packets did not happen yet.
  This helps when TX completion coalescing parameters are set
  even to insane values, and/or busy polling is used.

- A reorganization of struct sock fields to
  lower false sharing and increase data locality.

- Then I moved tsq_flags from tcp_sock to struct sock also
  to reduce cache line misses during TX completions.

I measured an overall throughput gain of 22 % for heavy TCP use
over a single TX queue.

Eric Dumazet (8):
  tcp: tsq: add tsq_flags / tsq_enum
  tcp: tsq: remove one locked operation in tcp_wfree()
  tcp: tsq: add shortcut in tcp_tasklet_func()
  tcp: tsq: avoid one atomic in tcp_wfree()
  tcp: tsq: add a shortcut in tcp_small_queue_check()
  tcp: tcp_mtu_probe() is likely to exit early
  net: reorganize struct sock for better data locality
  tcp: tsq: move tsq_flags close to sk_wmem_alloc

 include/linux/tcp.h   | 12 +++++--
 include/net/sock.h    | 51 +++++++++++++++--------------
 net/ipv4/tcp.c        |  4 +--
 net/ipv4/tcp_ipv4.c   |  2 +-
 net/ipv4/tcp_output.c | 91 +++++++++++++++++++++++++++++++--------------------
 net/ipv4/tcp_timer.c  |  4 +--
 net/ipv6/tcp_ipv6.c   |  2 +-
 7 files changed, 98 insertions(+), 68 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-12-05 19:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-03 19:14 [PATCH v2 net-next 0/8] tcp: tsq: performance series Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 1/8] tcp: tsq: add tsq_flags / tsq_enum Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 2/8] tcp: tsq: remove one locked operation in tcp_wfree() Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 3/8] tcp: tsq: add shortcut in tcp_tasklet_func() Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 4/8] tcp: tsq: avoid one atomic in tcp_wfree() Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 5/8] tcp: tsq: add a shortcut in tcp_small_queue_check() Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 6/8] tcp: tcp_mtu_probe() is likely to exit early Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 7/8] net: reorganize struct sock for better data locality Eric Dumazet
2016-12-05 12:36   ` Paolo Abeni
2016-12-05 14:30     ` Eric Dumazet
2016-12-03 19:14 ` [PATCH v2 net-next 8/8] tcp: tsq: move tsq_flags close to sk_wmem_alloc Eric Dumazet
2016-12-04  0:16   ` David Miller
2016-12-04  1:13     ` Eric Dumazet
2016-12-04  1:37       ` David Miller
2016-12-05  2:45         ` Eric Dumazet
2016-12-05 19:06 ` [PATCH v2 net-next 0/8] tcp: tsq: performance series David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).