[PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention
@ 2019-03-21 22:17 Eric Dumazet
  2019-03-21 22:17 ` [PATCH net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Eric Dumazet @ 2019-03-21 22:17 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

On hosts with many cpus we can observe a very verious contention
on spinlocks used in mm slab layer.

The following can happen quite often :

1) TX path
  sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
  ACK is received on CPU B, and consumes the skb that was in the retransmit
  queue.

2) RX path
  network driver alocates skb on CPU C
  recvmsg() happens on CPU D, freeing the skb after it has been delivered
  to user space.

In both cases, we are hitting the asymetric alloc/free pattern
for which slab has to drain alien caches. At 8 Mpps per second,
this represents 16 Mpps alloc/free per second and has a huge penalty.

In an interesting experiment, I tried to use a single kmem_cache for all the skbs
(in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
                  kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
qnd most of the contention disappeared, since cpus could better use
their local slab per-cpu cache.

But we can do actually better, in the following patches.

TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
     so that next sendmsg() can reuse it immediately.

RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
   so that it can be freed by the cpu feeding the incoming packets in BH.

This increased the performance of small RPC benchmark by about 10 % on a host
with 112 hyperthreads.

Eric Dumazet (3):
  net: convert rps_needed and rfs_needed to new static branch api
  tcp: add one skb cache for tx
  tcp: add one skb cache for rx

 include/linux/netdevice.h  |  4 ++--
 include/net/sock.h         | 13 +++++++++-
 net/core/dev.c             | 10 ++++----
 net/core/net-sysfs.c       |  4 ++--
 net/core/sysctl_net_core.c |  8 +++----
 net/ipv4/af_inet.c         |  4 ++++
 net/ipv4/tcp.c             | 49 +++++++++++++++++---------------------
 net/ipv4/tcp_ipv4.c        | 11 +++++++--
 net/ipv6/tcp_ipv6.c        | 12 +++++++---
 9 files changed, 69 insertions(+), 46 deletions(-)

-- 
2.21.0.225.g810b269d1ac-goog

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api
  2019-03-21 22:17 [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
@ 2019-03-21 22:17 ` Eric Dumazet
  2019-03-22 16:01   ` kbuild test robot
  2019-03-21 22:17 ` [PATCH net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2019-03-21 22:17 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

We prefer static_branch_unlikely() over static_key_false() these days.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h  |  4 ++--
 include/net/sock.h         |  2 +-
 net/core/dev.c             | 10 +++++-----
 net/core/net-sysfs.c       |  4 ++--
 net/core/sysctl_net_core.c |  8 ++++----
 5 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 823762291ebf59d2a8a0502f71d6591b5cd7839f..166fdc0a78b49c9df984b767169c3babce24462e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -194,8 +194,8 @@ struct net_device_stats {
 
 #ifdef CONFIG_RPS
 #include <linux/static_key.h>
-extern struct static_key rps_needed;
-extern struct static_key rfs_needed;
+extern struct static_key_false rps_needed;
+extern struct static_key_false rfs_needed;
 #endif
 
 struct neighbour;
diff --git a/include/net/sock.h b/include/net/sock.h
index 8de5ee258b93a50b2fdcde796bae3a5b53ce4d6a..fecdf639225c2d4995ee2e2cd9be57f3d4f22777 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -966,7 +966,7 @@ static inline void sock_rps_record_flow_hash(__u32 hash)
 static inline void sock_rps_record_flow(const struct sock *sk)
 {
 #ifdef CONFIG_RPS
-	if (static_key_false(&rfs_needed)) {
+	if (static_branch_unlikely(&rfs_needed)) {
 		/* Reading sk->sk_rxhash might incur an expensive cache line
 		 * miss.
 		 *
diff --git a/net/core/dev.c b/net/core/dev.c
index 357111431ec9a6a5873830b89dd137d5eba6f2f0..c71b0998fa3ac8ae9d28aa1131852032a5cd0008 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3973,9 +3973,9 @@ EXPORT_SYMBOL(rps_sock_flow_table);
 u32 rps_cpu_mask __read_mostly;
 EXPORT_SYMBOL(rps_cpu_mask);
 
-struct static_key rps_needed __read_mostly;
+struct static_key_false rps_needed __read_mostly;
 EXPORT_SYMBOL(rps_needed);
-struct static_key rfs_needed __read_mostly;
+struct static_key_false rfs_needed __read_mostly;
 EXPORT_SYMBOL(rfs_needed);
 
 static struct rps_dev_flow *
@@ -4501,7 +4501,7 @@ static int netif_rx_internal(struct sk_buff *skb)
 	}
 
 #ifdef CONFIG_RPS
-	if (static_key_false(&rps_needed)) {
+	if (static_branch_unlikely(&rps_needed)) {
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu;
 
@@ -5170,7 +5170,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
 
 	rcu_read_lock();
 #ifdef CONFIG_RPS
-	if (static_key_false(&rps_needed)) {
+	if (static_branch_unlikely(&rps_needed)) {
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu = get_rps_cpu(skb->dev, skb, &rflow);
 
@@ -5218,7 +5218,7 @@ static void netif_receive_skb_list_internal(struct list_head *head)
 
 	rcu_read_lock();
 #ifdef CONFIG_RPS
-	if (static_key_false(&rps_needed)) {
+	if (static_branch_unlikely(&rps_needed)) {
 		list_for_each_entry_safe(skb, next, head, list) {
 			struct rps_dev_flow voidflow, *rflow = &voidflow;
 			int cpu = get_rps_cpu(skb->dev, skb, &rflow);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 4ff661f6f989ae10ca49a1e81c825be56683d026..851cabb90bce66f30a5868d6b7499f240202d1eb 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -754,9 +754,9 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 	rcu_assign_pointer(queue->rps_map, map);
 
 	if (map)
-		static_key_slow_inc(&rps_needed);
+		static_branch_inc(&rps_needed);
 	if (old_map)
-		static_key_slow_dec(&rps_needed);
+		static_branch_dec(&rps_needed);
 
 	mutex_unlock(&rps_map_mutex);
 
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 84bf2861f45f76f162d661298991f13ac0e8b592..1a2685694abd537d7ae304754b84b237928fd298 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -95,12 +95,12 @@ static int rps_sock_flow_sysctl(struct ctl_table *table, int write,
 		if (sock_table != orig_sock_table) {
 			rcu_assign_pointer(rps_sock_flow_table, sock_table);
 			if (sock_table) {
-				static_key_slow_inc(&rps_needed);
-				static_key_slow_inc(&rfs_needed);
+				static_branch_inc(&rps_needed);
+				static_branch_inc(&rfs_needed);
 			}
 			if (orig_sock_table) {
-				static_key_slow_dec(&rps_needed);
-				static_key_slow_dec(&rfs_needed);
+				static_branch_dec(&rps_needed);
+				static_branch_dec(&rfs_needed);
 				synchronize_rcu();
 				vfree(orig_sock_table);
 			}
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api
  2019-03-21 22:17 ` [PATCH net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
@ 2019-03-22 16:01   ` kbuild test robot
  0 siblings, 0 replies; 8+ messages in thread
From: kbuild test robot @ 2019-03-22 16:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kbuild-all, David S . Miller, netdev, Eric Dumazet,
	Soheil Hassas Yeganeh, Willem de Bruijn, Florian Westphal,
	Tom Herbert, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 3182 bytes --]

Hi Eric,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/net-convert-rps_needed-and-rfs_needed-to-new-static-branch-api/20190322-211954
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   drivers//net/tun.c: In function 'tun_automq_xmit':
>> drivers//net/tun.c:1045:46: error: passing argument 1 of 'static_key_false' from incompatible pointer type [-Werror=incompatible-pointer-types]
     if (tun->numqueues == 1 && static_key_false(&rps_needed)) {
                                                 ^~~~~~~~~~~
   In file included from include/linux/module.h:19,
                    from drivers//net/tun.c:44:
   include/linux/jump_label.h:259:65: note: expected 'struct static_key *' but argument is of type 'struct static_key_false *'
    static __always_inline bool static_key_false(struct static_key *key)
                                                 ~~~~~~~~~~~~~~~~~~~^~~
   cc1: some warnings being treated as errors

vim +/static_key_false +1045 drivers//net/tun.c

^1da177e Linus Torvalds 2005-04-16  1040  
^1da177e Linus Torvalds 2005-04-16  1041  /* Net device start xmit */
96f84061 Jason Wang     2017-12-04  1042  static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
^1da177e Linus Torvalds 2005-04-16  1043  {
3df97ba8 Jason Wang     2016-04-25  1044  #ifdef CONFIG_RPS
96f84061 Jason Wang     2017-12-04 @1045  	if (tun->numqueues == 1 && static_key_false(&rps_needed)) {
9bc88939 Tom Herbert    2013-12-22  1046  		/* Select queue was not called for the skbuff, so we extract the
9bc88939 Tom Herbert    2013-12-22  1047  		 * RPS hash and save it into the flow_table here.
9bc88939 Tom Herbert    2013-12-22  1048  		 */
4b035271 Wang Li        2018-10-09  1049  		struct tun_flow_entry *e;
9bc88939 Tom Herbert    2013-12-22  1050  		__u32 rxhash;
9bc88939 Tom Herbert    2013-12-22  1051  
feec084a Jason Wang     2017-06-06  1052  		rxhash = __skb_get_hash_symmetric(skb);
4b035271 Wang Li        2018-10-09  1053  		e = tun_flow_find(&tun->flows[tun_hashfn(rxhash)], rxhash);
9bc88939 Tom Herbert    2013-12-22  1054  		if (e)
9bc88939 Tom Herbert    2013-12-22  1055  			tun_flow_save_rps_rxhash(e, rxhash);
9bc88939 Tom Herbert    2013-12-22  1056  	}
3df97ba8 Jason Wang     2016-04-25  1057  #endif
96f84061 Jason Wang     2017-12-04  1058  }
96f84061 Jason Wang     2017-12-04  1059  

:::::: The code at line 1045 was first introduced by commit
:::::: 96f84061620c6325a2ca9a9a05b410e6461d03c3 tun: add eBPF based queue selection method

:::::: TO: Jason Wang <jasowang@redhat.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 53125 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/3] tcp: add one skb cache for tx
  2019-03-21 22:17 [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
  2019-03-21 22:17 ` [PATCH net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
@ 2019-03-21 22:17 ` Eric Dumazet
  2019-03-22 19:07   ` kbuild test robot
  2019-03-21 22:17 ` [PATCH net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
  2019-03-21 22:25 ` [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Soheil Hassas Yeganeh
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2019-03-21 22:17 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks.

    20.69%  [kernel]       [k] queued_spin_lock_slowpath
     5.64%  [kernel]       [k] _raw_spin_lock
     3.83%  [kernel]       [k] syscall_return_via_sysret
     3.48%  [kernel]       [k] __entry_text_start
     1.76%  [kernel]       [k] __netif_receive_skb_core
     1.64%  [kernel]       [k] __fget

For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes.

In many cases, ACK packets are handled by another cpus, and this unfortunately
incurs heavy costs for slab layer.

This patch uses an extra pointer in socket structure, so that we try to reuse
the same skb and avoid these expensive costs.

We cache at most one skb per socket so this should be safe as far as
memory pressure is concerned.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h |  5 +++++
 net/ipv4/tcp.c     | 45 ++++++++++++++++++---------------------------
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index fecdf639225c2d4995ee2e2cd9be57f3d4f22777..314c47a8f5d19918393aa854a95e6e0f7ec6b604 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -414,6 +414,7 @@ struct sock {
 		struct sk_buff	*sk_send_head;
 		struct rb_root	tcp_rtx_queue;
 	};
+	struct sk_buff		*sk_tx_skb_cache;
 	struct sk_buff_head	sk_write_queue;
 	__s32			sk_peek_off;
 	int			sk_write_pending;
@@ -1463,6 +1464,10 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
 
 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
 {
+	if (!sk->sk_tx_skb_cache) {
+		sk->sk_tx_skb_cache = skb;
+		return;
+	}
 	sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
 	sk->sk_wmem_queued -= skb->truesize;
 	sk_mem_uncharge(sk, skb->truesize);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6baa6dc1b13b0b94b1da238668b93e167cf444fe..0e48912351616adf95c8618b851f5066d25c8aca 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -865,6 +865,16 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp,
 {
 	struct sk_buff *skb;
 
+	skb = sk->sk_tx_skb_cache;
+	if (skb && !size) {
+		sk->sk_wmem_queued -= skb->truesize;
+		sk_mem_uncharge(sk, skb->truesize);
+		skb->truesize -= skb->data_len;
+		sk->sk_tx_skb_cache = NULL;
+		pskb_trim(skb, 0);
+		INIT_LIST_HEAD(&skb->tcp_tsorted_anchor);
+		return skb;
+	}
 	/* The TCP header must be at least 32-bit aligned.  */
 	size = ALIGN(size, 4);
 
@@ -1098,30 +1108,6 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 }
 EXPORT_SYMBOL(tcp_sendpage);
 
-/* Do not bother using a page frag for very small frames.
- * But use this heuristic only for the first skb in write queue.
- *
- * Having no payload in skb->head allows better SACK shifting
- * in tcp_shift_skb_data(), reducing sack/rack overhead, because
- * write queue has less skbs.
- * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
- * This also speeds up tso_fragment(), since it wont fallback
- * to tcp_fragment().
- */
-static int linear_payload_sz(bool first_skb)
-{
-	if (first_skb)
-		return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
-	return 0;
-}
-
-static int select_size(bool first_skb, bool zc)
-{
-	if (zc)
-		return 0;
-	return linear_payload_sz(first_skb);
-}
-
 void tcp_free_fastopen_req(struct tcp_sock *tp)
 {
 	if (tp->fastopen_req) {
@@ -1272,7 +1258,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 		if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
 			bool first_skb;
-			int linear;
 
 new_segment:
 			if (!sk_stream_memory_free(sk))
@@ -1283,8 +1268,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 				goto restart;
 			}
 			first_skb = tcp_rtx_and_write_queues_empty(sk);
-			linear = select_size(first_skb, zc);
-			skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
+			skb = sk_stream_alloc_skb(sk, 0, sk->sk_allocation,
 						  first_skb);
 			if (!skb)
 				goto wait_for_memory;
@@ -2552,6 +2536,13 @@ void tcp_write_queue_purge(struct sock *sk)
 		sk_wmem_free_skb(sk, skb);
 	}
 	tcp_rtx_queue_purge(sk);
+	skb = sk->sk_tx_skb_cache;
+	if (skb) {
+		sk->sk_wmem_queued -= skb->truesize;
+		sk_mem_uncharge(sk, skb->truesize);
+		__kfree_skb(skb);
+		sk->sk_tx_skb_cache = NULL;
+	}
 	INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
 	sk_mem_reclaim(sk);
 	tcp_clear_all_retrans_hints(tcp_sk(sk));
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 2/3] tcp: add one skb cache for tx
  2019-03-21 22:17 ` [PATCH net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
@ 2019-03-22 19:07   ` kbuild test robot
  0 siblings, 0 replies; 8+ messages in thread
From: kbuild test robot @ 2019-03-22 19:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kbuild-all, David S . Miller, netdev, Eric Dumazet,
	Soheil Hassas Yeganeh, Willem de Bruijn, Florian Westphal,
	Tom Herbert, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 20962 bytes --]

Hi Eric,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/net-convert-rps_needed-and-rfs_needed-to-new-static-branch-api/20190322-211954
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'params' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'bo' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'level' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'pe' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'addr' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'count' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'incr' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1514: warning: Function parameter or member 'flags' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3104: warning: Function parameter or member 'pasid' not described in 'amdgpu_vm_make_compute'
   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:375: warning: Excess function parameter 'entry' description in 'amdgpu_irq_dispatch'
   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:376: warning: Function parameter or member 'ih' not described in 'amdgpu_irq_dispatch'
   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:376: warning: Excess function parameter 'entry' description in 'amdgpu_irq_dispatch'
   drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c:1: warning: no structured comments found
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:128: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source @atomic_obj
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:203: warning: Function parameter or member 'atomic_obj' not described in 'amdgpu_display_manager'
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:203: warning: Function parameter or member 'atomic_obj_lock' not described in 'amdgpu_display_manager'
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:203: warning: Function parameter or member 'backlight_link' not described in 'amdgpu_display_manager'
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:203: warning: Function parameter or member 'backlight_caps' not described in 'amdgpu_display_manager'
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:203: warning: Function parameter or member 'freesync_module' not described in 'amdgpu_display_manager'
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:203: warning: Function parameter or member 'fw_dmcu' not described in 'amdgpu_display_manager'
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:203: warning: Function parameter or member 'dmcu_fw_version' not described in 'amdgpu_display_manager'
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:1: warning: no structured comments found
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_pin' not described in 'drm_driver'
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_unpin' not described in 'drm_driver'
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_res_obj' not described in 'drm_driver'
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_get_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_import_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_vmap' not described in 'drm_driver'
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_vunmap' not described in 'drm_driver'
   include/drm/drm_drv.h:715: warning: Function parameter or member 'gem_prime_mmap' not described in 'drm_driver'
   include/drm/drm_atomic_state_helper.h:1: warning: no structured comments found
   drivers/gpu/drm/scheduler/sched_main.c:376: warning: Excess function parameter 'bad' description in 'drm_sched_stop'
   drivers/gpu/drm/scheduler/sched_main.c:377: warning: Excess function parameter 'bad' description in 'drm_sched_stop'
   drivers/gpu/drm/scheduler/sched_main.c:420: warning: Function parameter or member 'full_recovery' not described in 'drm_sched_start'
   drivers/gpu/drm/i915/i915_vma.h:50: warning: cannot understand function prototype: 'struct i915_vma '
   drivers/gpu/drm/i915/i915_vma.h:1: warning: no structured comments found
   drivers/gpu/drm/i915/intel_guc_fwif.h:536: warning: cannot understand function prototype: 'struct guc_log_buffer_state '
   drivers/gpu/drm/i915/i915_trace.h:1: warning: no structured comments found
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:126: warning: Function parameter or member 'hw_id' not described in 'komeda_component'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:126: warning: Function parameter or member 'max_active_outputs' not described in 'komeda_component'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:126: warning: Function parameter or member 'supported_outputs' not described in 'komeda_component'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:142: warning: Function parameter or member 'output_port' not described in 'komeda_component_output'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:196: warning: Function parameter or member 'component' not described in 'komeda_component_state'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:196: warning: Function parameter or member 'crtc' not described in 'komeda_component_state'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:196: warning: Function parameter or member 'plane' not described in 'komeda_component_state'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:196: warning: Function parameter or member 'wb_conn' not described in 'komeda_component_state'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:196: warning: Function parameter or member 'changed_active_inputs' not described in 'komeda_component_state'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:196: warning: Function parameter or member 'affected_inputs' not described in 'komeda_component_state'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'n_layers' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'layers' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'n_scalers' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'scalers' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'compiz' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'wb_layer' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'improc' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'ctrlr' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:300: warning: Function parameter or member 'funcs' not described in 'komeda_pipeline'
   drivers/gpu/drm/arm/display/komeda/komeda_pipeline.h:321: warning: Function parameter or member 'pipe' not described in 'komeda_pipeline_state'
   drivers/gpu/drm/arm/display/komeda/komeda_dev.h:97: warning: Function parameter or member 'dev' not described in 'komeda_dev'
   drivers/gpu/drm/arm/display/komeda/komeda_dev.h:97: warning: Function parameter or member 'reg_base' not described in 'komeda_dev'
   drivers/gpu/drm/arm/display/komeda/komeda_dev.h:97: warning: Function parameter or member 'chip' not described in 'komeda_dev'
   drivers/gpu/drm/arm/display/komeda/komeda_dev.h:97: warning: Function parameter or member 'mclk' not described in 'komeda_dev'
   drivers/gpu/drm/arm/display/komeda/komeda_dev.h:97: warning: Function parameter or member 'n_pipelines' not described in 'komeda_dev'
   drivers/gpu/drm/arm/display/komeda/komeda_dev.h:97: warning: Function parameter or member 'pipelines' not described in 'komeda_dev'
   drivers/gpu/drm/arm/display/komeda/komeda_framebuffer.h:1: warning: no structured comments found
   drivers/gpu/drm/arm/display/komeda/komeda_crtc.c:1: warning: no structured comments found
   drivers/gpu/drm/arm/display/komeda/komeda_plane.c:1: warning: no structured comments found
   include/linux/interconnect.h:1: warning: no structured comments found
   include/linux/skbuff.h:899: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'list' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
   include/linux/skbuff.h:899: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
   include/net/sock.h:238: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
   include/net/sock.h:238: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
   include/net/sock.h:514: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
   include/net/sock.h:514: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
>> include/net/sock.h:514: warning: Function parameter or member 'sk_tx_skb_cache' not described in 'sock'
   include/net/sock.h:514: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
   include/net/sock.h:514: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
   include/net/sock.h:514: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'xps_cpus_map' not described in 'net_device'
   include/linux/netdevice.h:2062: warning: Function parameter or member 'xps_rxqs_map' not described in 'net_device'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'quotactl' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'quota_on' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'sb_free_mnt_opts' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'sb_eat_lsm_opts' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'sb_kern_mount' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'sb_show_options' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'sb_add_mnt_opt' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'task_setioprio' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'task_getioprio' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'task_movememory' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'd_instantiate' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'getprocattr' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'setprocattr' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'secmark_refcount_inc' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1802: warning: Function parameter or member 'secmark_refcount_dec' not described in 'security_list_options'
   Documentation/core-api/index.rst:11: WARNING: toctree contains reference to nonexisting document u'core-api/flexible-arrays'
   include/linux/list.h:211: WARNING: Inline strong start-string without end-string.
   include/linux/xarray.h:232: ERROR: Unexpected indentation.
   include/linux/wait.h:110: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:113: ERROR: Unexpected indentation.
   include/linux/wait.h:115: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:1120: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:344: WARNING: Inline literal start-string without end-string.
   include/uapi/linux/firewire-cdev.h:312: WARNING: Inline literal start-string without end-string.
   drivers/ata/libata-core.c:5960: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5057: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1958: WARNING: Definition list ends without a blank line; unexpected unindent.
   include/linux/mtd/rawnand.h:1184: WARNING: Inline strong start-string without end-string.
   include/linux/mtd/rawnand.h:1186: WARNING: Inline strong start-string without end-string.
   include/linux/regulator/driver.h:289: ERROR: Unknown target name: "regulator_regmap_x_voltage".
   Documentation/driver-api/soundwire/locking.rst:50: ERROR: Inconsistent literal block quoting.
   Documentation/driver-api/soundwire/locking.rst:51: WARNING: Line block ends without a blank line.
   Documentation/driver-api/soundwire/locking.rst:55: WARNING: Inline substitution_reference start-string without end-string.
   Documentation/driver-api/soundwire/locking.rst:56: WARNING: Line block ends without a blank line.
   include/linux/spi/spi.h:376: ERROR: Unexpected indentation.
   fs/posix_acl.c:635: WARNING: Inline emphasis start-string without end-string.
   fs/debugfs/inode.c:386: WARNING: Inline literal start-string without end-string.
   fs/debugfs/inode.c:465: WARNING: Inline literal start-string without end-string.
   fs/debugfs/inode.c:497: WARNING: Inline literal start-string without end-string.
   fs/debugfs/inode.c:584: WARNING: Inline literal start-string without end-string.
   drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c:1856: WARNING: Inline emphasis start-string without end-string.
   drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c:1858: WARNING: Inline emphasis start-string without end-string.
   Documentation/networking/af_xdp.rst:319: WARNING: Literal block expected; none found.
   Documentation/networking/af_xdp.rst:326: WARNING: Literal block expected; none found.
   Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:43: WARNING: Definition list ends without a blank line; unexpected unindent.
   Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:63: ERROR: Unexpected indentation.
   include/linux/netdevice.h:3485: WARNING: Inline emphasis start-string without end-string.
   include/linux/netdevice.h:3485: WARNING: Inline emphasis start-string without end-string.
   net/core/dev.c:4979: ERROR: Unknown target name: "page_is".
   Documentation/networking/netdev-FAQ.rst:135: WARNING: Title underline too short.

vim +514 include/net/sock.h

^1da177e Linus Torvalds 2005-04-16 @514  

:::::: The code at line 514 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6712 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 3/3] tcp: add one skb cache for rx
  2019-03-21 22:17 [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
  2019-03-21 22:17 ` [PATCH net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
  2019-03-21 22:17 ` [PATCH net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
@ 2019-03-21 22:17 ` Eric Dumazet
  2019-03-21 22:52   ` Eric Dumazet
  2019-03-21 22:25 ` [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Soheil Hassas Yeganeh
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2019-03-21 22:17 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Soheil Hassas Yeganeh, Willem de Bruijn,
	Florian Westphal, Tom Herbert, Eric Dumazet

Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.

This means the incoming skb had to be allocated on a cpu,
but freed on another.

This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.

A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.

More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.

This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.

(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
 instead of 8 Mpps)

This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :

 - CPU handling the NIC rx interrupts, feeding the receive queue,
  and (after this patch) freeing the skbs that were consumed.

 - CPU in recvmsg() system call, essentially 100 % busy copying out
  data to user space.

Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.

Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.

To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h  |  6 ++++++
 net/ipv4/af_inet.c  |  4 ++++
 net/ipv4/tcp.c      |  4 ++++
 net/ipv4/tcp_ipv4.c | 11 +++++++++--
 net/ipv6/tcp_ipv6.c | 12 +++++++++---
 5 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 314c47a8f5d19918393aa854a95e6e0f7ec6b604..a7e936ce5a5ac935d90c47f6dd68bf9e8e47ba10 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -368,6 +368,7 @@ struct sock {
 	atomic_t		sk_drops;
 	int			sk_rcvlowat;
 	struct sk_buff_head	sk_error_queue;
+	struct sk_buff		*sk_rx_skb_cache;
 	struct sk_buff_head	sk_receive_queue;
 	/*
 	 * The backlog queue is special, it is always used with
@@ -2438,6 +2439,11 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags)
 static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
 {
 	__skb_unlink(skb, &sk->sk_receive_queue);
+	if (!sk->sk_rx_skb_cache) {
+		sk->sk_rx_skb_cache = skb;
+		skb_orphan(skb);
+		return;
+	}
 	__kfree_skb(skb);
 }
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index eab3ebde981e78a6a0a4852c3b4374c02ede1187..7f3a984ad618580ae28501c3fe3dd3fa915a66a2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -136,6 +136,10 @@ void inet_sock_destruct(struct sock *sk)
 	struct inet_sock *inet = inet_sk(sk);
 
 	__skb_queue_purge(&sk->sk_receive_queue);
+	if (sk->sk_rx_skb_cache) {
+		__kfree_skb(sk->sk_rx_skb_cache);
+		sk->sk_rx_skb_cache = NULL;
+	}
 	__skb_queue_purge(&sk->sk_error_queue);
 
 	sk_mem_reclaim(sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0e48912351616adf95c8618b851f5066d25c8aca..981db346b6f24e1dd2f66ddb4fb7a9bde6a88ead 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2578,6 +2578,10 @@ int tcp_disconnect(struct sock *sk, int flags)
 
 	tcp_clear_xmit_timers(sk);
 	__skb_queue_purge(&sk->sk_receive_queue);
+	if (sk->sk_rx_skb_cache) {
+		__kfree_skb(sk->sk_rx_skb_cache);
+		sk->sk_rx_skb_cache = NULL;
+	}
 	tp->copied_seq = tp->rcv_nxt;
 	tp->urg_data = 0;
 	tcp_write_queue_purge(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 277d71239d755d858be70663320d8de2ab23dfcc..3979939804b70b805655d94c598a6cb397e35947 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1774,6 +1774,7 @@ static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph,
 int tcp_v4_rcv(struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
+	struct sk_buff *skb_to_free;
 	int sdif = inet_sdif(skb);
 	const struct iphdr *iph;
 	const struct tcphdr *th;
@@ -1905,11 +1906,17 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	tcp_segs_in(tcp_sk(sk), skb);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
+		skb_to_free = sk->sk_rx_skb_cache;
+		sk->sk_rx_skb_cache = NULL;
 		ret = tcp_v4_do_rcv(sk, skb);
-	} else if (tcp_add_backlog(sk, skb)) {
-		goto discard_and_relse;
+	} else {
+		if (tcp_add_backlog(sk, skb))
+			goto discard_and_relse;
+		skb_to_free = NULL;
 	}
 	bh_unlock_sock(sk);
+	if (skb_to_free)
+		__kfree_skb(skb_to_free);
 
 put_and_return:
 	if (refcounted)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 983ad7a751027cb8fbaee095b90225d71fbaa698..77d723bbe05085881d3d5d4ca0cb4dbcede8d11d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1436,6 +1436,7 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
 
 static int tcp_v6_rcv(struct sk_buff *skb)
 {
+	struct sk_buff *skb_to_free;
 	int sdif = inet6_sdif(skb);
 	const struct tcphdr *th;
 	const struct ipv6hdr *hdr;
@@ -1562,12 +1563,17 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 	tcp_segs_in(tcp_sk(sk), skb);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
+		skb_to_free = sk->sk_rx_skb_cache;
+		sk->sk_rx_skb_cache = NULL;
 		ret = tcp_v6_do_rcv(sk, skb);
-	} else if (tcp_add_backlog(sk, skb)) {
-		goto discard_and_relse;
+	} else {
+		if (tcp_add_backlog(sk, skb))
+			goto discard_and_relse;
+		skb_to_free = NULL;
 	}
 	bh_unlock_sock(sk);
-
+	if (skb_to_free)
+		__kfree_skb(skb_to_free);
 put_and_return:
 	if (refcounted)
 		sock_put(sk);
-- 
2.21.0.225.g810b269d1ac-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 3/3] tcp: add one skb cache for rx
  2019-03-21 22:17 ` [PATCH net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
@ 2019-03-21 22:52   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2019-03-21 22:52 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Soheil Hassas Yeganeh, Willem de Bruijn, Florian Westphal,
	Tom Herbert



On 03/21/2019 03:17 PM, Eric Dumazet wrote:
> Often times, recvmsg() system calls and BH handling for a particular
> TCP socket are done on different cpus.

...

> Note that if rps/rfs is used, we do not enable this feature, because
> there is high chance that the same cpu is handling both the recvmsg()
> system call and the TCP rx path, but that another cpu did the skb
> allocations in the device driver right before the RPS/RFS logic.
> 
> To properly handle this case, it seems we would need to record
> on which cpu skb was allocated, and use a different channel
> to give skbs back to this cpu.

Oops a rebase went wrong and I missed the following bit,
this will be added in v2.

diff --git a/include/net/sock.h b/include/net/sock.h
index a7e936ce5a5ac935d90c47f6dd68bf9e8e47ba10..0840f4b27b91eddb205ff42c03f787e5914f755d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2439,7 +2439,7 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags)
 static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
 {
        __skb_unlink(skb, &sk->sk_receive_queue);
-       if (!sk->sk_rx_skb_cache) {
+       if (!static_branch_unlikely(&rps_needed) && !sk->sk_rx_skb_cache) {
                sk->sk_rx_skb_cache = skb;
                skb_orphan(skb);
                return;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention
  2019-03-21 22:17 [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-03-21 22:17 ` [PATCH net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
@ 2019-03-21 22:25 ` Soheil Hassas Yeganeh
  3 siblings, 0 replies; 8+ messages in thread
From: Soheil Hassas Yeganeh @ 2019-03-21 22:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Willem de Bruijn, Florian Westphal,
	Tom Herbert, Eric Dumazet

On Thu, Mar 21, 2019 at 6:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On hosts with many cpus we can observe a very verious contention
> on spinlocks used in mm slab layer.
>
> The following can happen quite often :
>
> 1) TX path
>   sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
>   ACK is received on CPU B, and consumes the skb that was in the retransmit
>   queue.
>
> 2) RX path
>   network driver alocates skb on CPU C
>   recvmsg() happens on CPU D, freeing the skb after it has been delivered
>   to user space.
>
> In both cases, we are hitting the asymetric alloc/free pattern
> for which slab has to drain alien caches. At 8 Mpps per second,
> this represents 16 Mpps alloc/free per second and has a huge penalty.
>
> In an interesting experiment, I tried to use a single kmem_cache for all the skbs
> (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
>                   kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
> qnd most of the contention disappeared, since cpus could better use
> their local slab per-cpu cache.
>
> But we can do actually better, in the following patches.
>
> TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
>      so that next sendmsg() can reuse it immediately.
>
> RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
>    so that it can be freed by the cpu feeding the incoming packets in BH.
>
> This increased the performance of small RPC benchmark by about 10 % on a host
> with 112 hyperthreads.
>
> Eric Dumazet (3):
>   net: convert rps_needed and rfs_needed to new static branch api
>   tcp: add one skb cache for tx
>   tcp: add one skb cache for rx

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

This is a really impressive improvement! Thank you, Eric!

>  include/linux/netdevice.h  |  4 ++--
>  include/net/sock.h         | 13 +++++++++-
>  net/core/dev.c             | 10 ++++----
>  net/core/net-sysfs.c       |  4 ++--
>  net/core/sysctl_net_core.c |  8 +++----
>  net/ipv4/af_inet.c         |  4 ++++
>  net/ipv4/tcp.c             | 49 +++++++++++++++++---------------------
>  net/ipv4/tcp_ipv4.c        | 11 +++++++--
>  net/ipv6/tcp_ipv6.c        | 12 +++++++---
>  9 files changed, 69 insertions(+), 46 deletions(-)
>
> --
> 2.21.0.225.g810b269d1ac-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-03-22 19:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-21 22:17 [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
2019-03-21 22:17 ` [PATCH net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
2019-03-22 16:01   ` kbuild test robot
2019-03-21 22:17 ` [PATCH net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
2019-03-22 19:07   ` kbuild test robot
2019-03-21 22:17 ` [PATCH net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
2019-03-21 22:52   ` Eric Dumazet
2019-03-21 22:25 ` [PATCH net-next 0/3] tcp: add rx/tx cache to reduce lock contention Soheil Hassas Yeganeh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).