netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cathy Zhang <cathy.zhang@intel.com>
To: edumazet@google.com, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com
Cc: jesse.brandeburg@intel.com, suresh.srinivas@intel.com,
	tim.c.chen@intel.com, lizhen.you@intel.com,
	cathy.zhang@intel.com, eric.dumazet@gmail.com,
	netdev@vger.kernel.org
Subject: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size
Date: Sun,  7 May 2023 19:08:00 -0700	[thread overview]
Message-ID: <20230508020801.10702-2-cathy.zhang@intel.com> (raw)
In-Reply-To: <20230508020801.10702-1-cathy.zhang@intel.com>

Before commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as
possible"), each TCP can forward allocate up to 2 MB of memory and
tcp_memory_allocated might hit tcp memory limitation quite soon. To
reduce the memory pressure, that commit keeps sk->sk_forward_alloc as
small as possible, which will be less than 1 page size if SO_RESERVE_MEM
is not specified.

However, with commit 4890b686f408 ("net: keep sk->sk_forward_alloc as
small as possible"), memcg charge hot paths are observed while system is
stressed with a large amount of connections. That is because
sk->sk_forward_alloc is too small and it's always less than
sk->truesize, network handlers like tcp_rcv_established() should jump to
slow path more frequently to increase sk->sk_forward_alloc. Each memory
allocation will trigger memcg charge, then perf top shows the following
contention paths on the busy system.

    16.77%  [kernel]            [k] page_counter_try_charge
    16.56%  [kernel]            [k] page_counter_cancel
    15.65%  [kernel]            [k] try_charge_memcg

In order to avoid the memcg overhead and performance penalty,
sk->sk_forward_alloc should be kept with a proper size instead of as
small as possible. Keep memory up to 64KB from reclaims when uncharging
sk_buff memory, which is closer to the maximum size of sk_buff. It will
help reduce the frequency of allocating memory during TCP connection.
The original reclaim threshold for reserved memory per-socket is 2MB, so
the extraneous memory reserved now is about 32 times less than before
commit 4890b686f408 ("net: keep sk->sk_forward_alloc as small as
possible").

Run memcached with memtier_benchamrk to verify the optimization fix. 8
server-client pairs are created with bridge network on localhost, server
and client of the same pair share 28 logical CPUs.

Results (Average for 5 run)
RPS (with/without patch)	+2.07x

Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible")

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Lizhen You <lizhen.you@intel.com>
Tested-by: Long Tao <tao.long@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Suresh Srinivas <suresh.srinivas@intel.com>
---
 include/net/sock.h | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8b7ed7167243..6d2960479a80 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1657,12 +1657,33 @@ static inline void sk_mem_charge(struct sock *sk, int size)
 	sk->sk_forward_alloc -= size;
 }
 
+/* The following macro controls memory reclaiming in sk_mem_uncharge().
+ */
+#define SK_RECLAIM_THRESHOLD	(1 << 16)
 static inline void sk_mem_uncharge(struct sock *sk, int size)
 {
+	int reclaimable;
+
 	if (!sk_has_account(sk))
 		return;
 	sk->sk_forward_alloc += size;
-	sk_mem_reclaim(sk);
+
+	reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk);
+
+	/* Reclaim memory to reduce memory pressure when multiple sockets
+	 * run in parallel. However, if we reclaim all pages and keep
+	 * sk->sk_forward_alloc as small as possible, it will cause
+	 * paths like tcp_rcv_established() going to the slow path with
+	 * much higher rate for forwarded memory expansion, which leads
+	 * to contention hot points and performance drop.
+	 *
+	 * In order to avoid the above issue, it's necessary to keep
+	 * sk->sk_forward_alloc with a proper size while doing reclaim.
+	 */
+	if (reclaimable > SK_RECLAIM_THRESHOLD) {
+		reclaimable -= SK_RECLAIM_THRESHOLD;
+		__sk_mem_reclaim(sk, reclaimable);
+	}
 }
 
 /*
-- 
2.34.1


  reply	other threads:[~2023-05-08  2:08 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08  2:07 [PATCH net-next 0/2] net: fix memcg overhead caused by sk->sk_forward_alloc size Cathy Zhang
2023-05-08  2:08 ` Cathy Zhang [this message]
2023-05-09  2:02   ` [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size Jakub Kicinski
2023-05-09  6:52     ` Zhang, Cathy
2023-05-09  2:06   ` Jakub Kicinski
2023-05-09  6:57     ` Zhang, Cathy
2023-05-09  8:43       ` Simon Horman
2023-05-09  9:36         ` Zhang, Cathy
2023-05-09  9:45           ` Simon Horman
2023-05-09 10:41             ` Zhang, Cathy
2023-05-09  8:48   ` Eric Dumazet
2023-05-09  9:33     ` Zhang, Cathy
2023-05-09  9:51   ` Paolo Abeni
2023-05-09 10:39     ` Zhang, Cathy
2023-05-09 11:01       ` Zhang, Cathy
2023-05-09 11:58         ` Eric Dumazet
2023-05-09 15:07           ` Zhang, Cathy
2023-05-09 15:43             ` Eric Dumazet
2023-05-09 16:09               ` Shakeel Butt
2023-05-10  6:54                 ` Zhang, Cathy
2023-05-10 11:11                 ` Zhang, Cathy
2023-05-10 11:24                   ` Eric Dumazet
2023-05-10 13:52                     ` Zhang, Cathy
2023-05-10 15:07                       ` Eric Dumazet
2023-05-10 16:09                         ` Zhang, Cathy
2023-05-10 19:00                           ` Shakeel Butt
2023-05-11  0:53                             ` Zhang, Cathy
2023-05-11  6:59                               ` Zhang, Cathy
2023-05-11  7:50                                 ` Eric Dumazet
2023-05-11  9:26                                   ` Zhang, Cathy
2023-05-11 16:23                                     ` Shakeel Butt
2023-05-11 16:35                                       ` Eric Dumazet
2023-05-11 17:10                                         ` Shakeel Butt
2023-05-11 21:18                                     ` Shakeel Butt
2023-05-12  2:38                                       ` Zhang, Cathy
2023-05-12  3:23                                         ` Zhang, Cathy
2023-05-12  5:06                                           ` Shakeel Butt
2023-05-12  5:51                                             ` Zhang, Cathy
2023-05-12 17:17                                               ` Shakeel Butt
2023-05-15  3:46                                                 ` Zhang, Cathy
2023-05-15  4:13                                                   ` Shakeel Butt
2023-05-15  6:27                                                     ` Zhang, Cathy
2023-05-15 19:50                                                       ` Shakeel Butt
2023-05-16  5:46                                                         ` Oliver Sang
2023-05-17 16:24                                                           ` Shakeel Butt
2023-05-17 16:33                                                             ` Eric Dumazet
2023-05-17 17:04                                                               ` Shakeel Butt
2023-07-28  2:26                                                                 ` Zhang, Cathy
2023-05-19  2:53                                                             ` Oliver Sang
2023-05-31  8:46                                                             ` Oliver Sang
2023-05-31 19:45                                                               ` Shakeel Butt
2023-06-01  2:48                                                                 ` Zhang, Cathy
2023-06-01  3:21                                                                   ` Eric Dumazet
2023-06-01  2:46                                                               ` Zhang, Cathy
2023-05-10  7:43               ` Zhang, Cathy
2023-05-09 17:58             ` Shakeel Butt
2023-05-10  7:21               ` Zhang, Cathy
2023-05-09 17:19   ` Shakeel Butt
2023-05-09 18:04     ` Chen, Tim C
2023-05-09 18:17       ` Shakeel Butt
2023-05-10  7:03         ` Zhang, Cathy
2023-05-10  7:32           ` Zhang, Cathy
2023-05-08  2:08 ` [PATCH net-next 2/2] net: Add sysctl_reclaim_threshold Cathy Zhang
2023-05-09  2:05   ` Jakub Kicinski
2023-05-09  6:55     ` Zhang, Cathy
2023-05-09 13:36   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230508020801.10702-2-cathy.zhang@intel.com \
    --to=cathy.zhang@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=lizhen.you@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=suresh.srinivas@intel.com \
    --cc=tim.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).