netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net] tcp: Fix performance regression for request-response workloads
@ 2022-09-07 12:25 Alexandra Winter
  2022-09-07 16:06 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandra Winter @ 2022-09-07 12:25 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, Jakub Kicinski
  Cc: Yuchung Cheng, Soheil Hassas Yeganeh, Willem de Bruijn,
	Paolo Abeni, Mat Martineau, Saeed Mahameed, Niklas Schnelle,
	Christian Borntraeger, netdev, linux-s390, Heiko Carstens,
	Alexandra Winter

Since linear payload was removed even for single small messages,
an additional page is required and we are measuring performance impact.

3613b3dbd1ad ("tcp: prepare skbs for better sack shifting")
explicitely allowed "payload in skb->head for first skb put in the queue,
to not impact RPC workloads."
472c2e07eef0 ("tcp: add one skb cache for tx")
made that obsolete and removed it.
When 
d8b81175e412 ("tcp: remove sk_{tr}x_skb_cache")
reverted it, this piece was not reverted and not added back in.

When running uperf with a request-response pattern with 1k payload
and 250 connections parallel, we measure 13% difference in throughput
for our PCI based network interfaces since 472c2e07eef0.
(our IO MMU is sensitive to the number of mapped pages)

Could you please consider allowing linear payload for the first
skb in queue again? A patch proposal is appended below.

Kind regards
Alexandra

---------------------------------------------------------------

tcp: allow linear skb payload for first in queue

Allow payload in skb->head for first skb in the queue,
RPC workloads will benefit.

Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
---
 net/ipv4/tcp.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e5011c136fdb..f7cbccd41d85 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1154,6 +1154,30 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 }
 EXPORT_SYMBOL(tcp_sendpage);
 
+/* Do not bother using a page frag for very small frames.
+ * But use this heuristic only for the first skb in write queue.
+ *
+ * Having no payload in skb->head allows better SACK shifting
+ * in tcp_shift_skb_data(), reducing sack/rack overhead, because
+ * write queue has less skbs.
+ * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
+ * This also speeds up tso_fragment(), since it won't fallback
+ * to tcp_fragment().
+ */
+static int linear_payload_sz(bool first_skb)
+{
+		if (first_skb)
+			return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+		return 0;
+}
+
+static int select_size(bool first_skb, bool zc)
+{
+		if (zc)
+			return 0;
+		return linear_payload_sz(first_skb);
+}
+
 void tcp_free_fastopen_req(struct tcp_sock *tp)
 {
 	if (tp->fastopen_req) {
@@ -1311,6 +1335,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 		if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
 			bool first_skb;
+			int linear;
 
 new_segment:
 			if (!sk_stream_memory_free(sk))
@@ -1322,7 +1347,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 					goto restart;
 			}
 			first_skb = tcp_rtx_and_write_queues_empty(sk);
-			skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation,
+			linear = select_size(first_skb, zc);
+			skb = tcp_stream_alloc_skb(sk, linear,
+						   sk->sk_allocation,
 						   first_skb);
 			if (!skb)
 				goto wait_for_space;
@@ -1344,7 +1371,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 		if (copy > msg_data_left(msg))
 			copy = msg_data_left(msg);
 
-		if (!zc) {
+		/* Where to copy to? */
+		if (skb_availroom(skb) > 0 && !zc) {
+			/* We have some space in skb head. Superb! */
+			copy = min_t(int, copy, skb_availroom(skb));
+			err = skb_add_data_nocache(sk, skb, &msg->msg_iter,
+						   copy);
+			if (err)
+				goto do_error;
+		} else if (!zc) {
 			bool merge = true;
 			int i = skb_shinfo(skb)->nr_frags;
 			struct page_frag *pfrag = sk_page_frag(sk);
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-27  9:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-07 12:25 [RFC net] tcp: Fix performance regression for request-response workloads Alexandra Winter
2022-09-07 16:06 ` Eric Dumazet
2022-09-08  9:40   ` Christian Borntraeger
2022-09-08 12:41     ` Eric Dumazet
2022-09-26 10:06       ` [RFC net] net/mlx5: " Alexandra Winter
2022-09-30 23:37         ` Saeed Mahameed
2022-12-29  8:27           ` Alexandra Winter
2023-04-27  9:44             ` Alexandra Winter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).