From: Arjun Roy <arjunroy.kdev@gmail.com>
To: davem@davemloft.net, netdev@vger.kernel.org
Cc: arjunroy@google.com, edumazet@google.com, soheil@google.com
Subject: [net-next 6/8] tcp: Introduce short-circuit small reads for recv zerocopy.
Date: Thu, 12 Nov 2020 11:02:03 -0800 [thread overview]
Message-ID: <20201112190205.633640-7-arjunroy.kdev@gmail.com> (raw)
In-Reply-To: <20201112190205.633640-1-arjunroy.kdev@gmail.com>
From: Arjun Roy <arjunroy@google.com>
Sometimes, we may call tcp receive zerocopy when inq is 0,
or inq < PAGE_SIZE, or inq is generally small enough that
it is cheaper to copy rather than remap pages.
In these cases, we may want to either return early (inq=0) or
attempt to use the provided copy buffer to simply copy
the received data.
This allows us to save both system call overhead and
the latency of acquiring mmap_sem in read mode for cases where
it would be useless to do so.
This patch enables this behaviour by:
1. Returning quickly if inq is 0.
2. Attempting to perform a regular copy if a hybrid copybuffer is
provided and it is large enough to absorb all available bytes.
3. Return quickly if no such buffer was provided and there are less
than PAGE_SIZE bytes available.
For small RPC ping-pong workloads, normally we would have
1 getsockopt(), 1 recvmsg() and 1 sendmsg() call per RPC. With this
change, we remove the recvmsg() call entirely, reducing the syscall
overhead by about 33%. In testing with small (hundreds of bytes)
RPC traffic, this yields a syscall reduction of about 33% and
an efficiency gain of about 3-5% when defined as QPS/CPU Util.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
net/ipv4/tcp.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 38f8e03f1182..ca45a875147e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1785,6 +1785,35 @@ static int find_next_mappable_frag(const skb_frag_t *frag,
return offset;
}
+static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
+ int nonblock, int flags,
+ struct scm_timestamping_internal *tss,
+ int *cmsg_flags);
+static int receive_fallback_to_copy(struct sock *sk,
+ struct tcp_zerocopy_receive *zc, int inq)
+{
+ struct scm_timestamping_internal tss_unused;
+ int err, cmsg_flags_unused;
+ struct msghdr msg = {};
+ struct iovec iov;
+
+ zc->length = 0;
+ zc->recv_skip_hint = 0;
+
+ err = import_single_range(READ, (void __user *)zc->copybuf_address,
+ inq, &iov, &msg.msg_iter);
+ if (err)
+ return err;
+
+ err = tcp_recvmsg_locked(sk, &msg, inq, /*nonblock=*/1, /*flags=*/0,
+ &tss_unused, &cmsg_flags_unused);
+ if (err < 0)
+ return err;
+
+ zc->copybuf_len = err;
+ return 0;
+}
+
static int tcp_copy_straggler_data(struct tcp_zerocopy_receive *zc,
struct sk_buff *skb, u32 copylen,
u32 *offset, u32 *seq)
@@ -1885,6 +1914,9 @@ static int tcp_zerocopy_receive(struct sock *sk,
sock_rps_record_flow(sk);
+ if (inq && inq <= copybuf_len)
+ return receive_fallback_to_copy(sk, zc, inq);
+
if (inq < PAGE_SIZE) {
zc->length = 0;
zc->recv_skip_hint = inq;
--
2.29.2.222.g5d2a92d10f8-goog
next prev parent reply other threads:[~2020-11-12 19:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-12 19:01 [net-next 0/8] Perf. optimizations for TCP Recv. Zerocopy Arjun Roy
2020-11-12 19:01 ` [net-next 1/8] tcp: Copy straggler unaligned data for TCP Rx. zerocopy Arjun Roy
2020-11-13 2:36 ` kernel test robot
2020-11-13 2:36 ` kernel test robot
2020-11-12 19:01 ` [net-next 2/8] tcp: Introduce tcp_recvmsg_locked() Arjun Roy
2020-11-12 19:02 ` [net-next 3/8] tcp: Refactor skb frag fast-forward op for recv zerocopy Arjun Roy
2020-11-12 19:02 ` [net-next 4/8] tcp: Refactor frag-is-remappable test " Arjun Roy
2020-11-12 19:02 ` [net-next 5/8] tcp: Fast return if inq < PAGE_SIZE " Arjun Roy
2020-11-12 19:02 ` Arjun Roy [this message]
2020-11-12 19:02 ` [net-next 7/8] tcp: Set zerocopy hint when data is copied Arjun Roy
2020-11-12 19:02 ` [net-next 8/8] tcp: Defer vm zap unless actually needed for recv zerocopy Arjun Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201112190205.633640-7-arjunroy.kdev@gmail.com \
--to=arjunroy.kdev@gmail.com \
--cc=arjunroy@google.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=netdev@vger.kernel.org \
--cc=soheil@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.