* [PATCH net-next 1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy.
@ 2020-02-14 23:30 Arjun Roy
2020-02-14 23:30 ` [PATCH net-next 2/2] tcp-zerocopy: Return sk_err (if set) " Arjun Roy
2020-02-17 3:25 ` [PATCH net-next 1/2] tcp-zerocopy: Return inq " David Miller
0 siblings, 2 replies; 4+ messages in thread
From: Arjun Roy @ 2020-02-14 23:30 UTC (permalink / raw)
To: davem, netdev; +Cc: arjunroy, soheil, edumazet
From: Arjun Roy <arjunroy@google.com>
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.
For applications using edge-triggered epoll, returning inq along with
the result of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
since normally we would need to perform a recvmsg() call for every
successful small RPC read via TCP receive zerocopy, returning inq can
reduce the number of system calls performed by approximately half.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/uapi/linux/tcp.h | 1 +
net/ipv4/tcp.c | 15 ++++++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 74af1f759cee..19700101cbba 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -343,5 +343,6 @@ struct tcp_zerocopy_receive {
__u64 address; /* in: address of mapping */
__u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */
+ __u32 inq; /* out: amount of bytes in read queue */
};
#endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f09fbc85b108..947be81b35c5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3658,13 +3658,26 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
if (get_user(len, optlen))
return -EFAULT;
- if (len != sizeof(zc))
+ if (len < offsetofend(struct tcp_zerocopy_receive, length))
return -EINVAL;
+ if (len > sizeof(zc))
+ len = sizeof(zc);
if (copy_from_user(&zc, optval, len))
return -EFAULT;
lock_sock(sk);
err = tcp_zerocopy_receive(sk, &zc);
release_sock(sk);
+ switch (len) {
+ case sizeof(zc):
+ case offsetofend(struct tcp_zerocopy_receive, inq):
+ goto zerocopy_rcv_inq;
+ case offsetofend(struct tcp_zerocopy_receive, length):
+ default:
+ goto zerocopy_rcv_out;
+ }
+zerocopy_rcv_inq:
+ zc.inq = tcp_inq_hint(sk);
+zerocopy_rcv_out:
if (!err && copy_to_user(optval, &zc, len))
err = -EFAULT;
return err;
--
2.25.0.265.gbab2e86ba0-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH net-next 2/2] tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.
2020-02-14 23:30 [PATCH net-next 1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy Arjun Roy
@ 2020-02-14 23:30 ` Arjun Roy
2020-02-17 3:25 ` David Miller
2020-02-17 3:25 ` [PATCH net-next 1/2] tcp-zerocopy: Return inq " David Miller
1 sibling, 1 reply; 4+ messages in thread
From: Arjun Roy @ 2020-02-14 23:30 UTC (permalink / raw)
To: davem, netdev; +Cc: arjunroy, soheil, edumazet
From: Arjun Roy <arjunroy@google.com>
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.
For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.
Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.
Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/uapi/linux/tcp.h | 1 +
net/ipv4/tcp.c | 8 +++++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 19700101cbba..e1706a7c9d88 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -344,5 +344,6 @@ struct tcp_zerocopy_receive {
__u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */
__u32 inq; /* out: amount of bytes in read queue */
+ __s32 err; /* out: socket error */
};
#endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 947be81b35c5..0efac228bbdb 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3667,14 +3667,20 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
lock_sock(sk);
err = tcp_zerocopy_receive(sk, &zc);
release_sock(sk);
+ if (len == sizeof(zc))
+ goto zerocopy_rcv_sk_err;
switch (len) {
- case sizeof(zc):
+ case offsetofend(struct tcp_zerocopy_receive, err):
+ goto zerocopy_rcv_sk_err;
case offsetofend(struct tcp_zerocopy_receive, inq):
goto zerocopy_rcv_inq;
case offsetofend(struct tcp_zerocopy_receive, length):
default:
goto zerocopy_rcv_out;
}
+zerocopy_rcv_sk_err:
+ if (!err)
+ zc.err = sock_error(sk);
zerocopy_rcv_inq:
zc.inq = tcp_inq_hint(sk);
zerocopy_rcv_out:
--
2.25.0.265.gbab2e86ba0-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net-next 1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy.
2020-02-14 23:30 [PATCH net-next 1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy Arjun Roy
2020-02-14 23:30 ` [PATCH net-next 2/2] tcp-zerocopy: Return sk_err (if set) " Arjun Roy
@ 2020-02-17 3:25 ` David Miller
1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2020-02-17 3:25 UTC (permalink / raw)
To: arjunroy.kdev; +Cc: netdev, arjunroy, soheil, edumazet
From: Arjun Roy <arjunroy.kdev@gmail.com>
Date: Fri, 14 Feb 2020 15:30:49 -0800
> From: Arjun Roy <arjunroy@google.com>
>
> This patchset is intended to reduce the number of extra system calls
> imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
> this patchset has demonstrated a system call reduction of about 30%
> when coupled with userspace changes.
>
> For applications using edge-triggered epoll, returning inq along with
> the result of tcp receive zerocopy could remove the need to call
> recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
> since normally we would need to perform a recvmsg() call for every
> successful small RPC read via TCP receive zerocopy, returning inq can
> reduce the number of system calls performed by approximately half.
>
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Applied.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net-next 2/2] tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.
2020-02-14 23:30 ` [PATCH net-next 2/2] tcp-zerocopy: Return sk_err (if set) " Arjun Roy
@ 2020-02-17 3:25 ` David Miller
0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2020-02-17 3:25 UTC (permalink / raw)
To: arjunroy.kdev; +Cc: netdev, arjunroy, soheil, edumazet
From: Arjun Roy <arjunroy.kdev@gmail.com>
Date: Fri, 14 Feb 2020 15:30:50 -0800
> From: Arjun Roy <arjunroy@google.com>
>
> This patchset is intended to reduce the number of extra system calls
> imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
> this patchset has demonstrated a system call reduction of about 30%
> when coupled with userspace changes.
>
> For applications using epoll, returning sk_err along with the result
> of tcp receive zerocopy could remove the need to call
> recvmsg()=-EAGAIN after a spurious wakeup.
>
> Consider a multi-threaded application using epoll. A thread may awaken
> with EPOLLIN but another thread may already be reading. The
> spuriously-awoken thread does not necessarily know that another thread
> 'won'; rather, it may be possible that it was woken up due to the
> presence of an error if there is no data. A zerocopy read receiving 0
> bytes thus would need to be followed up by recvmsg to be sure.
>
> Instead, we return sk_err directly with zerocopy, so the application
> can avoid this extra system call.
>
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Applied.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-02-17 3:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-14 23:30 [PATCH net-next 1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy Arjun Roy
2020-02-14 23:30 ` [PATCH net-next 2/2] tcp-zerocopy: Return sk_err (if set) " Arjun Roy
2020-02-17 3:25 ` David Miller
2020-02-17 3:25 ` [PATCH net-next 1/2] tcp-zerocopy: Return inq " David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).