All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied
@ 2026-05-13 12:58 Chuck Lever
  2026-05-15  1:30 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2026-05-13 12:58 UTC (permalink / raw)
  To: john.fastabend, kuba, sd, davem, edumazet, pabeni
  Cc: horms, netdev, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

The sk_err check in tls_rx_rec_wait() consumes the error via
sock_error(), which clears sk_err atomically. When the caller
(tls_sw_recvmsg, tls_sw_splice_read, or tls_sw_read_sock) already
has bytes copied to userspace, it returns those bytes and discards
the error from this call. sk_err is now zero on the socket, so the
next read syscall observes only RCV_SHUTDOWN and reports a clean
EOF instead of the actual error (typically -ECONNRESET).

The race is reachable when tls_read_flush_backlog()'s periodic
sk_flush_backlog() triggers tcp_reset() in the middle of a
multi-record read.

Pass a has_copied flag to tls_rx_rec_wait(). When has_copied is
false, consume sk_err via sock_error() as before. When has_copied
is true, report the error from READ_ONCE() but leave sk_err set:
the caller returns the byte count and discards the err from this
call, and the next read syscall surfaces the preserved sk_err. This
mirrors the tcp_recvmsg() preserve-and-surface pattern.

The decrypt-abort path is unaffected: tls_err_abort() raises
sk_err to EBADMSG after tls_rx_rec_wait() returns, and nothing
on the caller's return path consumes it, so the EBADMSG surfaces
on the next read.

tls_sw_splice_read() passes has_copied=false: it processes
one record per call, so no bytes have been copied within the
function when tls_rx_rec_wait() runs. A reset that arrives
between iterations of splice_direct_to_actor() (the sendfile()
path) is still consumed by sock_error() in the later call, and the
outer loop returns the prior iterations' byte count and drops the
error. tcp_splice_read() exhibits the same pattern at the iteration
boundary; addressing it belongs at the splice_direct_to_actor()
layer and is out of scope here.

Fixes: c46b01839f7a ("tls: rx: periodically flush socket backlog")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls_sw.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2590e855f6a5..c4cc4e357848 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1356,9 +1356,14 @@ void tls_sw_splice_eof(struct socket *sock)
 	mutex_unlock(&tls_ctx->tx_lock);
 }
 
+/* When has_copied is true the caller has already moved bytes to
+ * userspace. Report sk_err but leave it set so the next read
+ * surfaces it instead of a spurious EOF, otherwise sk_err is
+ * consumed via sock_error().
+ */
 static int
 tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
-		bool released)
+		bool released, bool has_copied)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -1376,8 +1381,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 		if (!sk_psock_queue_empty(psock))
 			return 0;
 
-		if (sk->sk_err)
+		if (sk->sk_err) {
+			if (has_copied)
+				return -READ_ONCE(sk->sk_err);
 			return sock_error(sk);
+		}
 
 		if (ret < 0)
 			return ret;
@@ -1413,7 +1421,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 	}
 
 	if (unlikely(!tls_strp_msg_load(&ctx->strp, released)))
-		return tls_rx_rec_wait(sk, psock, nonblock, false);
+		return tls_rx_rec_wait(sk, psock, nonblock, false, has_copied);
 
 	return 1;
 }
@@ -2100,7 +2108,7 @@ int tls_sw_recvmsg(struct sock *sk,
 		int to_decrypt, chunk;
 
 		err = tls_rx_rec_wait(sk, psock, flags & MSG_DONTWAIT,
-				      released);
+				      released, !!(decrypted + copied));
 		if (err <= 0) {
 			if (psock) {
 				chunk = sk_msg_recvmsg(sk, psock, msg, len,
@@ -2287,7 +2295,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 		struct tls_decrypt_arg darg;
 
 		err = tls_rx_rec_wait(sk, NULL, flags & SPLICE_F_NONBLOCK,
-				      true);
+				      true, false);
 		if (err <= 0)
 			goto splice_read_end;
 
@@ -2373,7 +2381,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 		} else {
 			struct tls_decrypt_arg darg;
 
-			err = tls_rx_rec_wait(sk, NULL, true, released);
+			err = tls_rx_rec_wait(sk, NULL, true, released, !!copied);
 			if (err <= 0)
 				goto read_sock_end;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied
  2026-05-14 20:56 [PATCH 0/3] three lockd fixes Chuck Lever
@ 2026-05-14 20:56 ` Chuck Lever
  2026-05-14 20:57   ` Chuck Lever
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2026-05-14 20:56 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever, Jakub Kicinski

From: Chuck Lever <chuck.lever@oracle.com>

The sk_err check in tls_rx_rec_wait() consumes the error via
sock_error(), which clears sk_err atomically. When the caller
(tls_sw_recvmsg, tls_sw_splice_read, or tls_sw_read_sock) already
has bytes copied to userspace, it returns those bytes and discards
the error from this call. sk_err is now zero on the socket, so the
next read syscall observes only RCV_SHUTDOWN and reports a clean
EOF instead of the actual error (typically -ECONNRESET).

The race is reachable when tls_read_flush_backlog()'s periodic
sk_flush_backlog() triggers tcp_reset() in the middle of a
multi-record read.

Pass a has_copied flag to tls_rx_rec_wait(). When has_copied is
false, consume sk_err via sock_error() as before. When has_copied
is true, report the error from READ_ONCE() but leave sk_err set:
the caller returns the byte count and discards the err from this
call, and the next read syscall surfaces the preserved sk_err. This
mirrors the tcp_recvmsg() preserve-and-surface pattern.

The decrypt-abort path is unaffected: tls_err_abort() raises
sk_err to EBADMSG after tls_rx_rec_wait() returns, and nothing
on the caller's return path consumes it, so the EBADMSG surfaces
on the next read.

tls_sw_splice_read() passes has_copied=false: it processes
one record per call, so no bytes have been copied within the
function when tls_rx_rec_wait() runs. A reset that arrives
between iterations of splice_direct_to_actor() (the sendfile()
path) is still consumed by sock_error() in the later call, and the
outer loop returns the prior iterations' byte count and drops the
error. tcp_splice_read() exhibits the same pattern at the iteration
boundary; addressing it belongs at the splice_direct_to_actor()
layer and is out of scope here.

Fixes: c46b01839f7a ("tls: rx: periodically flush socket backlog")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls_sw.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2590e855f6a5..c4cc4e357848 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1356,9 +1356,14 @@ void tls_sw_splice_eof(struct socket *sock)
 	mutex_unlock(&tls_ctx->tx_lock);
 }
 
+/* When has_copied is true the caller has already moved bytes to
+ * userspace. Report sk_err but leave it set so the next read
+ * surfaces it instead of a spurious EOF, otherwise sk_err is
+ * consumed via sock_error().
+ */
 static int
 tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
-		bool released)
+		bool released, bool has_copied)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -1376,8 +1381,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 		if (!sk_psock_queue_empty(psock))
 			return 0;
 
-		if (sk->sk_err)
+		if (sk->sk_err) {
+			if (has_copied)
+				return -READ_ONCE(sk->sk_err);
 			return sock_error(sk);
+		}
 
 		if (ret < 0)
 			return ret;
@@ -1413,7 +1421,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 	}
 
 	if (unlikely(!tls_strp_msg_load(&ctx->strp, released)))
-		return tls_rx_rec_wait(sk, psock, nonblock, false);
+		return tls_rx_rec_wait(sk, psock, nonblock, false, has_copied);
 
 	return 1;
 }
@@ -2100,7 +2108,7 @@ int tls_sw_recvmsg(struct sock *sk,
 		int to_decrypt, chunk;
 
 		err = tls_rx_rec_wait(sk, psock, flags & MSG_DONTWAIT,
-				      released);
+				      released, !!(decrypted + copied));
 		if (err <= 0) {
 			if (psock) {
 				chunk = sk_msg_recvmsg(sk, psock, msg, len,
@@ -2287,7 +2295,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 		struct tls_decrypt_arg darg;
 
 		err = tls_rx_rec_wait(sk, NULL, flags & SPLICE_F_NONBLOCK,
-				      true);
+				      true, false);
 		if (err <= 0)
 			goto splice_read_end;
 
@@ -2373,7 +2381,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 		} else {
 			struct tls_decrypt_arg darg;
 
-			err = tls_rx_rec_wait(sk, NULL, true, released);
+			err = tls_rx_rec_wait(sk, NULL, true, released, !!copied);
 			if (err <= 0)
 				goto read_sock_end;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied
  2026-05-14 20:56 ` [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied Chuck Lever
@ 2026-05-14 20:57   ` Chuck Lever
  0 siblings, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2026-05-14 20:57 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo,
	Tom Talpey
  Cc: linux-nfs, Jakub Kicinski

On 5/14/26 4:56 PM, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The sk_err check in tls_rx_rec_wait() consumes the error via
> sock_error(), which clears sk_err atomically. When the caller
> (tls_sw_recvmsg, tls_sw_splice_read, or tls_sw_read_sock) already
> has bytes copied to userspace, it returns those bytes and discards
> the error from this call. sk_err is now zero on the socket, so the
> next read syscall observes only RCV_SHUTDOWN and reports a clean
> EOF instead of the actual error (typically -ECONNRESET).
> 
> The race is reachable when tls_read_flush_backlog()'s periodic
> sk_flush_backlog() triggers tcp_reset() in the middle of a
> multi-record read.
> 
> Pass a has_copied flag to tls_rx_rec_wait(). When has_copied is
> false, consume sk_err via sock_error() as before. When has_copied
> is true, report the error from READ_ONCE() but leave sk_err set:
> the caller returns the byte count and discards the err from this
> call, and the next read syscall surfaces the preserved sk_err. This
> mirrors the tcp_recvmsg() preserve-and-surface pattern.
> 
> The decrypt-abort path is unaffected: tls_err_abort() raises
> sk_err to EBADMSG after tls_rx_rec_wait() returns, and nothing
> on the caller's return path consumes it, so the EBADMSG surfaces
> on the next read.
> 
> tls_sw_splice_read() passes has_copied=false: it processes
> one record per call, so no bytes have been copied within the
> function when tls_rx_rec_wait() runs. A reset that arrives
> between iterations of splice_direct_to_actor() (the sendfile()
> path) is still consumed by sock_error() in the later call, and the
> outer loop returns the prior iterations' byte count and drops the
> error. tcp_splice_read() exhibits the same pattern at the iteration
> boundary; addressing it belongs at the splice_direct_to_actor()
> layer and is out of scope here.
> 
> Fixes: c46b01839f7a ("tls: rx: periodically flush socket backlog")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/tls/tls_sw.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 2590e855f6a5..c4cc4e357848 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1356,9 +1356,14 @@ void tls_sw_splice_eof(struct socket *sock)
>  	mutex_unlock(&tls_ctx->tx_lock);
>  }
>  
> +/* When has_copied is true the caller has already moved bytes to
> + * userspace. Report sk_err but leave it set so the next read
> + * surfaces it instead of a spurious EOF, otherwise sk_err is
> + * consumed via sock_error().
> + */
>  static int
>  tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
> -		bool released)
> +		bool released, bool has_copied)
>  {
>  	struct tls_context *tls_ctx = tls_get_ctx(sk);
>  	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> @@ -1376,8 +1381,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
>  		if (!sk_psock_queue_empty(psock))
>  			return 0;
>  
> -		if (sk->sk_err)
> +		if (sk->sk_err) {
> +			if (has_copied)
> +				return -READ_ONCE(sk->sk_err);
>  			return sock_error(sk);
> +		}
>  
>  		if (ret < 0)
>  			return ret;
> @@ -1413,7 +1421,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
>  	}
>  
>  	if (unlikely(!tls_strp_msg_load(&ctx->strp, released)))
> -		return tls_rx_rec_wait(sk, psock, nonblock, false);
> +		return tls_rx_rec_wait(sk, psock, nonblock, false, has_copied);
>  
>  	return 1;
>  }
> @@ -2100,7 +2108,7 @@ int tls_sw_recvmsg(struct sock *sk,
>  		int to_decrypt, chunk;
>  
>  		err = tls_rx_rec_wait(sk, psock, flags & MSG_DONTWAIT,
> -				      released);
> +				      released, !!(decrypted + copied));
>  		if (err <= 0) {
>  			if (psock) {
>  				chunk = sk_msg_recvmsg(sk, psock, msg, len,
> @@ -2287,7 +2295,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
>  		struct tls_decrypt_arg darg;
>  
>  		err = tls_rx_rec_wait(sk, NULL, flags & SPLICE_F_NONBLOCK,
> -				      true);
> +				      true, false);
>  		if (err <= 0)
>  			goto splice_read_end;
>  
> @@ -2373,7 +2381,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>  		} else {
>  			struct tls_decrypt_arg darg;
>  
> -			err = tls_rx_rec_wait(sk, NULL, true, released);
> +			err = tls_rx_rec_wait(sk, NULL, true, released, !!copied);
>  			if (err <= 0)
>  				goto read_sock_end;
>  

Ignore this email. It looks like something left over from a previous
posting.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied
  2026-05-13 12:58 [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied Chuck Lever
@ 2026-05-15  1:30 ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-15  1:30 UTC (permalink / raw)
  To: Chuck Lever
  Cc: john.fastabend, kuba, sd, davem, edumazet, pabeni, horms, netdev,
	chuck.lever

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 13 May 2026 08:58:25 -0400 you wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The sk_err check in tls_rx_rec_wait() consumes the error via
> sock_error(), which clears sk_err atomically. When the caller
> (tls_sw_recvmsg, tls_sw_splice_read, or tls_sw_read_sock) already
> has bytes copied to userspace, it returns those bytes and discards
> the error from this call. sk_err is now zero on the socket, so the
> next read syscall observes only RCV_SHUTDOWN and reports a clean
> EOF instead of the actual error (typically -ECONNRESET).
> 
> [...]

Here is the summary with links:
  - [net] tls: Preserve sk_err across recvmsg() when data has been copied
    https://git.kernel.org/netdev/net/c/f508262ae9f2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-15  1:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 12:58 [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied Chuck Lever
2026-05-15  1:30 ` patchwork-bot+netdevbpf
  -- strict thread matches above, loose matches on Subject: below --
2026-05-14 20:56 [PATCH 0/3] three lockd fixes Chuck Lever
2026-05-14 20:56 ` [PATCH net] tls: Preserve sk_err across recvmsg() when data has been copied Chuck Lever
2026-05-14 20:57   ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.