Netdev List
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	Sabrina Dubroca <sd@queasysnail.net>
Cc: Eric Dumazet <edumazet@google.com>,
	Simon Horman <horms@kernel.org>,  Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org,  kernel-tls-handshake@lists.linux.dev,
	Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH net-next v10 7/7] tls: Preserve sk_err across recvmsg() when data has been copied
Date: Mon, 11 May 2026 19:25:58 -0400	[thread overview]
Message-ID: <20260511-tls-read-sock-v10-7-279fc5015f0e@oracle.com> (raw)
In-Reply-To: <20260511-tls-read-sock-v10-0-279fc5015f0e@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Both sk_err checks in tls_rx_rec_wait() consume the error via
sock_error(), which clears sk_err atomically. When the caller
(tls_sw_recvmsg, tls_sw_splice_read, or tls_sw_read_sock) already
has bytes copied to userspace, it returns those bytes and discards
the error from this call. sk_err is now zero on the socket, so the
next read syscall observes only RCV_SHUTDOWN and reports a clean
EOF instead of the actual error (typically -ECONNRESET).

The race was reachable before this series via tls_read_flush_backlog()
when its periodic sk_flush_backlog() triggered tcp_reset() in the
middle of a multi-record read. The earlier patch in this series that
flushes the backlog inside tls_rx_rec_wait() widens the window: the
flush now runs on every iteration of every wait, not only when the
periodic threshold fires.

Have tls_rx_rec_wait() report sk_err without clearing it, using
READ_ONCE() to keep the read explicit. Each caller's return path
consumes sk_err only when no data is being returned and the err
about to surface matches the pending sk_err. This mirrors the
tcp_recvmsg() preserve-and-surface pattern, and also handles
tls_rx_one_record()'s decrypt-abort path: it raises sk_err to
EBADMSG via tls_err_abort() before returning a different errno
(-EFAULT from tls_setup_from_iter() on zero-copy receive, -ENOMEM
from decrypt setup). The gate keeps the actual error on this read
and lets the EBADMSG surface on the next, matching pre-series
behavior.

Fixes: c46b01839f7a ("tls: rx: periodically flush socket backlog")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls_sw.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2b7093d27eb6..6a0ac2ccde56 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1376,8 +1376,14 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 		if (!sk_psock_queue_empty(psock))
 			return 0;
 
+		/* Report sk_err without clearing it. The caller may
+		 * discard the error return from this function in favor
+		 * of bytes already copied; leaving sk_err set ensures
+		 * the next read syscall surfaces the error instead of
+		 * a spurious EOF.
+		 */
 		if (sk->sk_err)
-			return sock_error(sk);
+			return -READ_ONCE(sk->sk_err);
 
 		if (ret < 0)
 			return ret;
@@ -1399,7 +1405,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 		 * actual error rather than a clean EOF.
 		 */
 		if (sk->sk_err)
-			return sock_error(sk);
+			return -READ_ONCE(sk->sk_err);
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
 
@@ -1430,6 +1436,18 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 	return 1;
 }
 
+/* Clear sk_err only when it matches the err about to be returned.
+ * tls_rx_one_record() can raise sk_err to EBADMSG via tls_err_abort()
+ * while returning a different errno; preserving sk_err in that case
+ * lets the EBADMSG surface on the next read.
+ */
+static int tls_sw_consume_matching_sk_err(struct sock *sk, int err)
+{
+	if (err < 0 && -err == READ_ONCE(sk->sk_err))
+		return sock_error(sk);
+	return err;
+}
+
 static int tls_setup_from_iter(struct iov_iter *from,
 			       int length, int *pages_used,
 			       struct scatterlist *to,
@@ -2285,7 +2303,9 @@ int tls_sw_recvmsg(struct sock *sk,
 	tls_rx_reader_unlock(sk, ctx);
 	if (psock)
 		sk_psock_put(sk, psock);
-	return copied ? : err;
+	if (copied)
+		return copied;
+	return tls_sw_consume_matching_sk_err(sk, err);
 }
 
 ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
@@ -2350,7 +2370,9 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 
 splice_read_end:
 	tls_rx_reader_unlock(sk, ctx);
-	return copied ? : err;
+	if (copied)
+		return copied;
+	return tls_sw_consume_matching_sk_err(sk, err);
 
 splice_requeue:
 	__skb_queue_head(&ctx->rx_list, skb);
@@ -2444,7 +2466,9 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 
 read_sock_end:
 	tls_rx_reader_release(sk, ctx);
-	return copied ? : err;
+	if (copied)
+		return copied;
+	return tls_sw_consume_matching_sk_err(sk, err);
 
 read_sock_requeue:
 	__skb_queue_head(&ctx->rx_list, skb);

-- 
2.54.0


      parent reply	other threads:[~2026-05-11 23:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 23:25 [PATCH net-next v10 0/7] tls: receive-path fixes and clean-ups Chuck Lever
2026-05-11 23:25 ` [PATCH net-next v10 1/7] tls: Move decrypt-failure abort into tls_rx_one_record() Chuck Lever
2026-05-11 23:25 ` [PATCH net-next v10 2/7] tls: Avoid evaluating freed skb in tls_sw_read_sock() loop Chuck Lever
2026-05-11 23:25 ` [PATCH net-next v10 3/7] tls: Re-present partially-consumed records in tls_sw_read_sock() Chuck Lever
2026-05-11 23:25 ` [PATCH net-next v10 4/7] tls: Factor tls_strp_msg_consume() from tls_strp_msg_done() Chuck Lever
2026-05-11 23:25 ` [PATCH net-next v10 5/7] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
2026-05-11 23:25 ` [PATCH net-next v10 6/7] tls: Flush backlog before waiting for a new record Chuck Lever
2026-05-11 23:25 ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260511-tls-read-sock-v10-7-279fc5015f0e@oracle.com \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-tls-handshake@lists.linux.dev \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sd@queasysnail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox