Linux kernel -stable discussions
 help / color / mirror / Atom feed
* [PATCH net 1/3] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
       [not found] <20260511160753.607296-1-dhowells@redhat.com>
@ 2026-05-11 16:07 ` David Howells
  2026-05-11 16:07 ` [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: David Howells @ 2026-05-11 16:07 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Hyunwoo Kim, Marc Dionne, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	linux-afs, linux-kernel, stable

From: Hyunwoo Kim <imv4bel@gmail.com>

The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
handler in rxrpc_verify_response() copy the skb to a linear one before
calling into the security ops only when skb_cloned() is true.  An skb
that is not cloned but still carries externally-owned paged fragments
(e.g. SKBFL_SHARED_FRAG set by splice() into a UDP socket via
__ip_append_data, or a chained skb_has_frag_list()) falls through to
the in-place decryption path, which binds the frag pages directly into
the AEAD/skcipher SGL via skb_to_sgvec().

Extend the gate to also unshare when skb_has_frag_list() or
skb_has_shared_frag() is true.  This catches the splice-loopback vector
and other externally-shared frag sources while preserving the
zero-copy fast path for skbs whose frags are kernel-private (e.g. NIC
page_pool RX, GRO).  The OOM/trace handling already in place is reused.

Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
 net/rxrpc/call_event.c | 4 +++-
 net/rxrpc/conn_event.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index fdd683261226..2b19b252225e 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -334,7 +334,9 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
 
 			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
 			    sp->hdr.securityIndex != 0 &&
-			    skb_cloned(skb)) {
+			    (skb_cloned(skb) ||
+			     skb_has_frag_list(skb) ||
+			     skb_has_shared_frag(skb))) {
 				/* Unshare the packet so that it can be
 				 * modified by in-place decryption.
 				 */
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index a2130d25aaa9..442414d90ba1 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -245,7 +245,8 @@ static int rxrpc_verify_response(struct rxrpc_connection *conn,
 {
 	int ret;
 
-	if (skb_cloned(skb)) {
+	if (skb_cloned(skb) || skb_has_frag_list(skb) ||
+	    skb_has_shared_frag(skb)) {
 		/* Copy the packet if shared so that we can do in-place
 		 * decryption.
 		 */


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
       [not found] <20260511160753.607296-1-dhowells@redhat.com>
  2026-05-11 16:07 ` [PATCH net 1/3] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present David Howells
@ 2026-05-11 16:07 ` David Howells
  2026-05-12  7:58   ` Jeffrey Altman
  2026-05-12 13:38   ` David Laight
  1 sibling, 2 replies; 10+ messages in thread
From: David Howells @ 2026-05-11 16:07 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Hyunwoo Kim, Marc Dionne, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	linux-afs, linux-kernel, Jeffrey Altman, Jiayuan Chen, stable

This improves the fix for CVE-2026-43500.

Fix the pagecache corruption from in-place decryption of a DATA packet
transmitted locally by splice() by getting rid of the packet sharing in the
I/O thread and unconditionally extracting the packet content into a bounce
buffer in which the buffer is decrypted.  recvmsg() (or the kernel
equivalent) then copies the data from the bounce buffer to the destination
buffer.  The sk_buff then remains unmodified.

This has an additional advantage in that the packet is then arranged in the
buffer with the correct alignment required for the crypto algorithms to
process directly.  The performance of the crypto does seem to be a little
faster and, surprisingly, the unencrypted performance doesn't seem to
change much - possibly due to removing complexity from the I/O thread.

Yet another advantage is that the I/O thread doesn't have to copy packets
which would slow down packet distribution, ACK generation, etc..

The buffer belongs to the call and is allocated initially at 2K,
sufficiently large to hold a whole jumbo subpacket, but the buffer will be
increased in size if needed.  There is one downside here, and that's if a
MSG_PEEK of more than one byte occurs, it may move on to the next packet,
replacing the content of the buffer.  In such a case, it has to go back and
re-decrypt the current packet.

Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
switch to using USHRT_MAX to indicate an invalid offset.

Note also that I would generally prefer to replace the buffers of the
current sk_buff with a new kmalloc'd buffer of the right size, ditching the
old data and frags as this makes the handling of MSG_PEEK easier and
removes the double-decryption issue, but this looks like quite a
complicated thing to achieve.  skb_morph() looks half way to what I want,
but I don't want to have to allocate a new sk_buff.

Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: Jiayuan Chen <jiayuan.chen@linux.dev>
cc: netdev@vger.kernel.org
cc: linux-afs@lists.infradead.org
cc: stable@vger.kernel.org
---
 net/rxrpc/ar-internal.h |  7 +++-
 net/rxrpc/call_event.c  | 22 +----------
 net/rxrpc/call_object.c |  2 +
 net/rxrpc/insecure.c    |  3 --
 net/rxrpc/recvmsg.c     | 72 +++++++++++++++++++++++++++-------
 net/rxrpc/rxgk.c        | 49 +++++++++++------------
 net/rxrpc/rxgk_common.h | 79 +++++++++++++++++++++++++++++++++++++
 net/rxrpc/rxkad.c       | 86 +++++++++++++++--------------------------
 8 files changed, 200 insertions(+), 120 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 27c2aa2dd023..783367eea798 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -213,8 +213,6 @@ struct rxrpc_skb_priv {
 		struct {
 			u16		offset;		/* Offset of data */
 			u16		len;		/* Length of data */
-			u8		flags;
-#define RXRPC_RX_VERIFIED	0x01
 		};
 		struct {
 			rxrpc_seq_t	first_ack;	/* First packet in acks table */
@@ -774,6 +772,11 @@ struct rxrpc_call {
 	struct sk_buff_head	recvmsg_queue;	/* Queue of packets ready for recvmsg() */
 	struct sk_buff_head	rx_queue;	/* Queue of packets for this call to receive */
 	struct sk_buff_head	rx_oos_queue;	/* Queue of out of sequence packets */
+	void			*rx_dec_buffer;	/* Decryption buffer */
+	unsigned short		rx_dec_bsize;	/* rx_dec_buffer size */
+	unsigned short		rx_dec_offset;	/* Decrypted packet data offset */
+	unsigned short		rx_dec_len;	/* Decrypted packet data len */
+	rxrpc_seq_t		rx_dec_seq;	/* Packet in decryption buffer */
 
 	rxrpc_seq_t		rx_highest_seq;	/* Higest sequence number received */
 	rxrpc_seq_t		rx_consumed;	/* Highest packet consumed */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 2b19b252225e..fec59d9338b9 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -332,27 +332,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
 
 			saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK;
 
-			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
-			    sp->hdr.securityIndex != 0 &&
-			    (skb_cloned(skb) ||
-			     skb_has_frag_list(skb) ||
-			     skb_has_shared_frag(skb))) {
-				/* Unshare the packet so that it can be
-				 * modified by in-place decryption.
-				 */
-				struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
-
-				if (nskb) {
-					rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
-					rxrpc_input_call_packet(call, nskb);
-					rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx);
-				} else {
-					/* OOM - Drop the packet. */
-					rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
-				}
-			} else {
-				rxrpc_input_call_packet(call, skb);
-			}
+			rxrpc_input_call_packet(call, skb);
 			rxrpc_free_skb(skb, rxrpc_skb_put_call_rx);
 			did_receive = true;
 		}
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index f035f486c139..fcb9d38bb521 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -152,6 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
 	spin_lock_init(&call->notify_lock);
 	refcount_set(&call->ref, 1);
 	call->debug_id		= debug_id;
+	call->rx_pkt_offset	= USHRT_MAX;
 	call->tx_total_len	= -1;
 	call->tx_jumbo_max	= 1;
 	call->next_rx_timo	= 20 * HZ;
@@ -553,6 +554,7 @@ static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call)
 	rxrpc_purge_queue(&call->recvmsg_queue);
 	rxrpc_purge_queue(&call->rx_queue);
 	rxrpc_purge_queue(&call->rx_oos_queue);
+	kfree(call->rx_dec_buffer);
 }
 
 /*
diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c
index 0a260df45d25..7a26c6097d03 100644
--- a/net/rxrpc/insecure.c
+++ b/net/rxrpc/insecure.c
@@ -32,9 +32,6 @@ static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
 
 static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
 {
-	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
-
-	sp->flags |= RXRPC_RX_VERIFIED;
 	return 0;
 }
 
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index e1f7513a46db..865e368381d5 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -147,15 +147,55 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
 }
 
 /*
- * Decrypt and verify a DATA packet.
+ * Decrypt and verify a DATA packet.  The content of the packet is pulled out
+ * into a flat buffer rather than decrypting in place in the skbuff.  This also
+ * has the advantage of aligning the buffer correctly for the crypto routines.
+ *
+ * We keep track of the sequence number of the packet currently decrypted into
+ * the buffer in ->rx_dec_seq.  Unfortunately, this means that a MSG_PEEK of
+ * more than one byte may cause a later packet to be decrypted into the buffer,
+ * requiring the original to be re-decrypted when recvmsg() is called again.
  */
 static int rxrpc_verify_data(struct rxrpc_call *call, struct sk_buff *skb)
 {
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+	int ret;
 
-	if (sp->flags & RXRPC_RX_VERIFIED)
+	if (call->rx_dec_seq == sp->hdr.seq && call->rx_dec_buffer)
 		return 0;
-	return call->security->verify_packet(call, skb);
+
+	if (call->rx_dec_bsize < sp->len) {
+		/* Make sure we can hold a 1412-byte jumbo subpacket and make
+		 * sure that the buffer size is aligned to a crypto blocksize.
+		 */
+		size_t size = umin(round_up(sp->len, 32), 2048);
+		void *buffer = krealloc(call->rx_dec_buffer, size, GFP_NOFS);
+
+		if (!buffer)
+			return -ENOMEM;
+		call->rx_dec_buffer = buffer;
+		call->rx_dec_bsize = size;
+	}
+
+	ret = -EFAULT;
+	if (skb_copy_bits(skb, sp->offset, call->rx_dec_buffer, sp->len) < 0)
+		goto err;
+
+	call->rx_dec_offset = 0;
+	call->rx_dec_len = sp->len;
+	call->rx_dec_seq = sp->hdr.seq;
+	ret = call->security->verify_packet(call, skb);
+	if (ret < 0)
+		goto err;
+	return 0;
+
+err:
+	kfree(call->rx_dec_buffer);
+	call->rx_dec_buffer = NULL;
+	call->rx_dec_bsize = 0;
+	call->rx_dec_offset = 0;
+	call->rx_dec_len = 0;
+	return ret;
 }
 
 /*
@@ -283,16 +323,19 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
 		if (msg)
 			sock_recv_timestamp(msg, sock->sk, skb);
 
-		if (rx_pkt_offset == 0) {
+		if (rx_pkt_offset == USHRT_MAX) {
 			ret2 = rxrpc_verify_data(call, skb);
 			trace_rxrpc_recvdata(call, rxrpc_recvmsg_next, seq,
-					     sp->offset, sp->len, ret2);
+					     call->rx_dec_offset,
+					     call->rx_dec_len, ret2);
 			if (ret2 < 0) {
 				ret = ret2;
 				goto out;
 			}
-			rx_pkt_offset = sp->offset;
-			rx_pkt_len = sp->len;
+			sp = rxrpc_skb(skb);
+			seq = sp->hdr.seq;
+			rx_pkt_offset = call->rx_dec_offset;
+			rx_pkt_len = call->rx_dec_len;
 		} else {
 			trace_rxrpc_recvdata(call, rxrpc_recvmsg_cont, seq,
 					     rx_pkt_offset, rx_pkt_len, 0);
@@ -304,10 +347,10 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
 		if (copy > remain)
 			copy = remain;
 		if (copy > 0) {
-			ret2 = skb_copy_datagram_iter(skb, rx_pkt_offset, iter,
-						      copy);
-			if (ret2 < 0) {
-				ret = ret2;
+			ret2 = copy_to_iter(call->rx_dec_buffer + rx_pkt_offset,
+					    copy, iter);
+			if (ret2 != copy) {
+				ret = -EFAULT;
 				goto out;
 			}
 
@@ -328,13 +371,14 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
 		/* The whole packet has been transferred. */
 		if (sp->hdr.flags & RXRPC_LAST_PACKET)
 			ret = 1;
-		rx_pkt_offset = 0;
+		rx_pkt_offset = USHRT_MAX;
 		rx_pkt_len = 0;
+		if (unlikely(flags & MSG_PEEK))
+			break;
 
 		skb = skb_peek_next(skb, &call->recvmsg_queue);
 
-		if (!(flags & MSG_PEEK))
-			rxrpc_rotate_rx_window(call);
+		rxrpc_rotate_rx_window(call);
 
 		if (!rx->app_ops &&
 		    !skb_queue_empty_lockless(&rx->recvmsg_oobq)) {
diff --git a/net/rxrpc/rxgk.c b/net/rxrpc/rxgk.c
index 0d5e654da918..88e651dd0e90 100644
--- a/net/rxrpc/rxgk.c
+++ b/net/rxrpc/rxgk.c
@@ -473,8 +473,9 @@ static int rxgk_verify_packet_integrity(struct rxrpc_call *call,
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rxgk_header *hdr;
 	struct krb5_buffer metadata;
-	unsigned int offset = sp->offset, len = sp->len;
+	unsigned int offset = 0, len = call->rx_dec_len;
 	size_t data_offset = 0, data_len = len;
+	void *data = call->rx_dec_buffer;
 	u32 ac = 0;
 	int ret = -ENOMEM;
 
@@ -496,16 +497,16 @@ static int rxgk_verify_packet_integrity(struct rxrpc_call *call,
 
 	metadata.len = sizeof(*hdr);
 	metadata.data = hdr;
-	ret = rxgk_verify_mic_skb(gk->krb5, gk->rx_Kc, &metadata,
-				  skb, &offset, &len, &ac);
+	ret = rxgk_verify_mic(gk->krb5, gk->rx_Kc, &metadata,
+			      data, &offset, &len, &ac);
 	kfree(hdr);
 	if (ret < 0) {
 		if (ret != -ENOMEM)
 			rxrpc_abort_eproto(call, skb, ac,
 					   rxgk_abort_1_verify_mic_eproto);
 	} else {
-		sp->offset = offset;
-		sp->len = len;
+		call->rx_dec_offset = offset;
+		call->rx_dec_len = len;
 	}
 
 put_gk:
@@ -522,49 +523,45 @@ static int rxgk_verify_packet_encrypted(struct rxrpc_call *call,
 					struct sk_buff *skb)
 {
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
-	struct rxgk_header hdr;
-	unsigned int offset = sp->offset, len = sp->len;
+	struct rxgk_header *hdr;
+	unsigned int offset = 0, len = call->rx_dec_len;
+	void *data = call->rx_dec_buffer;
 	int ret;
 	u32 ac = 0;
 
 	_enter("");
 
-	ret = rxgk_decrypt_skb(gk->krb5, gk->rx_enc, skb, &offset, &len, &ac);
+	ret = rxgk_decrypt(gk->krb5, gk->rx_enc, data, &offset, &len, &ac);
 	if (ret < 0) {
 		if (ret != -ENOMEM)
 			rxrpc_abort_eproto(call, skb, ac, rxgk_abort_2_decrypt_eproto);
 		goto error;
 	}
 
-	if (len < sizeof(hdr)) {
+	if (len < sizeof(*hdr)) {
 		ret = rxrpc_abort_eproto(call, skb, RXGK_PACKETSHORT,
 					 rxgk_abort_2_short_header);
 		goto error;
 	}
 
 	/* Extract the header from the skb */
-	ret = skb_copy_bits(skb, offset, &hdr, sizeof(hdr));
-	if (ret < 0) {
-		ret = rxrpc_abort_eproto(call, skb, RXGK_PACKETSHORT,
-					 rxgk_abort_2_short_encdata);
-		goto error;
-	}
-	offset += sizeof(hdr);
-	len -= sizeof(hdr);
-
-	if (ntohl(hdr.epoch)		!= call->conn->proto.epoch ||
-	    ntohl(hdr.cid)		!= call->cid ||
-	    ntohl(hdr.call_number)	!= call->call_id ||
-	    ntohl(hdr.seq)		!= sp->hdr.seq ||
-	    ntohl(hdr.sec_index)	!= call->security_ix ||
-	    ntohl(hdr.data_len)		> len) {
+	hdr = data + offset;
+	offset += sizeof(*hdr);
+	len -= sizeof(*hdr);
+
+	if (ntohl(hdr->epoch)		!= call->conn->proto.epoch ||
+	    ntohl(hdr->cid)		!= call->cid ||
+	    ntohl(hdr->call_number)	!= call->call_id ||
+	    ntohl(hdr->seq)		!= sp->hdr.seq ||
+	    ntohl(hdr->sec_index)	!= call->security_ix ||
+	    ntohl(hdr->data_len)	> len) {
 		ret = rxrpc_abort_eproto(call, skb, RXGK_SEALEDINCON,
 					 rxgk_abort_2_short_data);
 		goto error;
 	}
 
-	sp->offset = offset;
-	sp->len = ntohl(hdr.data_len);
+	call->rx_dec_offset = offset;
+	call->rx_dec_len = ntohl(hdr->data_len);
 	ret = 0;
 error:
 	rxgk_put(gk);
diff --git a/net/rxrpc/rxgk_common.h b/net/rxrpc/rxgk_common.h
index 1e257d7ab8ec..dc8b0f106104 100644
--- a/net/rxrpc/rxgk_common.h
+++ b/net/rxrpc/rxgk_common.h
@@ -105,6 +105,45 @@ int rxgk_decrypt_skb(const struct krb5_enctype *krb5,
 	return ret;
 }
 
+/*
+ * Apply decryption and checksumming functions a flat data buffer.  The offset
+ * and length are updated to reflect the actual content of the encrypted
+ * region.
+ */
+static inline int rxgk_decrypt(const struct krb5_enctype *krb5,
+			       struct crypto_aead *aead,
+			       void *data,
+			       unsigned int *_offset, unsigned int *_len,
+			       int *_error_code)
+{
+	struct scatterlist sg[1];
+	size_t offset = 0, len = *_len;
+	int ret;
+
+	sg_init_one(sg, data, len);
+
+	ret = crypto_krb5_decrypt(krb5, aead, sg, 1, &offset, &len);
+	switch (ret) {
+	case 0:
+		*_offset += offset;
+		*_len = len;
+		break;
+	case -EBADMSG: /* Checksum mismatch. */
+	case -EPROTO:
+		*_error_code = RXGK_SEALEDINCON;
+		break;
+	case -EMSGSIZE:
+		*_error_code = RXGK_PACKETSHORT;
+		break;
+	case -ENOPKG: /* Would prefer RXGK_BADETYPE, but not available for YFS. */
+	default:
+		*_error_code = RXGK_INCONSISTENCY;
+		break;
+	}
+
+	return ret;
+}
+
 /*
  * Check the MIC on a region of an skbuff.  The offset and length are updated
  * to reflect the actual content of the secure region.
@@ -148,3 +187,43 @@ int rxgk_verify_mic_skb(const struct krb5_enctype *krb5,
 
 	return ret;
 }
+
+/*
+ * Check the MIC on a flat buffer.  The offset and length are updated to
+ * reflect the actual content of the secure region.
+ */
+static inline
+int rxgk_verify_mic(const struct krb5_enctype *krb5,
+		    struct crypto_shash *shash,
+		    const struct krb5_buffer *metadata,
+		    void *data,
+		    unsigned int *_offset, unsigned int *_len,
+		    u32 *_error_code)
+{
+	struct scatterlist sg[1];
+	size_t offset = 0, len = *_len;
+	int ret;
+
+	sg_init_one(sg, data, len);
+
+	ret = crypto_krb5_verify_mic(krb5, shash, metadata, sg, 1, &offset, &len);
+	switch (ret) {
+	case 0:
+		*_offset += offset;
+		*_len = len;
+		break;
+	case -EBADMSG: /* Checksum mismatch */
+	case -EPROTO:
+		*_error_code = RXGK_SEALEDINCON;
+		break;
+	case -EMSGSIZE:
+		*_error_code = RXGK_PACKETSHORT;
+		break;
+	case -ENOPKG: /* Would prefer RXGK_BADETYPE, but not available for YFS. */
+	default:
+		*_error_code = RXGK_INCONSISTENCY;
+		break;
+	}
+
+	return ret;
+}
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index cba7935977f0..075936337836 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -430,27 +430,25 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
 				 rxrpc_seq_t seq,
 				 struct skcipher_request *req)
 {
-	struct rxkad_level1_hdr sechdr;
+	struct rxkad_level1_hdr *sechdr;
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rxrpc_crypt iv;
-	struct scatterlist sg[16];
-	u32 data_size, buf;
+	struct scatterlist sg[1];
+	void *data = call->rx_dec_buffer;
+	u32 len = sp->len, data_size, buf;
 	u16 check;
 	int ret;
 
 	_enter("");
 
-	if (sp->len < 8)
+	if (len < 8)
 		return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
 					  rxkad_abort_1_short_header);
 
 	/* Decrypt the skbuff in-place.  TODO: We really want to decrypt
 	 * directly into the target buffer.
 	 */
-	sg_init_table(sg, ARRAY_SIZE(sg));
-	ret = skb_to_sgvec(skb, sg, sp->offset, 8);
-	if (unlikely(ret < 0))
-		return ret;
+	sg_init_one(sg, data, len);
 
 	/* start the decryption afresh */
 	memset(&iv, 0, sizeof(iv));
@@ -464,13 +462,11 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
 		return ret;
 
 	/* Extract the decrypted packet length */
-	if (skb_copy_bits(skb, sp->offset, &sechdr, sizeof(sechdr)) < 0)
-		return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
-					  rxkad_abort_1_short_encdata);
-	sp->offset += sizeof(sechdr);
-	sp->len    -= sizeof(sechdr);
+	sechdr = data;
+	call->rx_dec_offset = sizeof(*sechdr);
+	len -= sizeof(*sechdr);
 
-	buf = ntohl(sechdr.data_size);
+	buf = ntohl(sechdr->data_size);
 	data_size = buf & 0xffff;
 
 	check = buf >> 16;
@@ -479,10 +475,10 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
 	if (check != 0)
 		return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
 					  rxkad_abort_1_short_check);
-	if (data_size > sp->len)
+	if (data_size > len)
 		return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
 					  rxkad_abort_1_short_data);
-	sp->len = data_size;
+	call->rx_dec_len = data_size;
 
 	_leave(" = 0 [dlen=%x]", data_size);
 	return 0;
@@ -496,43 +492,28 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
 				 struct skcipher_request *req)
 {
 	const struct rxrpc_key_token *token;
-	struct rxkad_level2_hdr sechdr;
+	struct rxkad_level2_hdr *sechdr;
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rxrpc_crypt iv;
-	struct scatterlist _sg[4], *sg;
-	u32 data_size, buf;
+	struct scatterlist sg[1];
+	void *data = call->rx_dec_buffer;
+	u32 len = sp->len, data_size, buf;
 	u16 check;
-	int nsg, ret;
+	int ret;
 
-	_enter(",{%d}", sp->len);
+	_enter(",{%d}", len);
 
-	if (sp->len < 8)
+	if (len < 8)
 		return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
 					  rxkad_abort_2_short_header);
 
 	/* Don't let the crypto algo see a misaligned length. */
-	sp->len = round_down(sp->len, 8);
+	len = round_down(len, 8);
 
-	/* Decrypt the skbuff in-place.  TODO: We really want to decrypt
-	 * directly into the target buffer.
+	/* Decrypt in place in the call's decryption buffer.  TODO: We really
+	 * want to decrypt directly into the target buffer.
 	 */
-	sg = _sg;
-	nsg = skb_shinfo(skb)->nr_frags + 1;
-	if (nsg <= 4) {
-		nsg = 4;
-	} else {
-		sg = kmalloc_objs(*sg, nsg, GFP_NOIO);
-		if (!sg)
-			return -ENOMEM;
-	}
-
-	sg_init_table(sg, nsg);
-	ret = skb_to_sgvec(skb, sg, sp->offset, sp->len);
-	if (unlikely(ret < 0)) {
-		if (sg != _sg)
-			kfree(sg);
-		return ret;
-	}
+	sg_init_one(sg, data, len);
 
 	/* decrypt from the session key */
 	token = call->conn->key->payload.data[0];
@@ -540,11 +521,9 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
 
 	skcipher_request_set_sync_tfm(req, call->conn->rxkad.cipher);
 	skcipher_request_set_callback(req, 0, NULL, NULL);
-	skcipher_request_set_crypt(req, sg, sg, sp->len, iv.x);
+	skcipher_request_set_crypt(req, sg, sg, len, iv.x);
 	ret = crypto_skcipher_decrypt(req);
 	skcipher_request_zero(req);
-	if (sg != _sg)
-		kfree(sg);
 	if (ret < 0) {
 		if (ret == -ENOMEM)
 			return ret;
@@ -553,13 +532,11 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
 	}
 
 	/* Extract the decrypted packet length */
-	if (skb_copy_bits(skb, sp->offset, &sechdr, sizeof(sechdr)) < 0)
-		return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
-					  rxkad_abort_2_short_len);
-	sp->offset += sizeof(sechdr);
-	sp->len    -= sizeof(sechdr);
+	sechdr = data;
+	call->rx_dec_offset = sizeof(*sechdr);
+	len -= sizeof(*sechdr);
 
-	buf = ntohl(sechdr.data_size);
+	buf = ntohl(sechdr->data_size);
 	data_size = buf & 0xffff;
 
 	check = buf >> 16;
@@ -569,17 +546,18 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
 		return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
 					  rxkad_abort_2_short_check);
 
-	if (data_size > sp->len)
+	if (data_size > len)
 		return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
 					  rxkad_abort_2_short_data);
 
-	sp->len = data_size;
+	call->rx_dec_len = data_size;
 	_leave(" = 0 [dlen=%x]", data_size);
 	return 0;
 }
 
 /*
- * Verify the security on a received packet and the subpackets therein.
+ * Verify the security on a received (sub)packet.  If the packet needs
+ * modifying (e.g. decrypting), it must be copied.
  */
 static int rxkad_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
 {


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-11 16:07 ` [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg David Howells
@ 2026-05-12  7:58   ` Jeffrey Altman
  2026-05-13  8:01     ` David Howells
  2026-05-12 13:38   ` David Laight
  1 sibling, 1 reply; 10+ messages in thread
From: Jeffrey Altman @ 2026-05-12  7:58 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jiayuan Chen, stable

[-- Attachment #1: Type: text/plain, Size: 23796 bytes --]



> On May 11, 2026, at 12:07 PM, David Howells <dhowells@redhat.com> wrote:
> 
> This improves the fix for CVE-2026-43500.
> 
> Fix the pagecache corruption from in-place decryption of a DATA packet
> transmitted locally by splice() by getting rid of the packet sharing in the
> I/O thread and unconditionally extracting the packet content into a bounce
> buffer in which the buffer is decrypted.  recvmsg() (or the kernel
> equivalent) then copies the data from the bounce buffer to the destination
> buffer.  The sk_buff then remains unmodified.
> 
> This has an additional advantage in that the packet is then arranged in the
> buffer with the correct alignment required for the crypto algorithms to
> process directly.  The performance of the crypto does seem to be a little
> faster and, surprisingly, the unencrypted performance doesn't seem to
> change much - possibly due to removing complexity from the I/O thread.
> 
> Yet another advantage is that the I/O thread doesn't have to copy packets
> which would slow down packet distribution, ACK generation, etc..
> 
> The buffer belongs to the call and is allocated initially at 2K,
> sufficiently large to hold a whole jumbo subpacket, but the buffer will be
> increased in size if needed.  There is one downside here, and that's if a
> MSG_PEEK of more than one byte occurs, it may move on to the next packet,
> replacing the content of the buffer.  In such a case, it has to go back and
> re-decrypt the current packet.
> 
> Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
> switch to using USHRT_MAX to indicate an invalid offset.
> 
> Note also that I would generally prefer to replace the buffers of the
> current sk_buff with a new kmalloc'd buffer of the right size, ditching the
> old data and frags as this makes the handling of MSG_PEEK easier and
> removes the double-decryption issue, but this looks like quite a
> complicated thing to achieve.  skb_morph() looks half way to what I want,
> but I don't want to have to allocate a new sk_buff.

It might be useful to add a back-porting note indication that the rxgk
changes can be safely dropped if the kernel version does not contain
rxgk support.

> 
> Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Simon Horman <horms@kernel.org>
> cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> cc: netdev@vger.kernel.org
> cc: linux-afs@lists.infradead.org
> cc: stable@vger.kernel.org
> ---
> net/rxrpc/ar-internal.h |  7 +++-
> net/rxrpc/call_event.c  | 22 +----------
> net/rxrpc/call_object.c |  2 +
> net/rxrpc/insecure.c    |  3 --
> net/rxrpc/recvmsg.c     | 72 +++++++++++++++++++++++++++-------
> net/rxrpc/rxgk.c        | 49 +++++++++++------------
> net/rxrpc/rxgk_common.h | 79 +++++++++++++++++++++++++++++++++++++
> net/rxrpc/rxkad.c       | 86 +++++++++++++++--------------------------
> 8 files changed, 200 insertions(+), 120 deletions(-)
> 
> diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
> index 27c2aa2dd023..783367eea798 100644
> --- a/net/rxrpc/ar-internal.h
> +++ b/net/rxrpc/ar-internal.h
> @@ -213,8 +213,6 @@ struct rxrpc_skb_priv {
> struct {
> u16 offset; /* Offset of data */
> u16 len; /* Length of data */
> - u8 flags;
> -#define RXRPC_RX_VERIFIED 0x01
> };
> struct {
> rxrpc_seq_t first_ack; /* First packet in acks table */
> @@ -774,6 +772,11 @@ struct rxrpc_call {
> struct sk_buff_head recvmsg_queue; /* Queue of packets ready for recvmsg() */
> struct sk_buff_head rx_queue; /* Queue of packets for this call to receive */
> struct sk_buff_head rx_oos_queue; /* Queue of out of sequence packets */
> + void *rx_dec_buffer; /* Decryption buffer */
> + unsigned short rx_dec_bsize; /* rx_dec_buffer size */
> + unsigned short rx_dec_offset; /* Decrypted packet data offset */
> + unsigned short rx_dec_len; /* Decrypted packet data len */
> + rxrpc_seq_t rx_dec_seq; /* Packet in decryption buffer */
> 
> rxrpc_seq_t rx_highest_seq; /* Higest sequence number received */
> rxrpc_seq_t rx_consumed; /* Highest packet consumed */


Instead of allocating the storage within struct rxrpc_call perhaps
It would be better to add them to struct rxrpc_channel.  Doing so 
would reduce the allocation/deallocation churn.  The majority of
calls are short lived (perhaps a single packet in each direction)
but there will be many calls in rapid succession.

> diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> index 2b19b252225e..fec59d9338b9 100644
> --- a/net/rxrpc/call_event.c
> +++ b/net/rxrpc/call_event.c
> @@ -332,27 +332,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
> 
> saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK;
> 
> - if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> -    sp->hdr.securityIndex != 0 &&
> -    (skb_cloned(skb) ||
> -     skb_has_frag_list(skb) ||
> -     skb_has_shared_frag(skb))) {
> - /* Unshare the packet so that it can be
> - * modified by in-place decryption.
> - */
> - struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
> -
> - if (nskb) {
> - rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
> - rxrpc_input_call_packet(call, nskb);
> - rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx);
> - } else {
> - /* OOM - Drop the packet. */
> - rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
> - }
> - } else {
> - rxrpc_input_call_packet(call, skb);
> - }
> + rxrpc_input_call_packet(call, skb);
> rxrpc_free_skb(skb, rxrpc_skb_put_call_rx);
> did_receive = true;
> }

> diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
> index f035f486c139..fcb9d38bb521 100644
> --- a/net/rxrpc/call_object.c
> +++ b/net/rxrpc/call_object.c
> @@ -152,6 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
> spin_lock_init(&call->notify_lock);
> refcount_set(&call->ref, 1);
> call->debug_id = debug_id;
> + call->rx_pkt_offset = USHRT_MAX;
> call->tx_total_len = -1;
> call->tx_jumbo_max = 1;
> call->next_rx_timo = 20 * HZ;
> @@ -553,6 +554,7 @@ static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call)
> rxrpc_purge_queue(&call->recvmsg_queue);
> rxrpc_purge_queue(&call->rx_queue);
> rxrpc_purge_queue(&call->rx_oos_queue);
> + kfree(call->rx_dec_buffer);
> }
> 
> /*
> diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c
> index 0a260df45d25..7a26c6097d03 100644
> --- a/net/rxrpc/insecure.c
> +++ b/net/rxrpc/insecure.c
> @@ -32,9 +32,6 @@ static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
> 
> static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
> {
> - struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> -
> - sp->flags |= RXRPC_RX_VERIFIED;
> return 0;
> }
> 
> diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
> index e1f7513a46db..865e368381d5 100644
> --- a/net/rxrpc/recvmsg.c
> +++ b/net/rxrpc/recvmsg.c
> @@ -147,15 +147,55 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
> }
> 
> /*
> - * Decrypt and verify a DATA packet.
> + * Decrypt and verify a DATA packet.  The content of the packet is pulled out
> + * into a flat buffer rather than decrypting in place in the skbuff.  This also
> + * has the advantage of aligning the buffer correctly for the crypto routines.
> + *
> + * We keep track of the sequence number of the packet currently decrypted into
> + * the buffer in ->rx_dec_seq.  Unfortunately, this means that a MSG_PEEK of
> + * more than one byte may cause a later packet to be decrypted into the buffer,
> + * requiring the original to be re-decrypted when recvmsg() is called again.
>  */
> static int rxrpc_verify_data(struct rxrpc_call *call, struct sk_buff *skb)
> {
> struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> + int ret;
> 
> - if (sp->flags & RXRPC_RX_VERIFIED)
> + if (call->rx_dec_seq == sp->hdr.seq && call->rx_dec_buffer)
> return 0;
> - return call->security->verify_packet(call, skb);
> +
> + if (call->rx_dec_bsize < sp->len) {
> + /* Make sure we can hold a 1412-byte jumbo subpacket and make
> + * sure that the buffer size is aligned to a crypto blocksize.
> + */
> + size_t size = umin(round_up(sp->len, 32), 2048);

I think you meant to use max() here so that a minimum of 2048 bytes
is allocated.  

I think applying a cap on the allocation size would also be 
beneficial.  IBM/Transarc derived Rx implementations have a hard
upper-bound of 21180 (15 x 1412) bytes plus one 28 byte rx header.
Applying a cap of 32KiB seems prudent.

It is also worth noting that there are no current implementations
of Rx RPC which will send individual Rx DATA packets larger than 
1444 bytes including the Rx header.  Rx RESPONSE packets can be sent
as large as 16384 bytes (including the Rx header).  However, it is
extremely unlikely that this buffer once allocated would ever need 
to be grown.  

> + void *buffer = krealloc(call->rx_dec_buffer, size, GFP_NOFS);
> +
> + if (!buffer)
> + return -ENOMEM;
> + call->rx_dec_buffer = buffer;
> + call->rx_dec_bsize = size;
> + }
> +
> + ret = -EFAULT;
> + if (skb_copy_bits(skb, sp->offset, call->rx_dec_buffer, sp->len) < 0)
> + goto err;
> +
> + call->rx_dec_offset = 0;
> + call->rx_dec_len = sp->len;
> + call->rx_dec_seq = sp->hdr.seq;
> + ret = call->security->verify_packet(call, skb);
> + if (ret < 0)
> + goto err;
> + return 0;
> +
> +err:
> + kfree(call->rx_dec_buffer);

It might be better to avoid deallocating the buffer on the error
path and permit it to be freed during normal call (or call channel)
deallocation.

> + call->rx_dec_buffer = NULL;
> + call->rx_dec_bsize = 0;
> + call->rx_dec_offset = 0;
> + call->rx_dec_len = 0;
> + return ret;
> }
> 
> /*
> @@ -283,16 +323,19 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
> if (msg)
> sock_recv_timestamp(msg, sock->sk, skb);
> 
> - if (rx_pkt_offset == 0) {
> + if (rx_pkt_offset == USHRT_MAX) {
> ret2 = rxrpc_verify_data(call, skb);
> trace_rxrpc_recvdata(call, rxrpc_recvmsg_next, seq,
> -     sp->offset, sp->len, ret2);
> +     call->rx_dec_offset,
> +     call->rx_dec_len, ret2);
> if (ret2 < 0) {
> ret = ret2;
> goto out;
> }
> - rx_pkt_offset = sp->offset;
> - rx_pkt_len = sp->len;
> + sp = rxrpc_skb(skb);
> + seq = sp->hdr.seq;
> + rx_pkt_offset = call->rx_dec_offset;
> + rx_pkt_len = call->rx_dec_len;
> } else {
> trace_rxrpc_recvdata(call, rxrpc_recvmsg_cont, seq,
>     rx_pkt_offset, rx_pkt_len, 0);
> @@ -304,10 +347,10 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
> if (copy > remain)
> copy = remain;
> if (copy > 0) {
> - ret2 = skb_copy_datagram_iter(skb, rx_pkt_offset, iter,
> -      copy);
> - if (ret2 < 0) {
> - ret = ret2;
> + ret2 = copy_to_iter(call->rx_dec_buffer + rx_pkt_offset,
> +    copy, iter);
> + if (ret2 != copy) {
> + ret = -EFAULT;
> goto out;
> }
> 
> @@ -328,13 +371,14 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
> /* The whole packet has been transferred. */
> if (sp->hdr.flags & RXRPC_LAST_PACKET)
> ret = 1;
> - rx_pkt_offset = 0;
> + rx_pkt_offset = USHRT_MAX;
> rx_pkt_len = 0;
> + if (unlikely(flags & MSG_PEEK))
> + break;
> 
> skb = skb_peek_next(skb, &call->recvmsg_queue);
> 
> - if (!(flags & MSG_PEEK))
> - rxrpc_rotate_rx_window(call);
> + rxrpc_rotate_rx_window(call);
> 
> if (!rx->app_ops &&
>    !skb_queue_empty_lockless(&rx->recvmsg_oobq)) {
> diff --git a/net/rxrpc/rxgk.c b/net/rxrpc/rxgk.c
> index 0d5e654da918..88e651dd0e90 100644
> --- a/net/rxrpc/rxgk.c
> +++ b/net/rxrpc/rxgk.c
> @@ -473,8 +473,9 @@ static int rxgk_verify_packet_integrity(struct rxrpc_call *call,
> struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> struct rxgk_header *hdr;
> struct krb5_buffer metadata;
> - unsigned int offset = sp->offset, len = sp->len;
> + unsigned int offset = 0, len = call->rx_dec_len;
> size_t data_offset = 0, data_len = len;
> + void *data = call->rx_dec_buffer;
> u32 ac = 0;
> int ret = -ENOMEM;
> 
> @@ -496,16 +497,16 @@ static int rxgk_verify_packet_integrity(struct rxrpc_call *call,
> 
> metadata.len = sizeof(*hdr);
> metadata.data = hdr;
> - ret = rxgk_verify_mic_skb(gk->krb5, gk->rx_Kc, &metadata,
> -  skb, &offset, &len, &ac);
> + ret = rxgk_verify_mic(gk->krb5, gk->rx_Kc, &metadata,
> +      data, &offset, &len, &ac);
> kfree(hdr);
> if (ret < 0) {
> if (ret != -ENOMEM)
> rxrpc_abort_eproto(call, skb, ac,
>   rxgk_abort_1_verify_mic_eproto);
> } else {
> - sp->offset = offset;
> - sp->len = len;
> + call->rx_dec_offset = offset;
> + call->rx_dec_len = len;
> }
> 
> put_gk:
> @@ -522,49 +523,45 @@ static int rxgk_verify_packet_encrypted(struct rxrpc_call *call,
> struct sk_buff *skb)
> {
> struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> - struct rxgk_header hdr;
> - unsigned int offset = sp->offset, len = sp->len;
> + struct rxgk_header *hdr;
> + unsigned int offset = 0, len = call->rx_dec_len;
> + void *data = call->rx_dec_buffer;
> int ret;
> u32 ac = 0;
> 
> _enter("");
> 
> - ret = rxgk_decrypt_skb(gk->krb5, gk->rx_enc, skb, &offset, &len, &ac);
> + ret = rxgk_decrypt(gk->krb5, gk->rx_enc, data, &offset, &len, &ac);
> if (ret < 0) {
> if (ret != -ENOMEM)
> rxrpc_abort_eproto(call, skb, ac, rxgk_abort_2_decrypt_eproto);
> goto error;
> }
> 
> - if (len < sizeof(hdr)) {
> + if (len < sizeof(*hdr)) {
> ret = rxrpc_abort_eproto(call, skb, RXGK_PACKETSHORT,
> rxgk_abort_2_short_header);
> goto error;
> }
> 
> /* Extract the header from the skb */
> - ret = skb_copy_bits(skb, offset, &hdr, sizeof(hdr));
> - if (ret < 0) {
> - ret = rxrpc_abort_eproto(call, skb, RXGK_PACKETSHORT,
> - rxgk_abort_2_short_encdata);
> - goto error;
> - }
> - offset += sizeof(hdr);
> - len -= sizeof(hdr);
> -
> - if (ntohl(hdr.epoch) != call->conn->proto.epoch ||
> -    ntohl(hdr.cid) != call->cid ||
> -    ntohl(hdr.call_number) != call->call_id ||
> -    ntohl(hdr.seq) != sp->hdr.seq ||
> -    ntohl(hdr.sec_index) != call->security_ix ||
> -    ntohl(hdr.data_len) > len) {
> + hdr = data + offset;
> + offset += sizeof(*hdr);
> + len -= sizeof(*hdr);
> +
> + if (ntohl(hdr->epoch) != call->conn->proto.epoch ||
> +    ntohl(hdr->cid) != call->cid ||
> +    ntohl(hdr->call_number) != call->call_id ||
> +    ntohl(hdr->seq) != sp->hdr.seq ||
> +    ntohl(hdr->sec_index) != call->security_ix ||
> +    ntohl(hdr->data_len) > len) {
> ret = rxrpc_abort_eproto(call, skb, RXGK_SEALEDINCON,
> rxgk_abort_2_short_data);
> goto error;
> }
> 
> - sp->offset = offset;
> - sp->len = ntohl(hdr.data_len);
> + call->rx_dec_offset = offset;
> + call->rx_dec_len = ntohl(hdr->data_len);
> ret = 0;
> error:
> rxgk_put(gk);
> diff --git a/net/rxrpc/rxgk_common.h b/net/rxrpc/rxgk_common.h
> index 1e257d7ab8ec..dc8b0f106104 100644
> --- a/net/rxrpc/rxgk_common.h
> +++ b/net/rxrpc/rxgk_common.h
> @@ -105,6 +105,45 @@ int rxgk_decrypt_skb(const struct krb5_enctype *krb5,
> return ret;
> }
> 
> +/*
> + * Apply decryption and checksumming functions a flat data buffer.  The offset
> + * and length are updated to reflect the actual content of the encrypted
> + * region.
> + */
> +static inline int rxgk_decrypt(const struct krb5_enctype *krb5,
> +       struct crypto_aead *aead,
> +       void *data,
> +       unsigned int *_offset, unsigned int *_len,
> +       int *_error_code)
> +{
> + struct scatterlist sg[1];
> + size_t offset = 0, len = *_len;
> + int ret;
> +
> + sg_init_one(sg, data, len);
> +
> + ret = crypto_krb5_decrypt(krb5, aead, sg, 1, &offset, &len);
> + switch (ret) {
> + case 0:
> + *_offset += offset;
> + *_len = len;
> + break;
> + case -EBADMSG: /* Checksum mismatch. */
> + case -EPROTO:
> + *_error_code = RXGK_SEALEDINCON;
> + break;
> + case -EMSGSIZE:
> + *_error_code = RXGK_PACKETSHORT;
> + break;
> + case -ENOPKG: /* Would prefer RXGK_BADETYPE, but not available for YFS. */
> + default:
> + *_error_code = RXGK_INCONSISTENCY;
> + break;
> + }
> +
> + return ret;
> +}
> +
> /*
>  * Check the MIC on a region of an skbuff.  The offset and length are updated
>  * to reflect the actual content of the secure region.
> @@ -148,3 +187,43 @@ int rxgk_verify_mic_skb(const struct krb5_enctype *krb5,
> 
> return ret;
> }
> +
> +/*
> + * Check the MIC on a flat buffer.  The offset and length are updated to
> + * reflect the actual content of the secure region.
> + */
> +static inline
> +int rxgk_verify_mic(const struct krb5_enctype *krb5,
> +    struct crypto_shash *shash,
> +    const struct krb5_buffer *metadata,
> +    void *data,
> +    unsigned int *_offset, unsigned int *_len,
> +    u32 *_error_code)
> +{
> + struct scatterlist sg[1];
> + size_t offset = 0, len = *_len;
> + int ret;
> +
> + sg_init_one(sg, data, len);
> +
> + ret = crypto_krb5_verify_mic(krb5, shash, metadata, sg, 1, &offset, &len);
> + switch (ret) {
> + case 0:
> + *_offset += offset;
> + *_len = len;
> + break;
> + case -EBADMSG: /* Checksum mismatch */
> + case -EPROTO:
> + *_error_code = RXGK_SEALEDINCON;
> + break;
> + case -EMSGSIZE:
> + *_error_code = RXGK_PACKETSHORT;
> + break;
> + case -ENOPKG: /* Would prefer RXGK_BADETYPE, but not available for YFS. */
> + default:
> + *_error_code = RXGK_INCONSISTENCY;
> + break;
> + }
> +
> + return ret;
> +}
> diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
> index cba7935977f0..075936337836 100644
> --- a/net/rxrpc/rxkad.c
> +++ b/net/rxrpc/rxkad.c
> @@ -430,27 +430,25 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
> rxrpc_seq_t seq,
> struct skcipher_request *req)
> {
> - struct rxkad_level1_hdr sechdr;
> + struct rxkad_level1_hdr *sechdr;
> struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> struct rxrpc_crypt iv;
> - struct scatterlist sg[16];
> - u32 data_size, buf;
> + struct scatterlist sg[1];
> + void *data = call->rx_dec_buffer;
> + u32 len = sp->len, data_size, buf;
> u16 check;
> int ret;
> 
> _enter("");
> 
> - if (sp->len < 8)
> + if (len < 8)
> return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
>  rxkad_abort_1_short_header);
> 
> /* Decrypt the skbuff in-place.  TODO: We really want to decrypt
> * directly into the target buffer.
> */
> - sg_init_table(sg, ARRAY_SIZE(sg));
> - ret = skb_to_sgvec(skb, sg, sp->offset, 8);
> - if (unlikely(ret < 0))
> - return ret;
> + sg_init_one(sg, data, len);
> 
> /* start the decryption afresh */
> memset(&iv, 0, sizeof(iv));
> @@ -464,13 +462,11 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
> return ret;
> 
> /* Extract the decrypted packet length */
> - if (skb_copy_bits(skb, sp->offset, &sechdr, sizeof(sechdr)) < 0)
> - return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
> -  rxkad_abort_1_short_encdata);
> - sp->offset += sizeof(sechdr);
> - sp->len    -= sizeof(sechdr);
> + sechdr = data;
> + call->rx_dec_offset = sizeof(*sechdr);
> + len -= sizeof(*sechdr);
> 
> - buf = ntohl(sechdr.data_size);
> + buf = ntohl(sechdr->data_size);
> data_size = buf & 0xffff;
> 
> check = buf >> 16;
> @@ -479,10 +475,10 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
> if (check != 0)
> return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
>  rxkad_abort_1_short_check);
> - if (data_size > sp->len)
> + if (data_size > len)
> return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
>  rxkad_abort_1_short_data);
> - sp->len = data_size;
> + call->rx_dec_len = data_size;
> 
> _leave(" = 0 [dlen=%x]", data_size);
> return 0;
> @@ -496,43 +492,28 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
> struct skcipher_request *req)
> {
> const struct rxrpc_key_token *token;
> - struct rxkad_level2_hdr sechdr;
> + struct rxkad_level2_hdr *sechdr;
> struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> struct rxrpc_crypt iv;
> - struct scatterlist _sg[4], *sg;
> - u32 data_size, buf;
> + struct scatterlist sg[1];
> + void *data = call->rx_dec_buffer;
> + u32 len = sp->len, data_size, buf;
> u16 check;
> - int nsg, ret;
> + int ret;
> 
> - _enter(",{%d}", sp->len);
> + _enter(",{%d}", len);
> 
> - if (sp->len < 8)
> + if (len < 8)
> return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
>  rxkad_abort_2_short_header);
> 
> /* Don't let the crypto algo see a misaligned length. */
> - sp->len = round_down(sp->len, 8);
> + len = round_down(len, 8);
> 
> - /* Decrypt the skbuff in-place.  TODO: We really want to decrypt
> - * directly into the target buffer.
> + /* Decrypt in place in the call's decryption buffer.  TODO: We really
> + * want to decrypt directly into the target buffer.
> */
> - sg = _sg;
> - nsg = skb_shinfo(skb)->nr_frags + 1;
> - if (nsg <= 4) {
> - nsg = 4;
> - } else {
> - sg = kmalloc_objs(*sg, nsg, GFP_NOIO);
> - if (!sg)
> - return -ENOMEM;
> - }
> -
> - sg_init_table(sg, nsg);
> - ret = skb_to_sgvec(skb, sg, sp->offset, sp->len);
> - if (unlikely(ret < 0)) {
> - if (sg != _sg)
> - kfree(sg);
> - return ret;
> - }
> + sg_init_one(sg, data, len);
> 
> /* decrypt from the session key */
> token = call->conn->key->payload.data[0];
> @@ -540,11 +521,9 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
> 
> skcipher_request_set_sync_tfm(req, call->conn->rxkad.cipher);
> skcipher_request_set_callback(req, 0, NULL, NULL);
> - skcipher_request_set_crypt(req, sg, sg, sp->len, iv.x);
> + skcipher_request_set_crypt(req, sg, sg, len, iv.x);
> ret = crypto_skcipher_decrypt(req);
> skcipher_request_zero(req);
> - if (sg != _sg)
> - kfree(sg);
> if (ret < 0) {
> if (ret == -ENOMEM)
> return ret;
> @@ -553,13 +532,11 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
> }
> 
> /* Extract the decrypted packet length */
> - if (skb_copy_bits(skb, sp->offset, &sechdr, sizeof(sechdr)) < 0)
> - return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
> -  rxkad_abort_2_short_len);
> - sp->offset += sizeof(sechdr);
> - sp->len    -= sizeof(sechdr);
> + sechdr = data;
> + call->rx_dec_offset = sizeof(*sechdr);
> + len -= sizeof(*sechdr);
> 
> - buf = ntohl(sechdr.data_size);
> + buf = ntohl(sechdr->data_size);
> data_size = buf & 0xffff;
> 
> check = buf >> 16;
> @@ -569,17 +546,18 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
> return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON,
>  rxkad_abort_2_short_check);
> 
> - if (data_size > sp->len)
> + if (data_size > len)
> return rxrpc_abort_eproto(call, skb, RXKADDATALEN,
>  rxkad_abort_2_short_data);
> 
> - sp->len = data_size;
> + call->rx_dec_len = data_size;
> _leave(" = 0 [dlen=%x]", data_size);
> return 0;
> }
> 
> /*
> - * Verify the security on a received packet and the subpackets therein.
> + * Verify the security on a received (sub)packet.  If the packet needs
> + * modifying (e.g. decrypting), it must be copied.
>  */
> static int rxkad_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
> {
> 

With the exception of the provided feedback this change looks good.

Thank you.

Jeffrey Altman



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4120 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-11 16:07 ` [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg David Howells
  2026-05-12  7:58   ` Jeffrey Altman
@ 2026-05-12 13:38   ` David Laight
  2026-05-12 16:52     ` David Howells
  1 sibling, 1 reply; 10+ messages in thread
From: David Laight @ 2026-05-12 13:38 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, Jiayuan Chen, stable

On Mon, 11 May 2026 17:07:48 +0100
David Howells <dhowells@redhat.com> wrote:

> This improves the fix for CVE-2026-43500.
> 
> Fix the pagecache corruption from in-place decryption of a DATA packet
> transmitted locally by splice() by getting rid of the packet sharing in the
> I/O thread and unconditionally extracting the packet content into a bounce
> buffer in which the buffer is decrypted.  recvmsg() (or the kernel
> equivalent) then copies the data from the bounce buffer to the destination
> buffer.  The sk_buff then remains unmodified.
> 
> This has an additional advantage in that the packet is then arranged in the
> buffer with the correct alignment required for the crypto algorithms to
> process directly.  The performance of the crypto does seem to be a little
> faster and, surprisingly, the unencrypted performance doesn't seem to
> change much - possibly due to removing complexity from the I/O thread.

Yep, avoiding data copies is overrated :-)

> Yet another advantage is that the I/O thread doesn't have to copy packets
> which would slow down packet distribution, ACK generation, etc..
> 
> The buffer belongs to the call and is allocated initially at 2K,
> sufficiently large to hold a whole jumbo subpacket, but the buffer will be
> increased in size if needed.  There is one downside here, and that's if a
> MSG_PEEK of more than one byte occurs, it may move on to the next packet,
> replacing the content of the buffer.  In such a case, it has to go back and
> re-decrypt the current packet.
> 
> Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
> switch to using USHRT_MAX to indicate an invalid offset.
> 
> Note also that I would generally prefer to replace the buffers of the
> current sk_buff with a new kmalloc'd buffer of the right size, ditching the
> old data and frags as this makes the handling of MSG_PEEK easier and
> removes the double-decryption issue, but this looks like quite a
> complicated thing to achieve.  skb_morph() looks half way to what I want,
> but I don't want to have to allocate a new sk_buff.

Wouldn't you need to do that anyway when the kkb is shared - or can't
that happen?

> 
> Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Simon Horman <horms@kernel.org>
> cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> cc: netdev@vger.kernel.org
> cc: linux-afs@lists.infradead.org
> cc: stable@vger.kernel.org
> ---
>  net/rxrpc/ar-internal.h |  7 +++-
>  net/rxrpc/call_event.c  | 22 +----------
>  net/rxrpc/call_object.c |  2 +
>  net/rxrpc/insecure.c    |  3 --
>  net/rxrpc/recvmsg.c     | 72 +++++++++++++++++++++++++++-------
>  net/rxrpc/rxgk.c        | 49 +++++++++++------------
>  net/rxrpc/rxgk_common.h | 79 +++++++++++++++++++++++++++++++++++++
>  net/rxrpc/rxkad.c       | 86 +++++++++++++++--------------------------
>  8 files changed, 200 insertions(+), 120 deletions(-)
> 
> diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
> index 27c2aa2dd023..783367eea798 100644
> --- a/net/rxrpc/ar-internal.h
> +++ b/net/rxrpc/ar-internal.h
> @@ -213,8 +213,6 @@ struct rxrpc_skb_priv {
>  		struct {
>  			u16		offset;		/* Offset of data */
>  			u16		len;		/* Length of data */
> -			u8		flags;
> -#define RXRPC_RX_VERIFIED	0x01
>  		};
>  		struct {
>  			rxrpc_seq_t	first_ack;	/* First packet in acks table */
> @@ -774,6 +772,11 @@ struct rxrpc_call {
>  	struct sk_buff_head	recvmsg_queue;	/* Queue of packets ready for recvmsg() */
>  	struct sk_buff_head	rx_queue;	/* Queue of packets for this call to receive */
>  	struct sk_buff_head	rx_oos_queue;	/* Queue of out of sequence packets */
> +	void			*rx_dec_buffer;	/* Decryption buffer */
> +	unsigned short		rx_dec_bsize;	/* rx_dec_buffer size */
> +	unsigned short		rx_dec_offset;	/* Decrypted packet data offset */
> +	unsigned short		rx_dec_len;	/* Decrypted packet data len */

Is it actually worth making those short rather than int?
I doubt the extra 4 bytes will matter and the generated code might be better.
(IIRC 32bit arm has a limited offset from 16 bit load/store, dunno about 64bit)

> +	rxrpc_seq_t		rx_dec_seq;	/* Packet in decryption buffer */
>  
>  	rxrpc_seq_t		rx_highest_seq;	/* Higest sequence number received */
>  	rxrpc_seq_t		rx_consumed;	/* Highest packet consumed */
> diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> index 2b19b252225e..fec59d9338b9 100644
> --- a/net/rxrpc/call_event.c
> +++ b/net/rxrpc/call_event.c
> @@ -332,27 +332,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
>  
>  			saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK;
>  
> -			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> -			    sp->hdr.securityIndex != 0 &&
> -			    (skb_cloned(skb) ||
> -			     skb_has_frag_list(skb) ||
> -			     skb_has_shared_frag(skb))) {
> -				/* Unshare the packet so that it can be
> -				 * modified by in-place decryption.
> -				 */
> -				struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
> -
> -				if (nskb) {
> -					rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
> -					rxrpc_input_call_packet(call, nskb);
> -					rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx);
> -				} else {
> -					/* OOM - Drop the packet. */
> -					rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
> -				}
> -			} else {
> -				rxrpc_input_call_packet(call, skb);
> -			}
> +			rxrpc_input_call_packet(call, skb);
>  			rxrpc_free_skb(skb, rxrpc_skb_put_call_rx);
>  			did_receive = true;
>  		}
> diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
> index f035f486c139..fcb9d38bb521 100644
> --- a/net/rxrpc/call_object.c
> +++ b/net/rxrpc/call_object.c
> @@ -152,6 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
>  	spin_lock_init(&call->notify_lock);
>  	refcount_set(&call->ref, 1);
>  	call->debug_id		= debug_id;
> +	call->rx_pkt_offset	= USHRT_MAX;
>  	call->tx_total_len	= -1;
>  	call->tx_jumbo_max	= 1;
>  	call->next_rx_timo	= 20 * HZ;
> @@ -553,6 +554,7 @@ static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call)
>  	rxrpc_purge_queue(&call->recvmsg_queue);
>  	rxrpc_purge_queue(&call->rx_queue);
>  	rxrpc_purge_queue(&call->rx_oos_queue);
> +	kfree(call->rx_dec_buffer);
>  }
>  
>  /*
> diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c
> index 0a260df45d25..7a26c6097d03 100644
> --- a/net/rxrpc/insecure.c
> +++ b/net/rxrpc/insecure.c
> @@ -32,9 +32,6 @@ static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
>  
>  static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
>  {
> -	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> -
> -	sp->flags |= RXRPC_RX_VERIFIED;
>  	return 0;
>  }
>  
> diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
> index e1f7513a46db..865e368381d5 100644
> --- a/net/rxrpc/recvmsg.c
> +++ b/net/rxrpc/recvmsg.c
> @@ -147,15 +147,55 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
>  }
>  
>  /*
> - * Decrypt and verify a DATA packet.
> + * Decrypt and verify a DATA packet.  The content of the packet is pulled out
> + * into a flat buffer rather than decrypting in place in the skbuff.  This also
> + * has the advantage of aligning the buffer correctly for the crypto routines.
> + *
> + * We keep track of the sequence number of the packet currently decrypted into
> + * the buffer in ->rx_dec_seq.  Unfortunately, this means that a MSG_PEEK of
> + * more than one byte may cause a later packet to be decrypted into the buffer,
> + * requiring the original to be re-decrypted when recvmsg() is called again.
>   */
>  static int rxrpc_verify_data(struct rxrpc_call *call, struct sk_buff *skb)
>  {
>  	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> +	int ret;
>  
> -	if (sp->flags & RXRPC_RX_VERIFIED)
> +	if (call->rx_dec_seq == sp->hdr.seq && call->rx_dec_buffer)
>  		return 0;
> -	return call->security->verify_packet(call, skb);
> +
> +	if (call->rx_dec_bsize < sp->len) {

IMHO That test is backwards; the 'more constant' value should be on the right.

> +		/* Make sure we can hold a 1412-byte jumbo subpacket and make
> +		 * sure that the buffer size is aligned to a crypto blocksize.
> +		 */
> +		size_t size = umin(round_up(sp->len, 32), 2048);

Doesn't min() work?

> +		void *buffer = krealloc(call->rx_dec_buffer, size, GFP_NOFS);
> +
> +		if (!buffer)
> +			return -ENOMEM;
> +		call->rx_dec_buffer = buffer;
> +		call->rx_dec_bsize = size;
> +	}

That doesn't look right.
If sp->len is bigger than 2048 the you keep allocating a new buffer
and the call below overruns the allocated buffer.

> +
> +	ret = -EFAULT;
> +	if (skb_copy_bits(skb, sp->offset, call->rx_dec_buffer, sp->len) < 0)
> +		goto err;
> +
...

-- David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-12 13:38   ` David Laight
@ 2026-05-12 16:52     ` David Howells
  2026-05-12 21:36       ` David Laight
  0 siblings, 1 reply; 10+ messages in thread
From: David Howells @ 2026-05-12 16:52 UTC (permalink / raw)
  To: David Laight
  Cc: dhowells, netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	linux-afs, linux-kernel, Jeffrey Altman, Jiayuan Chen, stable

David Laight <david.laight.linux@gmail.com> wrote:

> > Note also that I would generally prefer to replace the buffers of the
> > current sk_buff with a new kmalloc'd buffer of the right size, ditching the
> > old data and frags as this makes the handling of MSG_PEEK easier and
> > removes the double-decryption issue, but this looks like quite a
> > complicated thing to achieve.  skb_morph() looks half way to what I want,
> > but I don't want to have to allocate a new sk_buff.
> 
> Wouldn't you need to do that anyway when the kkb is shared - or can't
> that happen?

Hmmm...  That may well be the case - but if it's shared, do I own the
->next/prev pointers and the ->cb area?

> > +	unsigned short		rx_dec_bsize;	/* rx_dec_buffer size */
> > +	unsigned short		rx_dec_offset;	/* Decrypted packet data offset */
> > +	unsigned short		rx_dec_len;	/* Decrypted packet data len */
> 
> Is it actually worth making those short rather than int?
> I doubt the extra 4 bytes will matter and the generated code might be better.
> (IIRC 32bit arm has a limited offset from 16 bit load/store, dunno about 64bit)

Well, the capacity of a UDP packet less the rxrpc header can't reach 65535, so
on that basis this should be fine.  I'm a little worried about the rxrpc_call
struct's size - it's already ~1.3K.  It's already got a lot of 8- and 16-bit
fields in it.  Of course, it's nowhere near as bit-for-bit optimised as
sk_buff, but I guess there are a lot more of those in a system.

> > +	if (call->rx_dec_bsize < sp->len) {
> 
> IMHO That test is backwards; the 'more constant' value should be on the right.

Actually, the thing you're testing should be on the left and the thing you're
testing against on the right - but, yes, I should switch them around.

> > +		size_t size = umin(round_up(sp->len, 32), 2048);
> 
> Doesn't min() work?

Actually, it should be umax() as I want the largest of the values (as Jeff
pointed out).  I prefer using umin/umax for values that are known to be
unsigned as you don't get casting errors (see the number of places we end up
using min/max_t(<unsigned-type>, ...) when we should use umin/umax() instead)
and the compiler may generate better code as we've told it that it doesn't
have to worry about negatives.

> That doesn't look right.
> If sp->len is bigger than 2048 the you keep allocating a new buffer
> and the call below overruns the allocated buffer.

Yep - see the aforementioned umax comment.

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-12 16:52     ` David Howells
@ 2026-05-12 21:36       ` David Laight
  0 siblings, 0 replies; 10+ messages in thread
From: David Laight @ 2026-05-12 21:36 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, Jiayuan Chen, stable

On Tue, 12 May 2026 17:52:03 +0100
David Howells <dhowells@redhat.com> wrote:

> David Laight <david.laight.linux@gmail.com> wrote:
...
> > > +		size_t size = umin(round_up(sp->len, 32), 2048);  
> > 
> > Doesn't min() work?  
> 
> Actually, it should be umax() as I want the largest of the values (as Jeff
> pointed out).  I prefer using umin/umax for values that are known to be
> unsigned as you don't get casting errors (see the number of places we end up
> using min/max_t(<unsigned-type>, ...) when we should use umin/umax() instead)
> and the compiler may generate better code as we've told it that it doesn't
> have to worry about negatives.

umin() and umax() are better than min_t() and max_t() (which is why I added
them); but you lose the compile-time check in min() and max() that rejects
comparisons where one side is unsigned and the compiler doesn't know that the 
other is always non-negative.

Basically if you compare a signed 32bit value and an unsigned 64bit one
with umin() the 32bit one is zero-extended to 64 bits.
OTOH min_t(u64) will sign-extend the 32bit value and then treat it as unsigned.
In both cases the onus is on the programmer to ensure the 32bit value isn't
negative.
For valid non-negative values the result is the same.
Zero-extending is usually free, sign-extending is particularly horrid on 32bit.

But it is better to use min() or max().
The compile-time tests will reject any cases where the integer promotion
rules could convert a negative value to a large positive one.
Note that the types no longer have to match.
Code like this is (usually) ok:
	unsigned int blk_len = ...;
	int rval = fun(...);
	while (rval > 0) {
		u32 len = min(rval, blk_len);
		// process len bytes;
		rval -= len;
	}
even though the types passed to min() differ in signedness the compiler's
value tracking means it knows that rval can never become a large unsigned
value - and min() uses that to allow it all through.
	
-- David

> 
> > That doesn't look right.
> > If sp->len is bigger than 2048 the you keep allocating a new buffer
> > and the call below overruns the allocated buffer.  
> 
> Yep - see the aforementioned umax comment.
> 
> David
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-12  7:58   ` Jeffrey Altman
@ 2026-05-13  8:01     ` David Howells
  2026-05-13  8:13       ` David Howells
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: David Howells @ 2026-05-13  8:01 UTC (permalink / raw)
  To: Jeffrey Altman
  Cc: dhowells, netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	linux-afs, linux-kernel, Jiayuan Chen, stable

Jeffrey Altman <jaltman@auristor.com> wrote:

> > + void *rx_dec_buffer; /* Decryption buffer */
> > + unsigned short rx_dec_bsize; /* rx_dec_buffer size */
> > + unsigned short rx_dec_offset; /* Decrypted packet data offset */
> > + unsigned short rx_dec_len; /* Decrypted packet data len */
> > + rxrpc_seq_t rx_dec_seq; /* Packet in decryption buffer */
> > 
> > rxrpc_seq_t rx_highest_seq; /* Higest sequence number received */
> > rxrpc_seq_t rx_consumed; /* Highest packet consumed */
> 
> 
> Instead of allocating the storage within struct rxrpc_call perhaps
> It would be better to add them to struct rxrpc_channel.  Doing so 
> would reduce the allocation/deallocation churn.  The majority of
> calls are short lived (perhaps a single packet in each direction)
> but there will be many calls in rapid succession.

I'm trying to keep the I/O side separate from the application side.  I don't
particularly want recvmsg (on the app side) reaching into the rxrpc_connection
struct (on the I/O side).

Further, by only looking at the rxrpc_call struct, I don't have to deal with
locking required for the possibility that the next call on that channel will
start before I've finished with this one (say an incoming call is aborted and
immediately followed up by the first packet of the next call).

> > + size_t size = umin(round_up(sp->len, 32), 2048);
> 
> I think you meant to use max() here so that a minimum of 2048 bytes
> is allocated.  

Yeah.

> I think applying a cap on the allocation size would also be 
> beneficial.  IBM/Transarc derived Rx implementations have a hard
> upper-bound of 21180 (15 x 1412) bytes plus one 28 byte rx header.
> Applying a cap of 32KiB seems prudent.

This would need checking earlier in the input path.  A DATA packet that's too
large would need to be rejected as it comes off of the UDP socket if we're not
going to be able to unpack it later.

> It is also worth noting that there are no current implementations
> of Rx RPC which will send individual Rx DATA packets larger than 
> 1444 bytes including the Rx header.  Rx RESPONSE packets can be sent
> as large as 16384 bytes (including the Rx header).  However, it is
> extremely unlikely that this buffer once allocated would ever need 
> to be grown.  

For Rx RESPONSE packets, I'm fine with allocating a buffer on the spur of the
moment and freeing it immediately.  Ideally, there would only be one RESPONSE
per connection anyway.  I could do a static buffer with a lock, I suppose, to
make sure I can process the things under memory pressure-based writeback.

> > + kfree(call->rx_dec_buffer);
> 
> It might be better to avoid deallocating the buffer on the error
> path and permit it to be freed during normal call (or call channel)
> deallocation.

Hmmm.  But I then need some other way to note that the buffer is no longer
occupied by valid data.  I suppose I could set ->rx_dec_offset to USHRT_MAX.

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-13  8:01     ` David Howells
@ 2026-05-13  8:13       ` David Howells
  2026-05-13  8:38       ` David Laight
  2026-05-13  9:48       ` Jeffrey Altman
  2 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2026-05-13  8:13 UTC (permalink / raw)
  To: Jeffrey Altman
  Cc: dhowells, netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	linux-afs, linux-kernel, Jiayuan Chen, stable

David Howells <dhowells@redhat.com> wrote:

> > > + kfree(call->rx_dec_buffer);
> > 
> > It might be better to avoid deallocating the buffer on the error
> > path and permit it to be freed during normal call (or call channel)
> > deallocation.
> 
> Hmmm.  But I then need some other way to note that the buffer is no longer
> occupied by valid data.  I suppose I could set ->rx_dec_offset to USHRT_MAX.

Actually, I'm not sure that just freeing the buffer is all that bad.

If skb_copy_bits() fails (ie. EFAULT), then the sk_buff is unrecoverably
broken somehow and the app will may have to abandon the call.  Possibly the
call should be aborted directly here.  The case really shouldn't happen and
probably merits a pr_warn().

If ->verify_packet() fails with ENOMEM, then it's retryable.  Releasing the
buffer temporarily might help the system.

If ->verify_packet() fails with anything else, then the call should have been
aborted.

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-13  8:01     ` David Howells
  2026-05-13  8:13       ` David Howells
@ 2026-05-13  8:38       ` David Laight
  2026-05-13  9:48       ` Jeffrey Altman
  2 siblings, 0 replies; 10+ messages in thread
From: David Laight @ 2026-05-13  8:38 UTC (permalink / raw)
  To: David Howells
  Cc: Jeffrey Altman, netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	linux-afs, linux-kernel, Jiayuan Chen, stable

On Wed, 13 May 2026 09:01:14 +0100
David Howells <dhowells@redhat.com> wrote:

> Jeffrey Altman <jaltman@auristor.com> wrote:
> 
> > > + void *rx_dec_buffer; /* Decryption buffer */
> > > + unsigned short rx_dec_bsize; /* rx_dec_buffer size */
> > > + unsigned short rx_dec_offset; /* Decrypted packet data offset */
> > > + unsigned short rx_dec_len; /* Decrypted packet data len */
> > > + rxrpc_seq_t rx_dec_seq; /* Packet in decryption buffer */
> > > 
> > > rxrpc_seq_t rx_highest_seq; /* Higest sequence number received */
> > > rxrpc_seq_t rx_consumed; /* Highest packet consumed */  
> > 
> > 
> > Instead of allocating the storage within struct rxrpc_call perhaps
> > It would be better to add them to struct rxrpc_channel.  Doing so 
> > would reduce the allocation/deallocation churn.  The majority of
> > calls are short lived (perhaps a single packet in each direction)
> > but there will be many calls in rapid succession.  
> 
> I'm trying to keep the I/O side separate from the application side.  I don't
> particularly want recvmsg (on the app side) reaching into the rxrpc_connection
> struct (on the I/O side).
> 
> Further, by only looking at the rxrpc_call struct, I don't have to deal with
> locking required for the possibility that the next call on that channel will
> start before I've finished with this one (say an incoming call is aborted and
> immediately followed up by the first packet of the next call).

There are also loads of other allocates and frees (eg the skb itself).
One more isn't really going to be significant.
Especially for sub-page sizes that just come of a per-cpu list.

> 
> > > + size_t size = umin(round_up(sp->len, 32), 2048);  
> > 
> > I think you meant to use max() here so that a minimum of 2048 bytes
> > is allocated.    
> 
> Yeah.
> 
> > I think applying a cap on the allocation size would also be 
> > beneficial.  IBM/Transarc derived Rx implementations have a hard
> > upper-bound of 21180 (15 x 1412) bytes plus one 28 byte rx header.
> > Applying a cap of 32KiB seems prudent.  
> 
> This would need checking earlier in the input path.  A DATA packet that's too
> large would need to be rejected as it comes off of the UDP socket if we're not
> going to be able to unpack it later.
> 
> > It is also worth noting that there are no current implementations
> > of Rx RPC which will send individual Rx DATA packets larger than 
> > 1444 bytes including the Rx header.  Rx RESPONSE packets can be sent
> > as large as 16384 bytes (including the Rx header).  However, it is
> > extremely unlikely that this buffer once allocated would ever need 
> > to be grown.    
> 
> For Rx RESPONSE packets, I'm fine with allocating a buffer on the spur of the
> moment and freeing it immediately.  Ideally, there would only be one RESPONSE
> per connection anyway.  I could do a static buffer with a lock, I suppose, to
> make sure I can process the things under memory pressure-based writeback.

A 16K block of static data is rather a waste.
Under that much memory pressure something has to give.
Dropping a packet and forcing the remote to resend on timeout
may actually be the best thing to do.

-- David L

> 
> > > + kfree(call->rx_dec_buffer);  
> > 
> > It might be better to avoid deallocating the buffer on the error
> > path and permit it to be freed during normal call (or call channel)
> > deallocation.  
> 
> Hmmm.  But I then need some other way to note that the buffer is no longer
> occupied by valid data.  I suppose I could set ->rx_dec_offset to USHRT_MAX.
> 
> David
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
  2026-05-13  8:01     ` David Howells
  2026-05-13  8:13       ` David Howells
  2026-05-13  8:38       ` David Laight
@ 2026-05-13  9:48       ` Jeffrey Altman
  2 siblings, 0 replies; 10+ messages in thread
From: Jeffrey Altman @ 2026-05-13  9:48 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Hyunwoo Kim, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jiayuan Chen, stable

[-- Attachment #1: Type: text/plain, Size: 4497 bytes --]



> On May 13, 2026, at 4:01 AM, David Howells <dhowells@redhat.com> wrote:
> 
> Jeffrey Altman <jaltman@auristor.com> wrote:
> 
>>> + void *rx_dec_buffer; /* Decryption buffer */
>>> + unsigned short rx_dec_bsize; /* rx_dec_buffer size */
>>> + unsigned short rx_dec_offset; /* Decrypted packet data offset */
>>> + unsigned short rx_dec_len; /* Decrypted packet data len */
>>> + rxrpc_seq_t rx_dec_seq; /* Packet in decryption buffer */
>>> 
>>> rxrpc_seq_t rx_highest_seq; /* Higest sequence number received */
>>> rxrpc_seq_t rx_consumed; /* Highest packet consumed */
>> 
>> 
>> Instead of allocating the storage within struct rxrpc_call perhaps
>> It would be better to add them to struct rxrpc_channel.  Doing so 
>> would reduce the allocation/deallocation churn.  The majority of
>> calls are short lived (perhaps a single packet in each direction)
>> but there will be many calls in rapid succession.
> 
> I'm trying to keep the I/O side separate from the application side.  I don't
> particularly want recvmsg (on the app side) reaching into the rxrpc_connection
> struct (on the I/O side).
> 
> Further, by only looking at the rxrpc_call struct, I don't have to deal with
> locking required for the possibility that the next call on that channel will
> start before I've finished with this one (say an incoming call is aborted and
> immediately followed up by the first packet of the next call).

There could only be one rxrpc_call structure at a time referring to the 
rxrpc_channel.  If rxrpc_input_packet_on_conn() receives a DATA packet for a
later call number and there is a prior rxrpc_call assigned to rxrpc_channel.call
then an RX_PACKET_TYPE_BUSY should be sent to the peer if the rx_call cannot be
terminated without blocking the processing of incoming packets.  The Rx BUSY
informs the initiating peer that the DATA packet has been dropped and that it
should be retransmitted (which will occur anyway due to the lack of an Rx ACK
packet.

The busy call channel logic is out of scope for this change and avoiding
per-call allocation churn is an optimization that can be implemented another
day if desired.

> 
>>> + size_t size = umin(round_up(sp->len, 32), 2048);
>> 
>> I think you meant to use max() here so that a minimum of 2048 bytes
>> is allocated.  
> 
> Yeah.
> 
>> I think applying a cap on the allocation size would also be 
>> beneficial.  IBM/Transarc derived Rx implementations have a hard
>> upper-bound of 21180 (15 x 1412) bytes plus one 28 byte rx header.
>> Applying a cap of 32KiB seems prudent.
> 
> This would need checking earlier in the input path.  A DATA packet that's too
> large would need to be rejected as it comes off of the UDP socket if we're not
> going to be able to unpack it later.

ok.

> 
>> It is also worth noting that there are no current implementations
>> of Rx RPC which will send individual Rx DATA packets larger than 
>> 1444 bytes including the Rx header.  Rx RESPONSE packets can be sent
>> as large as 16384 bytes (including the Rx header).  However, it is
>> extremely unlikely that this buffer once allocated would ever need 
>> to be grown.  
> 
> For Rx RESPONSE packets, I'm fine with allocating a buffer on the spur of the
> moment and freeing it immediately.  Ideally, there would only be one RESPONSE
> per connection anyway.  I could do a static buffer with a lock, I suppose, to
> make sure I can process the things under memory pressure-based writeback.

I agree.  The Rx RESPONSE packets belong to the rxrpc_connection and not to 
any particular call.  It is possible for there to be more than one Rx RESPONSE
packet received due to Rx CHALLENGE retransmission or because this peer decided
it can no longer trust the prior authentication and sends a new Rx CHALLENGE.
However, RESPONSE packets are few and far between.  There is no benefit to
adding another 16KB to the rxrpc_connection allocation.

> 
>>> + kfree(call->rx_dec_buffer);
>> 
>> It might be better to avoid deallocating the buffer on the error
>> path and permit it to be freed during normal call (or call channel)
>> deallocation.
> 
> Hmmm.  But I then need some other way to note that the buffer is no longer
> occupied by valid data.  I suppose I could set ->rx_dec_offset to USHRT_MAX.

That meaning was unclear and could warrant a code comment indicating that
is why the allocation is being freed.


> David
> 

Thanks.

Jeffrey Altman

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4120 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-13  9:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260511160753.607296-1-dhowells@redhat.com>
2026-05-11 16:07 ` [PATCH net 1/3] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present David Howells
2026-05-11 16:07 ` [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg David Howells
2026-05-12  7:58   ` Jeffrey Altman
2026-05-13  8:01     ` David Howells
2026-05-13  8:13       ` David Howells
2026-05-13  8:38       ` David Laight
2026-05-13  9:48       ` Jeffrey Altman
2026-05-12 13:38   ` David Laight
2026-05-12 16:52     ` David Howells
2026-05-12 21:36       ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox