Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg

Linux kernel -stable discussions
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org, Hyunwoo Kim <imv4bel@gmail.com>,
	Marc Dionne <marc.dionne@auristor.com>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org,
	Jeffrey Altman <jaltman@auristor.com>,
	Jiayuan Chen <jiayuan.chen@linux.dev>,
	stable@vger.kernel.org
Subject: Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
Date: Tue, 12 May 2026 14:38:35 +0100	[thread overview]
Message-ID: <20260512143835.13af8bc4@pumpkin> (raw)
In-Reply-To: <20260511160753.607296-3-dhowells@redhat.com>

On Mon, 11 May 2026 17:07:48 +0100
David Howells <dhowells@redhat.com> wrote:

> This improves the fix for CVE-2026-43500.
> 
> Fix the pagecache corruption from in-place decryption of a DATA packet
> transmitted locally by splice() by getting rid of the packet sharing in the
> I/O thread and unconditionally extracting the packet content into a bounce
> buffer in which the buffer is decrypted.  recvmsg() (or the kernel
> equivalent) then copies the data from the bounce buffer to the destination
> buffer.  The sk_buff then remains unmodified.
> 
> This has an additional advantage in that the packet is then arranged in the
> buffer with the correct alignment required for the crypto algorithms to
> process directly.  The performance of the crypto does seem to be a little
> faster and, surprisingly, the unencrypted performance doesn't seem to
> change much - possibly due to removing complexity from the I/O thread.

Yep, avoiding data copies is overrated :-)

> Yet another advantage is that the I/O thread doesn't have to copy packets
> which would slow down packet distribution, ACK generation, etc..
> 
> The buffer belongs to the call and is allocated initially at 2K,
> sufficiently large to hold a whole jumbo subpacket, but the buffer will be
> increased in size if needed.  There is one downside here, and that's if a
> MSG_PEEK of more than one byte occurs, it may move on to the next packet,
> replacing the content of the buffer.  In such a case, it has to go back and
> re-decrypt the current packet.
> 
> Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
> switch to using USHRT_MAX to indicate an invalid offset.
> 
> Note also that I would generally prefer to replace the buffers of the
> current sk_buff with a new kmalloc'd buffer of the right size, ditching the
> old data and frags as this makes the handling of MSG_PEEK easier and
> removes the double-decryption issue, but this looks like quite a
> complicated thing to achieve.  skb_morph() looks half way to what I want,
> but I don't want to have to allocate a new sk_buff.

Wouldn't you need to do that anyway when the kkb is shared - or can't
that happen?

> 
> Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Simon Horman <horms@kernel.org>
> cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> cc: netdev@vger.kernel.org
> cc: linux-afs@lists.infradead.org
> cc: stable@vger.kernel.org
> ---
>  net/rxrpc/ar-internal.h |  7 +++-
>  net/rxrpc/call_event.c  | 22 +----------
>  net/rxrpc/call_object.c |  2 +
>  net/rxrpc/insecure.c    |  3 --
>  net/rxrpc/recvmsg.c     | 72 +++++++++++++++++++++++++++-------
>  net/rxrpc/rxgk.c        | 49 +++++++++++------------
>  net/rxrpc/rxgk_common.h | 79 +++++++++++++++++++++++++++++++++++++
>  net/rxrpc/rxkad.c       | 86 +++++++++++++++--------------------------
>  8 files changed, 200 insertions(+), 120 deletions(-)
> 
> diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
> index 27c2aa2dd023..783367eea798 100644
> --- a/net/rxrpc/ar-internal.h
> +++ b/net/rxrpc/ar-internal.h
> @@ -213,8 +213,6 @@ struct rxrpc_skb_priv {
>  		struct {
>  			u16		offset;		/* Offset of data */
>  			u16		len;		/* Length of data */
> -			u8		flags;
> -#define RXRPC_RX_VERIFIED	0x01
>  		};
>  		struct {
>  			rxrpc_seq_t	first_ack;	/* First packet in acks table */
> @@ -774,6 +772,11 @@ struct rxrpc_call {
>  	struct sk_buff_head	recvmsg_queue;	/* Queue of packets ready for recvmsg() */
>  	struct sk_buff_head	rx_queue;	/* Queue of packets for this call to receive */
>  	struct sk_buff_head	rx_oos_queue;	/* Queue of out of sequence packets */
> +	void			*rx_dec_buffer;	/* Decryption buffer */
> +	unsigned short		rx_dec_bsize;	/* rx_dec_buffer size */
> +	unsigned short		rx_dec_offset;	/* Decrypted packet data offset */
> +	unsigned short		rx_dec_len;	/* Decrypted packet data len */

Is it actually worth making those short rather than int?
I doubt the extra 4 bytes will matter and the generated code might be better.
(IIRC 32bit arm has a limited offset from 16 bit load/store, dunno about 64bit)

> +	rxrpc_seq_t		rx_dec_seq;	/* Packet in decryption buffer */
>  
>  	rxrpc_seq_t		rx_highest_seq;	/* Higest sequence number received */
>  	rxrpc_seq_t		rx_consumed;	/* Highest packet consumed */
> diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> index 2b19b252225e..fec59d9338b9 100644
> --- a/net/rxrpc/call_event.c
> +++ b/net/rxrpc/call_event.c
> @@ -332,27 +332,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
>  
>  			saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK;
>  
> -			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> -			    sp->hdr.securityIndex != 0 &&
> -			    (skb_cloned(skb) ||
> -			     skb_has_frag_list(skb) ||
> -			     skb_has_shared_frag(skb))) {
> -				/* Unshare the packet so that it can be
> -				 * modified by in-place decryption.
> -				 */
> -				struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
> -
> -				if (nskb) {
> -					rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
> -					rxrpc_input_call_packet(call, nskb);
> -					rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx);
> -				} else {
> -					/* OOM - Drop the packet. */
> -					rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
> -				}
> -			} else {
> -				rxrpc_input_call_packet(call, skb);
> -			}
> +			rxrpc_input_call_packet(call, skb);
>  			rxrpc_free_skb(skb, rxrpc_skb_put_call_rx);
>  			did_receive = true;
>  		}
> diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
> index f035f486c139..fcb9d38bb521 100644
> --- a/net/rxrpc/call_object.c
> +++ b/net/rxrpc/call_object.c
> @@ -152,6 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
>  	spin_lock_init(&call->notify_lock);
>  	refcount_set(&call->ref, 1);
>  	call->debug_id		= debug_id;
> +	call->rx_pkt_offset	= USHRT_MAX;
>  	call->tx_total_len	= -1;
>  	call->tx_jumbo_max	= 1;
>  	call->next_rx_timo	= 20 * HZ;
> @@ -553,6 +554,7 @@ static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call)
>  	rxrpc_purge_queue(&call->recvmsg_queue);
>  	rxrpc_purge_queue(&call->rx_queue);
>  	rxrpc_purge_queue(&call->rx_oos_queue);
> +	kfree(call->rx_dec_buffer);
>  }
>  
>  /*
> diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c
> index 0a260df45d25..7a26c6097d03 100644
> --- a/net/rxrpc/insecure.c
> +++ b/net/rxrpc/insecure.c
> @@ -32,9 +32,6 @@ static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
>  
>  static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
>  {
> -	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> -
> -	sp->flags |= RXRPC_RX_VERIFIED;
>  	return 0;
>  }
>  
> diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
> index e1f7513a46db..865e368381d5 100644
> --- a/net/rxrpc/recvmsg.c
> +++ b/net/rxrpc/recvmsg.c
> @@ -147,15 +147,55 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
>  }
>  
>  /*
> - * Decrypt and verify a DATA packet.
> + * Decrypt and verify a DATA packet.  The content of the packet is pulled out
> + * into a flat buffer rather than decrypting in place in the skbuff.  This also
> + * has the advantage of aligning the buffer correctly for the crypto routines.
> + *
> + * We keep track of the sequence number of the packet currently decrypted into
> + * the buffer in ->rx_dec_seq.  Unfortunately, this means that a MSG_PEEK of
> + * more than one byte may cause a later packet to be decrypted into the buffer,
> + * requiring the original to be re-decrypted when recvmsg() is called again.
>   */
>  static int rxrpc_verify_data(struct rxrpc_call *call, struct sk_buff *skb)
>  {
>  	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> +	int ret;
>  
> -	if (sp->flags & RXRPC_RX_VERIFIED)
> +	if (call->rx_dec_seq == sp->hdr.seq && call->rx_dec_buffer)
>  		return 0;
> -	return call->security->verify_packet(call, skb);
> +
> +	if (call->rx_dec_bsize < sp->len) {

IMHO That test is backwards; the 'more constant' value should be on the right.

> +		/* Make sure we can hold a 1412-byte jumbo subpacket and make
> +		 * sure that the buffer size is aligned to a crypto blocksize.
> +		 */
> +		size_t size = umin(round_up(sp->len, 32), 2048);

Doesn't min() work?

> +		void *buffer = krealloc(call->rx_dec_buffer, size, GFP_NOFS);
> +
> +		if (!buffer)
> +			return -ENOMEM;
> +		call->rx_dec_buffer = buffer;
> +		call->rx_dec_bsize = size;
> +	}

That doesn't look right.
If sp->len is bigger than 2048 the you keep allocating a new buffer
and the call below overruns the allocated buffer.

> +
> +	ret = -EFAULT;
> +	if (skb_copy_bits(skb, sp->offset, call->rx_dec_buffer, sp->len) < 0)
> +		goto err;
> +
...

-- David

next prev parent reply	other threads:[~2026-05-12 13:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260511160753.607296-1-dhowells@redhat.com>
2026-05-11 16:07 ` [PATCH net 1/3] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present David Howells
2026-05-11 16:07 ` [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg David Howells
2026-05-12  7:58   ` Jeffrey Altman
2026-05-12 13:38   ` David Laight [this message]
2026-05-12 16:52     ` David Howells
2026-05-12 21:36       ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512143835.13af8bc4@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=imv4bel@gmail.com \
    --cc=jaltman@auristor.com \
    --cc=jiayuan.chen@linux.dev \
    --cc=kuba@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox