Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org, Hyunwoo Kim <imv4bel@gmail.com>,
	Marc Dionne <marc.dionne@auristor.com>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org,
	Jeffrey Altman <jaltman@auristor.com>,
	Jiayuan Chen <jiayuan.chen@linux.dev>,
	stable@vger.kernel.org
Subject: Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
Date: Tue, 12 May 2026 14:38:35 +0100	[thread overview]
Message-ID: <20260512143835.13af8bc4@pumpkin> (raw)
In-Reply-To: <20260511160753.607296-3-dhowells@redhat.com>

On Mon, 11 May 2026 17:07:48 +0100
David Howells <dhowells@redhat.com> wrote:

> This improves the fix for CVE-2026-43500.
> 
> Fix the pagecache corruption from in-place decryption of a DATA packet
> transmitted locally by splice() by getting rid of the packet sharing in the
> I/O thread and unconditionally extracting the packet content into a bounce
> buffer in which the buffer is decrypted.  recvmsg() (or the kernel
> equivalent) then copies the data from the bounce buffer to the destination
> buffer.  The sk_buff then remains unmodified.
> 
> This has an additional advantage in that the packet is then arranged in the
> buffer with the correct alignment required for the crypto algorithms to
> process directly.  The performance of the crypto does seem to be a little
> faster and, surprisingly, the unencrypted performance doesn't seem to
> change much - possibly due to removing complexity from the I/O thread.

Yep, avoiding data copies is overrated :-)

> Yet another advantage is that the I/O thread doesn't have to copy packets
> which would slow down packet distribution, ACK generation, etc..
> 
> The buffer belongs to the call and is allocated initially at 2K,
> sufficiently large to hold a whole jumbo subpacket, but the buffer will be
> increased in size if needed.  There is one downside here, and that's if a
> MSG_PEEK of more than one byte occurs, it may move on to the next packet,
> replacing the content of the buffer.  In such a case, it has to go back and
> re-decrypt the current packet.
> 
> Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
> switch to using USHRT_MAX to indicate an invalid offset.
> 
> Note also that I would generally prefer to replace the buffers of the
> current sk_buff with a new kmalloc'd buffer of the right size, ditching the
> old data and frags as this makes the handling of MSG_PEEK easier and
> removes the double-decryption issue, but this looks like quite a
> complicated thing to achieve.  skb_morph() looks half way to what I want,
> but I don't want to have to allocate a new sk_buff.

Wouldn't you need to do that anyway when the kkb is shared - or can't
that happen?

> 
> Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Simon Horman <horms@kernel.org>
> cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> cc: netdev@vger.kernel.org
> cc: linux-afs@lists.infradead.org
> cc: stable@vger.kernel.org
> ---
>  net/rxrpc/ar-internal.h |  7 +++-
>  net/rxrpc/call_event.c  | 22 +----------
>  net/rxrpc/call_object.c |  2 +
>  net/rxrpc/insecure.c    |  3 --
>  net/rxrpc/recvmsg.c     | 72 +++++++++++++++++++++++++++-------
>  net/rxrpc/rxgk.c        | 49 +++++++++++------------
>  net/rxrpc/rxgk_common.h | 79 +++++++++++++++++++++++++++++++++++++
>  net/rxrpc/rxkad.c       | 86 +++++++++++++++--------------------------
>  8 files changed, 200 insertions(+), 120 deletions(-)
> 
> diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
> index 27c2aa2dd023..783367eea798 100644
> --- a/net/rxrpc/ar-internal.h
> +++ b/net/rxrpc/ar-internal.h
> @@ -213,8 +213,6 @@ struct rxrpc_skb_priv {
>  		struct {
>  			u16		offset;		/* Offset of data */
>  			u16		len;		/* Length of data */
> -			u8		flags;
> -#define RXRPC_RX_VERIFIED	0x01
>  		};
>  		struct {
>  			rxrpc_seq_t	first_ack;	/* First packet in acks table */
> @@ -774,6 +772,11 @@ struct rxrpc_call {
>  	struct sk_buff_head	recvmsg_queue;	/* Queue of packets ready for recvmsg() */
>  	struct sk_buff_head	rx_queue;	/* Queue of packets for this call to receive */
>  	struct sk_buff_head	rx_oos_queue;	/* Queue of out of sequence packets */
> +	void			*rx_dec_buffer;	/* Decryption buffer */
> +	unsigned short		rx_dec_bsize;	/* rx_dec_buffer size */
> +	unsigned short		rx_dec_offset;	/* Decrypted packet data offset */
> +	unsigned short		rx_dec_len;	/* Decrypted packet data len */

Is it actually worth making those short rather than int?
I doubt the extra 4 bytes will matter and the generated code might be better.
(IIRC 32bit arm has a limited offset from 16 bit load/store, dunno about 64bit)

> +	rxrpc_seq_t		rx_dec_seq;	/* Packet in decryption buffer */
>  
>  	rxrpc_seq_t		rx_highest_seq;	/* Higest sequence number received */
>  	rxrpc_seq_t		rx_consumed;	/* Highest packet consumed */
> diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> index 2b19b252225e..fec59d9338b9 100644
> --- a/net/rxrpc/call_event.c
> +++ b/net/rxrpc/call_event.c
> @@ -332,27 +332,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
>  
>  			saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK;
>  
> -			if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> -			    sp->hdr.securityIndex != 0 &&
> -			    (skb_cloned(skb) ||
> -			     skb_has_frag_list(skb) ||
> -			     skb_has_shared_frag(skb))) {
> -				/* Unshare the packet so that it can be
> -				 * modified by in-place decryption.
> -				 */
> -				struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
> -
> -				if (nskb) {
> -					rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
> -					rxrpc_input_call_packet(call, nskb);
> -					rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx);
> -				} else {
> -					/* OOM - Drop the packet. */
> -					rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
> -				}
> -			} else {
> -				rxrpc_input_call_packet(call, skb);
> -			}
> +			rxrpc_input_call_packet(call, skb);
>  			rxrpc_free_skb(skb, rxrpc_skb_put_call_rx);
>  			did_receive = true;
>  		}
> diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
> index f035f486c139..fcb9d38bb521 100644
> --- a/net/rxrpc/call_object.c
> +++ b/net/rxrpc/call_object.c
> @@ -152,6 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
>  	spin_lock_init(&call->notify_lock);
>  	refcount_set(&call->ref, 1);
>  	call->debug_id		= debug_id;
> +	call->rx_pkt_offset	= USHRT_MAX;
>  	call->tx_total_len	= -1;
>  	call->tx_jumbo_max	= 1;
>  	call->next_rx_timo	= 20 * HZ;
> @@ -553,6 +554,7 @@ static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call)
>  	rxrpc_purge_queue(&call->recvmsg_queue);
>  	rxrpc_purge_queue(&call->rx_queue);
>  	rxrpc_purge_queue(&call->rx_oos_queue);
> +	kfree(call->rx_dec_buffer);
>  }
>  
>  /*
> diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c
> index 0a260df45d25..7a26c6097d03 100644
> --- a/net/rxrpc/insecure.c
> +++ b/net/rxrpc/insecure.c
> @@ -32,9 +32,6 @@ static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
>  
>  static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
>  {
> -	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> -
> -	sp->flags |= RXRPC_RX_VERIFIED;
>  	return 0;
>  }
>  
> diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
> index e1f7513a46db..865e368381d5 100644
> --- a/net/rxrpc/recvmsg.c
> +++ b/net/rxrpc/recvmsg.c
> @@ -147,15 +147,55 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
>  }
>  
>  /*
> - * Decrypt and verify a DATA packet.
> + * Decrypt and verify a DATA packet.  The content of the packet is pulled out
> + * into a flat buffer rather than decrypting in place in the skbuff.  This also
> + * has the advantage of aligning the buffer correctly for the crypto routines.
> + *
> + * We keep track of the sequence number of the packet currently decrypted into
> + * the buffer in ->rx_dec_seq.  Unfortunately, this means that a MSG_PEEK of
> + * more than one byte may cause a later packet to be decrypted into the buffer,
> + * requiring the original to be re-decrypted when recvmsg() is called again.
>   */
>  static int rxrpc_verify_data(struct rxrpc_call *call, struct sk_buff *skb)
>  {
>  	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> +	int ret;
>  
> -	if (sp->flags & RXRPC_RX_VERIFIED)
> +	if (call->rx_dec_seq == sp->hdr.seq && call->rx_dec_buffer)
>  		return 0;
> -	return call->security->verify_packet(call, skb);
> +
> +	if (call->rx_dec_bsize < sp->len) {

IMHO That test is backwards; the 'more constant' value should be on the right.

> +		/* Make sure we can hold a 1412-byte jumbo subpacket and make
> +		 * sure that the buffer size is aligned to a crypto blocksize.
> +		 */
> +		size_t size = umin(round_up(sp->len, 32), 2048);

Doesn't min() work?

> +		void *buffer = krealloc(call->rx_dec_buffer, size, GFP_NOFS);
> +
> +		if (!buffer)
> +			return -ENOMEM;
> +		call->rx_dec_buffer = buffer;
> +		call->rx_dec_bsize = size;
> +	}

That doesn't look right.
If sp->len is bigger than 2048 the you keep allocating a new buffer
and the call below overruns the allocated buffer.

> +
> +	ret = -EFAULT;
> +	if (skb_copy_bits(skb, sp->offset, call->rx_dec_buffer, sp->len) < 0)
> +		goto err;
> +
...

-- David

next prev parent reply	other threads:[~2026-05-12 13:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 16:07 [PATCH net 0/3] rxrpc: Better fix for DATA/RESPONSE decrypt vs splice() David Howells
2026-05-11 16:07 ` [PATCH net 1/3] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present David Howells
2026-05-11 16:07 ` [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg David Howells
2026-05-12  7:58   ` Jeffrey Altman
2026-05-13  8:01     ` David Howells
2026-05-13  8:13       ` David Howells
2026-05-13  8:38       ` David Laight
2026-05-13  9:48       ` Jeffrey Altman
2026-05-12 13:38   ` David Laight [this message]
2026-05-12 16:52     ` David Howells
2026-05-12 21:36       ` David Laight
2026-05-11 16:07 ` [PATCH net 3/3] rxrpc: Fix RESPONSE packet verification to extract skb to a linear buffer David Howells
2026-05-12  8:22   ` Jeffrey Altman
2026-05-13  0:06   ` Jakub Kicinski
2026-05-13  7:35     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512143835.13af8bc4@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=imv4bel@gmail.com \
    --cc=jaltman@auristor.com \
    --cc=jiayuan.chen@linux.dev \
    --cc=kuba@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.