From: David Laight <david.laight.linux@gmail.com>
To: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org, Hyunwoo Kim <imv4bel@gmail.com>,
Marc Dionne <marc.dionne@auristor.com>,
Jakub Kicinski <kuba@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org,
Jeffrey Altman <jaltman@auristor.com>,
Jiayuan Chen <jiayuan.chen@linux.dev>,
stable@vger.kernel.org
Subject: Re: [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
Date: Tue, 12 May 2026 14:38:35 +0100 [thread overview]
Message-ID: <20260512143835.13af8bc4@pumpkin> (raw)
In-Reply-To: <20260511160753.607296-3-dhowells@redhat.com>
On Mon, 11 May 2026 17:07:48 +0100
David Howells <dhowells@redhat.com> wrote:
> This improves the fix for CVE-2026-43500.
>
> Fix the pagecache corruption from in-place decryption of a DATA packet
> transmitted locally by splice() by getting rid of the packet sharing in the
> I/O thread and unconditionally extracting the packet content into a bounce
> buffer in which the buffer is decrypted. recvmsg() (or the kernel
> equivalent) then copies the data from the bounce buffer to the destination
> buffer. The sk_buff then remains unmodified.
>
> This has an additional advantage in that the packet is then arranged in the
> buffer with the correct alignment required for the crypto algorithms to
> process directly. The performance of the crypto does seem to be a little
> faster and, surprisingly, the unencrypted performance doesn't seem to
> change much - possibly due to removing complexity from the I/O thread.
Yep, avoiding data copies is overrated :-)
> Yet another advantage is that the I/O thread doesn't have to copy packets
> which would slow down packet distribution, ACK generation, etc..
>
> The buffer belongs to the call and is allocated initially at 2K,
> sufficiently large to hold a whole jumbo subpacket, but the buffer will be
> increased in size if needed. There is one downside here, and that's if a
> MSG_PEEK of more than one byte occurs, it may move on to the next packet,
> replacing the content of the buffer. In such a case, it has to go back and
> re-decrypt the current packet.
>
> Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
> switch to using USHRT_MAX to indicate an invalid offset.
>
> Note also that I would generally prefer to replace the buffers of the
> current sk_buff with a new kmalloc'd buffer of the right size, ditching the
> old data and frags as this makes the handling of MSG_PEEK easier and
> removes the double-decryption issue, but this looks like quite a
> complicated thing to achieve. skb_morph() looks half way to what I want,
> but I don't want to have to allocate a new sk_buff.
Wouldn't you need to do that anyway when the kkb is shared - or can't
that happen?
>
> Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
> Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Simon Horman <horms@kernel.org>
> cc: Jiayuan Chen <jiayuan.chen@linux.dev>
> cc: netdev@vger.kernel.org
> cc: linux-afs@lists.infradead.org
> cc: stable@vger.kernel.org
> ---
> net/rxrpc/ar-internal.h | 7 +++-
> net/rxrpc/call_event.c | 22 +----------
> net/rxrpc/call_object.c | 2 +
> net/rxrpc/insecure.c | 3 --
> net/rxrpc/recvmsg.c | 72 +++++++++++++++++++++++++++-------
> net/rxrpc/rxgk.c | 49 +++++++++++------------
> net/rxrpc/rxgk_common.h | 79 +++++++++++++++++++++++++++++++++++++
> net/rxrpc/rxkad.c | 86 +++++++++++++++--------------------------
> 8 files changed, 200 insertions(+), 120 deletions(-)
>
> diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
> index 27c2aa2dd023..783367eea798 100644
> --- a/net/rxrpc/ar-internal.h
> +++ b/net/rxrpc/ar-internal.h
> @@ -213,8 +213,6 @@ struct rxrpc_skb_priv {
> struct {
> u16 offset; /* Offset of data */
> u16 len; /* Length of data */
> - u8 flags;
> -#define RXRPC_RX_VERIFIED 0x01
> };
> struct {
> rxrpc_seq_t first_ack; /* First packet in acks table */
> @@ -774,6 +772,11 @@ struct rxrpc_call {
> struct sk_buff_head recvmsg_queue; /* Queue of packets ready for recvmsg() */
> struct sk_buff_head rx_queue; /* Queue of packets for this call to receive */
> struct sk_buff_head rx_oos_queue; /* Queue of out of sequence packets */
> + void *rx_dec_buffer; /* Decryption buffer */
> + unsigned short rx_dec_bsize; /* rx_dec_buffer size */
> + unsigned short rx_dec_offset; /* Decrypted packet data offset */
> + unsigned short rx_dec_len; /* Decrypted packet data len */
Is it actually worth making those short rather than int?
I doubt the extra 4 bytes will matter and the generated code might be better.
(IIRC 32bit arm has a limited offset from 16 bit load/store, dunno about 64bit)
> + rxrpc_seq_t rx_dec_seq; /* Packet in decryption buffer */
>
> rxrpc_seq_t rx_highest_seq; /* Higest sequence number received */
> rxrpc_seq_t rx_consumed; /* Highest packet consumed */
> diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
> index 2b19b252225e..fec59d9338b9 100644
> --- a/net/rxrpc/call_event.c
> +++ b/net/rxrpc/call_event.c
> @@ -332,27 +332,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call)
>
> saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK;
>
> - if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
> - sp->hdr.securityIndex != 0 &&
> - (skb_cloned(skb) ||
> - skb_has_frag_list(skb) ||
> - skb_has_shared_frag(skb))) {
> - /* Unshare the packet so that it can be
> - * modified by in-place decryption.
> - */
> - struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
> -
> - if (nskb) {
> - rxrpc_new_skb(nskb, rxrpc_skb_new_unshared);
> - rxrpc_input_call_packet(call, nskb);
> - rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx);
> - } else {
> - /* OOM - Drop the packet. */
> - rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem);
> - }
> - } else {
> - rxrpc_input_call_packet(call, skb);
> - }
> + rxrpc_input_call_packet(call, skb);
> rxrpc_free_skb(skb, rxrpc_skb_put_call_rx);
> did_receive = true;
> }
> diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
> index f035f486c139..fcb9d38bb521 100644
> --- a/net/rxrpc/call_object.c
> +++ b/net/rxrpc/call_object.c
> @@ -152,6 +152,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
> spin_lock_init(&call->notify_lock);
> refcount_set(&call->ref, 1);
> call->debug_id = debug_id;
> + call->rx_pkt_offset = USHRT_MAX;
> call->tx_total_len = -1;
> call->tx_jumbo_max = 1;
> call->next_rx_timo = 20 * HZ;
> @@ -553,6 +554,7 @@ static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call)
> rxrpc_purge_queue(&call->recvmsg_queue);
> rxrpc_purge_queue(&call->rx_queue);
> rxrpc_purge_queue(&call->rx_oos_queue);
> + kfree(call->rx_dec_buffer);
> }
>
> /*
> diff --git a/net/rxrpc/insecure.c b/net/rxrpc/insecure.c
> index 0a260df45d25..7a26c6097d03 100644
> --- a/net/rxrpc/insecure.c
> +++ b/net/rxrpc/insecure.c
> @@ -32,9 +32,6 @@ static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
>
> static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
> {
> - struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> -
> - sp->flags |= RXRPC_RX_VERIFIED;
> return 0;
> }
>
> diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
> index e1f7513a46db..865e368381d5 100644
> --- a/net/rxrpc/recvmsg.c
> +++ b/net/rxrpc/recvmsg.c
> @@ -147,15 +147,55 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
> }
>
> /*
> - * Decrypt and verify a DATA packet.
> + * Decrypt and verify a DATA packet. The content of the packet is pulled out
> + * into a flat buffer rather than decrypting in place in the skbuff. This also
> + * has the advantage of aligning the buffer correctly for the crypto routines.
> + *
> + * We keep track of the sequence number of the packet currently decrypted into
> + * the buffer in ->rx_dec_seq. Unfortunately, this means that a MSG_PEEK of
> + * more than one byte may cause a later packet to be decrypted into the buffer,
> + * requiring the original to be re-decrypted when recvmsg() is called again.
> */
> static int rxrpc_verify_data(struct rxrpc_call *call, struct sk_buff *skb)
> {
> struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
> + int ret;
>
> - if (sp->flags & RXRPC_RX_VERIFIED)
> + if (call->rx_dec_seq == sp->hdr.seq && call->rx_dec_buffer)
> return 0;
> - return call->security->verify_packet(call, skb);
> +
> + if (call->rx_dec_bsize < sp->len) {
IMHO That test is backwards; the 'more constant' value should be on the right.
> + /* Make sure we can hold a 1412-byte jumbo subpacket and make
> + * sure that the buffer size is aligned to a crypto blocksize.
> + */
> + size_t size = umin(round_up(sp->len, 32), 2048);
Doesn't min() work?
> + void *buffer = krealloc(call->rx_dec_buffer, size, GFP_NOFS);
> +
> + if (!buffer)
> + return -ENOMEM;
> + call->rx_dec_buffer = buffer;
> + call->rx_dec_bsize = size;
> + }
That doesn't look right.
If sp->len is bigger than 2048 the you keep allocating a new buffer
and the call below overruns the allocated buffer.
> +
> + ret = -EFAULT;
> + if (skb_copy_bits(skb, sp->offset, call->rx_dec_buffer, sp->len) < 0)
> + goto err;
> +
...
-- David
next prev parent reply other threads:[~2026-05-12 13:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260511160753.607296-1-dhowells@redhat.com>
2026-05-11 16:07 ` [PATCH net 1/3] rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present David Howells
2026-05-11 16:07 ` [PATCH net 2/3] rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg David Howells
2026-05-12 7:58 ` Jeffrey Altman
2026-05-12 13:38 ` David Laight [this message]
2026-05-12 16:52 ` David Howells
2026-05-12 21:36 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512143835.13af8bc4@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=imv4bel@gmail.com \
--cc=jaltman@auristor.com \
--cc=jiayuan.chen@linux.dev \
--cc=kuba@kernel.org \
--cc=linux-afs@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.dionne@auristor.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox