From: Wilfred Mallawa <wilfred.opensource@gmail.com>
To: Sabrina Dubroca <sd@queasysnail.net>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, corbet@lwn.net,
john.fastabend@gmail.com, netdev@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
alistair.francis@wdc.com, dlemoal@kernel.org
Subject: Re: [PATCH v3] net/tls: support maximum record size limit
Date: Thu, 18 Sep 2025 10:52:14 +1000 [thread overview]
Message-ID: <05dc7efdb45363358825ff3782d3006ef9c6cea4.camel@gmail.com> (raw)
In-Reply-To: <aLgVCGbq0b6PJXbY@krikkit>
Hey Sabrina,
Sorry for the delay in getting back to this! Responded inline.
On Wed, 2025-09-03 at 12:14 +0200, Sabrina Dubroca wrote:
> note: since this is a new feature, the subject prefix should be
> "[PATCH net-next vN]" (ie add "net-next", the target tree for "new
> feature" changes)
>
> 2025-09-03, 11:47:57 +1000, Wilfred Mallawa wrote:
> > diff --git a/Documentation/networking/tls.rst
> > b/Documentation/networking/tls.rst
> > index 36cc7afc2527..0232df902320 100644
> > --- a/Documentation/networking/tls.rst
> > +++ b/Documentation/networking/tls.rst
> > @@ -280,6 +280,13 @@ If the record decrypted turns out to had been
> > padded or is not a data
> > record it will be decrypted again into a kernel buffer without
> > zero copy.
> > Such events are counted in the ``TlsDecryptRetry`` statistic.
> >
> > +TLS_TX_RECORD_SIZE_LIM
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +During a TLS handshake, an endpoint may use the record size limit
> > extension
> > +to specify a maximum record size. This allows enforcing the
> > specified record
> > +size limit, such that outgoing records do not exceed the limit
> > specified.
>
> Maybe worth adding a reference to the RFC that defines this
> extension?
> I'm not sure if that would be helpful to readers of this doc or not.
Good idea, I'll add that in.
>
>
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index a3ccb3135e51..94237c97f062 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> [...]
> > @@ -1022,6 +1075,7 @@ static int tls_init(struct sock *sk)
> >
> > ctx->tx_conf = TLS_BASE;
> > ctx->rx_conf = TLS_BASE;
> > + ctx->tx_record_size_limit = TLS_MAX_PAYLOAD_SIZE;
> > update_sk_prot(sk, ctx);
> > out:
> > write_unlock_bh(&sk->sk_callback_lock);
> > @@ -1065,7 +1119,7 @@ static u16 tls_user_config(struct tls_context
> > *ctx, bool tx)
> >
> > static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool
> > net_admin)
> > {
> > - u16 version, cipher_type;
> > + u16 version, cipher_type, tx_record_size_limit;
> > struct tls_context *ctx;
> > struct nlattr *start;
> > int err;
> > @@ -1110,7 +1164,13 @@ static int tls_get_info(struct sock *sk,
> > struct sk_buff *skb, bool net_admin)
> > if (err)
> > goto nla_failure;
> > }
> > -
> > + tx_record_size_limit = ctx->tx_record_size_limit;
> > + if (tx_record_size_limit) {
>
> You probably meant to update that to:
>
> tx_record_size_limit != TLS_MAX_PAYLOAD_SIZE
>
> Otherwise, now that the default is TLS_MAX_PAYLOAD_SIZE, it will
> always be exported - which is not wrong either. So I'd either update
> the conditional so that the attribute is only exported for non-
> default
> sizes (like in v2), or drop the if() and always export it.
>
Yeah, that makes sense I'll drop the If() so that it's always exported
then.
> > + err = nla_put_u16(skb,
> > TLS_INFO_TX_RECORD_SIZE_LIM,
> > + tx_record_size_limit);
> > + if (err)
> > + goto nla_failure;
> > + }
>
> [...]
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index bac65d0d4e3e..28fb796573d1 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock
> > *sk, struct msghdr *msg,
> > orig_size = msg_pl->sg.size;
> > full_record = false;
> > try_to_copy = msg_data_left(msg);
> > - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl-
> > >sg.size;
> > + record_room = tls_ctx->tx_record_size_limit -
> > msg_pl->sg.size;
>
> If we entered tls_sw_sendmsg_locked with an existing open record,
> this
> could end up being negative and confuse the rest of the code.
>
> send(MSG_MORE) returns with an open record of length len1
> setsockopt(TLS_INFO_TX_RECORD_SIZE_LIM, limit < len1)
> send() -> record_room < 0
>
>
> Possibly not a problem with a "well-behaved" userspace, but we can't
> rely on that.
Good catch! what if we don't allow tx_record_size_limit to be set if
there's a pending open record. This should avoid userspace from atleast
causing the record_room < 0 if we somehow end up there.
So for example:
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7c0367dc5d40..34bb6690016c 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -841,20 +841,27 @@ static int
do_tls_setsockopt_tx_record_size(struct sock *sk, sockptr_t optval,
unsigned int optlen)
{
struct tls_context *ctx = tls_get_ctx(sk);
+ struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx);
u16 value;
+ if (sw_ctx->open_rec)
+ return -EBUSY;
...
And to your follow up response:
```
> I suspect it's not a problem in practice because of what the TLS
> exchange between the peers setting up this extension looks like? (ie,
> there should never be an open record at this stage - unless userspace
> delays doing this setsockopt after getting the message from the peer,
> but then maybe we can call that a buggy userspace)
```
Yeah, record size limit extension occurs during a handshake
(Client/ServerHello). AFAIK, all of that is handled in tlshd/GnuTLS. We
shouldn't have any open records here at this point. For user-space
context, this is what support for record size limit looks like [1] in
tlshd.
If for whatever reason, as you mentioned, userspace decides to set it
later, change above could mitigate it for the open record case? I don't
think we need to try to fix things (or even can for records already
submitted to TCP) in the kernel.
[1]
WIP:https://github.com/twilfredo/ktls-utils/commit/73cb755acb4589ba31e4c42ef6b16cf5efdf3892
>
>
> Pushing out the pending "too big" record at the time we set
> tx_record_size_limit would likely make the peer close the connection
> (because it's already told us to limit our TX size), so I guess we'd
> have to split the pending record into tx_record_size_limit chunks
> before we start processing the new message (either directly at
> setsockopt(TLS_INFO_TX_RECORD_SIZE_LIM) time, or the next send/etc
> call). The final push during socket closing, and maybe some more
> codepaths that deal with ctx->open_rec, would also have to do that.
>
> I think additional selftests for
> send(MSG_MORE), TLS_INFO_TX_RECORD_SIZE_LIM, send
> and
> send(MSG_MORE), TLS_INFO_TX_RECORD_SIZE_LIM, close
> verifying the received record sizes would make sense, since it's a
> bit
> tricky to get that right.
Yeah I agree, I will work on that. Thanks for the feedback!
Regards,
Wilfred
next prev parent reply other threads:[~2025-09-18 0:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 1:47 [PATCH v3] net/tls: support maximum record size limit Wilfred Mallawa
2025-09-03 10:14 ` Sabrina Dubroca
2025-09-04 9:54 ` Sabrina Dubroca
2025-09-18 0:52 ` Wilfred Mallawa [this message]
2025-09-03 22:51 ` Jakub Kicinski
2025-09-04 23:41 ` Wilfred Mallawa
2025-09-04 10:10 ` Sabrina Dubroca
2025-09-04 13:33 ` Jakub Kicinski
2025-09-18 1:42 ` Wilfred Mallawa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=05dc7efdb45363358825ff3782d3006ef9c6cea4.camel@gmail.com \
--to=wilfred.opensource@gmail.com \
--cc=alistair.francis@wdc.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=dlemoal@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).