All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes
       [not found] <cover.1780979031.git.bronzed_45_vested@icloud.com>
@ 2026-06-10 16:18 ` Ren Wei
  2026-06-10 17:04   ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Ren Wei @ 2026-06-10 16:18 UTC (permalink / raw)
  To: netdev
  Cc: edumazet, pabeni, kuba, chia-yu.chang, ij, fmancera, yuuchihsu,
	idosch, herbert, yuantan098, zcliangcn, bird, bronzed_45_vested,
	n05ec

From: Wyatt Feng <bronzed_45_vested@icloud.com>

Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
socket state. The sysctl is stored as an `int` but copied into the
`u32` `tp->reordering` field for new sockets, so negative writes wrap
to large values.

With `tcp_mtu_probing=2`, the wrapped value can overflow the
`tcp_mtu_probe()` size calculation and drive the MTU probing path into
an out-of-bounds read. Route `tcp_reordering` writes through
`proc_dointvec_minmax()` and clamp them to the per-netns range
`[1, tcp_max_reordering]`.

Apply the matching cross-check to `tcp_max_reordering` so the existing
invariant `tcp_max_reordering >= tcp_reordering` is preserved. This
keeps the fix at the sysctl boundary and avoids changing the TCP fast
path.

Fixes: 91cc17c0e5e5 ("[TCP]: MTUprobe: receiver window & data available checks fixed")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Assisted-by: Codex:GPT-5.4
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
 net/ipv4/sysctl_net_ipv4.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index c0e85cc171ae..c44afb9c321d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -215,6 +215,40 @@ static int ipv4_fwd_update_priority(const struct ctl_table *table, int write,
 	return ret;
 }
 
+static int proc_tcp_reordering(const struct ctl_table *table, int write,
+			       void *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct ctl_table tmp = {
+		.data		= table->data,
+		.maxlen		= table->maxlen,
+		.mode		= table->mode,
+		.extra1		= SYSCTL_ONE,
+	};
+	struct net *net;
+
+	net = container_of(table->data, struct net, ipv4.sysctl_tcp_reordering);
+	tmp.extra2 = &net->ipv4.sysctl_tcp_max_reordering;
+
+	return proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+}
+
+static int proc_tcp_max_reordering(const struct ctl_table *table, int write,
+				   void *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct ctl_table tmp = {
+		.data		= table->data,
+		.maxlen		= table->maxlen,
+		.mode		= table->mode,
+	};
+	struct net *net;
+
+	net = container_of(table->data, struct net,
+			   ipv4.sysctl_tcp_max_reordering);
+	tmp.extra1 = &net->ipv4.sysctl_tcp_reordering;
+
+	return proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+}
+
 static int proc_tcp_congestion_control(const struct ctl_table *ctl, int write,
 				       void *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -1058,7 +1092,7 @@ static struct ctl_table ipv4_net_table[] = {
 		.data		= &init_net.ipv4.sysctl_tcp_reordering,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_tcp_reordering
 	},
 	{
 		.procname	= "tcp_retries1",
@@ -1293,7 +1327,7 @@ static struct ctl_table ipv4_net_table[] = {
 		.data		= &init_net.ipv4.sysctl_tcp_max_reordering,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_tcp_max_reordering
 	},
 	{
 		.procname	= "tcp_dsack",
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes
  2026-06-10 16:18 ` [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes Ren Wei
@ 2026-06-10 17:04   ` Eric Dumazet
  2026-06-11  9:21     ` David Laight
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2026-06-10 17:04 UTC (permalink / raw)
  To: Ren Wei, Neal Cardwell, Kuniyuki Iwashima
  Cc: netdev, pabeni, kuba, chia-yu.chang, ij, fmancera, yuuchihsu,
	idosch, herbert, yuantan098, zcliangcn, bird, bronzed_45_vested

On Wed, Jun 10, 2026 at 9:18 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
>
> From: Wyatt Feng <bronzed_45_vested@icloud.com>
>
> Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
> socket state. The sysctl is stored as an `int` but copied into the
> `u32` `tp->reordering` field for new sockets, so negative writes wrap
> to large values.
>
> With `tcp_mtu_probing=2`, the wrapped value can overflow the
> `tcp_mtu_probe()` size calculation and drive the MTU probing path into
> an out-of-bounds read. Route `tcp_reordering` writes through
> `proc_dointvec_minmax()` and clamp them to the per-netns range
> `[1, tcp_max_reordering]`.
>
> Apply the matching cross-check to `tcp_max_reordering` so the existing
> invariant `tcp_max_reordering >= tcp_reordering` is preserved. This
> keeps the fix at the sysctl boundary and avoids changing the TCP fast
> path.
>
> Fixes: 91cc17c0e5e5 ("[TCP]: MTUprobe: receiver window & data available checks fixed")
> Cc: stable@vger.kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Assisted-by: Codex:GPT-5.4
> Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
>  net/ipv4/sysctl_net_ipv4.c | 38 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index c0e85cc171ae..c44afb9c321d 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -215,6 +215,40 @@ static int ipv4_fwd_update_priority(const struct ctl_table *table, int write,
>         return ret;
>  }
>
> +static int proc_tcp_reordering(const struct ctl_table *table, int write,
> +                              void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +       struct ctl_table tmp = {
> +               .data           = table->data,
> +               .maxlen         = table->maxlen,
> +               .mode           = table->mode,
> +               .extra1         = SYSCTL_ONE,
> +       };
> +       struct net *net;
> +
> +       net = container_of(table->data, struct net, ipv4.sysctl_tcp_reordering);
> +       tmp.extra2 = &net->ipv4.sysctl_tcp_max_reordering;
> +
> +       return proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
> +}
> +
> +static int proc_tcp_max_reordering(const struct ctl_table *table, int write,
> +                                  void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +       struct ctl_table tmp = {
> +               .data           = table->data,
> +               .maxlen         = table->maxlen,
> +               .mode           = table->mode,
> +       };
> +       struct net *net;
> +
> +       net = container_of(table->data, struct net,
> +                          ipv4.sysctl_tcp_max_reordering);
> +       tmp.extra1 = &net->ipv4.sysctl_tcp_reordering;
> +
> +       return proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
> +}
> +
>  static int proc_tcp_congestion_control(const struct ctl_table *ctl, int write,
>                                        void *buffer, size_t *lenp, loff_t *ppos)
>  {
> @@ -1058,7 +1092,7 @@ static struct ctl_table ipv4_net_table[] = {
>                 .data           = &init_net.ipv4.sysctl_tcp_reordering,
>                 .maxlen         = sizeof(int),
>                 .mode           = 0644,
> -               .proc_handler   = proc_dointvec
> +               .proc_handler   = proc_tcp_reordering
>         },
>         {
>                 .procname       = "tcp_retries1",
> @@ -1293,7 +1327,7 @@ static struct ctl_table ipv4_net_table[] = {
>                 .data           = &init_net.ipv4.sysctl_tcp_max_reordering,
>                 .maxlen         = sizeof(int),
>                 .mode           = 0644,
> -               .proc_handler   = proc_dointvec
> +               .proc_handler   = proc_tcp_max_reordering
>         },
>         {
>                 .procname       = "tcp_dsack",
> --
> 2.47.3
>

Thanks for the patch. I think we can do better.

Also I do not see a fix in tcp_mtu_probe()?

What about:


diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index c0e85cc171aec099fd5d4897b1a623dd27eaee08..987bc87e255a81680ae5e23ddd01c1f0de4425ec
100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -1058,6 +1058,9 @@ static struct ctl_table ipv4_net_table[] = {
                .data           = &init_net.ipv4.sysctl_tcp_reordering,
                .maxlen         = sizeof(int),
                .mode           = 0644,
+               .proc_handler   = proc_dointvec_minmax,
+               .extra1         = SYSCTL_ONE,
+               .extra2         = &init_net.ipv4.sysctl_tcp_max_reordering,
                .proc_handler   = proc_dointvec
        },
        {
@@ -1293,7 +1296,8 @@ static struct ctl_table ipv4_net_table[] = {
                .data           = &init_net.ipv4.sysctl_tcp_max_reordering,
                .maxlen         = sizeof(int),
                .mode           = 0644,
-               .proc_handler   = proc_dointvec
+               .proc_handler   = proc_dointvec_minmax,
+               .extra1         = SYSCTL_ONE,
        },
        {
                .procname       = "tcp_dsack",
@@ -1676,6 +1680,9 @@ static __net_init int
ipv4_sysctl_init_net(struct net *net)
                                 */
                                table[i].mode &= ~0222;
                        }
+               if (table[i].extra2 >= (void *)&init_net.ipv4 &&
+                   table[i].extra2 < (void *)(&init_net.ipv4 + 1))
+                       table[i].extra2 += (void *)net - (void *)&init_net;
                }
        }

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6e4bb411dc041b2a299f0fbc2109c530afa2a345..193637a58dcc1acf8e70fdfbde982c2a3eed290e
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2687,7 +2687,7 @@ static int tcp_mtu_probe(struct sock *sk)
        struct sk_buff *skb, *nskb, *next;
        struct net *net = sock_net(sk);
        int probe_size;
-       int size_needed;
+       u64 size_needed;
        int copy, len;
        int mss_now;
        int interval;
@@ -2711,7 +2711,7 @@ static int tcp_mtu_probe(struct sock *sk)
        mss_now = tcp_current_mss(sk);
        probe_size = tcp_mtu_to_mss(sk, (icsk->icsk_mtup.search_high +
                                    icsk->icsk_mtup.search_low) >> 1);
-       size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache;
+       size_needed = probe_size + (tp->reordering + 1) * (u64)tp->mss_cache;
        interval = icsk->icsk_mtup.search_high - icsk->icsk_mtup.search_low;
        /* When misfortune happens, we are reprobing actively,
         * and then reprobe timer has expired. We stick with current

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes
  2026-06-10 17:04   ` Eric Dumazet
@ 2026-06-11  9:21     ` David Laight
  2026-06-11 17:13       ` Eric Dumazet
  2026-06-11 17:18       ` Eric Dumazet
  0 siblings, 2 replies; 6+ messages in thread
From: David Laight @ 2026-06-11  9:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ren Wei, Neal Cardwell, Kuniyuki Iwashima, netdev, pabeni, kuba,
	chia-yu.chang, ij, fmancera, yuuchihsu, idosch, herbert,
	yuantan098, zcliangcn, bird, bronzed_45_vested

On Wed, 10 Jun 2026 10:04:06 -0700
Eric Dumazet <edumazet@google.com> wrote:

> On Wed, Jun 10, 2026 at 9:18 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
> >
> > From: Wyatt Feng <bronzed_45_vested@icloud.com>
> >
> > Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
> > socket state. The sysctl is stored as an `int` but copied into the
> > `u32` `tp->reordering` field for new sockets, so negative writes wrap
> > to large values.
> >
> > With `tcp_mtu_probing=2`, the wrapped value can overflow the
> > `tcp_mtu_probe()` size calculation and drive the MTU probing path into
> > an out-of-bounds read. Route `tcp_reordering` writes through
> > `proc_dointvec_minmax()` and clamp them to the per-netns range
> > `[1, tcp_max_reordering]`.
> >
...
> Thanks for the patch. I think we can do better.
> 
> Also I do not see a fix in tcp_mtu_probe()?
> 
> What about:
> 
> 
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index c0e85cc171aec099fd5d4897b1a623dd27eaee08..987bc87e255a81680ae5e23ddd01c1f0de4425ec
> 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -1058,6 +1058,9 @@ static struct ctl_table ipv4_net_table[] = {
>                 .data           = &init_net.ipv4.sysctl_tcp_reordering,
>                 .maxlen         = sizeof(int),
>                 .mode           = 0644,
> +               .proc_handler   = proc_dointvec_minmax,
> +               .extra1         = SYSCTL_ONE,
> +               .extra2         = &init_net.ipv4.sysctl_tcp_max_reordering,
>                 .proc_handler   = proc_dointvec
>         },
>         {
> @@ -1293,7 +1296,8 @@ static struct ctl_table ipv4_net_table[] = {
>                 .data           = &init_net.ipv4.sysctl_tcp_max_reordering,
>                 .maxlen         = sizeof(int),
>                 .mode           = 0644,
> -               .proc_handler   = proc_dointvec
> +               .proc_handler   = proc_dointvec_minmax,
> +               .extra1         = SYSCTL_ONE,

Would it be reasonable to put in a 'sanity' upper bound here?

Since tcp_max_reordering can be lowered after tcp_reordering is set
the bound set here isn't 'hard'.
So the same sanity limit for both may be enough?

I also found this 'gem':
void tcp_enter_loss(struct sock *sk)
{
...
	u8 reordering;
...
	reordering = READ_ONCE(net->ipv4.sysctl_tcp_reordering);

Which makes be think that someone expected these values to be small.

I don't know what the sane bounds are for either sysctl or tp->reordering
itself (which is u32).

-- David



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes
  2026-06-11  9:21     ` David Laight
@ 2026-06-11 17:13       ` Eric Dumazet
  2026-06-12  2:01         ` Kuniyuki Iwashima
  2026-06-11 17:18       ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2026-06-11 17:13 UTC (permalink / raw)
  To: David Laight
  Cc: Ren Wei, Neal Cardwell, Kuniyuki Iwashima, netdev, pabeni, kuba,
	chia-yu.chang, ij, fmancera, yuuchihsu, idosch, herbert,
	yuantan098, zcliangcn, bird, bronzed_45_vested

On Thu, Jun 11, 2026 at 2:21 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 10 Jun 2026 10:04:06 -0700
> Eric Dumazet <edumazet@google.com> wrote:
>
> > On Wed, Jun 10, 2026 at 9:18 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
> > >
> > > From: Wyatt Feng <bronzed_45_vested@icloud.com>
> > >
> > > Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
> > > socket state. The sysctl is stored as an `int` but copied into the
> > > `u32` `tp->reordering` field for new sockets, so negative writes wrap
> > > to large values.
> > >
> > > With `tcp_mtu_probing=2`, the wrapped value can overflow the
> > > `tcp_mtu_probe()` size calculation and drive the MTU probing path into
> > > an out-of-bounds read. Route `tcp_reordering` writes through
> > > `proc_dointvec_minmax()` and clamp them to the per-netns range
> > > `[1, tcp_max_reordering]`.
> > >
> ...
> > Thanks for the patch. I think we can do better.
> >
> > Also I do not see a fix in tcp_mtu_probe()?
> >
> > What about:
> >
> >
> > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> > index c0e85cc171aec099fd5d4897b1a623dd27eaee08..987bc87e255a81680ae5e23ddd01c1f0de4425ec
> > 100644
> > --- a/net/ipv4/sysctl_net_ipv4.c
> > +++ b/net/ipv4/sysctl_net_ipv4.c
> > @@ -1058,6 +1058,9 @@ static struct ctl_table ipv4_net_table[] = {
> >                 .data           = &init_net.ipv4.sysctl_tcp_reordering,
> >                 .maxlen         = sizeof(int),
> >                 .mode           = 0644,
> > +               .proc_handler   = proc_dointvec_minmax,
> > +               .extra1         = SYSCTL_ONE,
> > +               .extra2         = &init_net.ipv4.sysctl_tcp_max_reordering,
> >                 .proc_handler   = proc_dointvec
> >         },
> >         {
> > @@ -1293,7 +1296,8 @@ static struct ctl_table ipv4_net_table[] = {
> >                 .data           = &init_net.ipv4.sysctl_tcp_max_reordering,
> >                 .maxlen         = sizeof(int),
> >                 .mode           = 0644,
> > -               .proc_handler   = proc_dointvec
> > +               .proc_handler   = proc_dointvec_minmax,
> > +               .extra1         = SYSCTL_ONE,
>
> Would it be reasonable to put in a 'sanity' upper bound here?
>
> Since tcp_max_reordering can be lowered after tcp_reordering is set
> the bound set here isn't 'hard'.
> So the same sanity limit for both may be enough?
>
> I also found this 'gem':
> void tcp_enter_loss(struct sock *sk)
> {
> ...
>         u8 reordering;
> ...
>         reordering = READ_ONCE(net->ipv4.sysctl_tcp_reordering);
>
> Which makes be think that someone expected these values to be small.
>
> I don't know what the sane bounds are for either sysctl or tp->reordering
> itself (which is u32).


Kuniyuki, can you please send a fix ? You used u8 in your commit :/

Default value of this sysctl is 3, but it could potentially exceed 255.

Thanks.

commit 46778cd16e6a5ad1b2e3a91f6c057c907379418e
Author: Kuniyuki Iwashima <kuniyu@google.com>
Date:   Fri Jul 15 10:17:49 2022 -0700

    tcp: Fix data-races around sysctl_tcp_reordering.

    While reading sysctl_tcp_reordering, it can be changed concurrently.
    Thus, we need to add READ_ONCE() to its readers.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes
  2026-06-11  9:21     ` David Laight
  2026-06-11 17:13       ` Eric Dumazet
@ 2026-06-11 17:18       ` Eric Dumazet
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2026-06-11 17:18 UTC (permalink / raw)
  To: David Laight
  Cc: Ren Wei, Neal Cardwell, Kuniyuki Iwashima, netdev, pabeni, kuba,
	chia-yu.chang, ij, fmancera, yuuchihsu, idosch, herbert,
	yuantan098, zcliangcn, bird, bronzed_45_vested

On Thu, Jun 11, 2026 at 2:21 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 10 Jun 2026 10:04:06 -0700
> Eric Dumazet <edumazet@google.com> wrote:
>
> > On Wed, Jun 10, 2026 at 9:18 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
> > >
> > > From: Wyatt Feng <bronzed_45_vested@icloud.com>
> > >
> > > Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
> > > socket state. The sysctl is stored as an `int` but copied into the
> > > `u32` `tp->reordering` field for new sockets, so negative writes wrap
> > > to large values.
> > >
> > > With `tcp_mtu_probing=2`, the wrapped value can overflow the
> > > `tcp_mtu_probe()` size calculation and drive the MTU probing path into
> > > an out-of-bounds read. Route `tcp_reordering` writes through
> > > `proc_dointvec_minmax()` and clamp them to the per-netns range
> > > `[1, tcp_max_reordering]`.
> > >
> ...
> > Thanks for the patch. I think we can do better.
> >
> > Also I do not see a fix in tcp_mtu_probe()?
> >
> > What about:
> >
> >
> > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> > index c0e85cc171aec099fd5d4897b1a623dd27eaee08..987bc87e255a81680ae5e23ddd01c1f0de4425ec
> > 100644
> > --- a/net/ipv4/sysctl_net_ipv4.c
> > +++ b/net/ipv4/sysctl_net_ipv4.c
> > @@ -1058,6 +1058,9 @@ static struct ctl_table ipv4_net_table[] = {
> >                 .data           = &init_net.ipv4.sysctl_tcp_reordering,
> >                 .maxlen         = sizeof(int),
> >                 .mode           = 0644,
> > +               .proc_handler   = proc_dointvec_minmax,
> > +               .extra1         = SYSCTL_ONE,
> > +               .extra2         = &init_net.ipv4.sysctl_tcp_max_reordering,
> >                 .proc_handler   = proc_dointvec
> >         },
> >         {
> > @@ -1293,7 +1296,8 @@ static struct ctl_table ipv4_net_table[] = {
> >                 .data           = &init_net.ipv4.sysctl_tcp_max_reordering,
> >                 .maxlen         = sizeof(int),
> >                 .mode           = 0644,
> > -               .proc_handler   = proc_dointvec
> > +               .proc_handler   = proc_dointvec_minmax,
> > +               .extra1         = SYSCTL_ONE,
>
> Would it be reasonable to put in a 'sanity' upper bound here?

Probably, I am not even sure we use this much with RACK anyway.

Neal, what do you think?

>
> Since tcp_max_reordering can be lowered after tcp_reordering is set
> the bound set here isn't 'hard'.
> So the same sanity limit for both may be enough?
>
> I also found this 'gem':
> void tcp_enter_loss(struct sock *sk)
> {
> ...
>         u8 reordering;
> ...
>         reordering = READ_ONCE(net->ipv4.sysctl_tcp_reordering);
>
> Which makes be think that someone expected these values to be small.
>
> I don't know what the sane bounds are for either sysctl or tp->reordering
> itself (which is u32).
>
> -- David
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes
  2026-06-11 17:13       ` Eric Dumazet
@ 2026-06-12  2:01         ` Kuniyuki Iwashima
  0 siblings, 0 replies; 6+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-12  2:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, Ren Wei, Neal Cardwell, netdev, pabeni, kuba,
	chia-yu.chang, ij, fmancera, yuuchihsu, idosch, herbert,
	yuantan098, zcliangcn, bird, bronzed_45_vested

On Thu, Jun 11, 2026 at 10:14 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jun 11, 2026 at 2:21 AM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Wed, 10 Jun 2026 10:04:06 -0700
> > Eric Dumazet <edumazet@google.com> wrote:
> >
> > > On Wed, Jun 10, 2026 at 9:18 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
> > > >
> > > > From: Wyatt Feng <bronzed_45_vested@icloud.com>
> > > >
> > > > Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
> > > > socket state. The sysctl is stored as an `int` but copied into the
> > > > `u32` `tp->reordering` field for new sockets, so negative writes wrap
> > > > to large values.
> > > >
> > > > With `tcp_mtu_probing=2`, the wrapped value can overflow the
> > > > `tcp_mtu_probe()` size calculation and drive the MTU probing path into
> > > > an out-of-bounds read. Route `tcp_reordering` writes through
> > > > `proc_dointvec_minmax()` and clamp them to the per-netns range
> > > > `[1, tcp_max_reordering]`.
> > > >
> > ...
> > > Thanks for the patch. I think we can do better.
> > >
> > > Also I do not see a fix in tcp_mtu_probe()?
> > >
> > > What about:
> > >
> > >
> > > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> > > index c0e85cc171aec099fd5d4897b1a623dd27eaee08..987bc87e255a81680ae5e23ddd01c1f0de4425ec
> > > 100644
> > > --- a/net/ipv4/sysctl_net_ipv4.c
> > > +++ b/net/ipv4/sysctl_net_ipv4.c
> > > @@ -1058,6 +1058,9 @@ static struct ctl_table ipv4_net_table[] = {
> > >                 .data           = &init_net.ipv4.sysctl_tcp_reordering,
> > >                 .maxlen         = sizeof(int),
> > >                 .mode           = 0644,
> > > +               .proc_handler   = proc_dointvec_minmax,
> > > +               .extra1         = SYSCTL_ONE,
> > > +               .extra2         = &init_net.ipv4.sysctl_tcp_max_reordering,
> > >                 .proc_handler   = proc_dointvec
> > >         },
> > >         {
> > > @@ -1293,7 +1296,8 @@ static struct ctl_table ipv4_net_table[] = {
> > >                 .data           = &init_net.ipv4.sysctl_tcp_max_reordering,
> > >                 .maxlen         = sizeof(int),
> > >                 .mode           = 0644,
> > > -               .proc_handler   = proc_dointvec
> > > +               .proc_handler   = proc_dointvec_minmax,
> > > +               .extra1         = SYSCTL_ONE,
> >
> > Would it be reasonable to put in a 'sanity' upper bound here?
> >
> > Since tcp_max_reordering can be lowered after tcp_reordering is set
> > the bound set here isn't 'hard'.
> > So the same sanity limit for both may be enough?
> >
> > I also found this 'gem':
> > void tcp_enter_loss(struct sock *sk)
> > {
> > ...
> >         u8 reordering;
> > ...
> >         reordering = READ_ONCE(net->ipv4.sysctl_tcp_reordering);
> >
> > Which makes be think that someone expected these values to be small.
> >
> > I don't know what the sane bounds are for either sysctl or tp->reordering
> > itself (which is u32).
>
>
> Kuniyuki, can you please send a fix ? You used u8 in your commit :/

Oh sorry, I'll change it to int.


>
> Default value of this sysctl is 3, but it could potentially exceed 255.
>
> Thanks.
>
> commit 46778cd16e6a5ad1b2e3a91f6c057c907379418e
> Author: Kuniyuki Iwashima <kuniyu@google.com>
> Date:   Fri Jul 15 10:17:49 2022 -0700
>
>     tcp: Fix data-races around sysctl_tcp_reordering.
>
>     While reading sysctl_tcp_reordering, it can be changed concurrently.
>     Thus, we need to add READ_ONCE() to its readers.
>
>     Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>     Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-12  2:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1780979031.git.bronzed_45_vested@icloud.com>
2026-06-10 16:18 ` [PATCH net 1/1] net: ipv4: bound TCP reordering sysctl writes Ren Wei
2026-06-10 17:04   ` Eric Dumazet
2026-06-11  9:21     ` David Laight
2026-06-11 17:13       ` Eric Dumazet
2026-06-12  2:01         ` Kuniyuki Iwashima
2026-06-11 17:18       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.