netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] tcp: show the right value of TCP_MIB_RTOMIN
@ 2024-06-06 15:03 Jason Xing
  2024-06-06 15:03 ` [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option Jason Xing
  2024-06-06 15:03 ` [PATCH net-next 2/2] tcp: fix showing wrong rtomin in snmp file when setting sysctl_tcp_rto_min_us Jason Xing
  0 siblings, 2 replies; 8+ messages in thread
From: Jason Xing @ 2024-06-06 15:03 UTC (permalink / raw)
  To: edumazet, kuba, pabeni, davem, dsahern, ncardwell
  Cc: netdev, kerneljasonxing, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

When doing 'cat /proc/net/snmp' if we already tune the tcp rto min by
using 'sysctl -w' or 'ip route', the result always the same 200 ms.
However, it cannot reflect the real situation because the minimum value
of rto min can be changed smaller than TCP_RTO_MIN.

Jason Xing (2):
  tcp: fix showing wrong rtomin in snmp file when using route option
  tcp: fix showing wrong rtomin in snmp file when setting
    sysctl_tcp_rto_min_us

 include/net/tcp.h  |  2 ++
 net/ipv4/metrics.c |  4 ++++
 net/ipv4/proc.c    | 15 +++++++++++++++
 3 files changed, 21 insertions(+)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option
  2024-06-06 15:03 [PATCH net-next 0/2] tcp: show the right value of TCP_MIB_RTOMIN Jason Xing
@ 2024-06-06 15:03 ` Jason Xing
  2024-06-06 15:14   ` Eric Dumazet
  2024-06-06 15:03 ` [PATCH net-next 2/2] tcp: fix showing wrong rtomin in snmp file when setting sysctl_tcp_rto_min_us Jason Xing
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Xing @ 2024-06-06 15:03 UTC (permalink / raw)
  To: edumazet, kuba, pabeni, davem, dsahern, ncardwell
  Cc: netdev, kerneljasonxing, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which
is true if without any method to tune rto min. In 2007, we got a way to
tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN
in /proc/net/snmp still shows the same, namely, 200.

As RFC 1213 said:
  "tcpRtoMin
   ...
   The minimum value permitted by a TCP implementation for the
   retransmission timeout, measured in milliseconds."

Since the lower bound of rto can be changed, we should accordingly
adjust the output of /proc/net/snmp.

Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/tcp.h  | 2 ++
 net/ipv4/metrics.c | 4 ++++
 net/ipv4/proc.c    | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a70fc39090fe..a111a5d151b7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX);
 extern int sysctl_tcp_max_orphans;
 extern long sysctl_tcp_mem[3];
 
+extern unsigned int tcp_rtax_rtomin;
+
 #define TCP_RACK_LOSS_DETECTION  0x1 /* Use RACK to detect losses */
 #define TCP_RACK_STATIC_REO_WND  0x2 /* Use static RACK reo wnd */
 #define TCP_RACK_NO_DUPTHRESH    0x4 /* Do not use DUPACK threshold in RACK */
diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
index 8ddac1f595ed..61ca949b8281 100644
--- a/net/ipv4/metrics.c
+++ b/net/ipv4/metrics.c
@@ -7,6 +7,8 @@
 #include <net/net_namespace.h>
 #include <net/tcp.h>
 
+unsigned int tcp_rtax_rtomin __read_mostly;
+
 static int ip_metrics_convert(struct nlattr *fc_mx,
 			      int fc_mx_len, u32 *metrics,
 			      struct netlink_ext_ack *extack)
@@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx,
 	if (ecn_ca)
 		metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
 
+	tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1];
+
 	return 0;
 }
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 6c4664c681ca..ce387081a3c9 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
 		/* MaxConn field is signed, RFC 2012 */
 		if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
 			seq_printf(seq, " %ld", buff[i]);
+		else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMIN)
+			seq_printf(seq, " %lu",
+				   tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]);
 		else
 			seq_printf(seq, " %lu", buff[i]);
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/2] tcp: fix showing wrong rtomin in snmp file when setting sysctl_tcp_rto_min_us
  2024-06-06 15:03 [PATCH net-next 0/2] tcp: show the right value of TCP_MIB_RTOMIN Jason Xing
  2024-06-06 15:03 ` [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option Jason Xing
@ 2024-06-06 15:03 ` Jason Xing
  2024-06-06 15:17   ` Eric Dumazet
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Xing @ 2024-06-06 15:03 UTC (permalink / raw)
  To: edumazet, kuba, pabeni, davem, dsahern, ncardwell
  Cc: netdev, kerneljasonxing, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

A few days ago, sysctl_tcp_rto_min_us has been introduced to allow user to
tune the rto min value per netns. But the RtoMin field in /proc/net/snmp
should have been adjusted accordingly. Or else, it will show 200 which is
TCP_RTO_MIN.

This patch can show the correct value even when user sets though using both
'ip route' and 'sysctl -w'. The priority from high to low like what
tcp_rto_min() shows to us is:
1) ip route option rto_min
2) icsk->icsk_rto_min

Fixes: f086edef71be ("tcp: add sysctl_tcp_rto_min_us")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/ipv4/proc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index ce387081a3c9..4aeef3118442 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -409,6 +409,19 @@ static int snmp_seq_show_ipstats(struct seq_file *seq, void *v)
 	return 0;
 }
 
+static void snmp_seq_show_tcp_rtomin(struct seq_file *seq, struct net *net,
+				     unsigned long val)
+{
+	int sysctl_rtomin = READ_ONCE(net->ipv4.sysctl_tcp_rto_min_us);
+
+	if (tcp_rtax_rtomin)
+		seq_printf(seq, " %u", tcp_rtax_rtomin);
+	else if (sysctl_rtomin != jiffies_to_usecs(TCP_RTO_MIN))
+		seq_printf(seq, " %lu", usecs_to_jiffies(sysctl_rtomin));
+	else
+		seq_printf(seq, " %lu", val);
+}
+
 static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
 {
 	unsigned long buff[TCPUDP_MIB_MAX];
@@ -429,8 +442,7 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
 		if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
 			seq_printf(seq, " %ld", buff[i]);
 		else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMIN)
-			seq_printf(seq, " %lu",
-				   tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]);
+			snmp_seq_show_tcp_rtomin(seq, net, buff[i]);
 		else
 			seq_printf(seq, " %lu", buff[i]);
 	}
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option
  2024-06-06 15:03 ` [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option Jason Xing
@ 2024-06-06 15:14   ` Eric Dumazet
  2024-06-06 15:40     ` Jason Xing
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2024-06-06 15:14 UTC (permalink / raw)
  To: Jason Xing; +Cc: kuba, pabeni, davem, dsahern, ncardwell, netdev, Jason Xing

On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which
> is true if without any method to tune rto min. In 2007, we got a way to
> tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN
> in /proc/net/snmp still shows the same, namely, 200.
>
> As RFC 1213 said:
>   "tcpRtoMin
>    ...
>    The minimum value permitted by a TCP implementation for the
>    retransmission timeout, measured in milliseconds."
>
> Since the lower bound of rto can be changed, we should accordingly
> adjust the output of /proc/net/snmp.
>
> Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  include/net/tcp.h  | 2 ++
>  net/ipv4/metrics.c | 4 ++++
>  net/ipv4/proc.c    | 3 +++
>  3 files changed, 9 insertions(+)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index a70fc39090fe..a111a5d151b7 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX);
>  extern int sysctl_tcp_max_orphans;
>  extern long sysctl_tcp_mem[3];
>
> +extern unsigned int tcp_rtax_rtomin;
> +
>  #define TCP_RACK_LOSS_DETECTION  0x1 /* Use RACK to detect losses */
>  #define TCP_RACK_STATIC_REO_WND  0x2 /* Use static RACK reo wnd */
>  #define TCP_RACK_NO_DUPTHRESH    0x4 /* Do not use DUPACK threshold in RACK */
> diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
> index 8ddac1f595ed..61ca949b8281 100644
> --- a/net/ipv4/metrics.c
> +++ b/net/ipv4/metrics.c
> @@ -7,6 +7,8 @@
>  #include <net/net_namespace.h>
>  #include <net/tcp.h>
>
> +unsigned int tcp_rtax_rtomin __read_mostly;
> +
>  static int ip_metrics_convert(struct nlattr *fc_mx,
>                               int fc_mx_len, u32 *metrics,
>                               struct netlink_ext_ack *extack)
> @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx,
>         if (ecn_ca)
>                 metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
>
> +       tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1];
> +
>         return 0;
>  }
>
> diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
> index 6c4664c681ca..ce387081a3c9 100644
> --- a/net/ipv4/proc.c
> +++ b/net/ipv4/proc.c
> @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
>                 /* MaxConn field is signed, RFC 2012 */
>                 if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
>                         seq_printf(seq, " %ld", buff[i]);
> +               else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN)
> +                       seq_printf(seq, " %lu",
> +                                  tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]);
>                 else
>                         seq_printf(seq, " %lu", buff[i]);
>         }
> --
> 2.37.3
>

I do not think we can accept this patch.

1) You might have missed that we have multiple network namespaces..

2) You might have missed that we can have thousands of routes, with
different metrics.

I would leave /proc/net/snmp as it is, because no value can possibly be right.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 2/2] tcp: fix showing wrong rtomin in snmp file when setting sysctl_tcp_rto_min_us
  2024-06-06 15:03 ` [PATCH net-next 2/2] tcp: fix showing wrong rtomin in snmp file when setting sysctl_tcp_rto_min_us Jason Xing
@ 2024-06-06 15:17   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2024-06-06 15:17 UTC (permalink / raw)
  To: Jason Xing; +Cc: kuba, pabeni, davem, dsahern, ncardwell, netdev, Jason Xing

On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> A few days ago, sysctl_tcp_rto_min_us has been introduced to allow user to
> tune the rto min value per netns. But the RtoMin field in /proc/net/snmp
> should have been adjusted accordingly. Or else, it will show 200 which is
> TCP_RTO_MIN.
>
> This patch can show the correct value even when user sets though using both
> 'ip route' and 'sysctl -w'. The priority from high to low like what
> tcp_rto_min() shows to us is:
> 1) ip route option rto_min
> 2) icsk->icsk_rto_min
>
> Fixes: f086edef71be ("tcp: add sysctl_tcp_rto_min_us")

Same remark as for the prior patch.

We can not 'fix' the issue of an old MIB that can not express the
variety of situations.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option
  2024-06-06 15:14   ` Eric Dumazet
@ 2024-06-06 15:40     ` Jason Xing
  2024-06-06 15:43       ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Xing @ 2024-06-06 15:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: kuba, pabeni, davem, dsahern, ncardwell, netdev, Jason Xing

On Thu, Jun 6, 2024 at 11:14 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which
> > is true if without any method to tune rto min. In 2007, we got a way to
> > tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN
> > in /proc/net/snmp still shows the same, namely, 200.
> >
> > As RFC 1213 said:
> >   "tcpRtoMin
> >    ...
> >    The minimum value permitted by a TCP implementation for the
> >    retransmission timeout, measured in milliseconds."
> >
> > Since the lower bound of rto can be changed, we should accordingly
> > adjust the output of /proc/net/snmp.
> >
> > Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.")
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  include/net/tcp.h  | 2 ++
> >  net/ipv4/metrics.c | 4 ++++
> >  net/ipv4/proc.c    | 3 +++
> >  3 files changed, 9 insertions(+)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index a70fc39090fe..a111a5d151b7 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX);
> >  extern int sysctl_tcp_max_orphans;
> >  extern long sysctl_tcp_mem[3];
> >
> > +extern unsigned int tcp_rtax_rtomin;
> > +
> >  #define TCP_RACK_LOSS_DETECTION  0x1 /* Use RACK to detect losses */
> >  #define TCP_RACK_STATIC_REO_WND  0x2 /* Use static RACK reo wnd */
> >  #define TCP_RACK_NO_DUPTHRESH    0x4 /* Do not use DUPACK threshold in RACK */
> > diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
> > index 8ddac1f595ed..61ca949b8281 100644
> > --- a/net/ipv4/metrics.c
> > +++ b/net/ipv4/metrics.c
> > @@ -7,6 +7,8 @@
> >  #include <net/net_namespace.h>
> >  #include <net/tcp.h>
> >
> > +unsigned int tcp_rtax_rtomin __read_mostly;
> > +
> >  static int ip_metrics_convert(struct nlattr *fc_mx,
> >                               int fc_mx_len, u32 *metrics,
> >                               struct netlink_ext_ack *extack)
> > @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx,
> >         if (ecn_ca)
> >                 metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
> >
> > +       tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1];
> > +
> >         return 0;
> >  }
> >
> > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
> > index 6c4664c681ca..ce387081a3c9 100644
> > --- a/net/ipv4/proc.c
> > +++ b/net/ipv4/proc.c
> > @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
> >                 /* MaxConn field is signed, RFC 2012 */
> >                 if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
> >                         seq_printf(seq, " %ld", buff[i]);
> > +               else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN)
> > +                       seq_printf(seq, " %lu",
> > +                                  tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]);
> >                 else
> >                         seq_printf(seq, " %lu", buff[i]);
> >         }
> > --
> > 2.37.3
> >
>
> I do not think we can accept this patch.
>
> 1) You might have missed that we have multiple network namespaces..

Thanks for the review.

For this patch, indeed, I think I need to consider namespaces...
For the other one, I did.

>
> 2) You might have missed that we can have thousands of routes, with
> different metrics.

Oh, that's really complicated...

>
> I would leave /proc/net/snmp as it is, because no value can possibly be right.

It cannot be right if someone tries to set rto min by 'ip route' or 'sysctl -w'.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option
  2024-06-06 15:40     ` Jason Xing
@ 2024-06-06 15:43       ` Eric Dumazet
  2024-06-06 15:50         ` Jason Xing
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2024-06-06 15:43 UTC (permalink / raw)
  To: Jason Xing; +Cc: kuba, pabeni, davem, dsahern, ncardwell, netdev, Jason Xing

On Thu, Jun 6, 2024 at 5:41 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Jun 6, 2024 at 11:14 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which
> > > is true if without any method to tune rto min. In 2007, we got a way to
> > > tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN
> > > in /proc/net/snmp still shows the same, namely, 200.
> > >
> > > As RFC 1213 said:
> > >   "tcpRtoMin
> > >    ...
> > >    The minimum value permitted by a TCP implementation for the
> > >    retransmission timeout, measured in milliseconds."
> > >
> > > Since the lower bound of rto can be changed, we should accordingly
> > > adjust the output of /proc/net/snmp.
> > >
> > > Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.")
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  include/net/tcp.h  | 2 ++
> > >  net/ipv4/metrics.c | 4 ++++
> > >  net/ipv4/proc.c    | 3 +++
> > >  3 files changed, 9 insertions(+)
> > >
> > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > index a70fc39090fe..a111a5d151b7 100644
> > > --- a/include/net/tcp.h
> > > +++ b/include/net/tcp.h
> > > @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX);
> > >  extern int sysctl_tcp_max_orphans;
> > >  extern long sysctl_tcp_mem[3];
> > >
> > > +extern unsigned int tcp_rtax_rtomin;
> > > +
> > >  #define TCP_RACK_LOSS_DETECTION  0x1 /* Use RACK to detect losses */
> > >  #define TCP_RACK_STATIC_REO_WND  0x2 /* Use static RACK reo wnd */
> > >  #define TCP_RACK_NO_DUPTHRESH    0x4 /* Do not use DUPACK threshold in RACK */
> > > diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
> > > index 8ddac1f595ed..61ca949b8281 100644
> > > --- a/net/ipv4/metrics.c
> > > +++ b/net/ipv4/metrics.c
> > > @@ -7,6 +7,8 @@
> > >  #include <net/net_namespace.h>
> > >  #include <net/tcp.h>
> > >
> > > +unsigned int tcp_rtax_rtomin __read_mostly;
> > > +
> > >  static int ip_metrics_convert(struct nlattr *fc_mx,
> > >                               int fc_mx_len, u32 *metrics,
> > >                               struct netlink_ext_ack *extack)
> > > @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx,
> > >         if (ecn_ca)
> > >                 metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
> > >
> > > +       tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1];
> > > +
> > >         return 0;
> > >  }
> > >
> > > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
> > > index 6c4664c681ca..ce387081a3c9 100644
> > > --- a/net/ipv4/proc.c
> > > +++ b/net/ipv4/proc.c
> > > @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
> > >                 /* MaxConn field is signed, RFC 2012 */
> > >                 if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
> > >                         seq_printf(seq, " %ld", buff[i]);
> > > +               else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN)
> > > +                       seq_printf(seq, " %lu",
> > > +                                  tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]);
> > >                 else
> > >                         seq_printf(seq, " %lu", buff[i]);
> > >         }
> > > --
> > > 2.37.3
> > >
> >
> > I do not think we can accept this patch.
> >
> > 1) You might have missed that we have multiple network namespaces..
>
> Thanks for the review.
>
> For this patch, indeed, I think I need to consider namespaces...
> For the other one, I did.
>
> >
> > 2) You might have missed that we can have thousands of routes, with
> > different metrics.
>
> Oh, that's really complicated...
>
> >
> > I would leave /proc/net/snmp as it is, because no value can possibly be right.
>
> It cannot be right if someone tries to set rto min by 'ip route' or 'sysctl -w'.
>

Or eBPF.

There is no way a /proc/net/snmp value can be right.

Why would anyone care, since thousands of TCP sockets can all have
different values ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option
  2024-06-06 15:43       ` Eric Dumazet
@ 2024-06-06 15:50         ` Jason Xing
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Xing @ 2024-06-06 15:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: kuba, pabeni, davem, dsahern, ncardwell, netdev, Jason Xing

On Thu, Jun 6, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jun 6, 2024 at 5:41 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Jun 6, 2024 at 11:14 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which
> > > > is true if without any method to tune rto min. In 2007, we got a way to
> > > > tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN
> > > > in /proc/net/snmp still shows the same, namely, 200.
> > > >
> > > > As RFC 1213 said:
> > > >   "tcpRtoMin
> > > >    ...
> > > >    The minimum value permitted by a TCP implementation for the
> > > >    retransmission timeout, measured in milliseconds."
> > > >
> > > > Since the lower bound of rto can be changed, we should accordingly
> > > > adjust the output of /proc/net/snmp.
> > > >
> > > > Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.")
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  include/net/tcp.h  | 2 ++
> > > >  net/ipv4/metrics.c | 4 ++++
> > > >  net/ipv4/proc.c    | 3 +++
> > > >  3 files changed, 9 insertions(+)
> > > >
> > > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > > index a70fc39090fe..a111a5d151b7 100644
> > > > --- a/include/net/tcp.h
> > > > +++ b/include/net/tcp.h
> > > > @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX);
> > > >  extern int sysctl_tcp_max_orphans;
> > > >  extern long sysctl_tcp_mem[3];
> > > >
> > > > +extern unsigned int tcp_rtax_rtomin;
> > > > +
> > > >  #define TCP_RACK_LOSS_DETECTION  0x1 /* Use RACK to detect losses */
> > > >  #define TCP_RACK_STATIC_REO_WND  0x2 /* Use static RACK reo wnd */
> > > >  #define TCP_RACK_NO_DUPTHRESH    0x4 /* Do not use DUPACK threshold in RACK */
> > > > diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
> > > > index 8ddac1f595ed..61ca949b8281 100644
> > > > --- a/net/ipv4/metrics.c
> > > > +++ b/net/ipv4/metrics.c
> > > > @@ -7,6 +7,8 @@
> > > >  #include <net/net_namespace.h>
> > > >  #include <net/tcp.h>
> > > >
> > > > +unsigned int tcp_rtax_rtomin __read_mostly;
> > > > +
> > > >  static int ip_metrics_convert(struct nlattr *fc_mx,
> > > >                               int fc_mx_len, u32 *metrics,
> > > >                               struct netlink_ext_ack *extack)
> > > > @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx,
> > > >         if (ecn_ca)
> > > >                 metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
> > > >
> > > > +       tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1];
> > > > +
> > > >         return 0;
> > > >  }
> > > >
> > > > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
> > > > index 6c4664c681ca..ce387081a3c9 100644
> > > > --- a/net/ipv4/proc.c
> > > > +++ b/net/ipv4/proc.c
> > > > @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v)
> > > >                 /* MaxConn field is signed, RFC 2012 */
> > > >                 if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN)
> > > >                         seq_printf(seq, " %ld", buff[i]);
> > > > +               else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN)
> > > > +                       seq_printf(seq, " %lu",
> > > > +                                  tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]);
> > > >                 else
> > > >                         seq_printf(seq, " %lu", buff[i]);
> > > >         }
> > > > --
> > > > 2.37.3
> > > >
> > >
> > > I do not think we can accept this patch.
> > >
> > > 1) You might have missed that we have multiple network namespaces..
> >
> > Thanks for the review.
> >
> > For this patch, indeed, I think I need to consider namespaces...
> > For the other one, I did.
> >
> > >
> > > 2) You might have missed that we can have thousands of routes, with
> > > different metrics.
> >
> > Oh, that's really complicated...
> >
> > >
> > > I would leave /proc/net/snmp as it is, because no value can possibly be right.
> >
> > It cannot be right if someone tries to set rto min by 'ip route' or 'sysctl -w'.
> >
>
> Or eBPF.
>
> There is no way a /proc/net/snmp value can be right.
>
> Why would anyone care, since thousands of TCP sockets can all have
> different values ?

I think you're right because there are too many kinds of situations.

For kernel developers or knowledgable admins, they are aware of this
'issue' (If I remember correctly, there is one comment in iproute
saying this field is obsolete.).
For some normal users, they are confused because someone reported this
'issue' to me.

I'll drop this series. There's no good way to solve it.

Thanks, Eric.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-06 15:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-06 15:03 [PATCH net-next 0/2] tcp: show the right value of TCP_MIB_RTOMIN Jason Xing
2024-06-06 15:03 ` [PATCH net-next 1/2] tcp: fix showing wrong rtomin in snmp file when using route option Jason Xing
2024-06-06 15:14   ` Eric Dumazet
2024-06-06 15:40     ` Jason Xing
2024-06-06 15:43       ` Eric Dumazet
2024-06-06 15:50         ` Jason Xing
2024-06-06 15:03 ` [PATCH net-next 2/2] tcp: fix showing wrong rtomin in snmp file when setting sysctl_tcp_rto_min_us Jason Xing
2024-06-06 15:17   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).