netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Namespaceify two sysctls related with route
@ 2022-08-22  4:53 cgel.zte
  2022-08-22  4:54 ` [PATCH 1/3] ipv4: Namespaceify route/error_cost knob cgel.zte
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: cgel.zte @ 2022-08-22  4:53 UTC (permalink / raw)
  To: davem, kuba, yoshfuji, dsahern; +Cc: netdev, linl, xu.xin16

From: xu xin <xu.xin16@zte.com.cn>

With the rise of cloud native, more and more container applications are
deployed. The network namespace is one of the foundations of the container.
The sysctls of error_cost and error_burst are important knobs to control
the sending frequency of ICMP_DEST_UNREACH packet for ipv4. When different
containers has requirements on the tuning of error_cost and error_burst,
for host's security, the sysctls should exist per network namespace.

Different netns has different requirements on the setting of error_cost
and error_burst, which are related with limiting the frequency of sending
ICMP_DEST_UNREACH packets. Enable them to be configured per netns.

*** BLURB HERE ***

xu xin (3):
  ipv4: Namespaceify route/error_cost knob
  ipv4: Namespaceify route/error_burst knob
  ipv4: add documentation of two sysctls about icmp

 Documentation/networking/ip-sysctl.rst | 17 ++++++++++
 include/net/netns/ipv4.h               |  2 ++
 net/ipv4/route.c                       | 45 ++++++++++++++------------
 3 files changed, 44 insertions(+), 20 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] ipv4: Namespaceify route/error_cost knob
  2022-08-22  4:53 [PATCH 0/3] Namespaceify two sysctls related with route cgel.zte
@ 2022-08-22  4:54 ` cgel.zte
  2022-08-22  4:55 ` [PATCH 2/3] ipv4: Namespaceify route/error_burst knob cgel.zte
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: cgel.zte @ 2022-08-22  4:54 UTC (permalink / raw)
  To: davem, kuba, yoshfuji, dsahern
  Cc: netdev, linl, xu.xin16, Yunkai Zhang, CGEL ZTE

From: xu xin <xu.xin16@zte.com.cn>

Different netns has different requirement on the setting of error_cost
sysctl which is used to limit the max frequency of sending
ICMP_DEST_UNREACH packet together with error_burst. To put it simply,
it refers to the minimum time interval between two consecutive
ICMP_DEST_UNREACHABLE packets sent to the same peer when now is
icmp-stable period not the burst case after a long calm time.

Enable error_cost to be configured per network namespace.

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/route.c         | 23 +++++++++++++----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index c7320ef356d9..319395bbad3c 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -85,6 +85,7 @@ struct netns_ipv4 {
 	u32 ip_rt_min_pmtu;
 	int ip_rt_mtu_expires;
 	int ip_rt_min_advmss;
+	int ip_rt_error_cost;
 
 	struct local_ports ip_local_ports;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 795cbe1de912..b022ae749640 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -118,7 +118,6 @@ static int ip_rt_max_size;
 static int ip_rt_redirect_number __read_mostly	= 9;
 static int ip_rt_redirect_load __read_mostly	= HZ / 50;
 static int ip_rt_redirect_silence __read_mostly	= ((HZ / 50) << (9 + 1));
-static int ip_rt_error_cost __read_mostly	= HZ;
 static int ip_rt_error_burst __read_mostly	= 5 * HZ;
 
 static int ip_rt_gc_timeout __read_mostly	= RT_GC_TIMEOUT;
@@ -949,6 +948,7 @@ static int ip_error(struct sk_buff *skb)
 	SKB_DR(reason);
 	bool send;
 	int code;
+	int error_cost;
 
 	if (netif_is_l3_master(skb->dev)) {
 		dev = __dev_get_by_index(dev_net(skb->dev), IPCB(skb)->iif);
@@ -1002,11 +1002,13 @@ static int ip_error(struct sk_buff *skb)
 	if (peer) {
 		now = jiffies;
 		peer->rate_tokens += now - peer->rate_last;
+		error_cost = READ_ONCE(net->ipv4.ip_rt_error_cost);
+
 		if (peer->rate_tokens > ip_rt_error_burst)
 			peer->rate_tokens = ip_rt_error_burst;
 		peer->rate_last = now;
-		if (peer->rate_tokens >= ip_rt_error_cost)
-			peer->rate_tokens -= ip_rt_error_cost;
+		if (peer->rate_tokens >= error_cost)
+			peer->rate_tokens -= error_cost;
 		else
 			send = false;
 		inet_putpeer(peer);
@@ -3535,13 +3537,6 @@ static struct ctl_table ipv4_route_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
-	{
-		.procname	= "error_cost",
-		.data		= &ip_rt_error_cost,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
 	{
 		.procname	= "error_burst",
 		.data		= &ip_rt_error_burst,
@@ -3590,6 +3585,13 @@ static struct ctl_table ipv4_route_netns_table[] = {
 		.mode       = 0644,
 		.proc_handler   = proc_dointvec,
 	},
+	{
+		.procname   = "error_cost",
+		.data       = &init_net.ipv4.ip_rt_error_cost,
+		.maxlen     = sizeof(int),
+		.mode       = 0644,
+		.proc_handler   = proc_dointvec,
+	},
 	{ },
 };
 
@@ -3653,6 +3655,7 @@ static __net_init int netns_ip_rt_init(struct net *net)
 	net->ipv4.ip_rt_min_pmtu = DEFAULT_MIN_PMTU;
 	net->ipv4.ip_rt_mtu_expires = DEFAULT_MTU_EXPIRES;
 	net->ipv4.ip_rt_min_advmss = DEFAULT_MIN_ADVMSS;
+	net->ipv4.ip_rt_error_cost = HZ;
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] ipv4: Namespaceify route/error_burst knob
  2022-08-22  4:53 [PATCH 0/3] Namespaceify two sysctls related with route cgel.zte
  2022-08-22  4:54 ` [PATCH 1/3] ipv4: Namespaceify route/error_cost knob cgel.zte
@ 2022-08-22  4:55 ` cgel.zte
  2022-08-22  4:56 ` [PATCH 3/3] ipv4: add documentation of two sysctls about icmp cgel.zte
  2022-08-23  2:05 ` [PATCH 0/3] Namespaceify two sysctls related with route Jakub Kicinski
  3 siblings, 0 replies; 5+ messages in thread
From: cgel.zte @ 2022-08-22  4:55 UTC (permalink / raw)
  To: davem, kuba, yoshfuji, dsahern
  Cc: netdev, linl, xu.xin16, Yunkai Zhang, CGEL ZTE

From: xu xin <xu.xin16@zte.com.cn>

Different netns has different requirement on the setting of error_burst
sysctl which is used to limit the frequency of sending ICMP_DEST_UNREACH
packet together with error_cost. To put it simply, if the rate of
error_burst over error_cost is larger, then allowd burstly-sent
ICMP_DEST_UNREACH packets after a long calm time (no dest-unreachable
icmp packets) is more.

Enable error_burst to be configured per network namespace.

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/route.c         | 24 +++++++++++++-----------
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 319395bbad3c..03d16cf32508 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -86,6 +86,7 @@ struct netns_ipv4 {
 	int ip_rt_mtu_expires;
 	int ip_rt_min_advmss;
 	int ip_rt_error_cost;
+	int ip_rt_error_burst;
 
 	struct local_ports ip_local_ports;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b022ae749640..86fbb2b511c1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -114,11 +114,11 @@
 #define DEFAULT_MIN_PMTU (512 + 20 + 20)
 #define DEFAULT_MTU_EXPIRES (10 * 60 * HZ)
 #define DEFAULT_MIN_ADVMSS 256
+#define DEFAULT_ERROR_BURST	(5 * HZ)
 static int ip_rt_max_size;
 static int ip_rt_redirect_number __read_mostly	= 9;
 static int ip_rt_redirect_load __read_mostly	= HZ / 50;
 static int ip_rt_redirect_silence __read_mostly	= ((HZ / 50) << (9 + 1));
-static int ip_rt_error_burst __read_mostly	= 5 * HZ;
 
 static int ip_rt_gc_timeout __read_mostly	= RT_GC_TIMEOUT;
 
@@ -948,7 +948,7 @@ static int ip_error(struct sk_buff *skb)
 	SKB_DR(reason);
 	bool send;
 	int code;
-	int error_cost;
+	int error_cost, error_burst;
 
 	if (netif_is_l3_master(skb->dev)) {
 		dev = __dev_get_by_index(dev_net(skb->dev), IPCB(skb)->iif);
@@ -1003,9 +1003,10 @@ static int ip_error(struct sk_buff *skb)
 		now = jiffies;
 		peer->rate_tokens += now - peer->rate_last;
 		error_cost = READ_ONCE(net->ipv4.ip_rt_error_cost);
+		error_burst = READ_ONCE(net->ipv4.ip_rt_error_burst);
 
-		if (peer->rate_tokens > ip_rt_error_burst)
-			peer->rate_tokens = ip_rt_error_burst;
+		if (peer->rate_tokens > error_burst)
+			peer->rate_tokens = error_burst;
 		peer->rate_last = now;
 		if (peer->rate_tokens >= error_cost)
 			peer->rate_tokens -= error_cost;
@@ -3537,13 +3538,6 @@ static struct ctl_table ipv4_route_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
-	{
-		.procname	= "error_burst",
-		.data		= &ip_rt_error_burst,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
 	{
 		.procname	= "gc_elasticity",
 		.data		= &ip_rt_gc_elasticity,
@@ -3592,6 +3586,13 @@ static struct ctl_table ipv4_route_netns_table[] = {
 		.mode       = 0644,
 		.proc_handler   = proc_dointvec,
 	},
+	{
+		.procname       = "error_burst",
+		.data           = &init_net.ipv4.ip_rt_error_burst,
+		.maxlen     = sizeof(int),
+		.mode       = 0644,
+		.proc_handler   = proc_dointvec,
+	},
 	{ },
 };
 
@@ -3656,6 +3657,7 @@ static __net_init int netns_ip_rt_init(struct net *net)
 	net->ipv4.ip_rt_mtu_expires = DEFAULT_MTU_EXPIRES;
 	net->ipv4.ip_rt_min_advmss = DEFAULT_MIN_ADVMSS;
 	net->ipv4.ip_rt_error_cost = HZ;
+	net->ipv4.ip_rt_error_burst = DEFAULT_ERROR_BURST;
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] ipv4: add documentation of two sysctls about icmp
  2022-08-22  4:53 [PATCH 0/3] Namespaceify two sysctls related with route cgel.zte
  2022-08-22  4:54 ` [PATCH 1/3] ipv4: Namespaceify route/error_cost knob cgel.zte
  2022-08-22  4:55 ` [PATCH 2/3] ipv4: Namespaceify route/error_burst knob cgel.zte
@ 2022-08-22  4:56 ` cgel.zte
  2022-08-23  2:05 ` [PATCH 0/3] Namespaceify two sysctls related with route Jakub Kicinski
  3 siblings, 0 replies; 5+ messages in thread
From: cgel.zte @ 2022-08-22  4:56 UTC (permalink / raw)
  To: corbet, davem, kuba, yoshfuji, dsahern
  Cc: netdev, linl, xu.xin16, Yunkai Zhang, CGEL ZTE

From: xu xin <xu.xin16@zte.com.cn>

Add the descriptions of the sysctls of error_cost and error_burst in
Documentation/networking/ip-sysctl.rst.

Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
---
 Documentation/networking/ip-sysctl.rst | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 56cd4ea059b2..c113a34a4115 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -156,6 +156,23 @@ route/max_size - INTEGER
 	From linux kernel 3.6 onwards, this is deprecated for ipv4
 	as route cache is no longer used.
 
+route/error_cost - INTEGER
+	The minimum time interval between two consecutive ICMP-DEST-
+	UNREACHABLE packets allowed sent to the same peer in the stable
+	period. Basically, The higher its value is, the lower the general
+	frequency of sending ICMP DEST-UNREACHABLE packets.
+
+	Default: HZ (one second)
+
+route/error_burst - INTEGER
+	Together with error_cost, it controls the max number of burstly
+	sent ICMP DEST-UNREACHABLE packets after a long calm time (no
+	sending ICMP DEST-UNREACHABLE). Basically, the higher the rate
+	of error_burst over error_cost is, the more allowed burstly sent
+	ICMP DEST-UNREACHABLE packets after a long calm time.
+
+	Default: 5 * HZ
+
 neigh/default/gc_thresh1 - INTEGER
 	Minimum number of entries to keep.  Garbage collector will not
 	purge entries if there are fewer than this number.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] Namespaceify two sysctls related with route
  2022-08-22  4:53 [PATCH 0/3] Namespaceify two sysctls related with route cgel.zte
                   ` (2 preceding siblings ...)
  2022-08-22  4:56 ` [PATCH 3/3] ipv4: add documentation of two sysctls about icmp cgel.zte
@ 2022-08-23  2:05 ` Jakub Kicinski
  3 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2022-08-23  2:05 UTC (permalink / raw)
  To: cgel.zte; +Cc: davem, yoshfuji, dsahern, netdev, linl, xu.xin16

On Mon, 22 Aug 2022 04:53:10 +0000 cgel.zte@gmail.com wrote:
> With the rise of cloud native, more and more container applications are
> deployed. The network namespace is one of the foundations of the container.
> The sysctls of error_cost and error_burst are important knobs to control
> the sending frequency of ICMP_DEST_UNREACH packet for ipv4. When different
> containers has requirements on the tuning of error_cost and error_burst,
> for host's security, the sysctls should exist per network namespace.
> 
> Different netns has different requirements on the setting of error_cost
> and error_burst, which are related with limiting the frequency of sending
> ICMP_DEST_UNREACH packets. Enable them to be configured per netns.

I'm not familiar with the IPv6 implementation either, but someone needs
to explain to me why the knob is important in v4 while entirely absent
in v6. On the surface this makes no sense. There may be a good reason,
it just needs to be stated.

Also from the patches:

Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>

Bots / teams can't sign off patches, I've been over this with your
colleagues. Please put your team's name in your signoff, e.g.

Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-23  2:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-22  4:53 [PATCH 0/3] Namespaceify two sysctls related with route cgel.zte
2022-08-22  4:54 ` [PATCH 1/3] ipv4: Namespaceify route/error_cost knob cgel.zte
2022-08-22  4:55 ` [PATCH 2/3] ipv4: Namespaceify route/error_burst knob cgel.zte
2022-08-22  4:56 ` [PATCH 3/3] ipv4: add documentation of two sysctls about icmp cgel.zte
2022-08-23  2:05 ` [PATCH 0/3] Namespaceify two sysctls related with route Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).