All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Andrea Mayer <andrea.mayer@uniroma2.it>,
	Jon Maxwell <jmaxwell37@gmail.com>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Suraj Jitindar Singh <surajjs@amazon.com>
Subject: [PATCH 4.19 25/25] ipv6: remove max_size check inline with ipv4
Date: Sat, 13 Jan 2024 10:50:06 +0100	[thread overview]
Message-ID: <20240113094205.833099566@linuxfoundation.org> (raw)
In-Reply-To: <20240113094205.025407355@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jon Maxwell <jmaxwell37@gmail.com>

commit af6d10345ca76670c1b7c37799f0d5576ccef277 upstream.

In ip6_dst_gc() replace:

  if (entries > gc_thresh)

With:

  if (entries > ops->gc_thresh)

Sending Ipv6 packets in a loop via a raw socket triggers an issue where a
route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly
consumes the Ipv6 max_size threshold which defaults to 4096 resulting in
these warnings:

[1]   99.187805] dst_alloc: 7728 callbacks suppressed
[2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
.
.
[300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.

When this happens the packet is dropped and sendto() gets a network is
unreachable error:

remaining pkt 200557 errno 101
remaining pkt 196462 errno 101
.
.
remaining pkt 126821 errno 101

Implement David Aherns suggestion to remove max_size check seeing that Ipv6
has a GC to manage memory usage. Ipv4 already does not check max_size.

Here are some memory comparisons for Ipv4 vs Ipv6 with the patch:

Test by running 5 instances of a program that sends UDP packets to a raw
socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar
program.

Ipv4:

Before test:

MemFree:        29427108 kB
Slab:             237612 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        2881   3990    192   42    2 : tunables    0    0    0

During test:

MemFree:        29417608 kB
Slab:             247712 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache       44394  44394    192   42    2 : tunables    0    0    0

After test:

MemFree:        29422308 kB
Slab:             238104 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

Ipv6 with patch:

Errno 101 errors are not observed anymore with the patch.

Before test:

MemFree:        29422308 kB
Slab:             238104 kB

ip6_dst_cache       1912   2528    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

During Test:

MemFree:        29431516 kB
Slab:             240940 kB

ip6_dst_cache      11980  12064    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

After Test:

MemFree:        29441816 kB
Slab:             238132 kB

ip6_dst_cache       1902   2432    256   32    2 : tunables    0    0    0
xfrm_dst_cache         0      0    320   25    2 : tunables    0    0    0
ip_dst_cache        3048   4116    192   42    2 : tunables    0    0    0

Tested-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Cc: <stable@vger.kernel.org> # 4.19.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/dst_ops.h |    2 +-
 net/core/dst.c        |    8 ++------
 net/ipv6/route.c      |   13 +++++--------
 3 files changed, 8 insertions(+), 15 deletions(-)

--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -16,7 +16,7 @@ struct dst_ops {
 	unsigned short		family;
 	unsigned int		gc_thresh;
 
-	int			(*gc)(struct dst_ops *ops);
+	void			(*gc)(struct dst_ops *ops);
 	struct dst_entry *	(*check)(struct dst_entry *, __u32 cookie);
 	unsigned int		(*default_advmss)(const struct dst_entry *);
 	unsigned int		(*mtu)(const struct dst_entry *);
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -99,12 +99,8 @@ void *dst_alloc(struct dst_ops *ops, str
 
 	if (ops->gc &&
 	    !(flags & DST_NOCOUNT) &&
-	    dst_entries_get_fast(ops) > ops->gc_thresh) {
-		if (ops->gc(ops)) {
-			pr_notice_ratelimited("Route cache is full: consider increasing sysctl net.ipv6.route.max_size.\n");
-			return NULL;
-		}
-	}
+	    dst_entries_get_fast(ops) > ops->gc_thresh)
+		ops->gc(ops);
 
 	dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC);
 	if (!dst)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -92,7 +92,7 @@ static struct dst_entry *ip6_negative_ad
 static void		ip6_dst_destroy(struct dst_entry *);
 static void		ip6_dst_ifdown(struct dst_entry *,
 				       struct net_device *dev, int how);
-static int		 ip6_dst_gc(struct dst_ops *ops);
+static void		 ip6_dst_gc(struct dst_ops *ops);
 
 static int		ip6_pkt_discard(struct sk_buff *skb);
 static int		ip6_pkt_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb);
@@ -2767,11 +2767,10 @@ out:
 	return dst;
 }
 
-static int ip6_dst_gc(struct dst_ops *ops)
+static void ip6_dst_gc(struct dst_ops *ops)
 {
 	struct net *net = container_of(ops, struct net, ipv6.ip6_dst_ops);
 	int rt_min_interval = net->ipv6.sysctl.ip6_rt_gc_min_interval;
-	int rt_max_size = net->ipv6.sysctl.ip6_rt_max_size;
 	int rt_elasticity = net->ipv6.sysctl.ip6_rt_gc_elasticity;
 	int rt_gc_timeout = net->ipv6.sysctl.ip6_rt_gc_timeout;
 	unsigned long rt_last_gc = net->ipv6.ip6_rt_last_gc;
@@ -2779,11 +2778,10 @@ static int ip6_dst_gc(struct dst_ops *op
 	int entries;
 
 	entries = dst_entries_get_fast(ops);
-	if (entries > rt_max_size)
+	if (entries > ops->gc_thresh)
 		entries = dst_entries_get_slow(ops);
 
-	if (time_after(rt_last_gc + rt_min_interval, jiffies) &&
-	    entries <= rt_max_size)
+	if (time_after(rt_last_gc + rt_min_interval, jiffies))
 		goto out;
 
 	fib6_run_gc(atomic_inc_return(&net->ipv6.ip6_rt_gc_expire), net, true);
@@ -2793,7 +2791,6 @@ static int ip6_dst_gc(struct dst_ops *op
 out:
 	val = atomic_read(&net->ipv6.ip6_rt_gc_expire);
 	atomic_set(&net->ipv6.ip6_rt_gc_expire, val - (val >> rt_elasticity));
-	return entries > rt_max_size;
 }
 
 static int ip6_convert_metrics(struct net *net, struct fib6_info *rt,
@@ -5336,7 +5333,7 @@ static int __net_init ip6_route_net_init
 #endif
 
 	net->ipv6.sysctl.flush_delay = 0;
-	net->ipv6.sysctl.ip6_rt_max_size = 4096;
+	net->ipv6.sysctl.ip6_rt_max_size = INT_MAX;
 	net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2;
 	net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ;
 	net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ;



  parent reply	other threads:[~2024-01-13  9:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-13  9:49 [PATCH 4.19 00/25] 4.19.305-rc1 review Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 01/25] nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 02/25] i40e: Fix filter input checks to prevent config with invalid values Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 03/25] net: sched: em_text: fix possible memory leak in em_text_destroy() Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 04/25] ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 05/25] net: bcmgenet: Fix FCS generation for fragmented skbuffs Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 06/25] net: Save and restore msg_namelen in sock_sendmsg Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 07/25] i40e: fix use-after-free in i40e_aqc_add_filters() Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 08/25] i40e: Restore VF MSI-X state during PCI reset Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 09/25] net/qla3xxx: switch from pci_ to dma_ API Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 10/25] net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 11/25] asix: Add check for usbnet_get_endpoints Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 12/25] bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters() Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 13/25] mm/memory-failure: check the mapcount of the precise page Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 14/25] firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 15/25] mm: fix unmap_mapping_range high bits shift bug Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 16/25] mmc: rpmb: fixes pause retune on all RPMB partitions Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 17/25] mmc: core: Cancel delayed work before releasing host Greg Kroah-Hartman
2024-01-13  9:49 ` [PATCH 4.19 18/25] fuse: nlookup missing decrement in fuse_direntplus_link Greg Kroah-Hartman
2024-01-13  9:50 ` [PATCH 4.19 19/25] netfilter: nf_tables: Reject tables of unsupported family Greg Kroah-Hartman
2024-01-13  9:50 ` [PATCH 4.19 20/25] PCI: Extract ATS disabling to a helper function Greg Kroah-Hartman
2024-01-13  9:50 ` [PATCH 4.19 21/25] PCI: Disable ATS for specific Intel IPU E2000 devices Greg Kroah-Hartman
2024-01-13  9:50 ` [PATCH 4.19 22/25] net: add a route cache full diagnostic message Greg Kroah-Hartman
2024-01-13  9:50 ` [PATCH 4.19 23/25] net/dst: use a smaller percpu_counter batch for dst entries accounting Greg Kroah-Hartman
2024-01-13  9:50 ` [PATCH 4.19 24/25] ipv6: make ip6_rt_gc_expire an atomic_t Greg Kroah-Hartman
2024-01-13  9:50 ` Greg Kroah-Hartman [this message]
2024-01-14 10:32 ` [PATCH 4.19 00/25] 4.19.305-rc1 review Pavel Machek
2024-01-15  8:57 ` Naresh Kamboju
2024-01-15 10:23 ` Jon Hunter
2024-01-15 16:24 ` Harshit Mogalapalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240113094205.833099566@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=andrea.mayer@uniroma2.it \
    --cc=dsahern@kernel.org \
    --cc=jmaxwell37@gmail.com \
    --cc=kuba@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=surajjs@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.