From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Andrea Mayer <andrea.mayer@uniroma2.it>,
Jon Maxwell <jmaxwell37@gmail.com>,
David Ahern <dsahern@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
"Jitindar Singh, Suraj" <surajjs@amazon.com>
Subject: [PATCH 6.1 4/4] ipv6: remove max_size check inline with ipv4
Date: Sat, 13 Jan 2024 10:50:42 +0100 [thread overview]
Message-ID: <20240113094204.182490438@linuxfoundation.org> (raw)
In-Reply-To: <20240113094204.017594027@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jon Maxwell <jmaxwell37@gmail.com>
commit af6d10345ca76670c1b7c37799f0d5576ccef277 upstream.
In ip6_dst_gc() replace:
if (entries > gc_thresh)
With:
if (entries > ops->gc_thresh)
Sending Ipv6 packets in a loop via a raw socket triggers an issue where a
route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly
consumes the Ipv6 max_size threshold which defaults to 4096 resulting in
these warnings:
[1] 99.187805] dst_alloc: 7728 callbacks suppressed
[2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
.
.
[300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
When this happens the packet is dropped and sendto() gets a network is
unreachable error:
remaining pkt 200557 errno 101
remaining pkt 196462 errno 101
.
.
remaining pkt 126821 errno 101
Implement David Aherns suggestion to remove max_size check seeing that Ipv6
has a GC to manage memory usage. Ipv4 already does not check max_size.
Here are some memory comparisons for Ipv4 vs Ipv6 with the patch:
Test by running 5 instances of a program that sends UDP packets to a raw
socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar
program.
Ipv4:
Before test:
MemFree: 29427108 kB
Slab: 237612 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0
xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0
ip_dst_cache 2881 3990 192 42 2 : tunables 0 0 0
During test:
MemFree: 29417608 kB
Slab: 247712 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0
xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0
ip_dst_cache 44394 44394 192 42 2 : tunables 0 0 0
After test:
MemFree: 29422308 kB
Slab: 238104 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0
xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0
ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
Ipv6 with patch:
Errno 101 errors are not observed anymore with the patch.
Before test:
MemFree: 29422308 kB
Slab: 238104 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0
xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0
ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
During Test:
MemFree: 29431516 kB
Slab: 240940 kB
ip6_dst_cache 11980 12064 256 32 2 : tunables 0 0 0
xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0
ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
After Test:
MemFree: 29441816 kB
Slab: 238132 kB
ip6_dst_cache 1902 2432 256 32 2 : tunables 0 0 0
xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0
ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
Tested-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cc: "Jitindar Singh, Suraj" <surajjs@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/net/dst_ops.h | 2 +-
net/core/dst.c | 8 ++------
net/ipv6/route.c | 13 +++++--------
3 files changed, 8 insertions(+), 15 deletions(-)
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -16,7 +16,7 @@ struct dst_ops {
unsigned short family;
unsigned int gc_thresh;
- int (*gc)(struct dst_ops *ops);
+ void (*gc)(struct dst_ops *ops);
struct dst_entry * (*check)(struct dst_entry *, __u32 cookie);
unsigned int (*default_advmss)(const struct dst_entry *);
unsigned int (*mtu)(const struct dst_entry *);
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -82,12 +82,8 @@ void *dst_alloc(struct dst_ops *ops, str
if (ops->gc &&
!(flags & DST_NOCOUNT) &&
- dst_entries_get_fast(ops) > ops->gc_thresh) {
- if (ops->gc(ops)) {
- pr_notice_ratelimited("Route cache is full: consider increasing sysctl net.ipv6.route.max_size.\n");
- return NULL;
- }
- }
+ dst_entries_get_fast(ops) > ops->gc_thresh)
+ ops->gc(ops);
dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC);
if (!dst)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -91,7 +91,7 @@ static struct dst_entry *ip6_negative_ad
static void ip6_dst_destroy(struct dst_entry *);
static void ip6_dst_ifdown(struct dst_entry *,
struct net_device *dev, int how);
-static int ip6_dst_gc(struct dst_ops *ops);
+static void ip6_dst_gc(struct dst_ops *ops);
static int ip6_pkt_discard(struct sk_buff *skb);
static int ip6_pkt_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb);
@@ -3288,11 +3288,10 @@ out:
return dst;
}
-static int ip6_dst_gc(struct dst_ops *ops)
+static void ip6_dst_gc(struct dst_ops *ops)
{
struct net *net = container_of(ops, struct net, ipv6.ip6_dst_ops);
int rt_min_interval = net->ipv6.sysctl.ip6_rt_gc_min_interval;
- int rt_max_size = net->ipv6.sysctl.ip6_rt_max_size;
int rt_elasticity = net->ipv6.sysctl.ip6_rt_gc_elasticity;
int rt_gc_timeout = net->ipv6.sysctl.ip6_rt_gc_timeout;
unsigned long rt_last_gc = net->ipv6.ip6_rt_last_gc;
@@ -3300,11 +3299,10 @@ static int ip6_dst_gc(struct dst_ops *op
int entries;
entries = dst_entries_get_fast(ops);
- if (entries > rt_max_size)
+ if (entries > ops->gc_thresh)
entries = dst_entries_get_slow(ops);
- if (time_after(rt_last_gc + rt_min_interval, jiffies) &&
- entries <= rt_max_size)
+ if (time_after(rt_last_gc + rt_min_interval, jiffies))
goto out;
fib6_run_gc(atomic_inc_return(&net->ipv6.ip6_rt_gc_expire), net, true);
@@ -3314,7 +3312,6 @@ static int ip6_dst_gc(struct dst_ops *op
out:
val = atomic_read(&net->ipv6.ip6_rt_gc_expire);
atomic_set(&net->ipv6.ip6_rt_gc_expire, val - (val >> rt_elasticity));
- return entries > rt_max_size;
}
static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg,
@@ -6517,7 +6514,7 @@ static int __net_init ip6_route_net_init
#endif
net->ipv6.sysctl.flush_delay = 0;
- net->ipv6.sysctl.ip6_rt_max_size = 4096;
+ net->ipv6.sysctl.ip6_rt_max_size = INT_MAX;
net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2;
net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ;
net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ;
next prev parent reply other threads:[~2024-01-13 10:02 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-13 9:50 [PATCH 6.1 0/4] 6.1.73-rc1 review Greg Kroah-Hartman
2024-01-13 9:50 ` [PATCH 6.1 1/4] cifs: fix flushing folio regression for 6.1 backport Greg Kroah-Hartman
2024-01-13 20:08 ` Salvatore Bonaccorso
2024-01-13 20:19 ` Greg Kroah-Hartman
2024-01-13 20:24 ` Salvatore Bonaccorso
2024-01-13 20:39 ` Greg Kroah-Hartman
2024-01-13 9:50 ` [PATCH 6.1 2/4] Revert "nfsd: call nfsd_last_thread() before final nfsd_put()" Greg Kroah-Hartman
2024-01-13 9:50 ` [PATCH 6.1 3/4] Revert "nfsd: separate nfsd_last_thread() from nfsd_put()" Greg Kroah-Hartman
2024-01-13 9:50 ` Greg Kroah-Hartman [this message]
2024-01-13 18:43 ` [PATCH 6.1 0/4] 6.1.73-rc1 review SeongJae Park
2024-01-14 10:34 ` Pavel Machek
2024-01-14 19:38 ` Ron Economos
2024-01-14 22:01 ` Slade Watkins
2024-01-15 8:50 ` Naresh Kamboju
2024-01-15 10:23 ` Jon Hunter
2024-01-15 11:34 ` Shreeya Patel
2024-01-15 12:20 ` Conor Dooley
2024-01-15 19:09 ` Florian Fainelli
2024-01-15 19:46 ` Allen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240113094204.182490438@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=andrea.mayer@uniroma2.it \
--cc=dsahern@kernel.org \
--cc=jmaxwell37@gmail.com \
--cc=kuba@kernel.org \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=surajjs@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.