* [PATCH net-next v3 0/5] Remove expired routes with a separated list of routes.
@ 2024-02-02 8:21 thinker.li
2024-02-02 8:21 ` [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router() thinker.li
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: thinker.li @ 2024-02-02 8:21 UTC (permalink / raw)
To: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng, Kui-Feng Lee
From: Kui-Feng Lee <thinker.li@gmail.com>
This patchset is resent due to previous reverting. [1]
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.
Background
==========
The size of a Linux IPv6 routing table can become a big problem if not
managed appropriately. Now, Linux has a garbage collector to remove
expired routes periodically. However, this may lead to a situation in
which the routing path is blocked for a long period due to an
excessive number of routes.
For example, years ago, there is a commit c7bb4b89033b ("ipv6: tcp:
drop silly ICMPv6 packet too big messages"). The root cause is that
malicious ICMPv6 packets were sent back for every small packet sent to
them. These packets add routes with an expiration time that prompts
the GC to periodically check all routes in the tables, including
permanent ones.
Why Route Expires
=================
Users can add IPv6 routes with an expiration time manually. However,
the Neighbor Discovery protocol may also generate routes that can
expire. For example, Router Advertisement (RA) messages may create a
default route with an expiration time. [RFC 4861] For IPv4, it is not
possible to set an expiration time for a route, and there is no RA, so
there is no need to worry about such issues.
Create Routes with Expires
==========================
You can create routes with expires with the command.
For example,
ip -6 route add 2001:b000:591::3 via fe80::5054:ff:fe12:3457 \
dev enp0s3 expires 30
The route that has been generated will be deleted automatically in 30
seconds.
GC of FIB6
==========
The function called fib6_run_gc() is responsible for performing
garbage collection (GC) for the Linux IPv6 stack. It checks for the
expiration of every route by traversing the trees of routing
tables. The time taken to traverse a routing table increases with its
size. Holding the routing table lock during traversal is particularly
undesirable. Therefore, it is preferable to keep the lock for the
shortest possible duration.
Solution
========
The cause of the issue is keeping the routing table locked during the
traversal of large trees. To solve this problem, we can create a separate
list of routes that have expiration. This will prevent GC from checking
permanent routes.
Result
======
We conducted a test to measure the execution times of fib6_gc_timer_cb()
and observed that it enhances the GC of FIB6. During the test, we added
permanent routes with the following numbers: 1000, 3000, 6000, and
9000. Additionally, we added a route with an expiration time.
Here are the average execution times for the kernel without the patch.
- 120020 ns with 1000 permanent routes
- 308920 ns with 3000 ...
- 581470 ns with 6000 ...
- 855310 ns with 9000 ...
The kernel with the patch consistently takes around 14000 ns to execute,
regardless of the number of permanent routes that are installed.
Major changes from v2:
- Refactory the boilerplate checks in the test case.
- check_rt_num() and check_rt_num_clean()
Major changes from v1:
- Reduce the numbers of routes (5) in the test cases to work with
slow environments. Due to the failure on patchwork.
- Remove systemd related commands in the test case.
Major changes from the previous patchset [2]:
- Split helpers.
- fib6_set_expires() -> fib6_set_expires() and fib6_add_gc_list().
- fib6_clean_expires() -> fib6_clean_expires() and
fib6_remove_gc_list().
- Fix rt6_add_dflt_router() to avoid racing of setting expires.
- Remove unnecessary calling to fib6_clean_expires() in
ip6_route_info_create().
- Add test cases of toggling routes between permanent and temporary
and handling routes from RA messages.
- Clean up routes by deleting the existing device and adding a new
one.
- Fix a potential issue in modify_prefix_route().
---
[1] https://lore.kernel.org/all/20231219030243.25687-1-dsahern@kernel.org/
[2] https://lore.kernel.org/all/20230815180706.772638-1-thinker.li@gmail.com/
v1: https://lore.kernel.org/all/20240131064041.3445212-1-thinker.li@gmail.com/
v2: https://lore.kernel.org/all/20240201082024.1018011-1-thinker.li@gmail.com/
Kui-Feng Lee (5):
net/ipv6: set expires in rt6_add_dflt_router().
net/ipv6: Remove unnecessary clean.
net/ipv6: Remove expired routes with a separated list of routes.
net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set.
selftests/net: Adding test cases of replacing routes and route
advertisements.
include/net/ip6_fib.h | 35 +++++-
include/net/ip6_route.h | 3 +-
net/ipv6/addrconf.c | 50 ++++++--
net/ipv6/ip6_fib.c | 58 ++++++++-
net/ipv6/ndisc.c | 14 ++-
net/ipv6/route.c | 20 ++-
tools/testing/selftests/net/fib_tests.sh | 148 +++++++++++++++++++----
7 files changed, 285 insertions(+), 43 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router().
2024-02-02 8:21 [PATCH net-next v3 0/5] Remove expired routes with a separated list of routes thinker.li
@ 2024-02-02 8:21 ` thinker.li
2024-02-02 11:52 ` Hangbin Liu
2024-02-05 4:42 ` David Ahern
2024-02-02 8:21 ` [PATCH net-next v3 2/5] net/ipv6: Remove unnecessary clean thinker.li
` (3 subsequent siblings)
4 siblings, 2 replies; 18+ messages in thread
From: thinker.li @ 2024-02-02 8:21 UTC (permalink / raw)
To: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng, Kui-Feng Lee
From: Kui-Feng Lee <thinker.li@gmail.com>
Pass the duration of a lifetime (in seconds) to the function
rt6_add_dflt_router() so that it can properly set the expiration time.
The function ndisc_router_discovery() is the only one that calls
rt6_add_dflt_router(), and it will later set the expiration time for the
route created by rt6_add_dflt_router(). However, there is a gap of time
between calling rt6_add_dflt_router() and setting the expiration time in
ndisc_router_discovery(). During this period, there is a possibility that a
new route may be removed from the routing table. By setting the correct
expiration time in rt6_add_dflt_router(), we can prevent this from
happening. The reason for setting RTF_EXPIRES in rt6_add_dflt_router() is
to start the Garbage Collection (GC) timer, as it only activates when a
route with RTF_EXPIRES is added to a table.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
include/net/ip6_route.h | 3 ++-
net/ipv6/ndisc.c | 3 ++-
net/ipv6/route.c | 4 +++-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 28b065790261..52a51c69aa9d 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -170,7 +170,8 @@ struct fib6_info *rt6_get_dflt_router(struct net *net,
struct fib6_info *rt6_add_dflt_router(struct net *net,
const struct in6_addr *gwaddr,
struct net_device *dev, unsigned int pref,
- u32 defrtr_usr_metric);
+ u32 defrtr_usr_metric,
+ int lifetime);
void rt6_purge_dflt_routers(struct net *net);
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index a19999b30bc0..a68462668158 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1382,7 +1382,8 @@ static enum skb_drop_reason ndisc_router_discovery(struct sk_buff *skb)
neigh_release(neigh);
rt = rt6_add_dflt_router(net, &ipv6_hdr(skb)->saddr,
- skb->dev, pref, defrtr_usr_metric);
+ skb->dev, pref, defrtr_usr_metric,
+ lifetime);
if (!rt) {
ND_PRINTK(0, err,
"RA: %s failed to add default route\n",
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 63b4c6056582..98abba8f15cd 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4355,7 +4355,8 @@ struct fib6_info *rt6_add_dflt_router(struct net *net,
const struct in6_addr *gwaddr,
struct net_device *dev,
unsigned int pref,
- u32 defrtr_usr_metric)
+ u32 defrtr_usr_metric,
+ int lifetime)
{
struct fib6_config cfg = {
.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT,
@@ -4368,6 +4369,7 @@ struct fib6_info *rt6_add_dflt_router(struct net *net,
.fc_nlinfo.portid = 0,
.fc_nlinfo.nlh = NULL,
.fc_nlinfo.nl_net = net,
+ .fc_expires = jiffies_to_clock_t(lifetime * HZ),
};
cfg.fc_gateway = *gwaddr;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v3 2/5] net/ipv6: Remove unnecessary clean.
2024-02-02 8:21 [PATCH net-next v3 0/5] Remove expired routes with a separated list of routes thinker.li
2024-02-02 8:21 ` [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router() thinker.li
@ 2024-02-02 8:21 ` thinker.li
2024-02-05 4:42 ` David Ahern
2024-02-02 8:21 ` [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes thinker.li
` (2 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: thinker.li @ 2024-02-02 8:21 UTC (permalink / raw)
To: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng, Kui-Feng Lee
From: Kui-Feng Lee <thinker.li@gmail.com>
The route here is newly created. It is unnecessary to call
fib6_clean_expires() on it.
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
net/ipv6/route.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 98abba8f15cd..dd6ff5b20918 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3765,8 +3765,6 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
if (cfg->fc_flags & RTF_EXPIRES)
fib6_set_expires(rt, jiffies +
clock_t_to_jiffies(cfg->fc_expires));
- else
- fib6_clean_expires(rt);
if (cfg->fc_protocol == RTPROT_UNSPEC)
cfg->fc_protocol = RTPROT_BOOT;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes.
2024-02-02 8:21 [PATCH net-next v3 0/5] Remove expired routes with a separated list of routes thinker.li
2024-02-02 8:21 ` [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router() thinker.li
2024-02-02 8:21 ` [PATCH net-next v3 2/5] net/ipv6: Remove unnecessary clean thinker.li
@ 2024-02-02 8:21 ` thinker.li
2024-02-04 10:29 ` Hangbin Liu
2024-02-05 4:45 ` David Ahern
2024-02-02 8:21 ` [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set thinker.li
2024-02-02 8:22 ` [PATCH net-next v3 5/5] selftests/net: Adding test cases of replacing routes and route advertisements thinker.li
4 siblings, 2 replies; 18+ messages in thread
From: thinker.li @ 2024-02-02 8:21 UTC (permalink / raw)
To: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng, Kui-Feng Lee
From: Kui-Feng Lee <thinker.li@gmail.com>
FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
include/net/ip6_fib.h | 35 +++++++++++++++++++++++++-
net/ipv6/addrconf.c | 50 +++++++++++++++++++++++++++++++------
net/ipv6/ip6_fib.c | 58 +++++++++++++++++++++++++++++++++++++++----
net/ipv6/ndisc.c | 11 +++++++-
net/ipv6/route.c | 14 +++++++++--
5 files changed, 152 insertions(+), 16 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 360b12e61850..fc581aa7802f 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -173,6 +173,9 @@ struct fib6_info {
refcount_t fib6_ref;
unsigned long expires;
+
+ struct hlist_node gc_link;
+
struct dst_metrics *fib6_metrics;
#define fib6_pmtu fib6_metrics->metrics[RTAX_MTU-1]
@@ -241,12 +244,18 @@ static inline bool fib6_requires_src(const struct fib6_info *rt)
return rt->fib6_src.plen > 0;
}
+/* The callers should hold f6i->fib6_table->tb6_lock if a route has ever
+ * been added to a table before.
+ */
static inline void fib6_clean_expires(struct fib6_info *f6i)
{
f6i->fib6_flags &= ~RTF_EXPIRES;
f6i->expires = 0;
}
+/* The callers should hold f6i->fib6_table->tb6_lock if a route has ever
+ * been added to a table before.
+ */
static inline void fib6_set_expires(struct fib6_info *f6i,
unsigned long expires)
{
@@ -327,8 +336,10 @@ static inline bool fib6_info_hold_safe(struct fib6_info *f6i)
static inline void fib6_info_release(struct fib6_info *f6i)
{
- if (f6i && refcount_dec_and_test(&f6i->fib6_ref))
+ if (f6i && refcount_dec_and_test(&f6i->fib6_ref)) {
+ DEBUG_NET_WARN_ON_ONCE(!hlist_unhashed(&f6i->gc_link));
call_rcu(&f6i->rcu, fib6_info_destroy_rcu);
+ }
}
enum fib6_walk_state {
@@ -382,6 +393,7 @@ struct fib6_table {
struct inet_peer_base tb6_peers;
unsigned int flags;
unsigned int fib_seq;
+ struct hlist_head tb6_gc_hlist; /* GC candidates */
#define RT6_TABLE_HAS_DFLT_ROUTER BIT(0)
};
@@ -498,6 +510,27 @@ void fib6_gc_cleanup(void);
int fib6_init(void);
+/* Add the route to the gc list if it is not already there
+ *
+ * The callers should hold f6i->fib6_table->tb6_lock and make sure the
+ * route is on a table.
+ */
+static inline void fib6_add_gc_list(struct fib6_info *f6i)
+{
+ if (hlist_unhashed(&f6i->gc_link))
+ hlist_add_head(&f6i->gc_link, &f6i->fib6_table->tb6_gc_hlist);
+}
+
+/* Remove the route from the gc list if it is on the list.
+ *
+ * The callers should hold f6i->fib6_table->tb6_lock.
+ */
+static inline void fib6_remove_gc_list(struct fib6_info *f6i)
+{
+ if (!hlist_unhashed(&f6i->gc_link))
+ hlist_del_init(&f6i->gc_link);
+}
+
struct ipv6_route_iter {
struct seq_net_private p;
struct fib6_walker w;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 733ace18806c..36bfa987c314 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1255,6 +1255,7 @@ static void
cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires,
bool del_rt, bool del_peer)
{
+ struct fib6_table *table;
struct fib6_info *f6i;
f6i = addrconf_get_prefix_route(del_peer ? &ifp->peer_addr : &ifp->addr,
@@ -1264,8 +1265,18 @@ cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires,
if (del_rt)
ip6_del_rt(dev_net(ifp->idev->dev), f6i, false);
else {
- if (!(f6i->fib6_flags & RTF_EXPIRES))
+ if (!(f6i->fib6_flags & RTF_EXPIRES)) {
+ table = f6i->fib6_table;
+ spin_lock_bh(&table->tb6_lock);
fib6_set_expires(f6i, expires);
+ /* If fib6_node is null, the f6i is just
+ * removed from the table.
+ */
+ if (rcu_dereference_protected(f6i->fib6_node,
+ lockdep_is_held(&table->tb6_lock)))
+ fib6_add_gc_list(f6i);
+ spin_unlock_bh(&table->tb6_lock);
+ }
fib6_info_release(f6i);
}
}
@@ -2706,6 +2717,7 @@ EXPORT_SYMBOL_GPL(addrconf_prefix_rcv_add_addr);
void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
{
struct prefix_info *pinfo;
+ struct fib6_table *table;
__u32 valid_lft;
__u32 prefered_lft;
int addr_type, err;
@@ -2782,11 +2794,23 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
if (valid_lft == 0) {
ip6_del_rt(net, rt, false);
rt = NULL;
- } else if (addrconf_finite_timeout(rt_expires)) {
- /* not infinity */
- fib6_set_expires(rt, jiffies + rt_expires);
} else {
- fib6_clean_expires(rt);
+ table = rt->fib6_table;
+ spin_lock_bh(&table->tb6_lock);
+ if (addrconf_finite_timeout(rt_expires)) {
+ /* not infinity */
+ fib6_set_expires(rt, jiffies + rt_expires);
+ /* If fib6_node is null, the f6i is
+ * just removed from the table.
+ */
+ if (rcu_dereference_protected(rt->fib6_node,
+ lockdep_is_held(&table->tb6_lock)))
+ fib6_add_gc_list(rt);
+ } else {
+ fib6_clean_expires(rt);
+ fib6_remove_gc_list(rt);
+ }
+ spin_unlock_bh(&table->tb6_lock);
}
} else if (valid_lft) {
clock_t expires = 0;
@@ -4741,6 +4765,7 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
unsigned long expires, u32 flags,
bool modify_peer)
{
+ struct fib6_table *table;
struct fib6_info *f6i;
u32 prio;
@@ -4761,10 +4786,21 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
ifp->rt_priority, ifp->idev->dev,
expires, flags, GFP_KERNEL);
} else {
- if (!expires)
+ table = f6i->fib6_table;
+ spin_lock_bh(&table->tb6_lock);
+ if (!expires) {
fib6_clean_expires(f6i);
- else
+ fib6_remove_gc_list(f6i);
+ } else {
fib6_set_expires(f6i, expires);
+ /* If fib6_node is null, the f6i is just removed
+ * from the table.
+ */
+ if (rcu_dereference_protected(f6i->fib6_node,
+ lockdep_is_held(&table->tb6_lock)))
+ fib6_add_gc_list(f6i);
+ }
+ spin_unlock_bh(&table->tb6_lock);
fib6_info_release(f6i);
}
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 38a0348b1d17..d53dc519d317 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -160,6 +160,8 @@ struct fib6_info *fib6_info_alloc(gfp_t gfp_flags, bool with_fib6_nh)
INIT_LIST_HEAD(&f6i->fib6_siblings);
refcount_set(&f6i->fib6_ref, 1);
+ INIT_HLIST_NODE(&f6i->gc_link);
+
return f6i;
}
@@ -246,6 +248,7 @@ static struct fib6_table *fib6_alloc_table(struct net *net, u32 id)
net->ipv6.fib6_null_entry);
table->tb6_root.fn_flags = RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
inet_peer_base_init(&table->tb6_peers);
+ INIT_HLIST_HEAD(&table->tb6_gc_hlist);
}
return table;
@@ -1055,6 +1058,9 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn,
lockdep_is_held(&table->tb6_lock));
}
}
+
+ fib6_clean_expires(rt);
+ fib6_remove_gc_list(rt);
}
/*
@@ -1115,10 +1121,13 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
rt->fib6_nsiblings = 0;
if (!(iter->fib6_flags & RTF_EXPIRES))
return -EEXIST;
- if (!(rt->fib6_flags & RTF_EXPIRES))
+ if (!(rt->fib6_flags & RTF_EXPIRES)) {
fib6_clean_expires(iter);
- else
+ fib6_remove_gc_list(iter);
+ } else {
fib6_set_expires(iter, rt->expires);
+ fib6_add_gc_list(iter);
+ }
if (rt->fib6_pmtu)
fib6_metric_set(iter, RTAX_MTU,
@@ -1477,6 +1486,10 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt,
if (rt->nh)
list_add(&rt->nh_list, &rt->nh->f6i_list);
__fib6_update_sernum_upto_root(rt, fib6_new_sernum(info->nl_net));
+
+ if (rt->fib6_flags & RTF_EXPIRES)
+ fib6_add_gc_list(rt);
+
fib6_start_gc(info->nl_net, rt);
}
@@ -2280,9 +2293,8 @@ static void fib6_flush_trees(struct net *net)
* Garbage collection
*/
-static int fib6_age(struct fib6_info *rt, void *arg)
+static int fib6_age(struct fib6_info *rt, struct fib6_gc_args *gc_args)
{
- struct fib6_gc_args *gc_args = arg;
unsigned long now = jiffies;
/*
@@ -2307,6 +2319,40 @@ static int fib6_age(struct fib6_info *rt, void *arg)
return 0;
}
+static void fib6_gc_table(struct net *net,
+ struct fib6_table *tb6,
+ struct fib6_gc_args *gc_args)
+{
+ struct fib6_info *rt;
+ struct hlist_node *n;
+ struct nl_info info = {
+ .nl_net = net,
+ .skip_notify = false,
+ };
+
+ hlist_for_each_entry_safe(rt, n, &tb6->tb6_gc_hlist, gc_link)
+ if (fib6_age(rt, gc_args) == -1)
+ fib6_del(rt, &info);
+}
+
+static void fib6_gc_all(struct net *net, struct fib6_gc_args *gc_args)
+{
+ struct fib6_table *table;
+ struct hlist_head *head;
+ unsigned int h;
+
+ rcu_read_lock();
+ for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
+ head = &net->ipv6.fib_table_hash[h];
+ hlist_for_each_entry_rcu(table, head, tb6_hlist) {
+ spin_lock_bh(&table->tb6_lock);
+ fib6_gc_table(net, table, gc_args);
+ spin_unlock_bh(&table->tb6_lock);
+ }
+ }
+ rcu_read_unlock();
+}
+
void fib6_run_gc(unsigned long expires, struct net *net, bool force)
{
struct fib6_gc_args gc_args;
@@ -2322,7 +2368,7 @@ void fib6_run_gc(unsigned long expires, struct net *net, bool force)
net->ipv6.sysctl.ip6_rt_gc_interval;
gc_args.more = 0;
- fib6_clean_all(net, fib6_age, &gc_args);
+ fib6_gc_all(net, &gc_args);
now = jiffies;
net->ipv6.ip6_rt_last_gc = now;
@@ -2382,6 +2428,7 @@ static int __net_init fib6_net_init(struct net *net)
net->ipv6.fib6_main_tbl->tb6_root.fn_flags =
RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
inet_peer_base_init(&net->ipv6.fib6_main_tbl->tb6_peers);
+ INIT_HLIST_HEAD(&net->ipv6.fib6_main_tbl->tb6_gc_hlist);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
net->ipv6.fib6_local_tbl = kzalloc(sizeof(*net->ipv6.fib6_local_tbl),
@@ -2394,6 +2441,7 @@ static int __net_init fib6_net_init(struct net *net)
net->ipv6.fib6_local_tbl->tb6_root.fn_flags =
RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
inet_peer_base_init(&net->ipv6.fib6_local_tbl->tb6_peers);
+ INIT_HLIST_HEAD(&net->ipv6.fib6_local_tbl->tb6_gc_hlist);
#endif
fib6_tables_init(net);
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index a68462668158..5ca9fd4f7945 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1410,8 +1410,17 @@ static enum skb_drop_reason ndisc_router_discovery(struct sk_buff *skb)
inet6_rt_notify(RTM_NEWROUTE, rt, &nlinfo, NLM_F_REPLACE);
}
- if (rt)
+ if (rt) {
+ spin_lock_bh(&rt->fib6_table->tb6_lock);
fib6_set_expires(rt, jiffies + (HZ * lifetime));
+ /* If fib6_node is null, the f6i is just removed from the
+ * table.
+ */
+ if (rcu_dereference_protected(rt->fib6_node,
+ lockdep_is_held(&rt->fib6_table->tb6_lock)))
+ fib6_add_gc_list(rt);
+ spin_unlock_bh(&rt->fib6_table->tb6_lock);
+ }
if (in6_dev->cnf.accept_ra_min_hop_limit < 256 &&
ra_msg->icmph.icmp6_hop_limit) {
if (in6_dev->cnf.accept_ra_min_hop_limit <= ra_msg->icmph.icmp6_hop_limit) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index dd6ff5b20918..cfaf226ecf98 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -989,10 +989,20 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len,
(rt->fib6_flags & ~RTF_PREF_MASK) | RTF_PREF(pref);
if (rt) {
- if (!addrconf_finite_timeout(lifetime))
+ spin_lock_bh(&rt->fib6_table->tb6_lock);
+ if (!addrconf_finite_timeout(lifetime)) {
fib6_clean_expires(rt);
- else
+ fib6_remove_gc_list(rt);
+ } else {
fib6_set_expires(rt, jiffies + HZ * lifetime);
+ /* If fib6_node is null, the f6i is just removed
+ * from the table.
+ */
+ if (rcu_dereference_protected(rt->fib6_node,
+ lockdep_is_held(&rt->fib6_table->tb6_lock)))
+ fib6_add_gc_list(rt);
+ }
+ spin_unlock_bh(&rt->fib6_table->tb6_lock);
fib6_info_release(rt);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set.
2024-02-02 8:21 [PATCH net-next v3 0/5] Remove expired routes with a separated list of routes thinker.li
` (2 preceding siblings ...)
2024-02-02 8:21 ` [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes thinker.li
@ 2024-02-02 8:21 ` thinker.li
2024-02-02 12:16 ` Hangbin Liu
2024-02-02 8:22 ` [PATCH net-next v3 5/5] selftests/net: Adding test cases of replacing routes and route advertisements thinker.li
4 siblings, 1 reply; 18+ messages in thread
From: thinker.li @ 2024-02-02 8:21 UTC (permalink / raw)
To: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng, Kui-Feng Lee
From: Kui-Feng Lee <thinker.li@gmail.com>
Make the decision to set or clean the expires of a route based on the
RTF_EXPIRES flag, rather than the value of the "expires" argument.
The function inet6_addr_modify() is the only caller of
modify_prefix_route(), and it passes the RTF_EXPIRES flag and an expiration
value. The RTF_EXPIRES flag is turned on or off based on the value of
valid_lft. The RTF_EXPIRES flag is turned on if valid_lft is a finite value
(not infinite, not 0xffffffff). Even if valid_lft is 0, the RTF_EXPIRES
flag remains on. The expiration value being passed is equal to the
valid_lft value if the flag is on. However, if the valid_lft value is
infinite, the expiration value becomes 0 and the RTF_EXPIRES flag is turned
off. Despite this, modify_prefix_route() decides to set the expiration
value if the received expiration value is not zero. This mixing of infinite
and zero cases creates an inconsistency.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
net/ipv6/addrconf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 36bfa987c314..2f6cf6314646 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4788,7 +4788,7 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
} else {
table = f6i->fib6_table;
spin_lock_bh(&table->tb6_lock);
- if (!expires) {
+ if (!(flags & RTF_EXPIRES)) {
fib6_clean_expires(f6i);
fib6_remove_gc_list(f6i);
} else {
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v3 5/5] selftests/net: Adding test cases of replacing routes and route advertisements.
2024-02-02 8:21 [PATCH net-next v3 0/5] Remove expired routes with a separated list of routes thinker.li
` (3 preceding siblings ...)
2024-02-02 8:21 ` [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set thinker.li
@ 2024-02-02 8:22 ` thinker.li
2024-02-02 12:37 ` Hangbin Liu
4 siblings, 1 reply; 18+ messages in thread
From: thinker.li @ 2024-02-02 8:22 UTC (permalink / raw)
To: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng, Kui-Feng Lee
From: Kui-Feng Lee <thinker.li@gmail.com>
Add tests of changing permanent routes to temporary routes and the reversed
case to make sure GC working correctly in these cases. Add tests for the
temporary routes from RA.
The existing device will be deleted between tests to remove all routes
associated with it, so that the earlier tests don't mess up the later ones.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
tools/testing/selftests/net/fib_tests.sh | 148 +++++++++++++++++++----
1 file changed, 126 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index b3ecccbbfcd2..b983462e2819 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -743,6 +743,43 @@ fib_notify_test()
cleanup &> /dev/null
}
+# Create a new dummy_10 to remove all associated routes.
+reset_dummy_10()
+{
+ $IP link del dev dummy_10
+
+ $IP link add dummy_10 type dummy
+ $IP link set dev dummy_10 up
+ $IP -6 address add 2001:10::1/64 dev dummy_10
+}
+
+check_rt_num()
+{
+ local expected=$1
+ local num=$2
+
+ if [ $num -ne $expected ]; then
+ echo "FAIL: Expected $expected routes, got $num"
+ ret=1
+ else
+ ret=0
+ fi
+}
+
+check_rt_num_clean()
+{
+ local expected=$1
+ local num=$2
+
+ if [ $num -ne $expected ]; then
+ log_test 1 0 "expected $expected routes, got $num"
+ set +e
+ cleanup &> /dev/null
+ return 1
+ fi
+ return 0
+}
+
fib6_gc_test()
{
setup
@@ -751,7 +788,7 @@ fib6_gc_test()
echo "Fib6 garbage collection test"
set -e
- EXPIRE=3
+ EXPIRE=5
# Check expiration of routes every $EXPIRE seconds (GC)
$NS_EXEC sysctl -wq net.ipv6.route.gc_interval=$EXPIRE
@@ -763,44 +800,111 @@ fib6_gc_test()
$NS_EXEC sysctl -wq net.ipv6.route.flush=1
# Temporary routes
- for i in $(seq 1 1000); do
+ for i in $(seq 1 5); do
# Expire route after $EXPIRE seconds
$IP -6 route add 2001:20::$i \
via 2001:10::2 dev dummy_10 expires $EXPIRE
done
- sleep $(($EXPIRE * 2))
- N_EXP_SLEEP=$($IP -6 route list |grep expires|wc -l)
- if [ $N_EXP_SLEEP -ne 0 ]; then
- echo "FAIL: expected 0 routes with expires, got $N_EXP_SLEEP"
- ret=1
- else
- ret=0
- fi
+ sleep $(($EXPIRE * 2 + 1))
+ check_rt_num 0 $($IP -6 route list |grep expires|wc -l)
+ log_test $ret 0 "ipv6 route garbage collection"
+
+ reset_dummy_10
# Permanent routes
- for i in $(seq 1 5000); do
+ for i in $(seq 1 5); do
$IP -6 route add 2001:30::$i \
via 2001:10::2 dev dummy_10
done
# Temporary routes
- for i in $(seq 1 1000); do
+ for i in $(seq 1 5); do
# Expire route after $EXPIRE seconds
$IP -6 route add 2001:20::$i \
via 2001:10::2 dev dummy_10 expires $EXPIRE
done
- sleep $(($EXPIRE * 2))
- N_EXP_SLEEP=$($IP -6 route list |grep expires|wc -l)
- if [ $N_EXP_SLEEP -ne 0 ]; then
- echo "FAIL: expected 0 routes with expires," \
- "got $N_EXP_SLEEP (5000 permanent routes)"
- ret=1
- else
- ret=0
+ sleep $(($EXPIRE * 2 + 1))
+ check_rt_num 0 $($IP -6 route list |grep expires|wc -l)
+ log_test $ret 0 "ipv6 route garbage collection (with permanent routes)"
+
+ reset_dummy_10
+
+ # Permanent routes
+ for i in $(seq 1 5); do
+ $IP -6 route add 2001:20::$i \
+ via 2001:10::2 dev dummy_10
+ done
+ # Replace with temporary routes
+ for i in $(seq 1 5); do
+ # Expire route after $EXPIRE seconds
+ $IP -6 route replace 2001:20::$i \
+ via 2001:10::2 dev dummy_10 expires $EXPIRE
+ done
+ check_rt_num_clean 5 $($IP -6 route list |grep expires|wc -l) || return
+ # Wait for GC
+ sleep $(($EXPIRE * 2 + 1))
+ check_rt_num 0 $($IP -6 route list |grep expires|wc -l)
+ log_test $ret 0 "ipv6 route garbage collection (replace with expires)"
+
+ reset_dummy_10
+
+ # Temporary routes
+ for i in $(seq 1 5); do
+ # Expire route after $EXPIRE seconds
+ $IP -6 route add 2001:20::$i \
+ via 2001:10::2 dev dummy_10 expires $EXPIRE
+ done
+ # Replace with permanent routes
+ for i in $(seq 1 5); do
+ $IP -6 route replace 2001:20::$i \
+ via 2001:10::2 dev dummy_10
+ done
+ check_rt_num_clean 0 $($IP -6 route list |grep expires|wc -l) || return
+
+ # Wait for GC
+ sleep $(($EXPIRE * 2 + 1))
+
+ check_rt_num 5 $($IP -6 route list |grep -v expires|grep 2001:20::|wc -l)
+ log_test $ret 0 "ipv6 route garbage collection (replace with permanent)"
+
+ # ra6 is required for the next test. (ipv6toolkit)
+ if [ ! -x "$(command -v ra6)" ]; then
+ echo "SKIP: ra6 not found."
+ set +e
+ cleanup &> /dev/null
+ return
fi
- set +e
+ # Delete dummy_10 and remove all routes
+ $IP link del dev dummy_10
- log_test $ret 0 "ipv6 route garbage collection"
+ # Create a pair of veth devices to send a RA message from one
+ # device to another.
+ $IP link add veth1 type veth peer name veth2
+ $IP link set dev veth1 up
+ $IP link set dev veth2 up
+ $IP -6 address add 2001:10::1/64 dev veth1 nodad
+ $IP -6 address add 2001:10::2/64 dev veth2 nodad
+
+ # Make veth1 ready to receive RA messages.
+ $NS_EXEC sysctl -wq net.ipv6.conf.veth1.accept_ra=2
+
+ # Send a RA message with a route from veth2 to veth1.
+ $NS_EXEC ra6 -i veth2 -d 2001:10::1 -t $EXPIRE
+
+ # Wait for the RA message.
+ sleep 1
+
+ # systemd may mess up the test. You syould make sure that
+ # systemd-networkd.service and systemd-networkd.socket are stopped.
+ check_rt_num_clean 1 $($IP -6 route list|grep expires|wc -l) || return
+
+ # Wait for GC
+ sleep $(($EXPIRE * 2 + 1))
+
+ check_rt_num 0 $($IP -6 route list |grep expires|wc -l)
+ log_test $ret 0 "ipv6 route garbage collection (RA message)"
+
+ set +e
cleanup &> /dev/null
}
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router().
2024-02-02 8:21 ` [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router() thinker.li
@ 2024-02-02 11:52 ` Hangbin Liu
2024-02-05 4:42 ` David Ahern
1 sibling, 0 replies; 18+ messages in thread
From: Hangbin Liu @ 2024-02-02 11:52 UTC (permalink / raw)
To: thinker.li
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, sinquersw, kuifeng
On Fri, Feb 02, 2024 at 12:21:56AM -0800, thinker.li@gmail.com wrote:
> From: Kui-Feng Lee <thinker.li@gmail.com>
>
> Pass the duration of a lifetime (in seconds) to the function
> rt6_add_dflt_router() so that it can properly set the expiration time.
>
> The function ndisc_router_discovery() is the only one that calls
> rt6_add_dflt_router(), and it will later set the expiration time for the
> route created by rt6_add_dflt_router(). However, there is a gap of time
> between calling rt6_add_dflt_router() and setting the expiration time in
> ndisc_router_discovery(). During this period, there is a possibility that a
> new route may be removed from the routing table. By setting the correct
> expiration time in rt6_add_dflt_router(), we can prevent this from
> happening. The reason for setting RTF_EXPIRES in rt6_add_dflt_router() is
> to start the Garbage Collection (GC) timer, as it only activates when a
> route with RTF_EXPIRES is added to a table.
>
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
> include/net/ip6_route.h | 3 ++-
> net/ipv6/ndisc.c | 3 ++-
> net/ipv6/route.c | 4 +++-
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index 28b065790261..52a51c69aa9d 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -170,7 +170,8 @@ struct fib6_info *rt6_get_dflt_router(struct net *net,
> struct fib6_info *rt6_add_dflt_router(struct net *net,
> const struct in6_addr *gwaddr,
> struct net_device *dev, unsigned int pref,
> - u32 defrtr_usr_metric);
> + u32 defrtr_usr_metric,
> + int lifetime);
>
> void rt6_purge_dflt_routers(struct net *net);
>
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index a19999b30bc0..a68462668158 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -1382,7 +1382,8 @@ static enum skb_drop_reason ndisc_router_discovery(struct sk_buff *skb)
> neigh_release(neigh);
>
> rt = rt6_add_dflt_router(net, &ipv6_hdr(skb)->saddr,
> - skb->dev, pref, defrtr_usr_metric);
> + skb->dev, pref, defrtr_usr_metric,
> + lifetime);
> if (!rt) {
> ND_PRINTK(0, err,
> "RA: %s failed to add default route\n",
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 63b4c6056582..98abba8f15cd 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4355,7 +4355,8 @@ struct fib6_info *rt6_add_dflt_router(struct net *net,
> const struct in6_addr *gwaddr,
> struct net_device *dev,
> unsigned int pref,
> - u32 defrtr_usr_metric)
> + u32 defrtr_usr_metric,
> + int lifetime)
> {
> struct fib6_config cfg = {
> .fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT,
> @@ -4368,6 +4369,7 @@ struct fib6_info *rt6_add_dflt_router(struct net *net,
> .fc_nlinfo.portid = 0,
> .fc_nlinfo.nlh = NULL,
> .fc_nlinfo.nl_net = net,
> + .fc_expires = jiffies_to_clock_t(lifetime * HZ),
> };
>
> cfg.fc_gateway = *gwaddr;
> --
> 2.34.1
>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set.
2024-02-02 8:21 ` [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set thinker.li
@ 2024-02-02 12:16 ` Hangbin Liu
2024-02-02 17:57 ` Kui-Feng Lee
0 siblings, 1 reply; 18+ messages in thread
From: Hangbin Liu @ 2024-02-02 12:16 UTC (permalink / raw)
To: thinker.li
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, sinquersw, kuifeng
On Fri, Feb 02, 2024 at 12:21:59AM -0800, thinker.li@gmail.com wrote:
> From: Kui-Feng Lee <thinker.li@gmail.com>
>
> Make the decision to set or clean the expires of a route based on the
> RTF_EXPIRES flag, rather than the value of the "expires" argument.
>
> The function inet6_addr_modify() is the only caller of
> modify_prefix_route(), and it passes the RTF_EXPIRES flag and an expiration
> value. The RTF_EXPIRES flag is turned on or off based on the value of
> valid_lft. The RTF_EXPIRES flag is turned on if valid_lft is a finite value
> (not infinite, not 0xffffffff). Even if valid_lft is 0, the RTF_EXPIRES
> flag remains on. The expiration value being passed is equal to the
> valid_lft value if the flag is on. However, if the valid_lft value is
> infinite, the expiration value becomes 0 and the RTF_EXPIRES flag is turned
> off. Despite this, modify_prefix_route() decides to set the expiration
> value if the received expiration value is not zero. This mixing of infinite
> and zero cases creates an inconsistency.
>
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
> net/ipv6/addrconf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 36bfa987c314..2f6cf6314646 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -4788,7 +4788,7 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
> } else {
> table = f6i->fib6_table;
> spin_lock_bh(&table->tb6_lock);
> - if (!expires) {
> + if (!(flags & RTF_EXPIRES)) {
Hi Kui-Feng,
I may missed something. But I still could not get why we shouldn't use
expires for checking? If expires == 0, but RTF_EXPIRES is on,
shouldn't we call fib6_clean_expires()?
Thanks
Hangbin
> fib6_clean_expires(f6i);
> fib6_remove_gc_list(f6i);
> } else {
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 5/5] selftests/net: Adding test cases of replacing routes and route advertisements.
2024-02-02 8:22 ` [PATCH net-next v3 5/5] selftests/net: Adding test cases of replacing routes and route advertisements thinker.li
@ 2024-02-02 12:37 ` Hangbin Liu
0 siblings, 0 replies; 18+ messages in thread
From: Hangbin Liu @ 2024-02-02 12:37 UTC (permalink / raw)
To: thinker.li
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, sinquersw, kuifeng
On Fri, Feb 02, 2024 at 12:22:00AM -0800, thinker.li@gmail.com wrote:
> From: Kui-Feng Lee <thinker.li@gmail.com>
>
> Add tests of changing permanent routes to temporary routes and the reversed
> case to make sure GC working correctly in these cases. Add tests for the
> temporary routes from RA.
>
> The existing device will be deleted between tests to remove all routes
> associated with it, so that the earlier tests don't mess up the later ones.
>
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set.
2024-02-02 12:16 ` Hangbin Liu
@ 2024-02-02 17:57 ` Kui-Feng Lee
2024-02-04 10:17 ` Hangbin Liu
0 siblings, 1 reply; 18+ messages in thread
From: Kui-Feng Lee @ 2024-02-02 17:57 UTC (permalink / raw)
To: Hangbin Liu, thinker.li
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, kuifeng
On 2/2/24 04:16, Hangbin Liu wrote:
> On Fri, Feb 02, 2024 at 12:21:59AM -0800, thinker.li@gmail.com wrote:
>> From: Kui-Feng Lee <thinker.li@gmail.com>
>>
>> Make the decision to set or clean the expires of a route based on the
>> RTF_EXPIRES flag, rather than the value of the "expires" argument.
>>
>> The function inet6_addr_modify() is the only caller of
>> modify_prefix_route(), and it passes the RTF_EXPIRES flag and an expiration
>> value. The RTF_EXPIRES flag is turned on or off based on the value of
>> valid_lft. The RTF_EXPIRES flag is turned on if valid_lft is a finite value
>> (not infinite, not 0xffffffff). Even if valid_lft is 0, the RTF_EXPIRES
>> flag remains on. The expiration value being passed is equal to the
>> valid_lft value if the flag is on. However, if the valid_lft value is
>> infinite, the expiration value becomes 0 and the RTF_EXPIRES flag is turned
>> off. Despite this, modify_prefix_route() decides to set the expiration
>> value if the received expiration value is not zero. This mixing of infinite
>> and zero cases creates an inconsistency.
>>
>> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
>> ---
>> net/ipv6/addrconf.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 36bfa987c314..2f6cf6314646 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -4788,7 +4788,7 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
>> } else {
>> table = f6i->fib6_table;
>> spin_lock_bh(&table->tb6_lock);
>> - if (!expires) {
>> + if (!(flags & RTF_EXPIRES)) {
>
> Hi Kui-Feng,
>
> I may missed something. But I still could not get why we shouldn't use
> expires for checking? If expires == 0, but RTF_EXPIRES is on,
> shouldn't we call fib6_clean_expires()?
The case that expires == 0 and RTF_EXPIES is on never happens since
inet6_addr_modify() rejects valid_lft == 0 at the beginning. This
patch doesn't make difference logically, but make inet6_addr_modify()
and modify_prefix_route() consistent.
Does that make sense to you?
>
> Thanks
> Hangbin
>> fib6_clean_expires(f6i);
>> fib6_remove_gc_list(f6i);
>> } else {
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set.
2024-02-02 17:57 ` Kui-Feng Lee
@ 2024-02-04 10:17 ` Hangbin Liu
2024-02-05 18:59 ` Kui-Feng Lee
0 siblings, 1 reply; 18+ messages in thread
From: Hangbin Liu @ 2024-02-04 10:17 UTC (permalink / raw)
To: Kui-Feng Lee
Cc: thinker.li, netdev, ast, martin.lau, kernel-team, davem, dsahern,
edumazet, kuba, pabeni, kuifeng
On Fri, Feb 02, 2024 at 09:57:46AM -0800, Kui-Feng Lee wrote:
> > Hi Kui-Feng,
> >
> > I may missed something. But I still could not get why we shouldn't use
> > expires for checking? If expires == 0, but RTF_EXPIRES is on,
> > shouldn't we call fib6_clean_expires()?
>
>
> The case that expires == 0 and RTF_EXPIES is on never happens since
> inet6_addr_modify() rejects valid_lft == 0 at the beginning. This
> patch doesn't make difference logically, but make inet6_addr_modify()
> and modify_prefix_route() consistent.
>
> Does that make sense to you?
Thanks, this does make sense to me. If there will be a new version. It would
be good to add the following sentence in the description.
"""
This patch doesn't make difference logically, but make inet6_addr_modify()
and modify_prefix_route() consistent.
"""
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Regards
Hangbin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes.
2024-02-02 8:21 ` [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes thinker.li
@ 2024-02-04 10:29 ` Hangbin Liu
2024-02-04 16:10 ` David Ahern
2024-02-05 4:45 ` David Ahern
1 sibling, 1 reply; 18+ messages in thread
From: Hangbin Liu @ 2024-02-04 10:29 UTC (permalink / raw)
To: thinker.li
Cc: netdev, ast, martin.lau, kernel-team, davem, dsahern, edumazet,
kuba, pabeni, sinquersw, kuifeng
On Fri, Feb 02, 2024 at 12:21:58AM -0800, thinker.li@gmail.com wrote:
> From: Kui-Feng Lee <thinker.li@gmail.com>
>
> FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
> can be expensive if the number of routes in a table is big, even if most of
> them are permanent. Checking routes in a separated list of routes having
> expiration will avoid this potential issue.
>
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
FYI, I will wait for David Ahern to help review this patch.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes.
2024-02-04 10:29 ` Hangbin Liu
@ 2024-02-04 16:10 ` David Ahern
0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2024-02-04 16:10 UTC (permalink / raw)
To: Hangbin Liu, thinker.li
Cc: netdev, ast, martin.lau, kernel-team, davem, edumazet, kuba,
pabeni, sinquersw, kuifeng
On 2/4/24 3:29 AM, Hangbin Liu wrote:
> FYI, I will wait for David Ahern to help review this patch.
I will get to it later today
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router().
2024-02-02 8:21 ` [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router() thinker.li
2024-02-02 11:52 ` Hangbin Liu
@ 2024-02-05 4:42 ` David Ahern
1 sibling, 0 replies; 18+ messages in thread
From: David Ahern @ 2024-02-05 4:42 UTC (permalink / raw)
To: thinker.li, netdev, ast, martin.lau, kernel-team, davem, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng
On 2/2/24 1:21 AM, thinker.li@gmail.com wrote:
> From: Kui-Feng Lee <thinker.li@gmail.com>
>
> Pass the duration of a lifetime (in seconds) to the function
> rt6_add_dflt_router() so that it can properly set the expiration time.
>
> The function ndisc_router_discovery() is the only one that calls
> rt6_add_dflt_router(), and it will later set the expiration time for the
> route created by rt6_add_dflt_router(). However, there is a gap of time
> between calling rt6_add_dflt_router() and setting the expiration time in
> ndisc_router_discovery(). During this period, there is a possibility that a
> new route may be removed from the routing table. By setting the correct
> expiration time in rt6_add_dflt_router(), we can prevent this from
> happening. The reason for setting RTF_EXPIRES in rt6_add_dflt_router() is
> to start the Garbage Collection (GC) timer, as it only activates when a
> route with RTF_EXPIRES is added to a table.
>
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
> include/net/ip6_route.h | 3 ++-
> net/ipv6/ndisc.c | 3 ++-
> net/ipv6/route.c | 4 +++-
> 3 files changed, 7 insertions(+), 3 deletions(-)
>
Suggested-by: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 2/5] net/ipv6: Remove unnecessary clean.
2024-02-02 8:21 ` [PATCH net-next v3 2/5] net/ipv6: Remove unnecessary clean thinker.li
@ 2024-02-05 4:42 ` David Ahern
0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2024-02-05 4:42 UTC (permalink / raw)
To: thinker.li, netdev, ast, martin.lau, kernel-team, davem, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng
On 2/2/24 1:21 AM, thinker.li@gmail.com wrote:
> From: Kui-Feng Lee <thinker.li@gmail.com>
>
> The route here is newly created. It is unnecessary to call
> fib6_clean_expires() on it.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
> ---
> net/ipv6/route.c | 2 --
> 1 file changed, 2 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes.
2024-02-02 8:21 ` [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes thinker.li
2024-02-04 10:29 ` Hangbin Liu
@ 2024-02-05 4:45 ` David Ahern
2024-02-05 15:24 ` David Ahern
1 sibling, 1 reply; 18+ messages in thread
From: David Ahern @ 2024-02-05 4:45 UTC (permalink / raw)
To: thinker.li, netdev, ast, martin.lau, kernel-team, davem, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng
On 2/2/24 1:21 AM, thinker.li@gmail.com wrote:
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 733ace18806c..36bfa987c314 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1255,6 +1255,7 @@ static void
> cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires,
> bool del_rt, bool del_peer)
> {
> + struct fib6_table *table;
> struct fib6_info *f6i;
>
> f6i = addrconf_get_prefix_route(del_peer ? &ifp->peer_addr : &ifp->addr,
addrconf_get_prefix_route walks the table, so you know it is already
there ...
> @@ -1264,8 +1265,18 @@ cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires,
> if (del_rt)
> ip6_del_rt(dev_net(ifp->idev->dev), f6i, false);
> else {
> - if (!(f6i->fib6_flags & RTF_EXPIRES))
> + if (!(f6i->fib6_flags & RTF_EXPIRES)) {
> + table = f6i->fib6_table;
> + spin_lock_bh(&table->tb6_lock);
> fib6_set_expires(f6i, expires);
> + /* If fib6_node is null, the f6i is just
> + * removed from the table.
> + */
> + if (rcu_dereference_protected(f6i->fib6_node,
... meaning this check should not be needed
> + lockdep_is_held(&table->tb6_lock)))
> + fib6_add_gc_list(f6i);
> + spin_unlock_bh(&table->tb6_lock);
> + }
> fib6_info_release(f6i);
> }
> }
> @@ -2706,6 +2717,7 @@ EXPORT_SYMBOL_GPL(addrconf_prefix_rcv_add_addr);
> void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
> {
> struct prefix_info *pinfo;
> + struct fib6_table *table;
> __u32 valid_lft;
> __u32 prefered_lft;
> int addr_type, err;
> @@ -2782,11 +2794,23 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
> if (valid_lft == 0) {
> ip6_del_rt(net, rt, false);
> rt = NULL;
> - } else if (addrconf_finite_timeout(rt_expires)) {
> - /* not infinity */
> - fib6_set_expires(rt, jiffies + rt_expires);
> } else {
> - fib6_clean_expires(rt);
> + table = rt->fib6_table;
> + spin_lock_bh(&table->tb6_lock);
when it comes to locking, I prefer the lock and unlock lines to *pop* -
meaning newline on both sides so it is clear and stands out.
> + if (addrconf_finite_timeout(rt_expires)) {
> + /* not infinity */
> + fib6_set_expires(rt, jiffies + rt_expires);
> + /* If fib6_node is null, the f6i is
> + * just removed from the table.
> + */
> + if (rcu_dereference_protected(rt->fib6_node,
similarly here, this code is entered because rt is set based on
addrconf_get_prefix_route.
> + lockdep_is_held(&table->tb6_lock)))
> + fib6_add_gc_list(rt);
> + } else {
> + fib6_clean_expires(rt);
> + fib6_remove_gc_list(rt);
> + }
> + spin_unlock_bh(&table->tb6_lock);
> }
> } else if (valid_lft) {
> clock_t expires = 0;
> @@ -4741,6 +4765,7 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
> unsigned long expires, u32 flags,
> bool modify_peer)
> {
> + struct fib6_table *table;
> struct fib6_info *f6i;
> u32 prio;
>
> @@ -4761,10 +4786,21 @@ static int modify_prefix_route(struct inet6_ifaddr *ifp,
> ifp->rt_priority, ifp->idev->dev,
> expires, flags, GFP_KERNEL);
> } else {
> - if (!expires)
> + table = f6i->fib6_table;
> + spin_lock_bh(&table->tb6_lock);
> + if (!expires) {
> fib6_clean_expires(f6i);
> - else
> + fib6_remove_gc_list(f6i);
> + } else {
> fib6_set_expires(f6i, expires);
> + /* If fib6_node is null, the f6i is just removed
> + * from the table.
> + */
> + if (rcu_dereference_protected(f6i->fib6_node,
and here as well. f6i is set based on a table lookup.
> + lockdep_is_held(&table->tb6_lock)))
> + fib6_add_gc_list(f6i);
> + }
> + spin_unlock_bh(&table->tb6_lock);
>
> fib6_info_release(f6i);
> }
...
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index a68462668158..5ca9fd4f7945 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -1410,8 +1410,17 @@ static enum skb_drop_reason ndisc_router_discovery(struct sk_buff *skb)
> inet6_rt_notify(RTM_NEWROUTE, rt, &nlinfo, NLM_F_REPLACE);
> }
>
> - if (rt)
> + if (rt) {
> + spin_lock_bh(&rt->fib6_table->tb6_lock);
> fib6_set_expires(rt, jiffies + (HZ * lifetime));
> + /* If fib6_node is null, the f6i is just removed from the
> + * table.
> + */
How about:
/* If fib6_node is NULL, the route was removed between
* the rt6_get_dflt_router or rt6_add_dflt_router calls
* above and here.
*/
> + if (rcu_dereference_protected(rt->fib6_node,> + lockdep_is_held(&rt->fib6_table->tb6_lock)))
> + fib6_add_gc_list(rt);
> + spin_unlock_bh(&rt->fib6_table->tb6_lock);
> + }
> if (in6_dev->cnf.accept_ra_min_hop_limit < 256 &&
> ra_msg->icmph.icmp6_hop_limit) {
> if (in6_dev->cnf.accept_ra_min_hop_limit <= ra_msg->icmph.icmp6_hop_limit) {
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index dd6ff5b20918..cfaf226ecf98 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -989,10 +989,20 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len,
> (rt->fib6_flags & ~RTF_PREF_MASK) | RTF_PREF(pref);
>
> if (rt) {
> - if (!addrconf_finite_timeout(lifetime))
> + spin_lock_bh(&rt->fib6_table->tb6_lock);
> + if (!addrconf_finite_timeout(lifetime)) {
> fib6_clean_expires(rt);
> - else
> + fib6_remove_gc_list(rt);
> + } else {
> fib6_set_expires(rt, jiffies + HZ * lifetime);
> + /* If fib6_node is null, the f6i is just removed
> + * from the table.
> + */
Similarly, enhance the comment:
/* If fib6_node is NULL, the route was removed
* between the get or add calls above and here.
*/
> + if (rcu_dereference_protected(rt->fib6_node,
> + lockdep_is_held(&rt->fib6_table->tb6_lock)))
> + fib6_add_gc_list(rt);
> + }
> + spin_unlock_bh(&rt->fib6_table->tb6_lock);
>
> fib6_info_release(rt);
> }
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes.
2024-02-05 4:45 ` David Ahern
@ 2024-02-05 15:24 ` David Ahern
0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2024-02-05 15:24 UTC (permalink / raw)
To: thinker.li, netdev, ast, martin.lau, kernel-team, davem, edumazet,
kuba, pabeni, liuhangbin
Cc: sinquersw, kuifeng
On 2/4/24 9:45 PM, David Ahern wrote:
>> @@ -1264,8 +1265,18 @@ cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires,
>> if (del_rt)
>> ip6_del_rt(dev_net(ifp->idev->dev), f6i, false);
>> else {
>> - if (!(f6i->fib6_flags & RTF_EXPIRES))
>> + if (!(f6i->fib6_flags & RTF_EXPIRES)) {
>> + table = f6i->fib6_table;
>> + spin_lock_bh(&table->tb6_lock);
>> fib6_set_expires(f6i, expires);
>> + /* If fib6_node is null, the f6i is just
>> + * removed from the table.
>> + */
>> + if (rcu_dereference_protected(f6i->fib6_node,
>
> ... meaning this check should not be needed
reviewing this patch again this morning, and yes, I believe this check
is needed here and other places. Given that all of the instances check
if the route entry is still in the table, it should be consolidated into
fb6_add_gc_list.
>
>> + lockdep_is_held(&table->tb6_lock)))
>> + fib6_add_gc_list(f6i);
>> + spin_unlock_bh(&table->tb6_lock);
>> + }
>> fib6_info_release(f6i);
>> }
>> }
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set.
2024-02-04 10:17 ` Hangbin Liu
@ 2024-02-05 18:59 ` Kui-Feng Lee
0 siblings, 0 replies; 18+ messages in thread
From: Kui-Feng Lee @ 2024-02-05 18:59 UTC (permalink / raw)
To: Hangbin Liu
Cc: thinker.li, netdev, ast, martin.lau, kernel-team, davem, dsahern,
edumazet, kuba, pabeni, kuifeng
On 2/4/24 02:17, Hangbin Liu wrote:
> On Fri, Feb 02, 2024 at 09:57:46AM -0800, Kui-Feng Lee wrote:
>>> Hi Kui-Feng,
>>>
>>> I may missed something. But I still could not get why we shouldn't use
>>> expires for checking? If expires == 0, but RTF_EXPIRES is on,
>>> shouldn't we call fib6_clean_expires()?
>>
>>
>> The case that expires == 0 and RTF_EXPIES is on never happens since
>> inet6_addr_modify() rejects valid_lft == 0 at the beginning. This
>> patch doesn't make difference logically, but make inet6_addr_modify()
>> and modify_prefix_route() consistent.
>>
>> Does that make sense to you?
>
> Thanks, this does make sense to me. If there will be a new version. It would
> be good to add the following sentence in the description.
>
> """
> This patch doesn't make difference logically, but make inet6_addr_modify()
> and modify_prefix_route() consistent.
> """
>
Sure, I will add it to the commit message.
> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
>
> Regards
> Hangbin
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-02-05 18:59 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-02 8:21 [PATCH net-next v3 0/5] Remove expired routes with a separated list of routes thinker.li
2024-02-02 8:21 ` [PATCH net-next v3 1/5] net/ipv6: set expires in rt6_add_dflt_router() thinker.li
2024-02-02 11:52 ` Hangbin Liu
2024-02-05 4:42 ` David Ahern
2024-02-02 8:21 ` [PATCH net-next v3 2/5] net/ipv6: Remove unnecessary clean thinker.li
2024-02-05 4:42 ` David Ahern
2024-02-02 8:21 ` [PATCH net-next v3 3/5] net/ipv6: Remove expired routes with a separated list of routes thinker.li
2024-02-04 10:29 ` Hangbin Liu
2024-02-04 16:10 ` David Ahern
2024-02-05 4:45 ` David Ahern
2024-02-05 15:24 ` David Ahern
2024-02-02 8:21 ` [PATCH net-next v3 4/5] net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set thinker.li
2024-02-02 12:16 ` Hangbin Liu
2024-02-02 17:57 ` Kui-Feng Lee
2024-02-04 10:17 ` Hangbin Liu
2024-02-05 18:59 ` Kui-Feng Lee
2024-02-02 8:22 ` [PATCH net-next v3 5/5] selftests/net: Adding test cases of replacing routes and route advertisements thinker.li
2024-02-02 12:37 ` Hangbin Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).