* [PATCH net-next 0/3] dst_cache: cope with device removal
@ 2024-05-30 17:18 Paolo Abeni
2024-05-30 17:21 ` Paolo Abeni
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Paolo Abeni @ 2024-05-30 17:18 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski
Eric reported a net device refcount leak and diagnosed the root cause
as the dst_cache not coping well with the underlying device removal.
To address such issue, this series introduces the infrastructure to let
the existing uncached list handle the relevant cleanup.
Patch 1 and 2 are preparation changes to make the uncached list infra
more flexible for the new use-case, and patch 3 addresses the issue.
---
Targeting net-next as the addressed problem is quite ancient and I fear
some unexpected side effects for patch 2.
Paolo Abeni (3):
ipv6: use a new flag to indicate elevated refcount.
ipv4: obsolete routes moved out of per cpu cache
dst_cache: let rt_uncached cope with dst_cache cleanup
include/net/ip6_fib.h | 3 +++
net/core/dst_cache.c | 8 ++++++++
net/ipv4/route.c | 2 +-
net/ipv6/route.c | 4 ++--
4 files changed, 14 insertions(+), 3 deletions(-)
--
2.43.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next 0/3] dst_cache: cope with device removal
2024-05-30 17:18 [PATCH net-next 0/3] dst_cache: cope with device removal Paolo Abeni
@ 2024-05-30 17:21 ` Paolo Abeni
2024-05-30 17:21 ` [PATCH net-next 1/3] ipv6: use a new flag to indicate elevated refcount Paolo Abeni
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2024-05-30 17:21 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet, David Ahern, Jakub Kicinski
Eric reported a net device refcount leak and diagnosed the root cause
as the dst_cache not coping well with the underlying device removal.
To address such issue, this series introduces the infrastructure to let
the existing uncached list handle the relevant cleanup.
Patch 1 and 2 are preparation changes to make the uncached list infra
more flexible for the new use-case, and patch 3 addresses the issue.
---
Targeting net-next as the addressed problem is quite ancient and I fear
some unexpected side effects for patch 2.
Paolo Abeni (3):
ipv6: use a new flag to indicate elevated refcount.
ipv4: obsolete routes moved out of per cpu cache
dst_cache: let rt_uncached cope with dst_cache cleanup
include/net/ip6_fib.h | 3 +++
net/core/dst_cache.c | 8 ++++++++
net/ipv4/route.c | 2 +-
net/ipv6/route.c | 4 ++--
4 files changed, 14 insertions(+), 3 deletions(-)
--
2.43.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next 1/3] ipv6: use a new flag to indicate elevated refcount.
2024-05-30 17:18 [PATCH net-next 0/3] dst_cache: cope with device removal Paolo Abeni
2024-05-30 17:21 ` Paolo Abeni
@ 2024-05-30 17:21 ` Paolo Abeni
2024-05-30 17:21 ` [PATCH net-next 2/3] ipv4: obsolete routes moved out of per cpu cache Paolo Abeni
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2024-05-30 17:21 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet, David Ahern, Jakub Kicinski
ip6_pol_route() can return a dst entry with elevated reference count
even when the caller ask for the RT6_LOOKUP_F_DST_NOREF flag.
Currently the caller uses the rt_uncached list entry field to detect
such scenario: the reference is elevated only for entry in the uncached
list.
Soon we are going to insert in the uncached list even entry held by
the dst_cache(s), potentially fooling the above check and causing
reference underflow.
To avoid such issue, introduce and use a new field to mark the entries
with refcount elevated. No functional change intended.
Before:
pahole -EC rt6_info
/* size: 224, cachelines: 4, members: 9 */
/* sum members: 218, holes: 1, sum holes: 4 */
After:
pahole: -EC rt6_info
/* size: 224, cachelines: 4, members: 10 */
/* sum members: 219, holes: 1, sum holes: 4 */
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/net/ip6_fib.h | 3 +++
net/ipv6/route.c | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 6cb867ce4878..eb997af5523c 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -216,6 +216,9 @@ struct rt6_info {
/* more non-fragment space at head required */
unsigned short rt6i_nfheader_len;
+
+ /* route lookup always acquires a reference */
+ bool rt6i_count_held;
};
struct fib6_result {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bbc2a0dd9314..3b729ab86c55 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2251,6 +2251,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
* this refcnt is always returned to the caller even
* if caller sets RT6_LOOKUP_F_DST_NOREF flag.
*/
+ rt->rt6i_count_held = true;
rt6_uncached_list_add(rt);
rcu_read_unlock();
@@ -2648,8 +2649,7 @@ struct dst_entry *ip6_route_output_flags(struct net *net,
rcu_read_lock();
dst = ip6_route_output_flags_noref(net, sk, fl6, flags);
rt6 = dst_rt6_info(dst);
- /* For dst cached in uncached_list, refcnt is already taken. */
- if (list_empty(&rt6->dst.rt_uncached) && !dst_hold_safe(dst)) {
+ if (!rt6->rt6i_count_held && !dst_hold_safe(dst)) {
dst = &net->ipv6.ip6_null_entry->dst;
dst_hold(dst);
}
--
2.43.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 2/3] ipv4: obsolete routes moved out of per cpu cache
2024-05-30 17:18 [PATCH net-next 0/3] dst_cache: cope with device removal Paolo Abeni
2024-05-30 17:21 ` Paolo Abeni
2024-05-30 17:21 ` [PATCH net-next 1/3] ipv6: use a new flag to indicate elevated refcount Paolo Abeni
@ 2024-05-30 17:21 ` Paolo Abeni
2024-05-30 17:21 ` [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup Paolo Abeni
2024-05-30 17:38 ` [PATCH net-next 0/3] dst_cache: cope with device removal Eric Dumazet
4 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2024-05-30 17:21 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet, David Ahern, Jakub Kicinski
When a new entry replaces an existing one in the next hop per cpu cache,
the old entry is added to the uncached list.
Soon we are going to insert in the uncached list even entries held by
the dst_cache(s), the above could cause double add.
Avoid the potential issue obsoleting the old entry instead. This
additionally make the stack more consistent with ipv6, as the latter
already calls dst_dev_put() when replacing per cpu cached entries.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/ipv4/route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5fd54103174f..506452f1395d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1473,7 +1473,7 @@ static bool rt_cache_route(struct fib_nh_common *nhc, struct rtable *rt)
prev = cmpxchg(p, orig, rt);
if (prev == orig) {
if (orig) {
- rt_add_uncached_list(orig);
+ dst_dev_put(&orig->dst);
dst_release(&orig->dst);
}
} else {
--
2.43.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup
2024-05-30 17:18 [PATCH net-next 0/3] dst_cache: cope with device removal Paolo Abeni
` (2 preceding siblings ...)
2024-05-30 17:21 ` [PATCH net-next 2/3] ipv4: obsolete routes moved out of per cpu cache Paolo Abeni
@ 2024-05-30 17:21 ` Paolo Abeni
2024-05-30 17:46 ` Eric Dumazet
` (2 more replies)
2024-05-30 17:38 ` [PATCH net-next 0/3] dst_cache: cope with device removal Eric Dumazet
4 siblings, 3 replies; 9+ messages in thread
From: Paolo Abeni @ 2024-05-30 17:21 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet, David Ahern, Jakub Kicinski
Eric reported that dst_cache don't cope correctly with device removal,
keeping the cached dst unmodified even when the underlining device is
deleted and the dst itself is not uncached.
The above causes the infamous 'unregistering netdevice' hangup.
Address the issue by adding each entry held by the dst_caches to the
'uncached' list, so that the dst core will cleanup the device reference
at device removal time.
Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 911362c70df5 ("net: add dst_cache support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/core/dst_cache.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 6a0482e676d3..d1cb852d5748 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -11,6 +11,7 @@
#include <net/route.h>
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ip6_fib.h>
+#include <net/ip6_route.h>
#endif
#include <uapi/linux/in.h>
@@ -28,6 +29,7 @@ static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
struct dst_entry *dst, u32 cookie)
{
dst_release(dst_cache->dst);
+
if (dst)
dst_hold(dst);
@@ -98,6 +100,9 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
idst = this_cpu_ptr(dst_cache->cache);
dst_cache_per_cpu_dst_set(idst, dst, 0);
+ if (dst && list_empty(&dst->rt_uncached))
+ rt_add_uncached_list(dst_rtable(dst));
+
idst->in_saddr.s_addr = saddr;
}
EXPORT_SYMBOL_GPL(dst_cache_set_ip4);
@@ -114,6 +119,9 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
idst = this_cpu_ptr(dst_cache->cache);
dst_cache_per_cpu_dst_set(idst, dst,
rt6_get_cookie(dst_rt6_info(dst)));
+ if (dst && list_empty(&dst->rt_uncached))
+ rt6_uncached_list_add(dst_rt6_info(dst));
+
idst->in6_saddr = *saddr;
}
EXPORT_SYMBOL_GPL(dst_cache_set_ip6);
--
2.43.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 0/3] dst_cache: cope with device removal
2024-05-30 17:18 [PATCH net-next 0/3] dst_cache: cope with device removal Paolo Abeni
` (3 preceding siblings ...)
2024-05-30 17:21 ` [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup Paolo Abeni
@ 2024-05-30 17:38 ` Eric Dumazet
4 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-05-30 17:38 UTC (permalink / raw)
To: Paolo Abeni; +Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski
On Thu, May 30, 2024 at 7:21 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Eric reported a net device refcount leak and diagnosed the root cause
> as the dst_cache not coping well with the underlying device removal.
>
> To address such issue, this series introduces the infrastructure to let
> the existing uncached list handle the relevant cleanup.
>
> Patch 1 and 2 are preparation changes to make the uncached list infra
> more flexible for the new use-case, and patch 3 addresses the issue.
>
> ---
> Targeting net-next as the addressed problem is quite ancient and I fear
> some unexpected side effects for patch 2.
Thanks Paolo, I am going to test this ASAP.
BTW I found suspect dst_cache uses from lwtunnel in the output path.
AFAIK lwtunnel_output() does not block BH.
Either we change lwtunnel_output() or replace some of the ->output
methods to use local_bh_disable() ?
If BH is already held, I do not think
preempt_disable()/preempt_enable(); pairs are necessary.
diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
index 7563f8c6aa87cf9f7841ee78dcea2a16f60ac344..bf7120ecea1ebe834e70073710be0c1692d7ad1d
100644
--- a/net/ipv6/ioam6_iptunnel.c
+++ b/net/ipv6/ioam6_iptunnel.c
@@ -351,9 +351,9 @@ static int ioam6_output(struct net *net, struct
sock *sk, struct sk_buff *skb)
goto drop;
if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
- preempt_disable();
+ local_bh_disable();
dst = dst_cache_get(&ilwt->cache);
- preempt_enable();
+ local_bh_enable();
if (unlikely(!dst)) {
struct ipv6hdr *hdr = ipv6_hdr(skb);
@@ -373,9 +373,9 @@ static int ioam6_output(struct net *net, struct
sock *sk, struct sk_buff *skb)
goto drop;
}
- preempt_disable();
+ local_bh_disable();
dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
- preempt_enable();
+ local_bh_enable();
}
skb_dst_drop(skb);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup
2024-05-30 17:21 ` [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup Paolo Abeni
@ 2024-05-30 17:46 ` Eric Dumazet
2024-05-31 10:53 ` kernel test robot
2024-05-31 17:32 ` kernel test robot
2 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-05-30 17:46 UTC (permalink / raw)
To: Paolo Abeni; +Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski
On Thu, May 30, 2024 at 7:21 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Eric reported that dst_cache don't cope correctly with device removal,
> keeping the cached dst unmodified even when the underlining device is
> deleted and the dst itself is not uncached.
>
> The above causes the infamous 'unregistering netdevice' hangup.
>
> Address the issue by adding each entry held by the dst_caches to the
> 'uncached' list, so that the dst core will cleanup the device reference
> at device removal time.
>
> Reported-by: Eric Dumazet <edumazet@google.com>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Fixes: 911362c70df5 ("net: add dst_cache support")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/core/dst_cache.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
> index 6a0482e676d3..d1cb852d5748 100644
> --- a/net/core/dst_cache.c
> +++ b/net/core/dst_cache.c
> @@ -11,6 +11,7 @@
> #include <net/route.h>
> #if IS_ENABLED(CONFIG_IPV6)
> #include <net/ip6_fib.h>
> +#include <net/ip6_route.h>
> #endif
> #include <uapi/linux/in.h>
>
> @@ -28,6 +29,7 @@ static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
> struct dst_entry *dst, u32 cookie)
> {
> dst_release(dst_cache->dst);
> +
> if (dst)
> dst_hold(dst);
>
> @@ -98,6 +100,9 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
>
> idst = this_cpu_ptr(dst_cache->cache);
> dst_cache_per_cpu_dst_set(idst, dst, 0);
> + if (dst && list_empty(&dst->rt_uncached))
> + rt_add_uncached_list(dst_rtable(dst));
> +
> idst->in_saddr.s_addr = saddr;
> }
> EXPORT_SYMBOL_GPL(dst_cache_set_ip4);
> @@ -114,6 +119,9 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
> idst = this_cpu_ptr(dst_cache->cache);
> dst_cache_per_cpu_dst_set(idst, dst,
> rt6_get_cookie(dst_rt6_info(dst)));
> + if (dst && list_empty(&dst->rt_uncached))
> + rt6_uncached_list_add(dst_rt6_info(dst));
This probably will not compile if CONFIG_IPV6=m ?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup
2024-05-30 17:21 ` [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup Paolo Abeni
2024-05-30 17:46 ` Eric Dumazet
@ 2024-05-31 10:53 ` kernel test robot
2024-05-31 17:32 ` kernel test robot
2 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2024-05-31 10:53 UTC (permalink / raw)
To: Paolo Abeni, netdev
Cc: oe-kbuild-all, Eric Dumazet, David Ahern, Jakub Kicinski
Hi Paolo,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/ipv6-use-a-new-flag-to-indicate-elevated-refcount/20240531-012716
base: net-next/main
patch link: https://lore.kernel.org/r/cd710487a34149654a5ff73a8c0df9b1d3fc73a9.1717087015.git.pabeni%40redhat.com
patch subject: [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup
config: arm64-defconfig (https://download.01.org/0day-ci/archive/20240531/202405311808.vqBTwxEf-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240531/202405311808.vqBTwxEf-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405311808.vqBTwxEf-lkp@intel.com/
All errors (new ones prefixed by >>):
aarch64-linux-ld: Unexpected GOT/PLT entries detected!
aarch64-linux-ld: Unexpected run-time procedure linkages detected!
aarch64-linux-ld: net/core/dst_cache.o: in function `dst_cache_set_ip6':
>> net/core/dst_cache.c:123:(.text+0x240): undefined reference to `rt6_uncached_list_add'
vim +123 net/core/dst_cache.c
109
110 #if IS_ENABLED(CONFIG_IPV6)
111 void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
112 const struct in6_addr *saddr)
113 {
114 struct dst_cache_pcpu *idst;
115
116 if (!dst_cache->cache)
117 return;
118
119 idst = this_cpu_ptr(dst_cache->cache);
120 dst_cache_per_cpu_dst_set(idst, dst,
121 rt6_get_cookie(dst_rt6_info(dst)));
122 if (dst && list_empty(&dst->rt_uncached))
> 123 rt6_uncached_list_add(dst_rt6_info(dst));
124
125 idst->in6_saddr = *saddr;
126 }
127 EXPORT_SYMBOL_GPL(dst_cache_set_ip6);
128
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup
2024-05-30 17:21 ` [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup Paolo Abeni
2024-05-30 17:46 ` Eric Dumazet
2024-05-31 10:53 ` kernel test robot
@ 2024-05-31 17:32 ` kernel test robot
2 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2024-05-31 17:32 UTC (permalink / raw)
To: Paolo Abeni, netdev
Cc: oe-kbuild-all, Eric Dumazet, David Ahern, Jakub Kicinski
Hi Paolo,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/ipv6-use-a-new-flag-to-indicate-elevated-refcount/20240531-012716
base: net-next/main
patch link: https://lore.kernel.org/r/cd710487a34149654a5ff73a8c0df9b1d3fc73a9.1717087015.git.pabeni%40redhat.com
patch subject: [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup
config: m68k-defconfig (https://download.01.org/0day-ci/archive/20240601/202406010139.PSZGIJkQ-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240601/202406010139.PSZGIJkQ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406010139.PSZGIJkQ-lkp@intel.com/
All errors (new ones prefixed by >>):
m68k-linux-ld: net/core/dst_cache.o: in function `dst_cache_set_ip6':
>> dst_cache.c:(.text+0x21e): undefined reference to `rt6_uncached_list_add'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-05-31 17:33 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-30 17:18 [PATCH net-next 0/3] dst_cache: cope with device removal Paolo Abeni
2024-05-30 17:21 ` Paolo Abeni
2024-05-30 17:21 ` [PATCH net-next 1/3] ipv6: use a new flag to indicate elevated refcount Paolo Abeni
2024-05-30 17:21 ` [PATCH net-next 2/3] ipv4: obsolete routes moved out of per cpu cache Paolo Abeni
2024-05-30 17:21 ` [PATCH net-next 3/3] dst_cache: let rt_uncached cope with dst_cache cleanup Paolo Abeni
2024-05-30 17:46 ` Eric Dumazet
2024-05-31 10:53 ` kernel test robot
2024-05-31 17:32 ` kernel test robot
2024-05-30 17:38 ` [PATCH net-next 0/3] dst_cache: cope with device removal Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).