* [PATCH net-next 1/4] net: no longer assume RTNL is held in flush_all_backlogs()
2025-01-07 17:38 [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Eric Dumazet
@ 2025-01-07 17:38 ` Eric Dumazet
2025-01-07 17:38 ` [PATCH net-next 2/4] net: no longer hold RTNL while calling flush_all_backlogs() Eric Dumazet
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2025-01-07 17:38 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Simon Horman, eric.dumazet, Eric Dumazet
flush_all_backlogs() uses per-cpu data to hold its
temporary daya, on the assumption it is called under RTNL
protection.
Following patch in the series will break this assumption.
Use instead a dynamically allocated piece of memory.
In the unlikely case the allocation fails,
use a boot-time allocated memory.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 53 +++++++++++++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 18 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 073f682a9653a212198b12bae17fafe7b46f96e9..9e1eb272e4feaf40dc87defd54d691634e0902e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5961,8 +5961,6 @@ void netif_receive_skb_list(struct list_head *head)
}
EXPORT_SYMBOL(netif_receive_skb_list);
-static DEFINE_PER_CPU(struct work_struct, flush_works);
-
/* Network device is going away, flush any packets still pending */
static void flush_backlog(struct work_struct *work)
{
@@ -6019,36 +6017,54 @@ static bool flush_required(int cpu)
return true;
}
+struct flush_backlogs {
+ cpumask_t flush_cpus;
+ struct work_struct w[];
+};
+
+static struct flush_backlogs *flush_backlogs_alloc(void)
+{
+ return kmalloc(struct_size_t(struct flush_backlogs, w, nr_cpu_ids),
+ GFP_KERNEL);
+}
+
+static struct flush_backlogs *flush_backlogs_fallback;
+static DEFINE_MUTEX(flush_backlogs_mutex);
+
static void flush_all_backlogs(void)
{
- static cpumask_t flush_cpus;
+ struct flush_backlogs *ptr = flush_backlogs_alloc();
unsigned int cpu;
- /* since we are under rtnl lock protection we can use static data
- * for the cpumask and avoid allocating on stack the possibly
- * large mask
- */
- ASSERT_RTNL();
+ if (!ptr) {
+ mutex_lock(&flush_backlogs_mutex);
+ ptr = flush_backlogs_fallback;
+ }
+ cpumask_clear(&ptr->flush_cpus);
cpus_read_lock();
- cpumask_clear(&flush_cpus);
for_each_online_cpu(cpu) {
if (flush_required(cpu)) {
- queue_work_on(cpu, system_highpri_wq,
- per_cpu_ptr(&flush_works, cpu));
- cpumask_set_cpu(cpu, &flush_cpus);
+ INIT_WORK(&ptr->w[cpu], flush_backlog);
+ queue_work_on(cpu, system_highpri_wq, &ptr->w[cpu]);
+ __cpumask_set_cpu(cpu, &ptr->flush_cpus);
}
}
/* we can have in flight packet[s] on the cpus we are not flushing,
* synchronize_net() in unregister_netdevice_many() will take care of
- * them
+ * them.
*/
- for_each_cpu(cpu, &flush_cpus)
- flush_work(per_cpu_ptr(&flush_works, cpu));
+ for_each_cpu(cpu, &ptr->flush_cpus)
+ flush_work(&ptr->w[cpu]);
cpus_read_unlock();
+
+ if (ptr != flush_backlogs_fallback)
+ kfree(ptr);
+ else
+ mutex_unlock(&flush_backlogs_mutex);
}
static void net_rps_send_ipi(struct softnet_data *remsd)
@@ -12237,12 +12253,13 @@ static int __init net_dev_init(void)
* Initialise the packet receive queues.
*/
+ flush_backlogs_fallback = flush_backlogs_alloc();
+ if (!flush_backlogs_fallback)
+ goto out;
+
for_each_possible_cpu(i) {
- struct work_struct *flush = per_cpu_ptr(&flush_works, i);
struct softnet_data *sd = &per_cpu(softnet_data, i);
- INIT_WORK(flush, flush_backlog);
-
skb_queue_head_init(&sd->input_pkt_queue);
skb_queue_head_init(&sd->process_queue);
#ifdef CONFIG_XFRM_OFFLOAD
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH net-next 2/4] net: no longer hold RTNL while calling flush_all_backlogs()
2025-01-07 17:38 [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Eric Dumazet
2025-01-07 17:38 ` [PATCH net-next 1/4] net: no longer assume RTNL is held in flush_all_backlogs() Eric Dumazet
@ 2025-01-07 17:38 ` Eric Dumazet
2025-01-07 17:38 ` [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net() Eric Dumazet
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2025-01-07 17:38 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Simon Horman, eric.dumazet, Eric Dumazet
flush_all_backlogs() is called from unregister_netdevice_many_notify()
as part of netdevice dismantles.
This is currently called under RTNL, and can last up to 20ms on busy hosts.
There is no reason to block RTNL at this stage.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 9e1eb272e4feaf40dc87defd54d691634e0902e5..ef6426aad84dc00740a1716c8fd4cfd48ee17cf3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11526,8 +11526,10 @@ void unregister_netdevice_many_notify(struct list_head *head,
unlist_netdevice(dev);
WRITE_ONCE(dev->reg_state, NETREG_UNREGISTERING);
}
- flush_all_backlogs();
+ __rtnl_unlock();
+ flush_all_backlogs();
+ rtnl_lock();
synchronize_net();
list_for_each_entry(dev, head, unreg_list) {
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net()
2025-01-07 17:38 [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Eric Dumazet
2025-01-07 17:38 ` [PATCH net-next 1/4] net: no longer assume RTNL is held in flush_all_backlogs() Eric Dumazet
2025-01-07 17:38 ` [PATCH net-next 2/4] net: no longer hold RTNL while calling flush_all_backlogs() Eric Dumazet
@ 2025-01-07 17:38 ` Eric Dumazet
2025-01-09 17:42 ` kernel test robot
2025-01-09 18:14 ` kernel test robot
2025-01-07 17:38 ` [PATCH net-next 4/4] net: reduce RTNL hold duration in unregister_netdevice_many_notify() Eric Dumazet
2025-01-07 20:11 ` [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Jakub Kicinski
4 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2025-01-07 17:38 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Simon Horman, eric.dumazet, Eric Dumazet
cleanup_net() is the single thread responsible
for netns dismantles, and a potential bottleneck.
Before we can get per-netns RTNL, make sure
all synchronize_net() called from this thread
are using rcu_synchronize_expedited().
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/net_namespace.h | 2 ++
net/core/dev.c | 2 +-
net/core/net_namespace.c | 5 +++++
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5a2a0df8ad91b677b515b392869c6c755be5c868..a3009bdd7efec0a3b665cbf51c159c323458410a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -565,4 +565,6 @@ void net_ns_init(void);
static inline void net_ns_init(void) {}
#endif
+extern struct task_struct *cleanup_net_task;
+
#endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ef6426aad84dc00740a1716c8fd4cfd48ee17cf3..342ab7d6001da8983db450f50327fc7915b0a8ba 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11423,7 +11423,7 @@ EXPORT_SYMBOL_GPL(alloc_netdev_dummy);
void synchronize_net(void)
{
might_sleep();
- if (rtnl_is_locked())
+ if (current == cleanup_net_task || rtnl_is_locked())
synchronize_rcu_expedited();
else
synchronize_rcu();
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index b5cd3ae4f04cf28d43f8401a3dafebac4a297123..cb39a12b2f8295c605f08b5589932932150a1644 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -588,6 +588,8 @@ static void unhash_nsid(struct net *net, struct net *last)
static LLIST_HEAD(cleanup_list);
+struct task_struct *cleanup_net_task;
+
static void cleanup_net(struct work_struct *work)
{
const struct pernet_operations *ops;
@@ -596,6 +598,8 @@ static void cleanup_net(struct work_struct *work)
LIST_HEAD(net_exit_list);
LIST_HEAD(dev_kill_list);
+ cleanup_net_task = current;
+
/* Atomically snapshot the list of namespaces to cleanup */
net_kill_list = llist_del_all(&cleanup_list);
@@ -670,6 +674,7 @@ static void cleanup_net(struct work_struct *work)
put_user_ns(net->user_ns);
net_free(net);
}
+ cleanup_net_task = NULL;
}
/**
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net()
2025-01-07 17:38 ` [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net() Eric Dumazet
@ 2025-01-09 17:42 ` kernel test robot
2025-01-09 18:14 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-01-09 17:42 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: oe-kbuild-all, netdev, Simon Horman, eric.dumazet, Eric Dumazet
Hi Eric,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-no-longer-assume-RTNL-is-held-in-flush_all_backlogs/20250108-014049
base: net-next/main
patch link: https://lore.kernel.org/r/20250107173838.1130187-4-edumazet%40google.com
patch subject: [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net()
config: mips-bmips_be_defconfig (https://download.01.org/0day-ci/archive/20250110/202501100127.LhRwhYMs-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250110/202501100127.LhRwhYMs-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501100127.LhRwhYMs-lkp@intel.com/
All errors (new ones prefixed by >>):
mips-linux-ld: net/core/dev.o: in function `synchronize_net':
dev.c:(.text+0x2250): undefined reference to `cleanup_net_task'
>> mips-linux-ld: dev.c:(.text+0x225c): undefined reference to `cleanup_net_task'
mips-linux-ld: net/core/dev.o: in function `free_netdev':
dev.c:(.text+0x5824): undefined reference to `cleanup_net_task'
mips-linux-ld: net/core/dev.o: in function `netdev_rx_handler_unregister':
dev.c:(.text+0x6204): undefined reference to `cleanup_net_task'
mips-linux-ld: dev.c:(.text+0x620c): undefined reference to `cleanup_net_task'
mips-linux-ld: net/core/dev.o:dev.c:(.text+0x6764): more undefined references to `cleanup_net_task' follow
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net()
2025-01-07 17:38 ` [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net() Eric Dumazet
2025-01-09 17:42 ` kernel test robot
@ 2025-01-09 18:14 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-01-09 18:14 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: oe-kbuild-all, netdev, Simon Horman, eric.dumazet, Eric Dumazet
Hi Eric,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-no-longer-assume-RTNL-is-held-in-flush_all_backlogs/20250108-014049
base: net-next/main
patch link: https://lore.kernel.org/r/20250107173838.1130187-4-edumazet%40google.com
patch subject: [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net()
config: powerpc-fsp2_defconfig (https://download.01.org/0day-ci/archive/20250110/202501100222.pLf6muKs-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250110/202501100222.pLf6muKs-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501100222.pLf6muKs-lkp@intel.com/
All errors (new ones prefixed by >>):
powerpc-linux-ld: net/core/dev.o: in function `synchronize_net':
net/core/dev.c:11426:(.text+0x1be2): undefined reference to `cleanup_net_task'
>> powerpc-linux-ld: net/core/dev.c:11426:(.text+0x1be6): undefined reference to `cleanup_net_task'
powerpc-linux-ld: net/core/dev.c:11426:(.text+0x68ce): undefined reference to `cleanup_net_task'
powerpc-linux-ld: net/core/dev.c:11426:(.text+0x68d2): undefined reference to `cleanup_net_task'
powerpc-linux-ld: net/core/dev.o: in function `free_netdev':
net/core/dev.c:11372:(.text+0x6ae2): undefined reference to `cleanup_net_task'
powerpc-linux-ld: net/core/dev.o:net/core/dev.c:11372: more undefined references to `cleanup_net_task' follow
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next 4/4] net: reduce RTNL hold duration in unregister_netdevice_many_notify()
2025-01-07 17:38 [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Eric Dumazet
` (2 preceding siblings ...)
2025-01-07 17:38 ` [PATCH net-next 3/4] net: expedite synchronize_net() for cleanup_net() Eric Dumazet
@ 2025-01-07 17:38 ` Eric Dumazet
2025-01-07 20:11 ` [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Jakub Kicinski
4 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2025-01-07 17:38 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Simon Horman, eric.dumazet, Eric Dumazet
Two synchronize_net() calls are currently done while holding RTNL.
This is source of RTNL contention in workloads adding and deleting
many network namespaces per second, because synchronize_rcu()
and synchronize_rcu_expedited() can use 10+ ms in some cases.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 342ab7d6001da8983db450f50327fc7915b0a8ba..9e93b13b9a76bd256d93d05a13d21dca883d6ab8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11529,8 +11529,8 @@ void unregister_netdevice_many_notify(struct list_head *head,
__rtnl_unlock();
flush_all_backlogs();
- rtnl_lock();
synchronize_net();
+ rtnl_lock();
list_for_each_entry(dev, head, unreg_list) {
struct sk_buff *skb = NULL;
@@ -11590,7 +11590,9 @@ void unregister_netdevice_many_notify(struct list_head *head,
#endif
}
+ __rtnl_unlock();
synchronize_net();
+ rtnl_lock();
list_for_each_entry(dev, head, unreg_list) {
netdev_put(dev, &dev->dev_registered_tracker);
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice()
2025-01-07 17:38 [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Eric Dumazet
` (3 preceding siblings ...)
2025-01-07 17:38 ` [PATCH net-next 4/4] net: reduce RTNL hold duration in unregister_netdevice_many_notify() Eric Dumazet
@ 2025-01-07 20:11 ` Jakub Kicinski
2025-01-07 20:22 ` Eric Dumazet
4 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2025-01-07 20:11 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, Simon Horman, eric.dumazet
On Tue, 7 Jan 2025 17:38:34 +0000 Eric Dumazet wrote:
> One major source of RTNL contention resides in unregister_netdevice()
>
> Due to RCU protection of various network structures, and
> unregister_netdevice() being a synchronous function,
> it is calling potentially slow functions while holding RTNL.
>
> I think we can release RTNL in two points, so that three
> slow functions are called while RTNL can be used
> by other threads.
I think we'll need:
diff --git a/net/devlink/port.c b/net/devlink/port.c
index 939081a0e615..cdfa22453a55 100644
--- a/net/devlink/port.c
+++ b/net/devlink/port.c
@@ -1311,6 +1311,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
__devlink_port_type_set(devlink_port, devlink_port->type,
netdev);
break;
+ case NETDEV_UNREGISTERING:
case NETDEV_UNREGISTER:
if (devlink_net(devlink) != dev_net(netdev))
return NOTIFY_OK;
There is no other way to speed things up? Use RT prio for the work?
Maybe WRITE_ONCE() a special handler into backlog.poll, and schedule it?
I'm not gonna stand in your way but in general re-taking caller locks
in a callee is a bit ugly :(
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice()
2025-01-07 20:11 ` [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice() Jakub Kicinski
@ 2025-01-07 20:22 ` Eric Dumazet
2025-01-07 20:46 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2025-01-07 20:22 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Simon Horman, eric.dumazet
On Tue, Jan 7, 2025 at 9:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 7 Jan 2025 17:38:34 +0000 Eric Dumazet wrote:
> > One major source of RTNL contention resides in unregister_netdevice()
> >
> > Due to RCU protection of various network structures, and
> > unregister_netdevice() being a synchronous function,
> > it is calling potentially slow functions while holding RTNL.
> >
> > I think we can release RTNL in two points, so that three
> > slow functions are called while RTNL can be used
> > by other threads.
>
> I think we'll need:
>
> diff --git a/net/devlink/port.c b/net/devlink/port.c
> index 939081a0e615..cdfa22453a55 100644
> --- a/net/devlink/port.c
> +++ b/net/devlink/port.c
> @@ -1311,6 +1311,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
> __devlink_port_type_set(devlink_port, devlink_port->type,
> netdev);
> break;
> + case NETDEV_UNREGISTERING:
Not sure I follow ?
> case NETDEV_UNREGISTER:
> if (devlink_net(devlink) != dev_net(netdev))
> return NOTIFY_OK;
>
>
> There is no other way to speed things up? Use RT prio for the work?
> Maybe WRITE_ONCE() a special handler into backlog.poll, and schedule it?
>
> I'm not gonna stand in your way but in general re-taking caller locks
> in a callee is a bit ugly :(
We might restrict this stuff to cleanup_net() caller only, we know the
netns are disappearing
and that no other thread can mess with them.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice()
2025-01-07 20:22 ` Eric Dumazet
@ 2025-01-07 20:46 ` Eric Dumazet
2025-01-07 21:09 ` Jakub Kicinski
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2025-01-07 20:46 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, Simon Horman, eric.dumazet
On Tue, Jan 7, 2025 at 9:22 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jan 7, 2025 at 9:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 7 Jan 2025 17:38:34 +0000 Eric Dumazet wrote:
> > > One major source of RTNL contention resides in unregister_netdevice()
> > >
> > > Due to RCU protection of various network structures, and
> > > unregister_netdevice() being a synchronous function,
> > > it is calling potentially slow functions while holding RTNL.
> > >
> > > I think we can release RTNL in two points, so that three
> > > slow functions are called while RTNL can be used
> > > by other threads.
> >
> > I think we'll need:
> >
> > diff --git a/net/devlink/port.c b/net/devlink/port.c
> > index 939081a0e615..cdfa22453a55 100644
> > --- a/net/devlink/port.c
> > +++ b/net/devlink/port.c
> > @@ -1311,6 +1311,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
> > __devlink_port_type_set(devlink_port, devlink_port->type,
> > netdev);
> > break;
> > + case NETDEV_UNREGISTERING:
>
> Not sure I follow ?
>
> > case NETDEV_UNREGISTER:
> > if (devlink_net(devlink) != dev_net(netdev))
> > return NOTIFY_OK;
> >
> >
> > There is no other way to speed things up? Use RT prio for the work?
> > Maybe WRITE_ONCE() a special handler into backlog.poll, and schedule it?
> >
> > I'm not gonna stand in your way but in general re-taking caller locks
> > in a callee is a bit ugly :(
>
> We might restrict this stuff to cleanup_net() caller only, we know the
> netns are disappearing
> and that no other thread can mess with them.
ie something like:
diff --git a/net/core/dev.c b/net/core/dev.c
index 9e93b13b9a76bd256d93d05a13d21dca883d6ab8..a555e82adbeda90672c72700e9235a5d271be8fd
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11414,6 +11414,23 @@ struct net_device *alloc_netdev_dummy(int sizeof_priv)
}
EXPORT_SYMBOL_GPL(alloc_netdev_dummy);
+static bool from_cleanup_net(void)
+{
+ return current == cleanup_net_task;
+}
+
+static void rtnl_drop_if_cleanup(void)
+{
+ if (from_cleanup_net())
+ __rtnl_unlock();
+}
+
+static void rtnl_acquire_if_cleanup(void)
+{
+ if (from_cleanup_net())
+ rtnl_lock();
+}
+
/**
* synchronize_net - Synchronize with packet receive processing
*
@@ -11423,7 +11440,7 @@ EXPORT_SYMBOL_GPL(alloc_netdev_dummy);
void synchronize_net(void)
{
might_sleep();
- if (current == cleanup_net_task || rtnl_is_locked())
+ if (from_cleanup_net() || rtnl_is_locked())
synchronize_rcu_expedited();
else
synchronize_rcu();
@@ -11527,10 +11544,10 @@ void unregister_netdevice_many_notify(struct
list_head *head,
WRITE_ONCE(dev->reg_state, NETREG_UNREGISTERING);
}
- __rtnl_unlock();
+ rtnl_drop_if_cleanup();
flush_all_backlogs();
synchronize_net();
- rtnl_lock();
+ rnl_acquire_if_cleanup();
list_for_each_entry(dev, head, unreg_list) {
struct sk_buff *skb = NULL;
@@ -11590,9 +11607,9 @@ void unregister_netdevice_many_notify(struct
list_head *head,
#endif
}
- __rtnl_unlock();
+ rtnl_drop_if_cleanup();
synchronize_net();
- rtnl_lock();
+ rnl_acquire_if_cleanup();
list_for_each_entry(dev, head, unreg_list) {
netdev_put(dev, &dev->dev_registered_tracker);
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH net-next 0/4] net: reduce RTNL pressure in unregister_netdevice()
2025-01-07 20:46 ` Eric Dumazet
@ 2025-01-07 21:09 ` Jakub Kicinski
0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2025-01-07 21:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, Simon Horman, eric.dumazet
On Tue, 7 Jan 2025 21:46:41 +0100 Eric Dumazet wrote:
> > > I think we'll need:
> > >
> > > diff --git a/net/devlink/port.c b/net/devlink/port.c
> > > index 939081a0e615..cdfa22453a55 100644
> > > --- a/net/devlink/port.c
> > > +++ b/net/devlink/port.c
> > > @@ -1311,6 +1311,7 @@ int devlink_port_netdevice_event(struct notifier_block *nb,
> > > __devlink_port_type_set(devlink_port, devlink_port->type,
> > > netdev);
> > > break;
> > > + case NETDEV_UNREGISTERING:
> >
> > Not sure I follow ?
I was worried some code assumed devlink_port->netdev is safe to access
under rtnl_lock. But looking closer it's only used in trivial ways, so
you can ignore that.
> > > case NETDEV_UNREGISTER:
> > > if (devlink_net(devlink) != dev_net(netdev))
> > > return NOTIFY_OK;
> > >
> > >
> > > There is no other way to speed things up? Use RT prio for the work?
> > > Maybe WRITE_ONCE() a special handler into backlog.poll, and schedule it?
> > >
> > > I'm not gonna stand in your way but in general re-taking caller locks
> > > in a callee is a bit ugly :(
> >
> > We might restrict this stuff to cleanup_net() caller only, we know the
> > netns are disappearing and that no other thread can mess with them.
Unless the interface has a peer in another netns. But that should be
fine, interface will be de-listed. I'm slightly more concerned that some
random code in the kernel assumes its stashed netdev pointer to be valid
under rtnl_lock, as long as it has not seen an NETDEV_UNREGISTER event.
I guess we'll find out..? :)
> ie something like:
> [...]
LGTM!
^ permalink raw reply [flat|nested] 11+ messages in thread