* [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl().
@ 2025-04-18 0:32 Kuniyuki Iwashima
2025-04-18 0:32 ` [PATCH v2 net-next 1/3] net: Drop hold_rtnl arg from ops_undo_list() Kuniyuki Iwashima
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-18 0:32 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Patch 1 drops the hold_rtnl arg in ops_undo_list() as suggested by Jakub.
Patch 2 & 3 apply ->exit_rtnl() to pfcp and ppp.
Changes:
v2:
* Patch 3
* Fix build failure by forward declaration
v1: https://lore.kernel.org/netdev/20250415022258.11491-1-kuniyu@amazon.com/
Kuniyuki Iwashima (3):
net: Drop hold_rtnl arg from ops_undo_list().
pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
ppp: Split ppp_exit_net() to ->exit_rtnl().
drivers/net/pfcp.c | 23 +++++++----------------
drivers/net/ppp/ppp_generic.c | 25 ++++++++++---------------
net/core/net_namespace.c | 14 ++++++++------
3 files changed, 25 insertions(+), 37 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 net-next 1/3] net: Drop hold_rtnl arg from ops_undo_list().
2025-04-18 0:32 [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() Kuniyuki Iwashima
@ 2025-04-18 0:32 ` Kuniyuki Iwashima
2025-04-18 0:32 ` [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl() Kuniyuki Iwashima
` (2 subsequent siblings)
3 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-18 0:32 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
ops_undo_list() first iterates over ops_list for ->pre_exit().
Let's check if any of the ops has ->exit_rtnl() there and drop
the hold_rtnl argument.
Note that nexthop uses ->exit_rtnl() and is built-in, so hold_rtnl
is always true for setup_net() and cleanup_net() for now.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250414170148.21f3523c@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/core/net_namespace.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 0a2b24af4028..48dd6dc603c9 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -220,17 +220,20 @@ static void ops_free_list(const struct pernet_operations *ops,
static void ops_undo_list(const struct list_head *ops_list,
const struct pernet_operations *ops,
struct list_head *net_exit_list,
- bool expedite_rcu, bool hold_rtnl)
+ bool expedite_rcu)
{
const struct pernet_operations *saved_ops;
+ bool hold_rtnl = false;
if (!ops)
ops = list_entry(ops_list, typeof(*ops), list);
saved_ops = ops;
- list_for_each_entry_continue_reverse(ops, ops_list, list)
+ list_for_each_entry_continue_reverse(ops, ops_list, list) {
+ hold_rtnl |= !!ops->exit_rtnl;
ops_pre_exit_list(ops, net_exit_list);
+ }
/* Another CPU might be rcu-iterating the list, wait for it.
* This needs to be before calling the exit() notifiers, so the
@@ -257,11 +260,10 @@ static void ops_undo_list(const struct list_head *ops_list,
static void ops_undo_single(struct pernet_operations *ops,
struct list_head *net_exit_list)
{
- bool hold_rtnl = !!ops->exit_rtnl;
LIST_HEAD(ops_list);
list_add(&ops->list, &ops_list);
- ops_undo_list(&ops_list, NULL, net_exit_list, false, hold_rtnl);
+ ops_undo_list(&ops_list, NULL, net_exit_list, false);
list_del(&ops->list);
}
@@ -452,7 +454,7 @@ static __net_init int setup_net(struct net *net)
* for the pernet modules whose init functions did not fail.
*/
list_add(&net->exit_list, &net_exit_list);
- ops_undo_list(&pernet_list, ops, &net_exit_list, false, true);
+ ops_undo_list(&pernet_list, ops, &net_exit_list, false);
rcu_barrier();
goto out;
}
@@ -681,7 +683,7 @@ static void cleanup_net(struct work_struct *work)
list_add_tail(&net->exit_list, &net_exit_list);
}
- ops_undo_list(&pernet_list, NULL, &net_exit_list, true, true);
+ ops_undo_list(&pernet_list, NULL, &net_exit_list, true);
up_read(&pernet_ops_rwsem);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-18 0:32 [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() Kuniyuki Iwashima
2025-04-18 0:32 ` [PATCH v2 net-next 1/3] net: Drop hold_rtnl arg from ops_undo_list() Kuniyuki Iwashima
@ 2025-04-18 0:32 ` Kuniyuki Iwashima
2025-04-23 2:47 ` Jakub Kicinski
2025-04-23 13:40 ` Jakub Kicinski
2025-04-18 0:32 ` [PATCH v2 net-next 3/3] ppp: Split ppp_exit_net() " Kuniyuki Iwashima
2025-04-23 2:50 ` [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() patchwork-bot+netdevbpf
3 siblings, 2 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-18 0:32 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
pfcp_net_exit() holds RTNL and cleans up all devices in the netns
and other devices tied to sockets in the netns.
We can use ->exit_rtnl() to save RTNL dance for all dying netns.
Note that we delegate the for_each_netdev() part to
default_device_exit_batch() to avoid a list corruption splat
like the one reported in commit 4ccacf86491d ("gtp: Suppress
list corruption splat in gtp_net_exit_batch_rtnl().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
drivers/net/pfcp.c | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)
diff --git a/drivers/net/pfcp.c b/drivers/net/pfcp.c
index f873a92d2445..28e6bc4a1f14 100644
--- a/drivers/net/pfcp.c
+++ b/drivers/net/pfcp.c
@@ -245,30 +245,21 @@ static int __net_init pfcp_net_init(struct net *net)
return 0;
}
-static void __net_exit pfcp_net_exit(struct net *net)
+static void __net_exit pfcp_net_exit_rtnl(struct net *net,
+ struct list_head *dev_to_kill)
{
struct pfcp_net *pn = net_generic(net, pfcp_net_id);
struct pfcp_dev *pfcp, *pfcp_next;
- struct net_device *dev;
- LIST_HEAD(list);
-
- rtnl_lock();
- for_each_netdev(net, dev)
- if (dev->rtnl_link_ops == &pfcp_link_ops)
- pfcp_dellink(dev, &list);
list_for_each_entry_safe(pfcp, pfcp_next, &pn->pfcp_dev_list, list)
- pfcp_dellink(pfcp->dev, &list);
-
- unregister_netdevice_many(&list);
- rtnl_unlock();
+ pfcp_dellink(pfcp->dev, dev_to_kill);
}
static struct pernet_operations pfcp_net_ops = {
- .init = pfcp_net_init,
- .exit = pfcp_net_exit,
- .id = &pfcp_net_id,
- .size = sizeof(struct pfcp_net),
+ .init = pfcp_net_init,
+ .exit_rtnl = pfcp_net_exit_rtnl,
+ .id = &pfcp_net_id,
+ .size = sizeof(struct pfcp_net),
};
static int __init pfcp_init(void)
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 net-next 3/3] ppp: Split ppp_exit_net() to ->exit_rtnl().
2025-04-18 0:32 [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() Kuniyuki Iwashima
2025-04-18 0:32 ` [PATCH v2 net-next 1/3] net: Drop hold_rtnl arg from ops_undo_list() Kuniyuki Iwashima
2025-04-18 0:32 ` [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl() Kuniyuki Iwashima
@ 2025-04-18 0:32 ` Kuniyuki Iwashima
2025-04-23 2:50 ` [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() patchwork-bot+netdevbpf
3 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-18 0:32 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
ppp_exit_net() unregisters devices related to the netns under
RTNL and destroys lists and IDR.
Let's use ->exit_rtnl() for the device unregistration part to
save RTNL dances for each netns.
Note that we delegate the for_each_netdev_safe() part to
default_device_exit_batch() and replace unregister_netdevice_queue()
with ppp_nl_dellink() to align with bond, geneve, gtp, and pfcp.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
v2: Fix build failure by forward declaration
---
drivers/net/ppp/ppp_generic.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 53463767cc43..def84e87e05b 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1131,6 +1131,8 @@ static const struct file_operations ppp_device_fops = {
.llseek = noop_llseek,
};
+static void ppp_nl_dellink(struct net_device *dev, struct list_head *head);
+
static __net_init int ppp_init_net(struct net *net)
{
struct ppp_net *pn = net_generic(net, ppp_net_id);
@@ -1146,28 +1148,20 @@ static __net_init int ppp_init_net(struct net *net)
return 0;
}
-static __net_exit void ppp_exit_net(struct net *net)
+static __net_exit void ppp_exit_rtnl_net(struct net *net,
+ struct list_head *dev_to_kill)
{
struct ppp_net *pn = net_generic(net, ppp_net_id);
- struct net_device *dev;
- struct net_device *aux;
struct ppp *ppp;
- LIST_HEAD(list);
int id;
- rtnl_lock();
- for_each_netdev_safe(net, dev, aux) {
- if (dev->netdev_ops == &ppp_netdev_ops)
- unregister_netdevice_queue(dev, &list);
- }
-
idr_for_each_entry(&pn->units_idr, ppp, id)
- /* Skip devices already unregistered by previous loop */
- if (!net_eq(dev_net(ppp->dev), net))
- unregister_netdevice_queue(ppp->dev, &list);
+ ppp_nl_dellink(ppp->dev, dev_to_kill);
+}
- unregister_netdevice_many(&list);
- rtnl_unlock();
+static __net_exit void ppp_exit_net(struct net *net)
+{
+ struct ppp_net *pn = net_generic(net, ppp_net_id);
mutex_destroy(&pn->all_ppp_mutex);
idr_destroy(&pn->units_idr);
@@ -1177,6 +1171,7 @@ static __net_exit void ppp_exit_net(struct net *net)
static struct pernet_operations ppp_net_ops = {
.init = ppp_init_net,
+ .exit_rtnl = ppp_exit_rtnl_net,
.exit = ppp_exit_net,
.id = &ppp_net_id,
.size = sizeof(struct ppp_net),
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-18 0:32 ` [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl() Kuniyuki Iwashima
@ 2025-04-23 2:47 ` Jakub Kicinski
2025-04-23 8:37 ` Michal Swiatkowski
2025-04-23 13:40 ` Jakub Kicinski
1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-04-23 2:47 UTC (permalink / raw)
To: Wojciech Drewek, Michal Swiatkowski
Cc: Kuniyuki Iwashima, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, Kuniyuki Iwashima, netdev
On Thu, 17 Apr 2025 17:32:33 -0700 Kuniyuki Iwashima wrote:
> drivers/net/pfcp.c | 23 +++++++----------------
Wojciech, Michał, does anyone use this driver?
It seems that it hooks dev_get_tstats64 as ndo_get_stats64
but it never allocates tstats, so any ifconfig / ip link
run after the device is create immediately crashes the kernel.
I don't see any tstats in this driver history or UDP tunnel
code so I'm moderately confused as how this worked / when
it broke.
If I'm not missing anything and indeed this driver was always
broken we should just delete it ?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl().
2025-04-18 0:32 [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() Kuniyuki Iwashima
` (2 preceding siblings ...)
2025-04-18 0:32 ` [PATCH v2 net-next 3/3] ppp: Split ppp_exit_net() " Kuniyuki Iwashima
@ 2025-04-23 2:50 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-04-23 2:50 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, kuni1840,
netdev
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 17 Apr 2025 17:32:31 -0700 you wrote:
> Patch 1 drops the hold_rtnl arg in ops_undo_list() as suggested by Jakub.
> Patch 2 & 3 apply ->exit_rtnl() to pfcp and ppp.
>
>
> Changes:
> v2:
> * Patch 3
> * Fix build failure by forward declaration
>
> [...]
Here is the summary with links:
- [v2,net-next,1/3] net: Drop hold_rtnl arg from ops_undo_list().
https://git.kernel.org/netdev/net-next/c/434efd3d0cdd
- [v2,net-next,2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
https://git.kernel.org/netdev/net-next/c/81eccc131bc1
- [v2,net-next,3/3] ppp: Split ppp_exit_net() to ->exit_rtnl().
https://git.kernel.org/netdev/net-next/c/7ee32072c732
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-23 2:47 ` Jakub Kicinski
@ 2025-04-23 8:37 ` Michal Swiatkowski
2025-04-23 13:33 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Michal Swiatkowski @ 2025-04-23 8:37 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Wojciech Drewek, Michal Swiatkowski, Kuniyuki Iwashima,
Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Kuniyuki Iwashima, netdev, marcin.szycik
On Tue, Apr 22, 2025 at 07:47:57PM -0700, Jakub Kicinski wrote:
> On Thu, 17 Apr 2025 17:32:33 -0700 Kuniyuki Iwashima wrote:
> > drivers/net/pfcp.c | 23 +++++++----------------
>
> Wojciech, Michał, does anyone use this driver?
> It seems that it hooks dev_get_tstats64 as ndo_get_stats64
> but it never allocates tstats, so any ifconfig / ip link
> run after the device is create immediately crashes the kernel.
> I don't see any tstats in this driver history or UDP tunnel
> code so I'm moderately confused as how this worked / when
> it broke.
>
> If I'm not missing anything and indeed this driver was always
> broken we should just delete it ?
Uh, I remember that we used it to add tc filter. Maybe we can fix it?
Adding Marcin, as he was working on it.
Thanks
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-23 8:37 ` Michal Swiatkowski
@ 2025-04-23 13:33 ` Jakub Kicinski
2025-04-24 4:40 ` Michal Swiatkowski
0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-04-23 13:33 UTC (permalink / raw)
To: Michal Swiatkowski
Cc: Wojciech Drewek, Kuniyuki Iwashima, Andrew Lunn, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
netdev, marcin.szycik
On Wed, 23 Apr 2025 10:37:05 +0200 Michal Swiatkowski wrote:
> On Tue, Apr 22, 2025 at 07:47:57PM -0700, Jakub Kicinski wrote:
> > On Thu, 17 Apr 2025 17:32:33 -0700 Kuniyuki Iwashima wrote:
> > > drivers/net/pfcp.c | 23 +++++++----------------
> >
> > Wojciech, Michał, does anyone use this driver?
> > It seems that it hooks dev_get_tstats64 as ndo_get_stats64
> > but it never allocates tstats, so any ifconfig / ip link
> > run after the device is create immediately crashes the kernel.
> > I don't see any tstats in this driver history or UDP tunnel
> > code so I'm moderately confused as how this worked / when
> > it broke.
> >
> > If I'm not missing anything and indeed this driver was always
> > broken we should just delete it ?
>
> Uh, I remember that we used it to add tc filter. Maybe we can fix it?
If it really was broken for over a year and nobody noticed -
my preference would be to delete it. I don't think you need
an actual tunnel dev to add TC filters?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-18 0:32 ` [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl() Kuniyuki Iwashima
2025-04-23 2:47 ` Jakub Kicinski
@ 2025-04-23 13:40 ` Jakub Kicinski
2025-04-23 14:12 ` Kuniyuki Iwashima
1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-04-23 13:40 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Kuniyuki Iwashima, netdev
On Thu, 17 Apr 2025 17:32:33 -0700 Kuniyuki Iwashima wrote:
> -static void __net_exit pfcp_net_exit(struct net *net)
> +static void __net_exit pfcp_net_exit_rtnl(struct net *net,
> + struct list_head *dev_to_kill)
> {
> struct pfcp_net *pn = net_generic(net, pfcp_net_id);
> struct pfcp_dev *pfcp, *pfcp_next;
> - struct net_device *dev;
> - LIST_HEAD(list);
> -
> - rtnl_lock();
> - for_each_netdev(net, dev)
> - if (dev->rtnl_link_ops == &pfcp_link_ops)
> - pfcp_dellink(dev, &list);
>
> list_for_each_entry_safe(pfcp, pfcp_next, &pn->pfcp_dev_list, list)
> - pfcp_dellink(pfcp->dev, &list);
> -
> - unregister_netdevice_many(&list);
> - rtnl_unlock();
> + pfcp_dellink(pfcp->dev, dev_to_kill);
Kuniyuki, I got distracted by the fact the driver is broken but I think
this isn't right. The devices do not migrate to the local pcfp_dev_list
when their netns is changed. They always stay on the list of original
netns. Which I guess may make some sense as that's where their socket
is? So we need to scan both the pcfp_dev_list _and_ the local netdevs
that are pfcp. Am I misunderstanding something?
For gtp was the problem perhaps that we were walking multiple
netns'es? I'm my understanding is correct then 4ccacf86491 needs
a redo :(
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-23 13:40 ` Jakub Kicinski
@ 2025-04-23 14:12 ` Kuniyuki Iwashima
2025-04-23 22:33 ` Jakub Kicinski
2025-04-23 22:52 ` Jakub Kicinski
0 siblings, 2 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-23 14:12 UTC (permalink / raw)
To: kuba
Cc: andrew+netdev, davem, edumazet, horms, kuni1840, kuniyu, netdev,
pabeni
From: Jakub Kicinski <kuba@kernel.org>
Date: Wed, 23 Apr 2025 06:40:26 -0700
> On Thu, 17 Apr 2025 17:32:33 -0700 Kuniyuki Iwashima wrote:
> > -static void __net_exit pfcp_net_exit(struct net *net)
> > +static void __net_exit pfcp_net_exit_rtnl(struct net *net,
> > + struct list_head *dev_to_kill)
> > {
> > struct pfcp_net *pn = net_generic(net, pfcp_net_id);
> > struct pfcp_dev *pfcp, *pfcp_next;
> > - struct net_device *dev;
> > - LIST_HEAD(list);
> > -
> > - rtnl_lock();
> > - for_each_netdev(net, dev)
> > - if (dev->rtnl_link_ops == &pfcp_link_ops)
> > - pfcp_dellink(dev, &list);
> >
> > list_for_each_entry_safe(pfcp, pfcp_next, &pn->pfcp_dev_list, list)
> > - pfcp_dellink(pfcp->dev, &list);
> > -
> > - unregister_netdevice_many(&list);
> > - rtnl_unlock();
> > + pfcp_dellink(pfcp->dev, dev_to_kill);
>
> Kuniyuki, I got distracted by the fact the driver is broken but I think
> this isn't right.
I guess it was broken recently ? at least I didn't see null-deref
while testing ffc90e9ca61b ("pfcp: Destroy device along with udp
socket's netns dismantle.").
> The devices do not migrate to the local pcfp_dev_list
> when their netns is changed. They always stay on the list of original
> netns. Which I guess may make some sense as that's where their socket
> is? So we need to scan both the pcfp_dev_list _and_ the local netdevs
> that are pfcp. Am I misunderstanding something?
If the netns is queued up for cleanup_net(), the local netdevs
are handled later by default_device_exit_batch().
If the module is unloaded, we call pfcp_net_exit_rtnl() for
all netns including all local netdevs.
I remember bareudp did like that and I changed geneve and gtp
as well.
I applied the quivalent diff below on ffc90e9ca61b, and during
netns dismantle, 2 local netdevs were removed properly (one
was originally created there, another was moved to the netns).
---8<---
[root@localhost ~]# ip netns add ns1
[root@localhost ~]# ip netns add ns2
[root@localhost ~]# ip -n ns1 link add netns ns2 name pfcp0 type pfcp
[ 74.564522] newlink: dev: pfcp0 at ff11000103ca9000
[root@localhost ~]# ip -n ns2 link add netns ns2 name pfcp1 type pfcp
[ 79.510334] newlink: dev: pfcp1 at ff11000103caa000
[root@localhost ~]# ip netns del ns2
[ 83.953871] dellink: dev: pfcp1 at ff11000103caa000 from cleanup_net+0x20e/0x3b0
[ 83.980520] dellink: dev: pfcp0 at ff11000103ca9000 from default_device_exit_batch+0x244/0x300
---8<---
---8<---
diff --git a/drivers/net/pfcp.c b/drivers/net/pfcp.c
index 68d0d9e92a22..f7f254d7d031 100644
--- a/drivers/net/pfcp.c
+++ b/drivers/net/pfcp.c
@@ -206,6 +206,7 @@ static int pfcp_newlink(struct net *net, struct net_device *dev,
goto exit_del_pfcp_sock;
}
+ printk(KERN_ERR "newlink: dev: %s at %px\n", dev->name, dev);
pn = net_generic(net, pfcp_net_id);
list_add(&pfcp->list, &pn->pfcp_dev_list);
@@ -224,6 +225,7 @@ static void pfcp_dellink(struct net_device *dev, struct list_head *head)
{
struct pfcp_dev *pfcp = netdev_priv(dev);
+ printk(KERN_ERR "dellink: dev: %s at %px from %pS\n", dev->name, dev, __builtin_return_address(0));
list_del(&pfcp->list);
unregister_netdevice_queue(dev, head);
}
@@ -244,28 +246,26 @@ static int __net_init pfcp_net_init(struct net *net)
return 0;
}
-static void __net_exit pfcp_net_exit(struct net *net)
+static void __net_exit pfcp_net_exit(struct net *net, struct list_head *list)
{
struct pfcp_net *pn = net_generic(net, pfcp_net_id);
struct pfcp_dev *pfcp, *pfcp_next;
- struct net_device *dev;
- LIST_HEAD(list);
-
- rtnl_lock();
- for_each_netdev(net, dev)
- if (dev->rtnl_link_ops == &pfcp_link_ops)
- pfcp_dellink(dev, &list);
list_for_each_entry_safe(pfcp, pfcp_next, &pn->pfcp_dev_list, list)
- pfcp_dellink(pfcp->dev, &list);
+ pfcp_dellink(pfcp->dev, list);
+}
+
+static void __net_exit pfcp_net_exit_batch_rtnl(struct list_head *net_list, struct list_head *head)
+{
+ struct net *net;
- unregister_netdevice_many(&list);
- rtnl_unlock();
+ list_for_each_entry(net, net_list, exit_list)
+ pfcp_net_exit(net, head);
}
static struct pernet_operations pfcp_net_ops = {
.init = pfcp_net_init,
- .exit = pfcp_net_exit,
+ .exit_batch_rtnl = pfcp_net_exit_batch_rtnl,
.id = &pfcp_net_id,
.size = sizeof(struct pfcp_net),
};
---8<---
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-23 14:12 ` Kuniyuki Iwashima
@ 2025-04-23 22:33 ` Jakub Kicinski
2025-04-23 22:52 ` Jakub Kicinski
1 sibling, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2025-04-23 22:33 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: andrew+netdev, davem, edumazet, horms, kuni1840, netdev, pabeni
On Wed, 23 Apr 2025 07:12:55 -0700 Kuniyuki Iwashima wrote:
> If the netns is queued up for cleanup_net(), the local netdevs
> are handled later by default_device_exit_batch().
Missed default_device_exit_batch(), I love notifiers so much!
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-23 14:12 ` Kuniyuki Iwashima
2025-04-23 22:33 ` Jakub Kicinski
@ 2025-04-23 22:52 ` Jakub Kicinski
2025-04-24 2:23 ` Kuniyuki Iwashima
1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-04-23 22:52 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: andrew+netdev, davem, edumazet, horms, kuni1840, netdev, pabeni
On Wed, 23 Apr 2025 07:12:55 -0700 Kuniyuki Iwashima wrote:
> > > list_for_each_entry_safe(pfcp, pfcp_next, &pn->pfcp_dev_list, list)
> > > - pfcp_dellink(pfcp->dev, &list);
> > > -
> > > - unregister_netdevice_many(&list);
> > > - rtnl_unlock();
> > > + pfcp_dellink(pfcp->dev, dev_to_kill);
> >
> > Kuniyuki, I got distracted by the fact the driver is broken but I think
> > this isn't right.
>
> I guess it was broken recently ? at least I didn't see null-deref
> while testing ffc90e9ca61b ("pfcp: Destroy device along with udp
> socket's netns dismantle.").
Not sure, nothing seems to have changed since?
I wiped my kernel, rebuilt net-next and still triggers:
vng -b \
-f tools/testing/selftests/net/config \
-f kernel/configs/debug.config \
-f tools/testing/selftests/drivers/net/config \
--configitem CONFIG_PFCP=y
vng -r arch/x86/boot/bzImage -v --user root
bash-5.1# ip link add type pfcp
[ 62.098561][ T662] BUG: unable to handle page fault for address: ffe21c00131f5000
[ 62.099024][ T662] #PF: supervisor read access in kernel mode
[ 62.099024][ T662] #PF: error_code(0x0000) - not-present page
[ 62.099024][ T662] PGD 3fdf1067 P4D 3fdf0067 PUD 3fdef067 PMD 0
[ 62.099024][ T662] Oops: Oops: 0000 [#1] SMP KASAN NOPTI
[ 62.099024][ T662] CPU: 3 UID: 0 PID: 662 Comm: ip Not tainted 6.15.0-rc2-virtme #1 PREEMPT(full)
[ 62.099024][ T662] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 62.099024][ T662] RIP: 0010:kasan_check_range+0x161/0x1c0
[ 62.099024][ T662] Code: 2c 48 89 c2 48 85 c0 75 b0 48 89 da 4c 89 d8 4c 29 da e9 49 ff ff ff 48 85 d2 74 b3 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 a5 <80> 38 00 74 f2 e9 74 ff ff ff b8 01 00 00 00 e9 e6 36 1e ff 48 29
[ 62.099024][ T662] RSP: 0018:ffa0000002b8ee00 EFLAGS: 00010282
[ 62.099024][ T662] RAX: ffe21c00131f5000 RBX: ffe21c00131f5001 RCX: ffffffff8f04f9da
[ 62.099024][ T662] RDX: ffe21c00131f5001 RSI: 0000000000000008 RDI: ff11000098fa8000
[ 62.099024][ T662] RBP: ffe21c00131f5000 R08: 0000000000000000 R09: ffe21c00131f5000
[ 62.099024][ T662] R10: ff11000098fa8007 R11: ffffffff9118a620 R12: ff11000098fa8000
[ 62.099024][ T662] R13: ff1100003c8fb8f8 R14: ff1100003c8fb8f8 R15: dffffc0000000000
[ 62.099024][ T662] FS: 00007f033a740440(0000) GS:ff11000099128000(0000) knlGS:0000000000000000
[ 62.099024][ T662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.099024][ T662] CR2: ffe21c00131f5000 CR3: 0000000010fca003 CR4: 0000000000771ef0
[ 62.099024][ T662] PKRU: 55555554
[ 62.099024][ T662] Call Trace:
[ 62.099024][ T662] <TASK>
[ 62.099024][ T662] dev_fetch_sw_netstats+0xea/0x310
[ 62.099024][ T662] dev_get_stats+0x98/0xc30
[ 62.099024][ T662] rtnl_fill_stats+0x40/0xa70
[ 62.099024][ T662] rtnl_fill_ifinfo.constprop.0+0x120c/0x2d50
[ 62.099024][ T662] rtmsg_ifinfo_build_skb+0x134/0x230
[ 62.099024][ T662] rtmsg_ifinfo_event.part.0+0x2d/0x120
[ 62.099024][ T662] rtmsg_ifinfo+0x5b/0xa0
[ 62.099024][ T662] __dev_notify_flags+0x1b8/0x250
[ 62.099024][ T662] rtnl_configure_link+0x1d3/0x260
[ 62.099024][ T662] rtnl_newlink_create+0x358/0x8f0
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-23 22:52 ` Jakub Kicinski
@ 2025-04-24 2:23 ` Kuniyuki Iwashima
2025-04-24 22:24 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-24 2:23 UTC (permalink / raw)
To: kuba
Cc: andrew+netdev, davem, edumazet, horms, kuni1840, kuniyu, netdev,
pabeni
From: Jakub Kicinski <kuba@kernel.org>
Date: Wed, 23 Apr 2025 15:52:33 -0700
> On Wed, 23 Apr 2025 07:12:55 -0700 Kuniyuki Iwashima wrote:
> > > > list_for_each_entry_safe(pfcp, pfcp_next, &pn->pfcp_dev_list, list)
> > > > - pfcp_dellink(pfcp->dev, &list);
> > > > -
> > > > - unregister_netdevice_many(&list);
> > > > - rtnl_unlock();
> > > > + pfcp_dellink(pfcp->dev, dev_to_kill);
> > >
> > > Kuniyuki, I got distracted by the fact the driver is broken but I think
> > > this isn't right.
> >
> > I guess it was broken recently ? at least I didn't see null-deref
> > while testing ffc90e9ca61b ("pfcp: Destroy device along with udp
> > socket's netns dismantle.").
>
> Not sure, nothing seems to have changed since?
It's been broken since the first commit of pfcp, but the bug seems
to be exposed recently by the commit below, which changed the per-cpu
variable section address from 0 to relative address.
$ git bisect good
9d7de2aa8b41407bc96d89a80dc1fd637d389d42 is the first bad commit
commit 9d7de2aa8b41407bc96d89a80dc1fd637d389d42
Author: Brian Gerst <brgerst@gmail.com>
Date: Thu Jan 23 14:07:40 2025 -0500
x86/percpu/64: Use relative percpu offsets
Looks like before this commit 0 was a valid per-cpu variable address
on x86, and that's why accessing per_cpu_ptr(NULL, cpu) was handled
(im)properly.
The fix is one-liner assigning pcpu_stat_type, but no one have used
pfcp for the recent 3 months and haven't noticed the wrong stats nor
used stats for a year.
Do we want to fix it or remove ? :)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-23 13:33 ` Jakub Kicinski
@ 2025-04-24 4:40 ` Michal Swiatkowski
2025-04-24 22:26 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Michal Swiatkowski @ 2025-04-24 4:40 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Michal Swiatkowski, Wojciech Drewek, Kuniyuki Iwashima,
Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Kuniyuki Iwashima, netdev, marcin.szycik
On Wed, Apr 23, 2025 at 06:33:50AM -0700, Jakub Kicinski wrote:
> On Wed, 23 Apr 2025 10:37:05 +0200 Michal Swiatkowski wrote:
> > On Tue, Apr 22, 2025 at 07:47:57PM -0700, Jakub Kicinski wrote:
> > > On Thu, 17 Apr 2025 17:32:33 -0700 Kuniyuki Iwashima wrote:
> > > > drivers/net/pfcp.c | 23 +++++++----------------
> > >
> > > Wojciech, Michał, does anyone use this driver?
> > > It seems that it hooks dev_get_tstats64 as ndo_get_stats64
> > > but it never allocates tstats, so any ifconfig / ip link
> > > run after the device is create immediately crashes the kernel.
> > > I don't see any tstats in this driver history or UDP tunnel
> > > code so I'm moderately confused as how this worked / when
> > > it broke.
> > >
> > > If I'm not missing anything and indeed this driver was always
> > > broken we should just delete it ?
> >
> > Uh, I remember that we used it to add tc filter. Maybe we can fix it?
>
> If it really was broken for over a year and nobody noticed -
> my preference would be to delete it. I don't think you need
> an actual tunnel dev to add TC filters?
Our approach was to follow scheme from exsisting ones.
For example, vxlan filter:
tc filter add dev vxlan ingress protocol ip ...
PFCP filter:
tc filter add dev pfcp ingress protocol ip ...
so in this case we need sth to point and pass the information that this
tunnel is PFCP. If you have an idea how to do it without actual tunnel
we are willing to implement it. AFAIR simple matching on specific port
number isn't good solution as tunnel specific fields can't be passed in
such scenario.
Thanks
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-24 2:23 ` Kuniyuki Iwashima
@ 2025-04-24 22:24 ` Jakub Kicinski
0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2025-04-24 22:24 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: andrew+netdev, davem, edumazet, horms, kuni1840, netdev, pabeni
On Wed, 23 Apr 2025 19:23:28 -0700 Kuniyuki Iwashima wrote:
> > > I guess it was broken recently ? at least I didn't see null-deref
> > > while testing ffc90e9ca61b ("pfcp: Destroy device along with udp
> > > socket's netns dismantle.").
> >
> > Not sure, nothing seems to have changed since?
>
> It's been broken since the first commit of pfcp, but the bug seems
> to be exposed recently by the commit below, which changed the per-cpu
> variable section address from 0 to relative address.
>
> $ git bisect good
> 9d7de2aa8b41407bc96d89a80dc1fd637d389d42 is the first bad commit
> commit 9d7de2aa8b41407bc96d89a80dc1fd637d389d42
> Author: Brian Gerst <brgerst@gmail.com>
> Date: Thu Jan 23 14:07:40 2025 -0500
>
> x86/percpu/64: Use relative percpu offsets
>
> Looks like before this commit 0 was a valid per-cpu variable address
> on x86, and that's why accessing per_cpu_ptr(NULL, cpu) was handled
> (im)properly.
Interesting! I guess in most cases, then, we'd access random data
and just show crazy interface stats prior to that commit?
> The fix is one-liner assigning pcpu_stat_type
Or remove the ndo, and re-add in net-next cause I don't see any actual
stats being counted.
> but no one have used > pfcp for the recent 3 months and haven't
> noticed the wrong stats nor used stats for a year.
>
> Do we want to fix it or remove ? :)
That would be very pleasant indeed :) Let me answer in the other
sub-thread..
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-24 4:40 ` Michal Swiatkowski
@ 2025-04-24 22:26 ` Jakub Kicinski
2025-04-25 4:28 ` Michal Swiatkowski
0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-04-24 22:26 UTC (permalink / raw)
To: Michal Swiatkowski
Cc: Wojciech Drewek, Kuniyuki Iwashima, Andrew Lunn, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Kuniyuki Iwashima,
netdev, marcin.szycik
On Thu, 24 Apr 2025 06:40:36 +0200 Michal Swiatkowski wrote:
> > > Uh, I remember that we used it to add tc filter. Maybe we can fix it?
> >
> > If it really was broken for over a year and nobody noticed -
> > my preference would be to delete it. I don't think you need
> > an actual tunnel dev to add TC filters?
>
> Our approach was to follow scheme from exsisting ones.
> For example, vxlan filter:
> tc filter add dev vxlan ingress protocol ip ...
> PFCP filter:
> tc filter add dev pfcp ingress protocol ip ...
>
> so in this case we need sth to point and pass the information that this
> tunnel is PFCP. If you have an idea how to do it without actual tunnel
> we are willing to implement it. AFAIR simple matching on specific port
> number isn't good solution as tunnel specific fields can't be passed in
> such scenario.
You're right, not sure what I was thinking.. probably about
the offloaded flow.
Could you please fix this and provide a selftests for offloaded
and non-offloaded operation? To make sure this code is exercised?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl().
2025-04-24 22:26 ` Jakub Kicinski
@ 2025-04-25 4:28 ` Michal Swiatkowski
0 siblings, 0 replies; 17+ messages in thread
From: Michal Swiatkowski @ 2025-04-25 4:28 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Michal Swiatkowski, Wojciech Drewek, Kuniyuki Iwashima,
Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Kuniyuki Iwashima, netdev, marcin.szycik
On Thu, Apr 24, 2025 at 03:26:38PM -0700, Jakub Kicinski wrote:
> On Thu, 24 Apr 2025 06:40:36 +0200 Michal Swiatkowski wrote:
> > > > Uh, I remember that we used it to add tc filter. Maybe we can fix it?
> > >
> > > If it really was broken for over a year and nobody noticed -
> > > my preference would be to delete it. I don't think you need
> > > an actual tunnel dev to add TC filters?
> >
> > Our approach was to follow scheme from exsisting ones.
> > For example, vxlan filter:
> > tc filter add dev vxlan ingress protocol ip ...
> > PFCP filter:
> > tc filter add dev pfcp ingress protocol ip ...
> >
> > so in this case we need sth to point and pass the information that this
> > tunnel is PFCP. If you have an idea how to do it without actual tunnel
> > we are willing to implement it. AFAIR simple matching on specific port
> > number isn't good solution as tunnel specific fields can't be passed in
> > such scenario.
>
> You're right, not sure what I was thinking.. probably about
> the offloaded flow.
>
> Could you please fix this and provide a selftests for offloaded
> and non-offloaded operation? To make sure this code is exercised?
Sure, I will do that. I am going for a two week vacation from today, so it
can take some time for v1.
Thanks
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-04-25 4:28 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-18 0:32 [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() Kuniyuki Iwashima
2025-04-18 0:32 ` [PATCH v2 net-next 1/3] net: Drop hold_rtnl arg from ops_undo_list() Kuniyuki Iwashima
2025-04-18 0:32 ` [PATCH v2 net-next 2/3] pfcp: Convert pfcp_net_exit() to ->exit_rtnl() Kuniyuki Iwashima
2025-04-23 2:47 ` Jakub Kicinski
2025-04-23 8:37 ` Michal Swiatkowski
2025-04-23 13:33 ` Jakub Kicinski
2025-04-24 4:40 ` Michal Swiatkowski
2025-04-24 22:26 ` Jakub Kicinski
2025-04-25 4:28 ` Michal Swiatkowski
2025-04-23 13:40 ` Jakub Kicinski
2025-04-23 14:12 ` Kuniyuki Iwashima
2025-04-23 22:33 ` Jakub Kicinski
2025-04-23 22:52 ` Jakub Kicinski
2025-04-24 2:23 ` Kuniyuki Iwashima
2025-04-24 22:24 ` Jakub Kicinski
2025-04-18 0:32 ` [PATCH v2 net-next 3/3] ppp: Split ppp_exit_net() " Kuniyuki Iwashima
2025-04-23 2:50 ` [PATCH v2 net-next 0/3] net: Followup series for ->exit_rtnl() patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).