netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
@ 2025-05-09 16:00 Taehee Yoo
  2025-05-09 19:43 ` Mina Almasry
  2025-05-09 22:41 ` Stanislav Fomichev
  0 siblings, 2 replies; 8+ messages in thread
From: Taehee Yoo @ 2025-05-09 16:00 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, horms, almasrymina, sdf, netdev
  Cc: asml.silence, dw, skhawaja, kaiyuanz, jdamato, ap420073

Kernel panic occurs when a devmem TCP socket is closed after NIC module
is unloaded.

This is Devmem TCP unregistration scenarios. number is an order.
(a)netlink socket close    (b)pp destroy    (c)uninstall    result
1                          2                3               OK
1                          3                2               (d)Impossible
2                          1                3               OK
3                          1                2               (e)Kernel panic
2                          3                1               (d)Impossible
3                          2                1               (d)Impossible

(a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
    closed.
(b) page_pool_destroy() is called when the interface is down.
(c) mp_ops->uninstall() is called when an interface is unregistered.
(d) There is no scenario in mp_ops->uninstall() is called before
    page_pool_destroy().
    Because unregister_netdevice_many_notify() closes interfaces first
    and then calls mp_ops->uninstall().
(e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
    netdev_lock().
    But if the interface module has already been removed, net_device
    pointer is invalid, so it causes kernel panic.

In summary, there are only 3 possible scenarios.
 A. sk close -> pp destroy -> uninstall.
 B. pp destroy -> sk close -> uninstall.
 C. pp destroy -> uninstall -> sk close.

Case C is a kernel panic scenario.

In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
binding->dev to NULL.
It indicates an bound net_device was unregistered.

It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
if binding->dev is NULL.

A new binding->lock is added to protect members of a binding.

Tests:
Scenario A:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    kill $pid
    ip link set $interface down
    modprobe -rv $module

Scenario B:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    ip link set $interface down
    kill $pid
    modprobe -rv $module

Scenario C:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    modprobe -rv $module
    sleep 5
    kill $pid

Splat looks like:
Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
Tainted: [B]=BAD_PAGE, [W]=WARN
RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
 ...
 netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
 genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
 ...
 netlink_release (net/netlink/af_netlink.c:737)
 ...
 __sock_release (net/socket.c:647)
 sock_close (net/socket.c:1393)

Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---

v3:
 - Add binding->lock for protecting members of a binding.
 - Add a net_devmem_unset_dev() helper function.
 - Do not reorder locks.
 - Fix build failure.

v2:
 - Fix commit message.
 - Correct Fixes tag.
 - Inverse locking order.
 - Do not put a reference count of binding in
   mp_dmabuf_devmem_uninstall().

In order to test this patch, driver side implementation of devmem TCP[1]
is needed to be applied.

[1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u

 net/core/devmem.c      | 14 +++++++++++---
 net/core/devmem.h      |  2 ++
 net/core/netdev-genl.c | 13 +++++++++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/net/core/devmem.c b/net/core/devmem.c
index 6e27a47d0493..ffbf50337413 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -33,6 +33,13 @@ bool net_is_devmem_iov(struct net_iov *niov)
 	return niov->pp->mp_ops == &dmabuf_devmem_ops;
 }
 
+static void net_devmem_unset_dev(struct net_devmem_dmabuf_binding *binding)
+{
+	mutex_lock(&binding->lock);
+	binding->dev = NULL;
+	mutex_unlock(&binding->lock);
+}
+
 static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
 					       struct gen_pool_chunk *chunk,
 					       void *not_used)
@@ -117,9 +124,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
 	unsigned long xa_idx;
 	unsigned int rxq_idx;
 
-	if (binding->list.next)
-		list_del(&binding->list);
-
 	xa_for_each(&binding->bound_rxqs, xa_idx, rxq) {
 		const struct pp_memory_provider_params mp_params = {
 			.mp_priv	= binding,
@@ -200,6 +204,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
 
 	refcount_set(&binding->ref, 1);
 
+	mutex_init(&binding->lock);
+
 	binding->dmabuf = dmabuf;
 
 	binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
@@ -379,6 +385,8 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
 	xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
 		if (bound_rxq == rxq) {
 			xa_erase(&binding->bound_rxqs, xa_idx);
+			if (xa_empty(&binding->bound_rxqs))
+				net_devmem_unset_dev(binding);
 			break;
 		}
 	}
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 7fc158d52729..b69adca6cd44 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -20,6 +20,8 @@ struct net_devmem_dmabuf_binding {
 	struct sg_table *sgt;
 	struct net_device *dev;
 	struct gen_pool *chunk_pool;
+	/* Protect all members */
+	struct mutex lock;
 
 	/* The user holds a ref (via the netlink API) for as long as they want
 	 * the binding to remain alive. Each page pool using this binding holds
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index dae9f0d432fb..bd5d58604ec0 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -979,14 +979,27 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
 {
 	struct net_devmem_dmabuf_binding *binding;
 	struct net_devmem_dmabuf_binding *temp;
+	netdevice_tracker dev_tracker;
 	struct net_device *dev;
 
 	mutex_lock(&priv->lock);
 	list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
+		list_del(&binding->list);
+
+		mutex_lock(&binding->lock);
 		dev = binding->dev;
+		if (!dev) {
+			mutex_unlock(&binding->lock);
+			net_devmem_unbind_dmabuf(binding);
+			continue;
+		}
+		netdev_hold(dev, &dev_tracker, GFP_KERNEL);
+		mutex_unlock(&binding->lock);
+
 		netdev_lock(dev);
 		net_devmem_unbind_dmabuf(binding);
 		netdev_unlock(dev);
+		netdev_put(dev, &dev_tracker);
 	}
 	mutex_unlock(&priv->lock);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
  2025-05-09 16:00 [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload Taehee Yoo
@ 2025-05-09 19:43 ` Mina Almasry
  2025-05-09 22:32   ` Jakub Kicinski
  2025-05-12  0:56   ` Taehee Yoo
  2025-05-09 22:41 ` Stanislav Fomichev
  1 sibling, 2 replies; 8+ messages in thread
From: Mina Almasry @ 2025-05-09 19:43 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, kuba, pabeni, edumazet, horms, sdf, netdev, asml.silence,
	dw, skhawaja, kaiyuanz, jdamato

On Fri, May 9, 2025 at 9:01 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> Kernel panic occurs when a devmem TCP socket is closed after NIC module
> is unloaded.
>
> This is Devmem TCP unregistration scenarios. number is an order.
> (a)netlink socket close    (b)pp destroy    (c)uninstall    result
> 1                          2                3               OK
> 1                          3                2               (d)Impossible
> 2                          1                3               OK
> 3                          1                2               (e)Kernel panic
> 2                          3                1               (d)Impossible
> 3                          2                1               (d)Impossible
>
> (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
>     closed.
> (b) page_pool_destroy() is called when the interface is down.
> (c) mp_ops->uninstall() is called when an interface is unregistered.
> (d) There is no scenario in mp_ops->uninstall() is called before
>     page_pool_destroy().
>     Because unregister_netdevice_many_notify() closes interfaces first
>     and then calls mp_ops->uninstall().
> (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
>     netdev_lock().
>     But if the interface module has already been removed, net_device
>     pointer is invalid, so it causes kernel panic.
>
> In summary, there are only 3 possible scenarios.
>  A. sk close -> pp destroy -> uninstall.
>  B. pp destroy -> sk close -> uninstall.
>  C. pp destroy -> uninstall -> sk close.
>
> Case C is a kernel panic scenario.
>
> In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> binding->dev to NULL.
> It indicates an bound net_device was unregistered.
>
> It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> if binding->dev is NULL.
>
> A new binding->lock is added to protect members of a binding.
>
> Tests:
> Scenario A:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     kill $pid
>     ip link set $interface down
>     modprobe -rv $module
>
> Scenario B:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     ip link set $interface down
>     kill $pid
>     modprobe -rv $module
>
> Scenario C:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     modprobe -rv $module
>     sleep 5
>     kill $pid
>
> Splat looks like:
> Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> Tainted: [B]=BAD_PAGE, [W]=WARN
> RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> PKRU: 55555554
> Call Trace:
> <TASK>
>  ...
>  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
>  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
>  ...
>  netlink_release (net/netlink/af_netlink.c:737)
>  ...
>  __sock_release (net/socket.c:647)
>  sock_close (net/socket.c:1393)
>
> Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> v3:
>  - Add binding->lock for protecting members of a binding.
>  - Add a net_devmem_unset_dev() helper function.
>  - Do not reorder locks.
>  - Fix build failure.
>

Thanks for addressing the feedback Taehee. I think this diff looks
much much nicer.

> v2:
>  - Fix commit message.
>  - Correct Fixes tag.
>  - Inverse locking order.
>  - Do not put a reference count of binding in
>    mp_dmabuf_devmem_uninstall().
>
> In order to test this patch, driver side implementation of devmem TCP[1]
> is needed to be applied.
>
> [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
>
>  net/core/devmem.c      | 14 +++++++++++---
>  net/core/devmem.h      |  2 ++
>  net/core/netdev-genl.c | 13 +++++++++++++
>  3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 6e27a47d0493..ffbf50337413 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -33,6 +33,13 @@ bool net_is_devmem_iov(struct net_iov *niov)
>         return niov->pp->mp_ops == &dmabuf_devmem_ops;
>  }
>
> +static void net_devmem_unset_dev(struct net_devmem_dmabuf_binding *binding)
> +{
> +       mutex_lock(&binding->lock);
> +       binding->dev = NULL;
> +       mutex_unlock(&binding->lock);
> +}
> +
>  static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
>                                                struct gen_pool_chunk *chunk,
>                                                void *not_used)
> @@ -117,9 +124,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
>         unsigned long xa_idx;
>         unsigned int rxq_idx;
>
> -       if (binding->list.next)
> -               list_del(&binding->list);
> -

Unfortunately if you're going to delete this, then you need to do
list_del in _all_ the callers of net_devmem_unbind_dmabuf, and I think
there is a callsite in netdev_nl_bind_rx_doit that is missed?

But also, it may rough to continually have to remember to always do
list_del when we do unbind. AFAIR Jakub asked for uniformity in the
bind/unbind functions. Can we instead do the list_add inside of
net_devmem_bind_dmabuf? So net_devmem_bind_dmabuf can take the struct
list_head as an arg and do the list add, then the unbind can do the
list_del, so it is uniform, but we don't have to remember to do
list_add/del everytime we call bind/unbind.

Also, I suspect that clean up can be a separate patch.

>         xa_for_each(&binding->bound_rxqs, xa_idx, rxq) {
>                 const struct pp_memory_provider_params mp_params = {
>                         .mp_priv        = binding,
> @@ -200,6 +204,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
>
>         refcount_set(&binding->ref, 1);
>
> +       mutex_init(&binding->lock);
> +
>         binding->dmabuf = dmabuf;
>
>         binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
> @@ -379,6 +385,8 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
>         xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
>                 if (bound_rxq == rxq) {
>                         xa_erase(&binding->bound_rxqs, xa_idx);
> +                       if (xa_empty(&binding->bound_rxqs))
> +                               net_devmem_unset_dev(binding);
>                         break;
>                 }
>         }
> diff --git a/net/core/devmem.h b/net/core/devmem.h
> index 7fc158d52729..b69adca6cd44 100644
> --- a/net/core/devmem.h
> +++ b/net/core/devmem.h
> @@ -20,6 +20,8 @@ struct net_devmem_dmabuf_binding {
>         struct sg_table *sgt;
>         struct net_device *dev;
>         struct gen_pool *chunk_pool;
> +       /* Protect all members */

nit: i would say here "Protect *dev". Protect all members implies this
lock should be acquired before accessing any members, which is not
true as of this patch, right? binding->lock needs to be acquired only
for accessing binding->dev (other members are either thread safe, like
the genpool, or have other concurrency guarantees, like netdev_lock).

> +       struct mutex lock;
>
>         /* The user holds a ref (via the netlink API) for as long as they want
>          * the binding to remain alive. Each page pool using this binding holds
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index dae9f0d432fb..bd5d58604ec0 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -979,14 +979,27 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
>  {
>         struct net_devmem_dmabuf_binding *binding;
>         struct net_devmem_dmabuf_binding *temp;
> +       netdevice_tracker dev_tracker;
>         struct net_device *dev;
>
>         mutex_lock(&priv->lock);
>         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> +               list_del(&binding->list);
> +
> +               mutex_lock(&binding->lock);
>                 dev = binding->dev;
> +               if (!dev) {
> +                       mutex_unlock(&binding->lock);
> +                       net_devmem_unbind_dmabuf(binding);
> +                       continue;
> +               }
> +               netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> +               mutex_unlock(&binding->lock);
> +


Consider writing the above lines as something like:

mutex_lock(&binding->lock);
if (binding->dev) {
    netdev_hold(binding->dev, &dev_tracker, GPF_KERNEL);
}

net_devmem_unbind_dmabuf(binding);

if (binding->dev) {
   netdev_put(binding->dev, &dev_tracker);
}
mutex_unlock(&binding->lock);

i.e., don't duplicate the net_devmem_unbind_dmabuf(binding); call.


Other than that, I could not find issues. I checked lock ordering. The
lock hierarchy is:

priv->lock
  binding->lock
    netdev_lock(dev)

and AFAICT it is not violated anywhere. I ran my regression tests and
did not see issues. Just holding my reviewed-by because I see the
issue with list_del. I recommend that Stan also takes a look, since he
implemented the locking change.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
  2025-05-09 19:43 ` Mina Almasry
@ 2025-05-09 22:32   ` Jakub Kicinski
  2025-05-09 23:55     ` Mina Almasry
  2025-05-12  1:05     ` Taehee Yoo
  2025-05-12  0:56   ` Taehee Yoo
  1 sibling, 2 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-05-09 22:32 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Taehee Yoo, davem, pabeni, edumazet, horms, sdf, netdev,
	asml.silence, dw, skhawaja, kaiyuanz, jdamato

On Fri, 9 May 2025 12:43:42 -0700 Mina Almasry wrote:
> > @@ -117,9 +124,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
> >         unsigned long xa_idx;
> >         unsigned int rxq_idx;
> >
> > -       if (binding->list.next)
> > -               list_del(&binding->list);
> > -  
> 
> Unfortunately if you're going to delete this, then you need to do
> list_del in _all_ the callers of net_devmem_unbind_dmabuf, and I think
> there is a callsite in netdev_nl_bind_rx_doit that is missed?
> 
> But also, it may rough to continually have to remember to always do
> list_del when we do unbind. AFAIR Jakub asked for uniformity in the
> bind/unbind functions. Can we instead do the list_add inside of
> net_devmem_bind_dmabuf? So net_devmem_bind_dmabuf can take the struct
> list_head as an arg and do the list add, then the unbind can do the
> list_del, so it is uniform, but we don't have to remember to do
> list_add/del everytime we call bind/unbind.
> 
> Also, I suspect that clean up can be a separate patch.

Right. Let's leave it for a cleanup. And you can also inline
net_devmem_unset_dev() in that case. My ask was to separate
devmem logic from socket logic more clearly but the "new lock"
approach doesn't really go in such direction. It's good enough
for the fix tho.

> > +       struct mutex lock;
> >
> >         /* The user holds a ref (via the netlink API) for as long as they want
> >          * the binding to remain alive. Each page pool using this binding holds
> > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > index dae9f0d432fb..bd5d58604ec0 100644
> > --- a/net/core/netdev-genl.c
> > +++ b/net/core/netdev-genl.c
> > @@ -979,14 +979,27 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> >  {
> >         struct net_devmem_dmabuf_binding *binding;
> >         struct net_devmem_dmabuf_binding *temp;
> > +       netdevice_tracker dev_tracker;
> >         struct net_device *dev;
> >
> >         mutex_lock(&priv->lock);
> >         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> > +               list_del(&binding->list);
> > +
> > +               mutex_lock(&binding->lock);
> >                 dev = binding->dev;
> > +               if (!dev) {
> > +                       mutex_unlock(&binding->lock);
> > +                       net_devmem_unbind_dmabuf(binding);
> > +                       continue;
> > +               }
> > +               netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> > +               mutex_unlock(&binding->lock);
> > +  
> 
> Consider writing the above lines as something like:
> 
> mutex_lock(&binding->lock);
> if (binding->dev) {
>     netdev_hold(binding->dev, &dev_tracker, GPF_KERNEL);
> }
> 
> net_devmem_unbind_dmabuf(binding);
> 
> if (binding->dev) {
>    netdev_put(binding->dev, &dev_tracker);
> }
> mutex_unlock(&binding->lock);
> 
> i.e., don't duplicate the net_devmem_unbind_dmabuf(binding); call.

I think it's fine as is.

> Other than that, I could not find issues. I checked lock ordering. The
> lock hierarchy is:
> 
> priv->lock
>   binding->lock
>     netdev_lock(dev)

Did you mean:

  priv->lock
    netdev_lock(dev)
      binding->lock

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
  2025-05-09 16:00 [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload Taehee Yoo
  2025-05-09 19:43 ` Mina Almasry
@ 2025-05-09 22:41 ` Stanislav Fomichev
  2025-05-12  1:22   ` Taehee Yoo
  1 sibling, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2025-05-09 22:41 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, kuba, pabeni, edumazet, horms, almasrymina, sdf, netdev,
	asml.silence, dw, skhawaja, kaiyuanz, jdamato

On 05/09, Taehee Yoo wrote:
> Kernel panic occurs when a devmem TCP socket is closed after NIC module
> is unloaded.
> 
> This is Devmem TCP unregistration scenarios. number is an order.
> (a)netlink socket close    (b)pp destroy    (c)uninstall    result
> 1                          2                3               OK
> 1                          3                2               (d)Impossible
> 2                          1                3               OK
> 3                          1                2               (e)Kernel panic
> 2                          3                1               (d)Impossible
> 3                          2                1               (d)Impossible
> 
> (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
>     closed.
> (b) page_pool_destroy() is called when the interface is down.
> (c) mp_ops->uninstall() is called when an interface is unregistered.
> (d) There is no scenario in mp_ops->uninstall() is called before
>     page_pool_destroy().
>     Because unregister_netdevice_many_notify() closes interfaces first
>     and then calls mp_ops->uninstall().
> (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
>     netdev_lock().
>     But if the interface module has already been removed, net_device
>     pointer is invalid, so it causes kernel panic.
> 
> In summary, there are only 3 possible scenarios.
>  A. sk close -> pp destroy -> uninstall.
>  B. pp destroy -> sk close -> uninstall.
>  C. pp destroy -> uninstall -> sk close.
> 
> Case C is a kernel panic scenario.
> 
> In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> binding->dev to NULL.
> It indicates an bound net_device was unregistered.
> 
> It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> if binding->dev is NULL.
> 
> A new binding->lock is added to protect members of a binding.
> 
> Tests:
> Scenario A:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     kill $pid
>     ip link set $interface down
>     modprobe -rv $module
> 
> Scenario B:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     ip link set $interface down
>     kill $pid
>     modprobe -rv $module
> 
> Scenario C:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     modprobe -rv $module
>     sleep 5
>     kill $pid
> 
> Splat looks like:
> Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> Tainted: [B]=BAD_PAGE, [W]=WARN
> RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> PKRU: 55555554
> Call Trace:
> <TASK>
>  ...
>  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
>  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
>  ...
>  netlink_release (net/netlink/af_netlink.c:737)
>  ...
>  __sock_release (net/socket.c:647)
>  sock_close (net/socket.c:1393)
> 
> Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
> 
> v3:
>  - Add binding->lock for protecting members of a binding.
>  - Add a net_devmem_unset_dev() helper function.
>  - Do not reorder locks.
>  - Fix build failure.
> 
> v2:
>  - Fix commit message.
>  - Correct Fixes tag.
>  - Inverse locking order.
>  - Do not put a reference count of binding in
>    mp_dmabuf_devmem_uninstall().
> 
> In order to test this patch, driver side implementation of devmem TCP[1]
> is needed to be applied.
> 
> [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
> 
>  net/core/devmem.c      | 14 +++++++++++---
>  net/core/devmem.h      |  2 ++
>  net/core/netdev-genl.c | 13 +++++++++++++
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 6e27a47d0493..ffbf50337413 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -33,6 +33,13 @@ bool net_is_devmem_iov(struct net_iov *niov)
>  	return niov->pp->mp_ops == &dmabuf_devmem_ops;
>  }
>  
> +static void net_devmem_unset_dev(struct net_devmem_dmabuf_binding *binding)
> +{
> +	mutex_lock(&binding->lock);
> +	binding->dev = NULL;
> +	mutex_unlock(&binding->lock);
> +}

nit: there is just one place where we call net_devmem_unset_dev, why do
we need an extra function? IMHO it makes it harder to read wrt
locking.. Jakub is also hinting the same in
https://lore.kernel.org/netdev/20250509153252.76f08c14@kernel.org/#t ?

>  static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
>  					       struct gen_pool_chunk *chunk,
>  					       void *not_used)
> @@ -117,9 +124,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
>  	unsigned long xa_idx;
>  	unsigned int rxq_idx;
>  
> -	if (binding->list.next)
> -		list_del(&binding->list);
> -
>  	xa_for_each(&binding->bound_rxqs, xa_idx, rxq) {
>  		const struct pp_memory_provider_params mp_params = {
>  			.mp_priv	= binding,
> @@ -200,6 +204,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
>  
>  	refcount_set(&binding->ref, 1);
>  
> +	mutex_init(&binding->lock);
> +
>  	binding->dmabuf = dmabuf;
>  
>  	binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
> @@ -379,6 +385,8 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
>  	xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
>  		if (bound_rxq == rxq) {
>  			xa_erase(&binding->bound_rxqs, xa_idx);
> +			if (xa_empty(&binding->bound_rxqs))
> +				net_devmem_unset_dev(binding);
>  			break;
>  		}
>  	}
> diff --git a/net/core/devmem.h b/net/core/devmem.h
> index 7fc158d52729..b69adca6cd44 100644
> --- a/net/core/devmem.h
> +++ b/net/core/devmem.h
> @@ -20,6 +20,8 @@ struct net_devmem_dmabuf_binding {
>  	struct sg_table *sgt;
>  	struct net_device *dev;
>  	struct gen_pool *chunk_pool;
> +	/* Protect all members */
> +	struct mutex lock;
>  
>  	/* The user holds a ref (via the netlink API) for as long as they want
>  	 * the binding to remain alive. Each page pool using this binding holds
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index dae9f0d432fb..bd5d58604ec0 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -979,14 +979,27 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
>  {
>  	struct net_devmem_dmabuf_binding *binding;
>  	struct net_devmem_dmabuf_binding *temp;
> +	netdevice_tracker dev_tracker;
>  	struct net_device *dev;
>  
>  	mutex_lock(&priv->lock);
>  	list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> +		list_del(&binding->list);
> +
> +		mutex_lock(&binding->lock);
>  		dev = binding->dev;
> +		if (!dev) {
> +			mutex_unlock(&binding->lock);
> +			net_devmem_unbind_dmabuf(binding);
> +			continue;
> +		}
> +		netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> +		mutex_unlock(&binding->lock);
> +

Same suggestion as in v2: let's have a short comment here on the lock
ordering (netdev outer, binding inner)?

>  		netdev_lock(dev);
>  		net_devmem_unbind_dmabuf(binding);
>  		netdev_unlock(dev);
> +		netdev_put(dev, &dev_tracker);
>  	}
>  	mutex_unlock(&priv->lock);
>  }
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
  2025-05-09 22:32   ` Jakub Kicinski
@ 2025-05-09 23:55     ` Mina Almasry
  2025-05-12  1:05     ` Taehee Yoo
  1 sibling, 0 replies; 8+ messages in thread
From: Mina Almasry @ 2025-05-09 23:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Taehee Yoo, davem, pabeni, edumazet, horms, sdf, netdev,
	asml.silence, dw, skhawaja, kaiyuanz, jdamato

On Fri, May 9, 2025 at 3:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > Other than that, I could not find issues. I checked lock ordering. The
> > lock hierarchy is:
> >
> > priv->lock
> >   binding->lock
> >     netdev_lock(dev)
>
> Did you mean:
>
>   priv->lock
>     netdev_lock(dev)
>       binding->lock

Yep, that's it, sorry. I mixed up what netdev_hold and netdev_lock do
on the first read :(

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
  2025-05-09 19:43 ` Mina Almasry
  2025-05-09 22:32   ` Jakub Kicinski
@ 2025-05-12  0:56   ` Taehee Yoo
  1 sibling, 0 replies; 8+ messages in thread
From: Taehee Yoo @ 2025-05-12  0:56 UTC (permalink / raw)
  To: Mina Almasry
  Cc: davem, kuba, pabeni, edumazet, horms, sdf, netdev, asml.silence,
	dw, skhawaja, kaiyuanz, jdamato

On Sat, May 10, 2025 at 4:43 AM Mina Almasry <almasrymina@google.com> wrote:
>

Hi Mina,
Thanks a lot for the review!

> On Fri, May 9, 2025 at 9:01 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > Kernel panic occurs when a devmem TCP socket is closed after NIC module
> > is unloaded.
> >
> > This is Devmem TCP unregistration scenarios. number is an order.
> > (a)netlink socket close    (b)pp destroy    (c)uninstall    result
> > 1                          2                3               OK
> > 1                          3                2               (d)Impossible
> > 2                          1                3               OK
> > 3                          1                2               (e)Kernel panic
> > 2                          3                1               (d)Impossible
> > 3                          2                1               (d)Impossible
> >
> > (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
> >     closed.
> > (b) page_pool_destroy() is called when the interface is down.
> > (c) mp_ops->uninstall() is called when an interface is unregistered.
> > (d) There is no scenario in mp_ops->uninstall() is called before
> >     page_pool_destroy().
> >     Because unregister_netdevice_many_notify() closes interfaces first
> >     and then calls mp_ops->uninstall().
> > (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
> >     netdev_lock().
> >     But if the interface module has already been removed, net_device
> >     pointer is invalid, so it causes kernel panic.
> >
> > In summary, there are only 3 possible scenarios.
> >  A. sk close -> pp destroy -> uninstall.
> >  B. pp destroy -> sk close -> uninstall.
> >  C. pp destroy -> uninstall -> sk close.
> >
> > Case C is a kernel panic scenario.
> >
> > In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> > binding->dev to NULL.
> > It indicates an bound net_device was unregistered.
> >
> > It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> > if binding->dev is NULL.
> >
> > A new binding->lock is added to protect members of a binding.
> >
> > Tests:
> > Scenario A:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     kill $pid
> >     ip link set $interface down
> >     modprobe -rv $module
> >
> > Scenario B:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     ip link set $interface down
> >     kill $pid
> >     modprobe -rv $module
> >
> > Scenario C:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     modprobe -rv $module
> >     sleep 5
> >     kill $pid
> >
> > Splat looks like:
> > Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> > CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> > Tainted: [B]=BAD_PAGE, [W]=WARN
> > RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> > Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> > RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> > RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> > RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> > R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> > R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> > FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> >  ...
> >  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
> >  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
> >  ...
> >  netlink_release (net/netlink/af_netlink.c:737)
> >  ...
> >  __sock_release (net/socket.c:647)
> >  sock_close (net/socket.c:1393)
> >
> > Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v3:
> >  - Add binding->lock for protecting members of a binding.
> >  - Add a net_devmem_unset_dev() helper function.
> >  - Do not reorder locks.
> >  - Fix build failure.
> >
>
> Thanks for addressing the feedback Taehee. I think this diff looks
> much much nicer.
>
> > v2:
> >  - Fix commit message.
> >  - Correct Fixes tag.
> >  - Inverse locking order.
> >  - Do not put a reference count of binding in
> >    mp_dmabuf_devmem_uninstall().
> >
> > In order to test this patch, driver side implementation of devmem TCP[1]
> > is needed to be applied.
> >
> > [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
> >
> >  net/core/devmem.c      | 14 +++++++++++---
> >  net/core/devmem.h      |  2 ++
> >  net/core/netdev-genl.c | 13 +++++++++++++
> >  3 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 6e27a47d0493..ffbf50337413 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -33,6 +33,13 @@ bool net_is_devmem_iov(struct net_iov *niov)
> >         return niov->pp->mp_ops == &dmabuf_devmem_ops;
> >  }
> >
> > +static void net_devmem_unset_dev(struct net_devmem_dmabuf_binding *binding)
> > +{
> > +       mutex_lock(&binding->lock);
> > +       binding->dev = NULL;
> > +       mutex_unlock(&binding->lock);
> > +}
> > +
> >  static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
> >                                                struct gen_pool_chunk *chunk,
> >                                                void *not_used)
> > @@ -117,9 +124,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
> >         unsigned long xa_idx;
> >         unsigned int rxq_idx;
> >
> > -       if (binding->list.next)
> > -               list_del(&binding->list);
> > -
>
> Unfortunately if you're going to delete this, then you need to do
> list_del in _all_ the callers of net_devmem_unbind_dmabuf, and I think
> there is a callsite in netdev_nl_bind_rx_doit that is missed?

Thanks for catching it, I missed applying it to netdev_nl_bind_rx_doit().
I will fix this in the next patch.

>
> But also, it may rough to continually have to remember to always do
> list_del when we do unbind. AFAIR Jakub asked for uniformity in the
> bind/unbind functions. Can we instead do the list_add inside of
> net_devmem_bind_dmabuf? So net_devmem_bind_dmabuf can take the struct
> list_head as an arg and do the list add, then the unbind can do the
> list_del, so it is uniform, but we don't have to remember to do
> list_add/del everytime we call bind/unbind.
>
> Also, I suspect that clean up can be a separate patch.

Okay, it's good to me.
I think there is no problem with changing it in the next patch.

>
> >         xa_for_each(&binding->bound_rxqs, xa_idx, rxq) {
> >                 const struct pp_memory_provider_params mp_params = {
> >                         .mp_priv        = binding,
> > @@ -200,6 +204,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> >
> >         refcount_set(&binding->ref, 1);
> >
> > +       mutex_init(&binding->lock);
> > +
> >         binding->dmabuf = dmabuf;
> >
> >         binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
> > @@ -379,6 +385,8 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
> >         xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
> >                 if (bound_rxq == rxq) {
> >                         xa_erase(&binding->bound_rxqs, xa_idx);
> > +                       if (xa_empty(&binding->bound_rxqs))
> > +                               net_devmem_unset_dev(binding);
> >                         break;
> >                 }
> >         }
> > diff --git a/net/core/devmem.h b/net/core/devmem.h
> > index 7fc158d52729..b69adca6cd44 100644
> > --- a/net/core/devmem.h
> > +++ b/net/core/devmem.h
> > @@ -20,6 +20,8 @@ struct net_devmem_dmabuf_binding {
> >         struct sg_table *sgt;
> >         struct net_device *dev;
> >         struct gen_pool *chunk_pool;
> > +       /* Protect all members */
>
> nit: i would say here "Protect *dev". Protect all members implies this
> lock should be acquired before accessing any members, which is not
> true as of this patch, right? binding->lock needs to be acquired only
> for accessing binding->dev (other members are either thread safe, like
> the genpool, or have other concurrency guarantees, like netdev_lock).

Okay, I will change it too.

>
> > +       struct mutex lock;
> >
> >         /* The user holds a ref (via the netlink API) for as long as they want
> >          * the binding to remain alive. Each page pool using this binding holds
> > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > index dae9f0d432fb..bd5d58604ec0 100644
> > --- a/net/core/netdev-genl.c
> > +++ b/net/core/netdev-genl.c
> > @@ -979,14 +979,27 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> >  {
> >         struct net_devmem_dmabuf_binding *binding;
> >         struct net_devmem_dmabuf_binding *temp;
> > +       netdevice_tracker dev_tracker;
> >         struct net_device *dev;
> >
> >         mutex_lock(&priv->lock);
> >         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> > +               list_del(&binding->list);
> > +
> > +               mutex_lock(&binding->lock);
> >                 dev = binding->dev;
> > +               if (!dev) {
> > +                       mutex_unlock(&binding->lock);
> > +                       net_devmem_unbind_dmabuf(binding);
> > +                       continue;
> > +               }
> > +               netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> > +               mutex_unlock(&binding->lock);
> > +
>
>
> Consider writing the above lines as something like:
>
> mutex_lock(&binding->lock);
> if (binding->dev) {
>     netdev_hold(binding->dev, &dev_tracker, GPF_KERNEL);
> }
>
> net_devmem_unbind_dmabuf(binding);
>
> if (binding->dev) {
>    netdev_put(binding->dev, &dev_tracker);
> }
> mutex_unlock(&binding->lock);
>
> i.e., don't duplicate the net_devmem_unbind_dmabuf(binding); call.
>
>
> Other than that, I could not find issues. I checked lock ordering. The
> lock hierarchy is:
>
> priv->lock
>   binding->lock
>     netdev_lock(dev)
>
> and AFAICT it is not violated anywhere. I ran my regression tests and
> did not see issues. Just holding my reviewed-by because I see the
> issue with list_del. I recommend that Stan also takes a look, since he
> implemented the locking change.
>
> --
> Thanks,
> Mina

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
  2025-05-09 22:32   ` Jakub Kicinski
  2025-05-09 23:55     ` Mina Almasry
@ 2025-05-12  1:05     ` Taehee Yoo
  1 sibling, 0 replies; 8+ messages in thread
From: Taehee Yoo @ 2025-05-12  1:05 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mina Almasry, davem, pabeni, edumazet, horms, sdf, netdev,
	asml.silence, dw, skhawaja, kaiyuanz, jdamato

On Sat, May 10, 2025 at 7:32 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

Hi Jakub,
Thanks a lot for the review!

> On Fri, 9 May 2025 12:43:42 -0700 Mina Almasry wrote:
> > > @@ -117,9 +124,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
> > >         unsigned long xa_idx;
> > >         unsigned int rxq_idx;
> > >
> > > -       if (binding->list.next)
> > > -               list_del(&binding->list);
> > > -
> >
> > Unfortunately if you're going to delete this, then you need to do
> > list_del in _all_ the callers of net_devmem_unbind_dmabuf, and I think
> > there is a callsite in netdev_nl_bind_rx_doit that is missed?
> >
> > But also, it may rough to continually have to remember to always do
> > list_del when we do unbind. AFAIR Jakub asked for uniformity in the
> > bind/unbind functions. Can we instead do the list_add inside of
> > net_devmem_bind_dmabuf? So net_devmem_bind_dmabuf can take the struct
> > list_head as an arg and do the list add, then the unbind can do the
> > list_del, so it is uniform, but we don't have to remember to do
> > list_add/del everytime we call bind/unbind.
> >
> > Also, I suspect that clean up can be a separate patch.
>
> Right. Let's leave it for a cleanup. And you can also inline
> net_devmem_unset_dev() in that case. My ask was to separate
> devmem logic from socket logic more clearly but the "new lock"
> approach doesn't really go in such direction. It's good enough
> for the fix tho.

Thanks!
I will make net_devmem_unset_dev() the inline function.

>
> > > +       struct mutex lock;
> > >
> > >         /* The user holds a ref (via the netlink API) for as long as they want
> > >          * the binding to remain alive. Each page pool using this binding holds
> > > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > > index dae9f0d432fb..bd5d58604ec0 100644
> > > --- a/net/core/netdev-genl.c
> > > +++ b/net/core/netdev-genl.c
> > > @@ -979,14 +979,27 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> > >  {
> > >         struct net_devmem_dmabuf_binding *binding;
> > >         struct net_devmem_dmabuf_binding *temp;
> > > +       netdevice_tracker dev_tracker;
> > >         struct net_device *dev;
> > >
> > >         mutex_lock(&priv->lock);
> > >         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> > > +               list_del(&binding->list);
> > > +
> > > +               mutex_lock(&binding->lock);
> > >                 dev = binding->dev;
> > > +               if (!dev) {
> > > +                       mutex_unlock(&binding->lock);
> > > +                       net_devmem_unbind_dmabuf(binding);
> > > +                       continue;
> > > +               }
> > > +               netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> > > +               mutex_unlock(&binding->lock);
> > > +
> >
> > Consider writing the above lines as something like:
> >
> > mutex_lock(&binding->lock);
> > if (binding->dev) {
> >     netdev_hold(binding->dev, &dev_tracker, GPF_KERNEL);
> > }
> >
> > net_devmem_unbind_dmabuf(binding);
> >
> > if (binding->dev) {
> >    netdev_put(binding->dev, &dev_tracker);
> > }
> > mutex_unlock(&binding->lock);
> >
> > i.e., don't duplicate the net_devmem_unbind_dmabuf(binding); call.
>
> I think it's fine as is.
>
> > Other than that, I could not find issues. I checked lock ordering. The
> > lock hierarchy is:
> >
> > priv->lock
> >   binding->lock
> >     netdev_lock(dev)
>
> Did you mean:
>
>   priv->lock
>     netdev_lock(dev)
>       binding->lock

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload
  2025-05-09 22:41 ` Stanislav Fomichev
@ 2025-05-12  1:22   ` Taehee Yoo
  0 siblings, 0 replies; 8+ messages in thread
From: Taehee Yoo @ 2025-05-12  1:22 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, kuba, pabeni, edumazet, horms, almasrymina, sdf, netdev,
	asml.silence, dw, skhawaja, kaiyuanz, jdamato

On Sat, May 10, 2025 at 7:41 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>

Hi Stanislav,
Thanks a lot for the review!

> On 05/09, Taehee Yoo wrote:
> > Kernel panic occurs when a devmem TCP socket is closed after NIC module
> > is unloaded.
> >
> > This is Devmem TCP unregistration scenarios. number is an order.
> > (a)netlink socket close    (b)pp destroy    (c)uninstall    result
> > 1                          2                3               OK
> > 1                          3                2               (d)Impossible
> > 2                          1                3               OK
> > 3                          1                2               (e)Kernel panic
> > 2                          3                1               (d)Impossible
> > 3                          2                1               (d)Impossible
> >
> > (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
> >     closed.
> > (b) page_pool_destroy() is called when the interface is down.
> > (c) mp_ops->uninstall() is called when an interface is unregistered.
> > (d) There is no scenario in mp_ops->uninstall() is called before
> >     page_pool_destroy().
> >     Because unregister_netdevice_many_notify() closes interfaces first
> >     and then calls mp_ops->uninstall().
> > (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
> >     netdev_lock().
> >     But if the interface module has already been removed, net_device
> >     pointer is invalid, so it causes kernel panic.
> >
> > In summary, there are only 3 possible scenarios.
> >  A. sk close -> pp destroy -> uninstall.
> >  B. pp destroy -> sk close -> uninstall.
> >  C. pp destroy -> uninstall -> sk close.
> >
> > Case C is a kernel panic scenario.
> >
> > In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> > binding->dev to NULL.
> > It indicates an bound net_device was unregistered.
> >
> > It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> > if binding->dev is NULL.
> >
> > A new binding->lock is added to protect members of a binding.
> >
> > Tests:
> > Scenario A:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     kill $pid
> >     ip link set $interface down
> >     modprobe -rv $module
> >
> > Scenario B:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     ip link set $interface down
> >     kill $pid
> >     modprobe -rv $module
> >
> > Scenario C:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     modprobe -rv $module
> >     sleep 5
> >     kill $pid
> >
> > Splat looks like:
> > Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> > CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> > Tainted: [B]=BAD_PAGE, [W]=WARN
> > RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> > Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> > RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> > RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> > RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> > R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> > R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> > FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> >  ...
> >  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
> >  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
> >  ...
> >  netlink_release (net/netlink/af_netlink.c:737)
> >  ...
> >  __sock_release (net/socket.c:647)
> >  sock_close (net/socket.c:1393)
> >
> > Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v3:
> >  - Add binding->lock for protecting members of a binding.
> >  - Add a net_devmem_unset_dev() helper function.
> >  - Do not reorder locks.
> >  - Fix build failure.
> >
> > v2:
> >  - Fix commit message.
> >  - Correct Fixes tag.
> >  - Inverse locking order.
> >  - Do not put a reference count of binding in
> >    mp_dmabuf_devmem_uninstall().
> >
> > In order to test this patch, driver side implementation of devmem TCP[1]
> > is needed to be applied.
> >
> > [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
> >
> >  net/core/devmem.c      | 14 +++++++++++---
> >  net/core/devmem.h      |  2 ++
> >  net/core/netdev-genl.c | 13 +++++++++++++
> >  3 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 6e27a47d0493..ffbf50337413 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -33,6 +33,13 @@ bool net_is_devmem_iov(struct net_iov *niov)
> >       return niov->pp->mp_ops == &dmabuf_devmem_ops;
> >  }
> >
> > +static void net_devmem_unset_dev(struct net_devmem_dmabuf_binding *binding)
> > +{
> > +     mutex_lock(&binding->lock);
> > +     binding->dev = NULL;
> > +     mutex_unlock(&binding->lock);
> > +}
>
> nit: there is just one place where we call net_devmem_unset_dev, why do
> we need an extra function? IMHO it makes it harder to read wrt
> locking.. Jakub is also hinting the same in
> https://lore.kernel.org/netdev/20250509153252.76f08c14@kernel.org/#t ?

Okay, I think it needs to be removed, not made inline.

>
> >  static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
> >                                              struct gen_pool_chunk *chunk,
> >                                              void *not_used)
> > @@ -117,9 +124,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
> >       unsigned long xa_idx;
> >       unsigned int rxq_idx;
> >
> > -     if (binding->list.next)
> > -             list_del(&binding->list);
> > -
> >       xa_for_each(&binding->bound_rxqs, xa_idx, rxq) {
> >               const struct pp_memory_provider_params mp_params = {
> >                       .mp_priv        = binding,
> > @@ -200,6 +204,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> >
> >       refcount_set(&binding->ref, 1);
> >
> > +     mutex_init(&binding->lock);
> > +
> >       binding->dmabuf = dmabuf;
> >
> >       binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
> > @@ -379,6 +385,8 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
> >       xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
> >               if (bound_rxq == rxq) {
> >                       xa_erase(&binding->bound_rxqs, xa_idx);
> > +                     if (xa_empty(&binding->bound_rxqs))
> > +                             net_devmem_unset_dev(binding);
> >                       break;
> >               }
> >       }
> > diff --git a/net/core/devmem.h b/net/core/devmem.h
> > index 7fc158d52729..b69adca6cd44 100644
> > --- a/net/core/devmem.h
> > +++ b/net/core/devmem.h
> > @@ -20,6 +20,8 @@ struct net_devmem_dmabuf_binding {
> >       struct sg_table *sgt;
> >       struct net_device *dev;
> >       struct gen_pool *chunk_pool;
> > +     /* Protect all members */
> > +     struct mutex lock;
> >
> >       /* The user holds a ref (via the netlink API) for as long as they want
> >        * the binding to remain alive. Each page pool using this binding holds
> > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > index dae9f0d432fb..bd5d58604ec0 100644
> > --- a/net/core/netdev-genl.c
> > +++ b/net/core/netdev-genl.c
> > @@ -979,14 +979,27 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> >  {
> >       struct net_devmem_dmabuf_binding *binding;
> >       struct net_devmem_dmabuf_binding *temp;
> > +     netdevice_tracker dev_tracker;
> >       struct net_device *dev;
> >
> >       mutex_lock(&priv->lock);
> >       list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> > +             list_del(&binding->list);
> > +
> > +             mutex_lock(&binding->lock);
> >               dev = binding->dev;
> > +             if (!dev) {
> > +                     mutex_unlock(&binding->lock);
> > +                     net_devmem_unbind_dmabuf(binding);
> > +                     continue;
> > +             }
> > +             netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> > +             mutex_unlock(&binding->lock);
> > +
>
> Same suggestion as in v2: let's have a short comment here on the lock
> ordering (netdev outer, binding inner)?

Okay, I will add a comment about it.

>
> >               netdev_lock(dev);
> >               net_devmem_unbind_dmabuf(binding);
> >               netdev_unlock(dev);
> > +             netdev_put(dev, &dev_tracker);
> >       }
> >       mutex_unlock(&priv->lock);
> >  }
> > --
> > 2.34.1
> >

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-12  1:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09 16:00 [PATCH net v3] net: devmem: fix kernel panic when netlink socket close after module unload Taehee Yoo
2025-05-09 19:43 ` Mina Almasry
2025-05-09 22:32   ` Jakub Kicinski
2025-05-09 23:55     ` Mina Almasry
2025-05-12  1:05     ` Taehee Yoo
2025-05-12  0:56   ` Taehee Yoo
2025-05-09 22:41 ` Stanislav Fomichev
2025-05-12  1:22   ` Taehee Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).