netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
@ 2025-05-06 14:08 Taehee Yoo
  2025-05-06 17:41 ` Stanislav Fomichev
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Taehee Yoo @ 2025-05-06 14:08 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, horms, almasrymina,
	sdf, netdev
  Cc: asml.silence, dw, skhawaja, willemb, jdamato, ap420073

Kernel panic occurs when a devmem TCP socket is closed after NIC module
is unloaded.

This is Devmem TCP unregistration scenarios. number is an order.
(a)socket close    (b)pp destroy    (c)uninstall    result
1                  2                3               OK
1                  3                2               (d)Impossible
2                  1                3               OK
3                  1                2               (e)Kernel panic
2                  3                1               (d)Impossible
3                  2                1               (d)Impossible

(a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
    closed.
(b) page_pool_destroy() is called when the interface is down.
(c) mp_ops->uninstall() is called when an interface is unregistered.
(d) There is no scenario in mp_ops->uninstall() is called before
    page_pool_destroy().
    Because unregister_netdevice_many_notify() closes interfaces first
    and then calls mp_ops->uninstall().
(e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
    netdev_lock().
    But if the interface module has already been removed, net_device
    pointer is invalid, so it causes kernel panic.

In summary, there are only 3 possible scenarios.
 A. sk close -> pp destroy -> uninstall.
 B. pp destroy -> sk close -> uninstall.
 C. pp destroy -> uninstall -> sk close.

Case C is a kernel panic scenario.

In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
binding->dev to NULL.
It indicates an bound net_device was unregistered.

It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
if binding->dev is NULL.

It inverts socket/netdev lock order like below:
    netdev_lock();
    mutex_lock(&priv->lock);
    mutex_unlock(&priv->lock);
    netdev_unlock();

Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
acquires socket lock from now on.

Tests:
Scenario A:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    kill $pid
    ip link set $interface down
    modprobe -rv $module

Scenario B:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    ip link set $interface down
    kill $pid
    modprobe -rv $module

Scenario C:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    modprobe -rv $module
    sleep 5
    kill $pid

Splat looks like:
Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
Tainted: [B]=BAD_PAGE, [W]=WARN
RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
 ...
 netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
 genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
 ...
 netlink_release (net/netlink/af_netlink.c:737)
 ...
 __sock_release (net/socket.c:647)
 sock_close (net/socket.c:1393)

Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---

v2:
 - Fix commit message.
 - Correct Fixes tag.
 - Inverse locking order.
 - Do not put a reference count of binding in
   mp_dmabuf_devmem_uninstall().

In order to test this patch, driver side implementation of devmem TCP[1]
is needed to be applied.

[1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u

 net/core/devmem.c      |  6 ++++++
 net/core/devmem.h      |  3 +++
 net/core/netdev-genl.c | 27 ++++++++++++++++++---------
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/net/core/devmem.c b/net/core/devmem.c
index 6e27a47d0493..636c1e82b8da 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -167,6 +167,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 
 struct net_devmem_dmabuf_binding *
 net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
+		       struct netdev_nl_sock *priv,
 		       struct netlink_ext_ack *extack)
 {
 	struct net_devmem_dmabuf_binding *binding;
@@ -189,6 +190,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
 	}
 
 	binding->dev = dev;
+	binding->priv = priv;
 
 	err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id,
 			      binding, xa_limit_32b, &id_alloc_next,
@@ -376,12 +378,16 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
 	struct netdev_rx_queue *bound_rxq;
 	unsigned long xa_idx;
 
+	mutex_lock(&binding->priv->lock);
 	xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
 		if (bound_rxq == rxq) {
 			xa_erase(&binding->bound_rxqs, xa_idx);
+			if (xa_empty(&binding->bound_rxqs))
+				binding->dev = NULL;
 			break;
 		}
 	}
+	mutex_unlock(&binding->priv->lock);
 }
 
 static const struct memory_provider_ops dmabuf_devmem_ops = {
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 7fc158d52729..afd6320b2c9b 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -11,6 +11,7 @@
 #define _NET_DEVMEM_H
 
 #include <net/netmem.h>
+#include <net/netdev_netlink.h>
 
 struct netlink_ext_ack;
 
@@ -20,6 +21,7 @@ struct net_devmem_dmabuf_binding {
 	struct sg_table *sgt;
 	struct net_device *dev;
 	struct gen_pool *chunk_pool;
+	struct netdev_nl_sock *priv;
 
 	/* The user holds a ref (via the netlink API) for as long as they want
 	 * the binding to remain alive. Each page pool using this binding holds
@@ -63,6 +65,7 @@ struct dmabuf_genpool_chunk_owner {
 void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
 struct net_devmem_dmabuf_binding *
 net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
+		       struct netdev_nl_sock *priv,
 		       struct netlink_ext_ack *extack);
 void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
 int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 230743bdbb14..b8bb73574276 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -859,13 +859,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 		goto err_genlmsg_free;
 	}
 
-	mutex_lock(&priv->lock);
-
 	err = 0;
 	netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
 	if (!netdev) {
 		err = -ENODEV;
-		goto err_unlock_sock;
+		goto err_genlmsg_free;
 	}
 	if (!netif_device_present(netdev))
 		err = -ENODEV;
@@ -877,10 +875,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 		goto err_unlock;
 	}
 
-	binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack);
+	mutex_lock(&priv->lock);
+	binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
 	if (IS_ERR(binding)) {
 		err = PTR_ERR(binding);
-		goto err_unlock;
+		goto err_unlock_sock;
 	}
 
 	nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
@@ -921,18 +920,17 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 	if (err)
 		goto err_unbind;
 
-	netdev_unlock(netdev);
-
 	mutex_unlock(&priv->lock);
+	netdev_unlock(netdev);
 
 	return 0;
 
 err_unbind:
 	net_devmem_unbind_dmabuf(binding);
-err_unlock:
-	netdev_unlock(netdev);
 err_unlock_sock:
 	mutex_unlock(&priv->lock);
+err_unlock:
+	netdev_unlock(netdev);
 err_genlmsg_free:
 	nlmsg_free(rsp);
 	return err;
@@ -948,14 +946,25 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
 {
 	struct net_devmem_dmabuf_binding *binding;
 	struct net_devmem_dmabuf_binding *temp;
+	netdevice_tracker dev_tracker;
 	struct net_device *dev;
 
 	mutex_lock(&priv->lock);
 	list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
 		dev = binding->dev;
+		if (!dev) {
+			net_devmem_unbind_dmabuf(binding);
+			continue;
+		}
+		netdev_hold(dev, &dev_tracker, GFP_KERNEL);
+		mutex_unlock(&priv->lock);
 		netdev_lock(dev);
+		mutex_lock(&priv->lock);
 		net_devmem_unbind_dmabuf(binding);
+		mutex_unlock(&priv->lock);
 		netdev_unlock(dev);
+		netdev_put(dev, &dev_tracker);
+		mutex_lock(&priv->lock);
 	}
 	mutex_unlock(&priv->lock);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 14:08 [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
@ 2025-05-06 17:41 ` Stanislav Fomichev
  2025-05-07  2:42   ` Jakub Kicinski
  2025-05-07  4:22   ` Taehee Yoo
  2025-05-06 22:07 ` Jakub Kicinski
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Stanislav Fomichev @ 2025-05-06 17:41 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, almasrymina,
	sdf, netdev, asml.silence, dw, skhawaja, willemb, jdamato

On 05/06, Taehee Yoo wrote:
> Kernel panic occurs when a devmem TCP socket is closed after NIC module
> is unloaded.
> 
> This is Devmem TCP unregistration scenarios. number is an order.
> (a)socket close    (b)pp destroy    (c)uninstall    result
> 1                  2                3               OK
> 1                  3                2               (d)Impossible
> 2                  1                3               OK
> 3                  1                2               (e)Kernel panic
> 2                  3                1               (d)Impossible
> 3                  2                1               (d)Impossible
> 
> (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
>     closed.
> (b) page_pool_destroy() is called when the interface is down.
> (c) mp_ops->uninstall() is called when an interface is unregistered.
> (d) There is no scenario in mp_ops->uninstall() is called before
>     page_pool_destroy().
>     Because unregister_netdevice_many_notify() closes interfaces first
>     and then calls mp_ops->uninstall().
> (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
>     netdev_lock().
>     But if the interface module has already been removed, net_device
>     pointer is invalid, so it causes kernel panic.
> 
> In summary, there are only 3 possible scenarios.
>  A. sk close -> pp destroy -> uninstall.
>  B. pp destroy -> sk close -> uninstall.
>  C. pp destroy -> uninstall -> sk close.
> 
> Case C is a kernel panic scenario.
> 
> In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> binding->dev to NULL.
> It indicates an bound net_device was unregistered.
> 
> It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> if binding->dev is NULL.
> 
> It inverts socket/netdev lock order like below:
>     netdev_lock();
>     mutex_lock(&priv->lock);
>     mutex_unlock(&priv->lock);
>     netdev_unlock();
> 
> Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
> acquires socket lock from now on.
> 
> Tests:
> Scenario A:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     kill $pid
>     ip link set $interface down
>     modprobe -rv $module
> 
> Scenario B:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     ip link set $interface down
>     kill $pid
>     modprobe -rv $module
> 
> Scenario C:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     modprobe -rv $module
>     sleep 5
>     kill $pid
> 
> Splat looks like:
> Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> Tainted: [B]=BAD_PAGE, [W]=WARN
> RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> PKRU: 55555554
> Call Trace:
> <TASK>
>  ...
>  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
>  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
>  ...
>  netlink_release (net/netlink/af_netlink.c:737)
>  ...
>  __sock_release (net/socket.c:647)
>  sock_close (net/socket.c:1393)
> 
> Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
> 
> v2:
>  - Fix commit message.
>  - Correct Fixes tag.
>  - Inverse locking order.
>  - Do not put a reference count of binding in
>    mp_dmabuf_devmem_uninstall().
> 
> In order to test this patch, driver side implementation of devmem TCP[1]
> is needed to be applied.
> 
> [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
> 
>  net/core/devmem.c      |  6 ++++++
>  net/core/devmem.h      |  3 +++
>  net/core/netdev-genl.c | 27 ++++++++++++++++++---------
>  3 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 6e27a47d0493..636c1e82b8da 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -167,6 +167,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
>  
>  struct net_devmem_dmabuf_binding *
>  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> +		       struct netdev_nl_sock *priv,
>  		       struct netlink_ext_ack *extack)
>  {
>  	struct net_devmem_dmabuf_binding *binding;
> @@ -189,6 +190,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
>  	}
>  
>  	binding->dev = dev;
> +	binding->priv = priv;
>  
>  	err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id,
>  			      binding, xa_limit_32b, &id_alloc_next,
> @@ -376,12 +378,16 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
>  	struct netdev_rx_queue *bound_rxq;
>  	unsigned long xa_idx;
>  
> +	mutex_lock(&binding->priv->lock);
>  	xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
>  		if (bound_rxq == rxq) {
>  			xa_erase(&binding->bound_rxqs, xa_idx);
> +			if (xa_empty(&binding->bound_rxqs))
> +				binding->dev = NULL;
>  			break;
>  		}
>  	}
> +	mutex_unlock(&binding->priv->lock);
>  }
>  
>  static const struct memory_provider_ops dmabuf_devmem_ops = {
> diff --git a/net/core/devmem.h b/net/core/devmem.h
> index 7fc158d52729..afd6320b2c9b 100644
> --- a/net/core/devmem.h
> +++ b/net/core/devmem.h
> @@ -11,6 +11,7 @@
>  #define _NET_DEVMEM_H
>  
>  #include <net/netmem.h>
> +#include <net/netdev_netlink.h>
>  
>  struct netlink_ext_ack;
>  
> @@ -20,6 +21,7 @@ struct net_devmem_dmabuf_binding {
>  	struct sg_table *sgt;
>  	struct net_device *dev;
>  	struct gen_pool *chunk_pool;
> +	struct netdev_nl_sock *priv;
>  
>  	/* The user holds a ref (via the netlink API) for as long as they want
>  	 * the binding to remain alive. Each page pool using this binding holds
> @@ -63,6 +65,7 @@ struct dmabuf_genpool_chunk_owner {
>  void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
>  struct net_devmem_dmabuf_binding *
>  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> +		       struct netdev_nl_sock *priv,
>  		       struct netlink_ext_ack *extack);
>  void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
>  int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index 230743bdbb14..b8bb73574276 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -859,13 +859,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
>  		goto err_genlmsg_free;
>  	}
>  
> -	mutex_lock(&priv->lock);
> -
>  	err = 0;
>  	netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
>  	if (!netdev) {
>  		err = -ENODEV;
> -		goto err_unlock_sock;
> +		goto err_genlmsg_free;
>  	}
>  	if (!netif_device_present(netdev))
>  		err = -ENODEV;
> @@ -877,10 +875,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
>  		goto err_unlock;
>  	}
>  
> -	binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack);
> +	mutex_lock(&priv->lock);
> +	binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
>  	if (IS_ERR(binding)) {
>  		err = PTR_ERR(binding);
> -		goto err_unlock;
> +		goto err_unlock_sock;
>  	}
>  
>  	nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
> @@ -921,18 +920,17 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
>  	if (err)
>  		goto err_unbind;
>  
> -	netdev_unlock(netdev);
> -
>  	mutex_unlock(&priv->lock);
> +	netdev_unlock(netdev);
>  
>  	return 0;
>  
>  err_unbind:
>  	net_devmem_unbind_dmabuf(binding);
> -err_unlock:
> -	netdev_unlock(netdev);
>  err_unlock_sock:
>  	mutex_unlock(&priv->lock);
> +err_unlock:
> +	netdev_unlock(netdev);
>  err_genlmsg_free:
>  	nlmsg_free(rsp);
>  	return err;
> @@ -948,14 +946,25 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
>  {
>  	struct net_devmem_dmabuf_binding *binding;
>  	struct net_devmem_dmabuf_binding *temp;
> +	netdevice_tracker dev_tracker;
>  	struct net_device *dev;
>  
>  	mutex_lock(&priv->lock);
>  	list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
>  		dev = binding->dev;
> +		if (!dev) {
> +			net_devmem_unbind_dmabuf(binding);
> +			continue;
> +		}
> +		netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> +		mutex_unlock(&priv->lock);
>  		netdev_lock(dev);
> +		mutex_lock(&priv->lock);
>  		net_devmem_unbind_dmabuf(binding);
> +		mutex_unlock(&priv->lock);
>  		netdev_unlock(dev);
> +		netdev_put(dev, &dev_tracker);
> +		mutex_lock(&priv->lock);

nit: this feels like it deserves a comment on the lock ordering (and,
hence, why this dance is needed). The rest looks good!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 14:08 [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
  2025-05-06 17:41 ` Stanislav Fomichev
@ 2025-05-06 22:07 ` Jakub Kicinski
  2025-05-07  4:24   ` Taehee Yoo
  2025-05-07  2:55 ` Jakub Kicinski
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2025-05-06 22:07 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, almasrymina, sdf,
	netdev, asml.silence, dw, skhawaja, willemb, jdamato

On Tue,  6 May 2025 14:08:58 +0000 Taehee Yoo wrote:
> Kernel panic occurs when a devmem TCP socket is closed after NIC module
> is unloaded.
> 
> This is Devmem TCP unregistration scenarios. number is an order.
> (a)socket close    (b)pp destroy    (c)uninstall    result
> 1                  2                3               OK
> 1                  3                2               (d)Impossible
> 2                  1                3               OK
> 3                  1                2               (e)Kernel panic
> 2                  3                1               (d)Impossible
> 3                  2                1               (d)Impossible
> 
> (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
>     closed.
> (b) page_pool_destroy() is called when the interface is down.
> (c) mp_ops->uninstall() is called when an interface is unregistered.
> (d) There is no scenario in mp_ops->uninstall() is called before
>     page_pool_destroy().
>     Because unregister_netdevice_many_notify() closes interfaces first
>     and then calls mp_ops->uninstall().
> (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
>     netdev_lock().
>     But if the interface module has already been removed, net_device
>     pointer is invalid, so it causes kernel panic.
> 
> In summary, there are only 3 possible scenarios.
>  A. sk close -> pp destroy -> uninstall.
>  B. pp destroy -> sk close -> uninstall.
>  C. pp destroy -> uninstall -> sk close.
> 
> Case C is a kernel panic scenario.
> 
> In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> binding->dev to NULL.
> It indicates an bound net_device was unregistered.
> 
> It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> if binding->dev is NULL.
> 
> It inverts socket/netdev lock order like below:
>     netdev_lock();
>     mutex_lock(&priv->lock);
>     mutex_unlock(&priv->lock);
>     netdev_unlock();
> 
> Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
> acquires socket lock from now on.
> 
> Tests:
> Scenario A:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     kill $pid
>     ip link set $interface down
>     modprobe -rv $module
> 
> Scenario B:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     ip link set $interface down
>     kill $pid
>     modprobe -rv $module
> 
> Scenario C:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     modprobe -rv $module
>     sleep 5
>     kill $pid
> 
> Splat looks like:
> Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> Tainted: [B]=BAD_PAGE, [W]=WARN
> RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> PKRU: 55555554
> Call Trace:
> <TASK>
>  ...
>  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
>  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
>  ...
>  netlink_release (net/netlink/af_netlink.c:737)
>  ...
>  __sock_release (net/socket.c:647)
>  sock_close (net/socket.c:1393)

I'll look at the code later today, but it will need a respin for sure:

net/core/netdev-genl.c: In function ‘netdev_nl_bind_rx_doit’:
net/core/netdev-genl.c:878:61: error: passing argument 3 of ‘net_devmem_bind_dmabuf’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  878 |         binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
      |                                                             ^~~~
      |                                                             |
      |                                                             struct netdev_nl_sock *
In file included from ../net/core/netdev-genl.c:16:
net/core/devmem.h:133:48: note: expected ‘struct netlink_ext_ack *’ but argument is of type ‘struct netdev_nl_sock *’
  133 |                        struct netlink_ext_ack *extack)
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
net/core/netdev-genl.c:878:19: error: too many arguments to function ‘net_devmem_bind_dmabuf’
  878 |         binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
      |                   ^~~~~~~~~~~~~~~~~~~~~~
In file included from ../net/core/netdev-genl.c:16:
net/core/devmem.h:132:1: note: declared here
  132 | net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
      | ^~~~~~~~~~~~~~~~~~~~~~

This is on the kunit build so guessing some compat was missed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 17:41 ` Stanislav Fomichev
@ 2025-05-07  2:42   ` Jakub Kicinski
  2025-05-07  4:22   ` Taehee Yoo
  1 sibling, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2025-05-07  2:42 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Taehee Yoo, davem, pabeni, edumazet, andrew+netdev, horms,
	almasrymina, sdf, netdev, asml.silence, dw, skhawaja, willemb,
	jdamato

On Tue, 6 May 2025 10:41:52 -0700 Stanislav Fomichev wrote:
> > +		mutex_unlock(&priv->lock);
> >  		netdev_unlock(dev);
> > +		netdev_put(dev, &dev_tracker);
> > +		mutex_lock(&priv->lock);  
> 
> nit: this feels like it deserves a comment on the lock ordering (and,
> hence, why this dance is needed). The rest looks good!

Agreed, and maybe document the ordering and what is protected 
in struct netdev_nl_sock ?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 14:08 [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
  2025-05-06 17:41 ` Stanislav Fomichev
  2025-05-06 22:07 ` Jakub Kicinski
@ 2025-05-07  2:55 ` Jakub Kicinski
  2025-05-07  4:55   ` Taehee Yoo
  2025-05-07 13:08 ` kernel test robot
  2025-05-07 18:28 ` Mina Almasry
  4 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2025-05-07  2:55 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, almasrymina, sdf,
	netdev, asml.silence, dw, skhawaja, willemb, jdamato

On Tue,  6 May 2025 14:08:58 +0000 Taehee Yoo wrote:
> +	mutex_lock(&binding->priv->lock);
>  	xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
>  		if (bound_rxq == rxq) {
>  			xa_erase(&binding->bound_rxqs, xa_idx);
> +			if (xa_empty(&binding->bound_rxqs))
> +				binding->dev = NULL;
>  			break;
>  		}
>  	}
> +	mutex_unlock(&binding->priv->lock);

Why do we need to lock the socket around the while loop?
binding->bound_rxqs have its own lock, and add/del are also
protected by the netdev instance lock. The only thing we
must lock is the write to binding->dev I think ?

Would it be cleaner to move that write and locking to a helper
which would live in netdev-genl.c?

Similarly could we move:

	if (binding->list.next)
		list_del(&binding->list);

from net_devmem_unbind_dmabuf() to its callers?
The asymmetry of list_add() being directly in netdev_nl_bind_rx_doit()
not net_devmem_bind_dmabuf(), and list_del() being in
net_devmem_unbind_dmabuf() always confuses me.

>+	mutex_lock(&priv->lock);
>+	binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);

We shouldn't have to lock the net_devmem_bind_dmabuf(), we have the
instance lock so the device can't go away, and we haven't listed
the binding on the socket, yet. Locking around list_add() should
be enough?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 17:41 ` Stanislav Fomichev
  2025-05-07  2:42   ` Jakub Kicinski
@ 2025-05-07  4:22   ` Taehee Yoo
  1 sibling, 0 replies; 14+ messages in thread
From: Taehee Yoo @ 2025-05-07  4:22 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, almasrymina,
	sdf, netdev, asml.silence, dw, skhawaja, willemb, jdamato

On Wed, May 7, 2025 at 2:41 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>

Hi Stanislav,
Thanks a lot for your review!

> On 05/06, Taehee Yoo wrote:
> > Kernel panic occurs when a devmem TCP socket is closed after NIC module
> > is unloaded.
> >
> > This is Devmem TCP unregistration scenarios. number is an order.
> > (a)socket close    (b)pp destroy    (c)uninstall    result
> > 1                  2                3               OK
> > 1                  3                2               (d)Impossible
> > 2                  1                3               OK
> > 3                  1                2               (e)Kernel panic
> > 2                  3                1               (d)Impossible
> > 3                  2                1               (d)Impossible
> >
> > (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
> >     closed.
> > (b) page_pool_destroy() is called when the interface is down.
> > (c) mp_ops->uninstall() is called when an interface is unregistered.
> > (d) There is no scenario in mp_ops->uninstall() is called before
> >     page_pool_destroy().
> >     Because unregister_netdevice_many_notify() closes interfaces first
> >     and then calls mp_ops->uninstall().
> > (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
> >     netdev_lock().
> >     But if the interface module has already been removed, net_device
> >     pointer is invalid, so it causes kernel panic.
> >
> > In summary, there are only 3 possible scenarios.
> >  A. sk close -> pp destroy -> uninstall.
> >  B. pp destroy -> sk close -> uninstall.
> >  C. pp destroy -> uninstall -> sk close.
> >
> > Case C is a kernel panic scenario.
> >
> > In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> > binding->dev to NULL.
> > It indicates an bound net_device was unregistered.
> >
> > It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> > if binding->dev is NULL.
> >
> > It inverts socket/netdev lock order like below:
> >     netdev_lock();
> >     mutex_lock(&priv->lock);
> >     mutex_unlock(&priv->lock);
> >     netdev_unlock();
> >
> > Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
> > acquires socket lock from now on.
> >
> > Tests:
> > Scenario A:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     kill $pid
> >     ip link set $interface down
> >     modprobe -rv $module
> >
> > Scenario B:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     ip link set $interface down
> >     kill $pid
> >     modprobe -rv $module
> >
> > Scenario C:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     modprobe -rv $module
> >     sleep 5
> >     kill $pid
> >
> > Splat looks like:
> > Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> > CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> > Tainted: [B]=BAD_PAGE, [W]=WARN
> > RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> > Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> > RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> > RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> > RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> > R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> > R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> > FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> >  ...
> >  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
> >  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
> >  ...
> >  netlink_release (net/netlink/af_netlink.c:737)
> >  ...
> >  __sock_release (net/socket.c:647)
> >  sock_close (net/socket.c:1393)
> >
> > Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v2:
> >  - Fix commit message.
> >  - Correct Fixes tag.
> >  - Inverse locking order.
> >  - Do not put a reference count of binding in
> >    mp_dmabuf_devmem_uninstall().
> >
> > In order to test this patch, driver side implementation of devmem TCP[1]
> > is needed to be applied.
> >
> > [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
> >
> >  net/core/devmem.c      |  6 ++++++
> >  net/core/devmem.h      |  3 +++
> >  net/core/netdev-genl.c | 27 ++++++++++++++++++---------
> >  3 files changed, 27 insertions(+), 9 deletions(-)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 6e27a47d0493..636c1e82b8da 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -167,6 +167,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> >
> >  struct net_devmem_dmabuf_binding *
> >  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > +                    struct netdev_nl_sock *priv,
> >                      struct netlink_ext_ack *extack)
> >  {
> >       struct net_devmem_dmabuf_binding *binding;
> > @@ -189,6 +190,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> >       }
> >
> >       binding->dev = dev;
> > +     binding->priv = priv;
> >
> >       err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id,
> >                             binding, xa_limit_32b, &id_alloc_next,
> > @@ -376,12 +378,16 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
> >       struct netdev_rx_queue *bound_rxq;
> >       unsigned long xa_idx;
> >
> > +     mutex_lock(&binding->priv->lock);
> >       xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
> >               if (bound_rxq == rxq) {
> >                       xa_erase(&binding->bound_rxqs, xa_idx);
> > +                     if (xa_empty(&binding->bound_rxqs))
> > +                             binding->dev = NULL;
> >                       break;
> >               }
> >       }
> > +     mutex_unlock(&binding->priv->lock);
> >  }
> >
> >  static const struct memory_provider_ops dmabuf_devmem_ops = {
> > diff --git a/net/core/devmem.h b/net/core/devmem.h
> > index 7fc158d52729..afd6320b2c9b 100644
> > --- a/net/core/devmem.h
> > +++ b/net/core/devmem.h
> > @@ -11,6 +11,7 @@
> >  #define _NET_DEVMEM_H
> >
> >  #include <net/netmem.h>
> > +#include <net/netdev_netlink.h>
> >
> >  struct netlink_ext_ack;
> >
> > @@ -20,6 +21,7 @@ struct net_devmem_dmabuf_binding {
> >       struct sg_table *sgt;
> >       struct net_device *dev;
> >       struct gen_pool *chunk_pool;
> > +     struct netdev_nl_sock *priv;
> >
> >       /* The user holds a ref (via the netlink API) for as long as they want
> >        * the binding to remain alive. Each page pool using this binding holds
> > @@ -63,6 +65,7 @@ struct dmabuf_genpool_chunk_owner {
> >  void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
> >  struct net_devmem_dmabuf_binding *
> >  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > +                    struct netdev_nl_sock *priv,
> >                      struct netlink_ext_ack *extack);
> >  void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
> >  int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > index 230743bdbb14..b8bb73574276 100644
> > --- a/net/core/netdev-genl.c
> > +++ b/net/core/netdev-genl.c
> > @@ -859,13 +859,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >               goto err_genlmsg_free;
> >       }
> >
> > -     mutex_lock(&priv->lock);
> > -
> >       err = 0;
> >       netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
> >       if (!netdev) {
> >               err = -ENODEV;
> > -             goto err_unlock_sock;
> > +             goto err_genlmsg_free;
> >       }
> >       if (!netif_device_present(netdev))
> >               err = -ENODEV;
> > @@ -877,10 +875,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >               goto err_unlock;
> >       }
> >
> > -     binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack);
> > +     mutex_lock(&priv->lock);
> > +     binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
> >       if (IS_ERR(binding)) {
> >               err = PTR_ERR(binding);
> > -             goto err_unlock;
> > +             goto err_unlock_sock;
> >       }
> >
> >       nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
> > @@ -921,18 +920,17 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >       if (err)
> >               goto err_unbind;
> >
> > -     netdev_unlock(netdev);
> > -
> >       mutex_unlock(&priv->lock);
> > +     netdev_unlock(netdev);
> >
> >       return 0;
> >
> >  err_unbind:
> >       net_devmem_unbind_dmabuf(binding);
> > -err_unlock:
> > -     netdev_unlock(netdev);
> >  err_unlock_sock:
> >       mutex_unlock(&priv->lock);
> > +err_unlock:
> > +     netdev_unlock(netdev);
> >  err_genlmsg_free:
> >       nlmsg_free(rsp);
> >       return err;
> > @@ -948,14 +946,25 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> >  {
> >       struct net_devmem_dmabuf_binding *binding;
> >       struct net_devmem_dmabuf_binding *temp;
> > +     netdevice_tracker dev_tracker;
> >       struct net_device *dev;
> >
> >       mutex_lock(&priv->lock);
> >       list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> >               dev = binding->dev;
> > +             if (!dev) {
> > +                     net_devmem_unbind_dmabuf(binding);
> > +                     continue;
> > +             }
> > +             netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> > +             mutex_unlock(&priv->lock);
> >               netdev_lock(dev);
> > +             mutex_lock(&priv->lock);
> >               net_devmem_unbind_dmabuf(binding);
> > +             mutex_unlock(&priv->lock);
> >               netdev_unlock(dev);
> > +             netdev_put(dev, &dev_tracker);
> > +             mutex_lock(&priv->lock);
>
> nit: this feels like it deserves a comment on the lock ordering (and,
> hence, why this dance is needed). The rest looks good!

The device unreg path didn't acquire a socket lock.
But by this patch, it sets binding->dev to NULL and binding->dev is
protected by socket lock.
So socket lock needs to be added to the device unreg path.

The device unreg path acquires netdev_lock first, and socket close path
acquires socket lock first. So, one of them should be changed.
I modified the socket-close path and adjusted the locking order because,
in my view, the socket-close path in the code is more flexible than the
device unregistration path.

I will add a comment about it in the next version.

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 22:07 ` Jakub Kicinski
@ 2025-05-07  4:24   ` Taehee Yoo
  0 siblings, 0 replies; 14+ messages in thread
From: Taehee Yoo @ 2025-05-07  4:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, almasrymina, sdf,
	netdev, asml.silence, dw, skhawaja, willemb, jdamato

On Wed, May 7, 2025 at 7:07 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

Hi Jakub,
Thanks a lot for the review!

> On Tue,  6 May 2025 14:08:58 +0000 Taehee Yoo wrote:
> > Kernel panic occurs when a devmem TCP socket is closed after NIC module
> > is unloaded.
> >
> > This is Devmem TCP unregistration scenarios. number is an order.
> > (a)socket close    (b)pp destroy    (c)uninstall    result
> > 1                  2                3               OK
> > 1                  3                2               (d)Impossible
> > 2                  1                3               OK
> > 3                  1                2               (e)Kernel panic
> > 2                  3                1               (d)Impossible
> > 3                  2                1               (d)Impossible
> >
> > (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
> >     closed.
> > (b) page_pool_destroy() is called when the interface is down.
> > (c) mp_ops->uninstall() is called when an interface is unregistered.
> > (d) There is no scenario in mp_ops->uninstall() is called before
> >     page_pool_destroy().
> >     Because unregister_netdevice_many_notify() closes interfaces first
> >     and then calls mp_ops->uninstall().
> > (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
> >     netdev_lock().
> >     But if the interface module has already been removed, net_device
> >     pointer is invalid, so it causes kernel panic.
> >
> > In summary, there are only 3 possible scenarios.
> >  A. sk close -> pp destroy -> uninstall.
> >  B. pp destroy -> sk close -> uninstall.
> >  C. pp destroy -> uninstall -> sk close.
> >
> > Case C is a kernel panic scenario.
> >
> > In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> > binding->dev to NULL.
> > It indicates an bound net_device was unregistered.
> >
> > It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> > if binding->dev is NULL.
> >
> > It inverts socket/netdev lock order like below:
> >     netdev_lock();
> >     mutex_lock(&priv->lock);
> >     mutex_unlock(&priv->lock);
> >     netdev_unlock();
> >
> > Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
> > acquires socket lock from now on.
> >
> > Tests:
> > Scenario A:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     kill $pid
> >     ip link set $interface down
> >     modprobe -rv $module
> >
> > Scenario B:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     ip link set $interface down
> >     kill $pid
> >     modprobe -rv $module
> >
> > Scenario C:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     modprobe -rv $module
> >     sleep 5
> >     kill $pid
> >
> > Splat looks like:
> > Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> > CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> > Tainted: [B]=BAD_PAGE, [W]=WARN
> > RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> > Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> > RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> > RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> > RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> > R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> > R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> > FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> >  ...
> >  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
> >  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
> >  ...
> >  netlink_release (net/netlink/af_netlink.c:737)
> >  ...
> >  __sock_release (net/socket.c:647)
> >  sock_close (net/socket.c:1393)
>
> I'll look at the code later today, but it will need a respin for sure:
>
> net/core/netdev-genl.c: In function ‘netdev_nl_bind_rx_doit’:
> net/core/netdev-genl.c:878:61: error: passing argument 3 of ‘net_devmem_bind_dmabuf’ from incompatible pointer type [-Werror=incompatible-pointer-types]
>   878 |         binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
>       |                                                             ^~~~
>       |                                                             |
>       |                                                             struct netdev_nl_sock *
> In file included from ../net/core/netdev-genl.c:16:
> net/core/devmem.h:133:48: note: expected ‘struct netlink_ext_ack *’ but argument is of type ‘struct netdev_nl_sock *’
>   133 |                        struct netlink_ext_ack *extack)
>       |                        ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
> net/core/netdev-genl.c:878:19: error: too many arguments to function ‘net_devmem_bind_dmabuf’
>   878 |         binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
>       |                   ^~~~~~~~~~~~~~~~~~~~~~
> In file included from ../net/core/netdev-genl.c:16:
> net/core/devmem.h:132:1: note: declared here
>   132 | net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
>       | ^~~~~~~~~~~~~~~~~~~~~~
>
> This is on the kunit build so guessing some compat was missed.

Ah, yes, I missed updating a prototype for the netmem disabled case
in the devmem.h
I will fix it in the next version!

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-07  2:55 ` Jakub Kicinski
@ 2025-05-07  4:55   ` Taehee Yoo
  2025-05-07 13:23     ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Taehee Yoo @ 2025-05-07  4:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, almasrymina, sdf,
	netdev, asml.silence, dw, skhawaja, willemb, jdamato

On Wed, May 7, 2025 at 11:55 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue,  6 May 2025 14:08:58 +0000 Taehee Yoo wrote:
> > +     mutex_lock(&binding->priv->lock);
> >       xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
> >               if (bound_rxq == rxq) {
> >                       xa_erase(&binding->bound_rxqs, xa_idx);
> > +                     if (xa_empty(&binding->bound_rxqs))
> > +                             binding->dev = NULL;
> >                       break;
> >               }
> >       }
> > +     mutex_unlock(&binding->priv->lock);
>
> Why do we need to lock the socket around the while loop?
> binding->bound_rxqs have its own lock, and add/del are also
> protected by the netdev instance lock. The only thing we
> must lock is the write to binding->dev I think ?

I intended to protect both binding->bound_rxq and binding->dev.
But you're right, xarray API internally acquires a lock.
Only binding->dev is protected by socket lock here.

>
> Would it be cleaner to move that write and locking to a helper
> which would live in netdev-genl.c?

You mean that the socket lock is not required to cover whole loop
because bound_rxq is safe itself.
So, it acquires a socket lock only for setting binding->dev to NULL,
right? It makes sense to me.
Making a helper in netdev-genl.c would be good, I will make it.

>
> Similarly could we move:
>
>         if (binding->list.next)
>                 list_del(&binding->list);
>
> from net_devmem_unbind_dmabuf() to its callers?
> The asymmetry of list_add() being directly in netdev_nl_bind_rx_doit()
> not net_devmem_bind_dmabuf(), and list_del() being in
> net_devmem_unbind_dmabuf() always confuses me.

I agree with you. I will change it in the next version, too.

>
> >+      mutex_lock(&priv->lock);
> >+      binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
>
> We shouldn't have to lock the net_devmem_bind_dmabuf(), we have the
> instance lock so the device can't go away, and we haven't listed
> the binding on the socket, yet. Locking around list_add() should
> be enough?

I agree with it.
If binding is not listed, it doesn't have to be protected by lock.
As you mentioned, I will try doing just locking around list_add()
in the netdev_nl_bind_rx_doit().

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 14:08 [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
                   ` (2 preceding siblings ...)
  2025-05-07  2:55 ` Jakub Kicinski
@ 2025-05-07 13:08 ` kernel test robot
  2025-05-07 18:28 ` Mina Almasry
  4 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-05-07 13:08 UTC (permalink / raw)
  To: Taehee Yoo, davem, kuba, pabeni, edumazet, andrew+netdev, horms,
	almasrymina, sdf, netdev
  Cc: llvm, oe-kbuild-all, asml.silence, dw, skhawaja, willemb, jdamato,
	ap420073

Hi Taehee,

kernel test robot noticed the following build errors:

[auto build test ERROR on net/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Taehee-Yoo/net-devmem-fix-kernel-panic-when-socket-close-after-module-unload/20250506-221010
base:   net/main
patch link:    https://lore.kernel.org/r/20250506140858.2660441-1-ap420073%40gmail.com
patch subject: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
config: i386-defconfig (https://download.01.org/0day-ci/archive/20250507/202505072001.RHUTj8Jo-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250507/202505072001.RHUTj8Jo-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505072001.RHUTj8Jo-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/core/netdev-genl.c:879:60: error: too many arguments to function call, expected 3, have 4
     879 |         binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
         |                   ~~~~~~~~~~~~~~~~~~~~~~                          ^~~~~~~~~~~~
   net/core/devmem.h:132:1: note: 'net_devmem_bind_dmabuf' declared here
     132 | net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
         | ^                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     133 |                        struct netlink_ext_ack *extack)
         |                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 error generated.


vim +879 net/core/netdev-genl.c

   827	
   828	int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
   829	{
   830		struct nlattr *tb[ARRAY_SIZE(netdev_queue_id_nl_policy)];
   831		struct net_devmem_dmabuf_binding *binding;
   832		u32 ifindex, dmabuf_fd, rxq_idx;
   833		struct netdev_nl_sock *priv;
   834		struct net_device *netdev;
   835		struct sk_buff *rsp;
   836		struct nlattr *attr;
   837		int rem, err = 0;
   838		void *hdr;
   839	
   840		if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_DEV_IFINDEX) ||
   841		    GENL_REQ_ATTR_CHECK(info, NETDEV_A_DMABUF_FD) ||
   842		    GENL_REQ_ATTR_CHECK(info, NETDEV_A_DMABUF_QUEUES))
   843			return -EINVAL;
   844	
   845		ifindex = nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]);
   846		dmabuf_fd = nla_get_u32(info->attrs[NETDEV_A_DMABUF_FD]);
   847	
   848		priv = genl_sk_priv_get(&netdev_nl_family, NETLINK_CB(skb).sk);
   849		if (IS_ERR(priv))
   850			return PTR_ERR(priv);
   851	
   852		rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
   853		if (!rsp)
   854			return -ENOMEM;
   855	
   856		hdr = genlmsg_iput(rsp, info);
   857		if (!hdr) {
   858			err = -EMSGSIZE;
   859			goto err_genlmsg_free;
   860		}
   861	
   862		err = 0;
   863		netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
   864		if (!netdev) {
   865			err = -ENODEV;
   866			goto err_genlmsg_free;
   867		}
   868		if (!netif_device_present(netdev))
   869			err = -ENODEV;
   870		else if (!netdev_need_ops_lock(netdev))
   871			err = -EOPNOTSUPP;
   872		if (err) {
   873			NL_SET_BAD_ATTR(info->extack,
   874					info->attrs[NETDEV_A_DEV_IFINDEX]);
   875			goto err_unlock;
   876		}
   877	
   878		mutex_lock(&priv->lock);
 > 879		binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
   880		if (IS_ERR(binding)) {
   881			err = PTR_ERR(binding);
   882			goto err_unlock_sock;
   883		}
   884	
   885		nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
   886				       genlmsg_data(info->genlhdr),
   887				       genlmsg_len(info->genlhdr), rem) {
   888			err = nla_parse_nested(
   889				tb, ARRAY_SIZE(netdev_queue_id_nl_policy) - 1, attr,
   890				netdev_queue_id_nl_policy, info->extack);
   891			if (err < 0)
   892				goto err_unbind;
   893	
   894			if (NL_REQ_ATTR_CHECK(info->extack, attr, tb, NETDEV_A_QUEUE_ID) ||
   895			    NL_REQ_ATTR_CHECK(info->extack, attr, tb, NETDEV_A_QUEUE_TYPE)) {
   896				err = -EINVAL;
   897				goto err_unbind;
   898			}
   899	
   900			if (nla_get_u32(tb[NETDEV_A_QUEUE_TYPE]) != NETDEV_QUEUE_TYPE_RX) {
   901				NL_SET_BAD_ATTR(info->extack, tb[NETDEV_A_QUEUE_TYPE]);
   902				err = -EINVAL;
   903				goto err_unbind;
   904			}
   905	
   906			rxq_idx = nla_get_u32(tb[NETDEV_A_QUEUE_ID]);
   907	
   908			err = net_devmem_bind_dmabuf_to_queue(netdev, rxq_idx, binding,
   909							      info->extack);
   910			if (err)
   911				goto err_unbind;
   912		}
   913	
   914		list_add(&binding->list, &priv->bindings);
   915	
   916		nla_put_u32(rsp, NETDEV_A_DMABUF_ID, binding->id);
   917		genlmsg_end(rsp, hdr);
   918	
   919		err = genlmsg_reply(rsp, info);
   920		if (err)
   921			goto err_unbind;
   922	
   923		mutex_unlock(&priv->lock);
   924		netdev_unlock(netdev);
   925	
   926		return 0;
   927	
   928	err_unbind:
   929		net_devmem_unbind_dmabuf(binding);
   930	err_unlock_sock:
   931		mutex_unlock(&priv->lock);
   932	err_unlock:
   933		netdev_unlock(netdev);
   934	err_genlmsg_free:
   935		nlmsg_free(rsp);
   936		return err;
   937	}
   938	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-07  4:55   ` Taehee Yoo
@ 2025-05-07 13:23     ` Jakub Kicinski
  2025-05-07 15:54       ` Taehee Yoo
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2025-05-07 13:23 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, almasrymina, sdf,
	netdev, asml.silence, dw, skhawaja, willemb, jdamato

On Wed, 7 May 2025 13:55:44 +0900 Taehee Yoo wrote:
> So, it acquires a socket lock only for setting binding->dev to NULL,
> right? 

Yes.

BTW one more tiny nit pick:

 		net_devmem_unbind_dmabuf(binding);
+		mutex_unlock(&priv->lock);       << unlock
 		netdev_unlock(dev);
+		netdev_put(dev, &dev_tracker);
+		mutex_lock(&priv->lock);         << re-lock

The two marked ops are unnecessary. We only have to acquire the locks
in order. Its perfectly fine to release netdev_unlock() and keep holding
the socket lock.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-07 13:23     ` Jakub Kicinski
@ 2025-05-07 15:54       ` Taehee Yoo
  0 siblings, 0 replies; 14+ messages in thread
From: Taehee Yoo @ 2025-05-07 15:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, almasrymina, sdf,
	netdev, asml.silence, dw, skhawaja, willemb, jdamato

On Wed, May 7, 2025 at 10:23 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 7 May 2025 13:55:44 +0900 Taehee Yoo wrote:
> > So, it acquires a socket lock only for setting binding->dev to NULL,
> > right?
>
> Yes.
>
> BTW one more tiny nit pick:
>
>                 net_devmem_unbind_dmabuf(binding);
> +               mutex_unlock(&priv->lock);       << unlock
>                 netdev_unlock(dev);
> +               netdev_put(dev, &dev_tracker);
> +               mutex_lock(&priv->lock);         << re-lock
>
> The two marked ops are unnecessary. We only have to acquire the locks
> in order. Its perfectly fine to release netdev_unlock() and keep holding
> the socket lock.

Wow, I didn't know that.
As my knowledge is only ABBA is correct, and I've never thought about
whether ABAB is possible or not.
I will change it.

Thank you so much!
Taehee Yoo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-06 14:08 [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
                   ` (3 preceding siblings ...)
  2025-05-07 13:08 ` kernel test robot
@ 2025-05-07 18:28 ` Mina Almasry
  2025-05-08  9:59   ` Taehee Yoo
  4 siblings, 1 reply; 14+ messages in thread
From: Mina Almasry @ 2025-05-07 18:28 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, sdf, netdev,
	asml.silence, dw, skhawaja, willemb, jdamato

On Tue, May 6, 2025 at 7:09 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> Kernel panic occurs when a devmem TCP socket is closed after NIC module
> is unloaded.
>
> This is Devmem TCP unregistration scenarios. number is an order.
> (a)socket close

Lets call this "netlink socket close", to differentiate between
netlink and data sockets.

>    (b)pp destroy
>    (c)uninstall    result
> 1                  2                3               OK
> 1                  3                2               (d)Impossible
> 2                  1                3               OK
> 3                  1                2               (e)Kernel panic
> 2                  3                1               (d)Impossible
> 3                  2                1               (d)Impossible
>
> (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
>     closed.
> (b) page_pool_destroy() is called when the interface is down.
> (c) mp_ops->uninstall() is called when an interface is unregistered.
> (d) There is no scenario in mp_ops->uninstall() is called before
>     page_pool_destroy().
>     Because unregister_netdevice_many_notify() closes interfaces first
>     and then calls mp_ops->uninstall().
> (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
>     netdev_lock().
>     But if the interface module has already been removed, net_device
>     pointer is invalid, so it causes kernel panic.
>
> In summary, there are only 3 possible scenarios.
>  A. sk close -> pp destroy -> uninstall.
>  B. pp destroy -> sk close -> uninstall.
>  C. pp destroy -> uninstall -> sk close.
>
> Case C is a kernel panic scenario.
>
> In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> binding->dev to NULL.
> It indicates an bound net_device was unregistered.
>
> It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> if binding->dev is NULL.
>
> It inverts socket/netdev lock order like below:
>     netdev_lock();
>     mutex_lock(&priv->lock);
>     mutex_unlock(&priv->lock);
>     netdev_unlock();
>
> Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
> acquires socket lock from now on.
>
> Tests:
> Scenario A:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     kill $pid
>     ip link set $interface down
>     modprobe -rv $module
>
> Scenario B:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     ip link set $interface down
>     kill $pid
>     modprobe -rv $module
>
> Scenario C:
>     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
>         -v 7 -t 1 -q 1 &
>     pid=$!
>     sleep 10
>     modprobe -rv $module
>     sleep 5
>     kill $pid
>
> Splat looks like:
> Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> Tainted: [B]=BAD_PAGE, [W]=WARN
> RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> PKRU: 55555554
> Call Trace:
> <TASK>
>  ...
>  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
>  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
>  ...
>  netlink_release (net/netlink/af_netlink.c:737)
>  ...
>  __sock_release (net/socket.c:647)
>  sock_close (net/socket.c:1393)
>
> Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> v2:
>  - Fix commit message.
>  - Correct Fixes tag.
>  - Inverse locking order.
>  - Do not put a reference count of binding in
>    mp_dmabuf_devmem_uninstall().
>
> In order to test this patch, driver side implementation of devmem TCP[1]
> is needed to be applied.
>
> [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
>
>  net/core/devmem.c      |  6 ++++++
>  net/core/devmem.h      |  3 +++
>  net/core/netdev-genl.c | 27 ++++++++++++++++++---------
>  3 files changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 6e27a47d0493..636c1e82b8da 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -167,6 +167,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
>
>  struct net_devmem_dmabuf_binding *
>  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> +                      struct netdev_nl_sock *priv,
>                        struct netlink_ext_ack *extack)
>  {
>         struct net_devmem_dmabuf_binding *binding;
> @@ -189,6 +190,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
>         }
>
>         binding->dev = dev;
> +       binding->priv = priv;
>
>         err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id,
>                               binding, xa_limit_32b, &id_alloc_next,
> @@ -376,12 +378,16 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
>         struct netdev_rx_queue *bound_rxq;
>         unsigned long xa_idx;
>
> +       mutex_lock(&binding->priv->lock);
>         xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
>                 if (bound_rxq == rxq) {
>                         xa_erase(&binding->bound_rxqs, xa_idx);
> +                       if (xa_empty(&binding->bound_rxqs))
> +                               binding->dev = NULL;

priv->lock is meant to protect priv->bindings only. To be honest, I
find priv->lock being used to protect both priv->bindings and
bindings->dev is extremely confusing. I think it may be contributing
to the convoluted lock ordering in netdev_nl_sock_priv_destroy. It
also makes it such that the binding needs to have a reference to the
netlink socket that owns it which is mentally confusing to me. The
binding should abstractly speaking not need to know anything about the
netlink socket that holds it. Also, AFAICT this may be buggy. The
binding may outlive the netlink socket. For example when the netlink
socket is closed but the binding is kept alive because there are
references to the netmems in it, UAF may happen.

Instead of this, if you need to protect concurrent access to
binding->dev, either:

1. create new spin_lock, binding->dev_lock, which protects access to
binding->dev, or
2. Use rcu protection to lockelessly set binding->dev, or
3. Use some cmpxchg logic to locklessly set/query binding->dev and
detect if it got modified in a racing thread.

But please no multiplexing priv->lock to protect both priv->bindings
and binding->dev.

>                         break;
>                 }
>         }
> +       mutex_unlock(&binding->priv->lock);
>  }
>
>  static const struct memory_provider_ops dmabuf_devmem_ops = {
> diff --git a/net/core/devmem.h b/net/core/devmem.h
> index 7fc158d52729..afd6320b2c9b 100644
> --- a/net/core/devmem.h
> +++ b/net/core/devmem.h
> @@ -11,6 +11,7 @@
>  #define _NET_DEVMEM_H
>
>  #include <net/netmem.h>
> +#include <net/netdev_netlink.h>
>
>  struct netlink_ext_ack;
>
> @@ -20,6 +21,7 @@ struct net_devmem_dmabuf_binding {
>         struct sg_table *sgt;
>         struct net_device *dev;
>         struct gen_pool *chunk_pool;
> +       struct netdev_nl_sock *priv;
>
>         /* The user holds a ref (via the netlink API) for as long as they want
>          * the binding to remain alive. Each page pool using this binding holds
> @@ -63,6 +65,7 @@ struct dmabuf_genpool_chunk_owner {
>  void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
>  struct net_devmem_dmabuf_binding *
>  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> +                      struct netdev_nl_sock *priv,
>                        struct netlink_ext_ack *extack);
>  void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
>  int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index 230743bdbb14..b8bb73574276 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -859,13 +859,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
>                 goto err_genlmsg_free;
>         }
>
> -       mutex_lock(&priv->lock);
> -
>         err = 0;
>         netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
>         if (!netdev) {
>                 err = -ENODEV;
> -               goto err_unlock_sock;
> +               goto err_genlmsg_free;
>         }
>         if (!netif_device_present(netdev))
>                 err = -ENODEV;
> @@ -877,10 +875,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
>                 goto err_unlock;
>         }
>
> -       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack);
> +       mutex_lock(&priv->lock);
> +       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);

I'm not so sure about this as well. priv->lock should protect access
to priv->bindings only. I'm not sure why it's locked before
net_devmem_bind_dmabuf? Can it be locked around the access to
priv->bindings only?

>         if (IS_ERR(binding)) {
>                 err = PTR_ERR(binding);
> -               goto err_unlock;
> +               goto err_unlock_sock;
>         }
>
>         nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
> @@ -921,18 +920,17 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
>         if (err)
>                 goto err_unbind;
>
> -       netdev_unlock(netdev);
> -
>         mutex_unlock(&priv->lock);
> +       netdev_unlock(netdev);
>
>         return 0;
>
>  err_unbind:
>         net_devmem_unbind_dmabuf(binding);
> -err_unlock:
> -       netdev_unlock(netdev);
>  err_unlock_sock:
>         mutex_unlock(&priv->lock);
> +err_unlock:
> +       netdev_unlock(netdev);
>  err_genlmsg_free:
>         nlmsg_free(rsp);
>         return err;
> @@ -948,14 +946,25 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
>  {
>         struct net_devmem_dmabuf_binding *binding;
>         struct net_devmem_dmabuf_binding *temp;
> +       netdevice_tracker dev_tracker;
>         struct net_device *dev;
>
>         mutex_lock(&priv->lock);
>         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
>                 dev = binding->dev;
> +               if (!dev) {
> +                       net_devmem_unbind_dmabuf(binding);
> +                       continue;
> +               }
> +               netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> +               mutex_unlock(&priv->lock);
>                 netdev_lock(dev);
> +               mutex_lock(&priv->lock);
>                 net_devmem_unbind_dmabuf(binding);
> +               mutex_unlock(&priv->lock);
>                 netdev_unlock(dev);
> +               netdev_put(dev, &dev_tracker);
> +               mutex_lock(&priv->lock);
>         }
>         mutex_unlock(&priv->lock);
>  }
> --
> 2.34.1
>


-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-07 18:28 ` Mina Almasry
@ 2025-05-08  9:59   ` Taehee Yoo
  2025-05-08 20:45     ` Mina Almasry
  0 siblings, 1 reply; 14+ messages in thread
From: Taehee Yoo @ 2025-05-08  9:59 UTC (permalink / raw)
  To: Mina Almasry
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, sdf, netdev,
	asml.silence, dw, skhawaja, willemb, jdamato

On Thu, May 8, 2025 at 3:28 AM Mina Almasry <almasrymina@google.com> wrote:
>

Hi Mina,
Thanks a lot for your review!

> On Tue, May 6, 2025 at 7:09 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > Kernel panic occurs when a devmem TCP socket is closed after NIC module
> > is unloaded.
> >
> > This is Devmem TCP unregistration scenarios. number is an order.
> > (a)socket close
>
> Lets call this "netlink socket close", to differentiate between
> netlink and data sockets.

Thanks, I will change it in the next patch.

>
> >    (b)pp destroy
> >    (c)uninstall    result
> > 1                  2                3               OK
> > 1                  3                2               (d)Impossible
> > 2                  1                3               OK
> > 3                  1                2               (e)Kernel panic
> > 2                  3                1               (d)Impossible
> > 3                  2                1               (d)Impossible
> >
> > (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
> >     closed.
> > (b) page_pool_destroy() is called when the interface is down.
> > (c) mp_ops->uninstall() is called when an interface is unregistered.
> > (d) There is no scenario in mp_ops->uninstall() is called before
> >     page_pool_destroy().
> >     Because unregister_netdevice_many_notify() closes interfaces first
> >     and then calls mp_ops->uninstall().
> > (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
> >     netdev_lock().
> >     But if the interface module has already been removed, net_device
> >     pointer is invalid, so it causes kernel panic.
> >
> > In summary, there are only 3 possible scenarios.
> >  A. sk close -> pp destroy -> uninstall.
> >  B. pp destroy -> sk close -> uninstall.
> >  C. pp destroy -> uninstall -> sk close.
> >
> > Case C is a kernel panic scenario.
> >
> > In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
> > binding->dev to NULL.
> > It indicates an bound net_device was unregistered.
> >
> > It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
> > if binding->dev is NULL.
> >
> > It inverts socket/netdev lock order like below:
> >     netdev_lock();
> >     mutex_lock(&priv->lock);
> >     mutex_unlock(&priv->lock);
> >     netdev_unlock();
> >
> > Because of inversion of locking ordering, mp_dmabuf_devmem_uninstall()
> > acquires socket lock from now on.
> >
> > Tests:
> > Scenario A:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     kill $pid
> >     ip link set $interface down
> >     modprobe -rv $module
> >
> > Scenario B:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     ip link set $interface down
> >     kill $pid
> >     modprobe -rv $module
> >
> > Scenario C:
> >     ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
> >         -v 7 -t 1 -q 1 &
> >     pid=$!
> >     sleep 10
> >     modprobe -rv $module
> >     sleep 5
> >     kill $pid
> >
> > Splat looks like:
> > Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
> > CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
> > Tainted: [B]=BAD_PAGE, [W]=WARN
> > RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
> > Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
> > RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
> > RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
> > RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
> > RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
> > R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
> > R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
> > FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> >  ...
> >  netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
> >  genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
> >  ...
> >  netlink_release (net/netlink/af_netlink.c:737)
> >  ...
> >  __sock_release (net/socket.c:647)
> >  sock_close (net/socket.c:1393)
> >
> > Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> > ---
> >
> > v2:
> >  - Fix commit message.
> >  - Correct Fixes tag.
> >  - Inverse locking order.
> >  - Do not put a reference count of binding in
> >    mp_dmabuf_devmem_uninstall().
> >
> > In order to test this patch, driver side implementation of devmem TCP[1]
> > is needed to be applied.
> >
> > [1] https://lore.kernel.org/netdev/20250415052458.1260575-1-ap420073@gmail.com/T/#u
> >
> >  net/core/devmem.c      |  6 ++++++
> >  net/core/devmem.h      |  3 +++
> >  net/core/netdev-genl.c | 27 ++++++++++++++++++---------
> >  3 files changed, 27 insertions(+), 9 deletions(-)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 6e27a47d0493..636c1e82b8da 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -167,6 +167,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> >
> >  struct net_devmem_dmabuf_binding *
> >  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > +                      struct netdev_nl_sock *priv,
> >                        struct netlink_ext_ack *extack)
> >  {
> >         struct net_devmem_dmabuf_binding *binding;
> > @@ -189,6 +190,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> >         }
> >
> >         binding->dev = dev;
> > +       binding->priv = priv;
> >
> >         err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id,
> >                               binding, xa_limit_32b, &id_alloc_next,
> > @@ -376,12 +378,16 @@ static void mp_dmabuf_devmem_uninstall(void *mp_priv,
> >         struct netdev_rx_queue *bound_rxq;
> >         unsigned long xa_idx;
> >
> > +       mutex_lock(&binding->priv->lock);
> >         xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
> >                 if (bound_rxq == rxq) {
> >                         xa_erase(&binding->bound_rxqs, xa_idx);
> > +                       if (xa_empty(&binding->bound_rxqs))
> > +                               binding->dev = NULL;
>
> priv->lock is meant to protect priv->bindings only. To be honest, I
> find priv->lock being used to protect both priv->bindings and
> bindings->dev is extremely confusing. I think it may be contributing
> to the convoluted lock ordering in netdev_nl_sock_priv_destroy. It
> also makes it such that the binding needs to have a reference to the
> netlink socket that owns it which is mentally confusing to me. The
> binding should abstractly speaking not need to know anything about the
> netlink socket that holds it. Also, AFAICT this may be buggy. The
> binding may outlive the netlink socket. For example when the netlink
> socket is closed but the binding is kept alive because there are
> references to the netmems in it, UAF may happen.

Thanks for sharing your concerns.

>
> Instead of this, if you need to protect concurrent access to
> binding->dev, either:
>
> 1. create new spin_lock, binding->dev_lock, which protects access to
> binding->dev, or

Would a mutex be sufficient, or is a spinlock required?

> 2. Use rcu protection to lockelessly set binding->dev, or
> 3. Use some cmpxchg logic to locklessly set/query binding->dev and
> detect if it got modified in a racing thread.
>
> But please no multiplexing priv->lock to protect both priv->bindings
> and binding->dev.

Okay, I think introducing a new lock for protecting a binding would
be enough. Not only protect binding->dev, all members of a binding.
So, binding->lock would be more fit as a name.

All members except a dev are not required to be protected so far.
Because they are initialized when binding is initialized, and they are
not changed in live.
So, binding is alive, members are valid except for a dev.
However, introducing binding->lock makes the code obvious, and it will
not confuse us someday we want to protect other members of a binding.

>
> >                         break;
> >                 }
> >         }
> > +       mutex_unlock(&binding->priv->lock);
> >  }
> >
> >  static const struct memory_provider_ops dmabuf_devmem_ops = {
> > diff --git a/net/core/devmem.h b/net/core/devmem.h
> > index 7fc158d52729..afd6320b2c9b 100644
> > --- a/net/core/devmem.h
> > +++ b/net/core/devmem.h
> > @@ -11,6 +11,7 @@
> >  #define _NET_DEVMEM_H
> >
> >  #include <net/netmem.h>
> > +#include <net/netdev_netlink.h>
> >
> >  struct netlink_ext_ack;
> >
> > @@ -20,6 +21,7 @@ struct net_devmem_dmabuf_binding {
> >         struct sg_table *sgt;
> >         struct net_device *dev;
> >         struct gen_pool *chunk_pool;
> > +       struct netdev_nl_sock *priv;
> >
> >         /* The user holds a ref (via the netlink API) for as long as they want
> >          * the binding to remain alive. Each page pool using this binding holds
> > @@ -63,6 +65,7 @@ struct dmabuf_genpool_chunk_owner {
> >  void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
> >  struct net_devmem_dmabuf_binding *
> >  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > +                      struct netdev_nl_sock *priv,
> >                        struct netlink_ext_ack *extack);
> >  void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
> >  int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > index 230743bdbb14..b8bb73574276 100644
> > --- a/net/core/netdev-genl.c
> > +++ b/net/core/netdev-genl.c
> > @@ -859,13 +859,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >                 goto err_genlmsg_free;
> >         }
> >
> > -       mutex_lock(&priv->lock);
> > -
> >         err = 0;
> >         netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
> >         if (!netdev) {
> >                 err = -ENODEV;
> > -               goto err_unlock_sock;
> > +               goto err_genlmsg_free;
> >         }
> >         if (!netif_device_present(netdev))
> >                 err = -ENODEV;
> > @@ -877,10 +875,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >                 goto err_unlock;
> >         }
> >
> > -       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack);
> > +       mutex_lock(&priv->lock);
> > +       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
>
> I'm not so sure about this as well. priv->lock should protect access
> to priv->bindings only. I'm not sure why it's locked before
> net_devmem_bind_dmabuf? Can it be locked around the access to
> priv->bindings only?

As you mentioned, this lock is unnecessary to protect
netdev_devmem_bind_dmabuf().
It is required to protect only priv->bindings.
So,  I will move priv->lock to lock only around priv->bindings
in the next patch.

>
> >         if (IS_ERR(binding)) {
> >                 err = PTR_ERR(binding);
> > -               goto err_unlock;
> > +               goto err_unlock_sock;
> >         }
> >
> >         nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
> > @@ -921,18 +920,17 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> >         if (err)
> >                 goto err_unbind;
> >
> > -       netdev_unlock(netdev);
> > -
> >         mutex_unlock(&priv->lock);
> > +       netdev_unlock(netdev);
> >
> >         return 0;
> >
> >  err_unbind:
> >         net_devmem_unbind_dmabuf(binding);
> > -err_unlock:
> > -       netdev_unlock(netdev);
> >  err_unlock_sock:
> >         mutex_unlock(&priv->lock);
> > +err_unlock:
> > +       netdev_unlock(netdev);
> >  err_genlmsg_free:
> >         nlmsg_free(rsp);
> >         return err;
> > @@ -948,14 +946,25 @@ void netdev_nl_sock_priv_destroy(struct netdev_nl_sock *priv)
> >  {
> >         struct net_devmem_dmabuf_binding *binding;
> >         struct net_devmem_dmabuf_binding *temp;
> > +       netdevice_tracker dev_tracker;
> >         struct net_device *dev;
> >
> >         mutex_lock(&priv->lock);
> >         list_for_each_entry_safe(binding, temp, &priv->bindings, list) {
> >                 dev = binding->dev;
> > +               if (!dev) {
> > +                       net_devmem_unbind_dmabuf(binding);
> > +                       continue;
> > +               }
> > +               netdev_hold(dev, &dev_tracker, GFP_KERNEL);
> > +               mutex_unlock(&priv->lock);
> >                 netdev_lock(dev);
> > +               mutex_lock(&priv->lock);
> >                 net_devmem_unbind_dmabuf(binding);
> > +               mutex_unlock(&priv->lock);
> >                 netdev_unlock(dev);
> > +               netdev_put(dev, &dev_tracker);
> > +               mutex_lock(&priv->lock);
> >         }
> >         mutex_unlock(&priv->lock);
> >  }
> > --
> > 2.34.1
> >
>
>
> --
> Thanks,
> Mina

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload
  2025-05-08  9:59   ` Taehee Yoo
@ 2025-05-08 20:45     ` Mina Almasry
  0 siblings, 0 replies; 14+ messages in thread
From: Mina Almasry @ 2025-05-08 20:45 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, sdf, netdev,
	asml.silence, dw, skhawaja, willemb, jdamato

On Thu, May 8, 2025 at 3:00 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > Instead of this, if you need to protect concurrent access to
> > binding->dev, either:
> >
> > 1. create new spin_lock, binding->dev_lock, which protects access to
> > binding->dev, or
>
> Would a mutex be sufficient, or is a spinlock required?
>

AFAICT a mutex would be fine.

> > 2. Use rcu protection to lockelessly set binding->dev, or
> > 3. Use some cmpxchg logic to locklessly set/query binding->dev and
> > detect if it got modified in a racing thread.
> >
> > But please no multiplexing priv->lock to protect both priv->bindings
> > and binding->dev.
>
> Okay, I think introducing a new lock for protecting a binding would
> be enough. Not only protect binding->dev, all members of a binding.
> So, binding->lock would be more fit as a name.
>
> All members except a dev are not required to be protected so far.
> Because they are initialized when binding is initialized, and they are
> not changed in live.
> So, binding is alive, members are valid except for a dev.
> However, introducing binding->lock makes the code obvious, and it will
> not confuse us someday we want to protect other members of a binding.
>

Thanks, yes AFAICT we don't really need to synchronize access to any
other members of the binding in this patch, it's just binding->dev
that can be NULL'd while another thread is reading it. Maybe for now
we can have binding->lock protect binding->dev and in the future if
more sync it's needed, binding->lock can be re-used. A comment above
binding->lock can explain what it protects and doesn't.

> >
> > >                         break;
> > >                 }
> > >         }
> > > +       mutex_unlock(&binding->priv->lock);
> > >  }
> > >
> > >  static const struct memory_provider_ops dmabuf_devmem_ops = {
> > > diff --git a/net/core/devmem.h b/net/core/devmem.h
> > > index 7fc158d52729..afd6320b2c9b 100644
> > > --- a/net/core/devmem.h
> > > +++ b/net/core/devmem.h
> > > @@ -11,6 +11,7 @@
> > >  #define _NET_DEVMEM_H
> > >
> > >  #include <net/netmem.h>
> > > +#include <net/netdev_netlink.h>
> > >
> > >  struct netlink_ext_ack;
> > >
> > > @@ -20,6 +21,7 @@ struct net_devmem_dmabuf_binding {
> > >         struct sg_table *sgt;
> > >         struct net_device *dev;
> > >         struct gen_pool *chunk_pool;
> > > +       struct netdev_nl_sock *priv;
> > >
> > >         /* The user holds a ref (via the netlink API) for as long as they want
> > >          * the binding to remain alive. Each page pool using this binding holds
> > > @@ -63,6 +65,7 @@ struct dmabuf_genpool_chunk_owner {
> > >  void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
> > >  struct net_devmem_dmabuf_binding *
> > >  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > > +                      struct netdev_nl_sock *priv,
> > >                        struct netlink_ext_ack *extack);
> > >  void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
> > >  int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > > index 230743bdbb14..b8bb73574276 100644
> > > --- a/net/core/netdev-genl.c
> > > +++ b/net/core/netdev-genl.c
> > > @@ -859,13 +859,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> > >                 goto err_genlmsg_free;
> > >         }
> > >
> > > -       mutex_lock(&priv->lock);
> > > -
> > >         err = 0;
> > >         netdev = netdev_get_by_index_lock(genl_info_net(info), ifindex);
> > >         if (!netdev) {
> > >                 err = -ENODEV;
> > > -               goto err_unlock_sock;
> > > +               goto err_genlmsg_free;
> > >         }
> > >         if (!netif_device_present(netdev))
> > >                 err = -ENODEV;
> > > @@ -877,10 +875,11 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
> > >                 goto err_unlock;
> > >         }
> > >
> > > -       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack);
> > > +       mutex_lock(&priv->lock);
> > > +       binding = net_devmem_bind_dmabuf(netdev, dmabuf_fd, priv, info->extack);
> >
> > I'm not so sure about this as well. priv->lock should protect access
> > to priv->bindings only. I'm not sure why it's locked before
> > net_devmem_bind_dmabuf? Can it be locked around the access to
> > priv->bindings only?
>
> As you mentioned, this lock is unnecessary to protect
> netdev_devmem_bind_dmabuf().
> It is required to protect only priv->bindings.
> So,  I will move priv->lock to lock only around priv->bindings
> in the next patch.
>

Thanks!

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-05-08 20:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-06 14:08 [PATCH net v2] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
2025-05-06 17:41 ` Stanislav Fomichev
2025-05-07  2:42   ` Jakub Kicinski
2025-05-07  4:22   ` Taehee Yoo
2025-05-06 22:07 ` Jakub Kicinski
2025-05-07  4:24   ` Taehee Yoo
2025-05-07  2:55 ` Jakub Kicinski
2025-05-07  4:55   ` Taehee Yoo
2025-05-07 13:23     ` Jakub Kicinski
2025-05-07 15:54       ` Taehee Yoo
2025-05-07 13:08 ` kernel test robot
2025-05-07 18:28 ` Mina Almasry
2025-05-08  9:59   ` Taehee Yoo
2025-05-08 20:45     ` Mina Almasry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).