netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net V5 0/3] Fixes for IPsec over bonding
@ 2024-08-21  9:04 Jianbo Liu
  2024-08-21  9:04 ` [PATCH net V5 1/3] bonding: implement xdo_dev_state_free and call it after deletion Jianbo Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jianbo Liu @ 2024-08-21  9:04 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet, jv, andy
  Cc: saeedm, gal, leonro, liuhangbin, tariqt, Jianbo Liu

Hi,

This patchset provides bug fixes for IPsec over bonding driver.

It adds the missing xdo_dev_state_free API, and fixes "scheduling while
atomic" by using mutex lock instead.

Series generated against:
commit c07ff8592d57 ("netem: fix return value if duplicate enqueue fails")

Thanks!
Jianbo

V5:
- Rebased.
- Removed state deletion/free in bond_ipsec_add_sa_all() added before,
  as real_dev is not set to NULL in Nikolay's patch. 

V4:
- Add to all patches: Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>.
- Update commit message in patch 1 (Jakub).

V3:
- Add RCU read lock/unlock for bond_ipsec_add_sa, bond_ipsec_del_sa and bond_ipsec_free_sa.

V2:
- Rebased on top of latest net branch.
- Squashed patch #2 into #1 per Hangbin comment.
- Addressed Hangbin's comments.
- Patch #3 (was #4): Addressed comments by Paolo.

Jianbo Liu (3):
  bonding: implement xdo_dev_state_free and call it after deletion
  bonding: extract the use of real_device into local variable
  bonding: change ipsec_lock from spin lock to mutex

 drivers/net/bonding/bond_main.c | 143 ++++++++++++++++++++------------
 include/net/bonding.h           |   2 +-
 2 files changed, 90 insertions(+), 55 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net V5 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-21  9:04 [PATCH net V5 0/3] Fixes for IPsec over bonding Jianbo Liu
@ 2024-08-21  9:04 ` Jianbo Liu
  2024-08-21  9:04 ` [PATCH net V5 2/3] bonding: extract the use of real_device into local variable Jianbo Liu
  2024-08-21  9:04 ` [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex Jianbo Liu
  2 siblings, 0 replies; 12+ messages in thread
From: Jianbo Liu @ 2024-08-21  9:04 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet, jv, andy
  Cc: saeedm, gal, leonro, liuhangbin, tariqt, Jianbo Liu

Add this implementation for bonding, so hardware resources can be
freed from the active slave after xfrm state is deleted. The netdev
used to invoke xdo_dev_state_free callback, is saved in the xfrm state
(xs->xso.real_dev), which is also the bond's active slave.

And call it when deleting all SAs from old active real interface while
switching current active slave.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/bonding/bond_main.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f74bacf071fc..f191a48c7766 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -581,12 +581,43 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 				   __func__);
 		} else {
 			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
+			if (slave->dev->xfrmdev_ops->xdo_dev_state_free)
+				slave->dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
 		}
 	}
 	spin_unlock_bh(&bond->ipsec_lock);
 	rcu_read_unlock();
 }
 
+static void bond_ipsec_free_sa(struct xfrm_state *xs)
+{
+	struct net_device *bond_dev = xs->xso.dev;
+	struct net_device *real_dev;
+	struct bonding *bond;
+	struct slave *slave;
+
+	if (!bond_dev)
+		return;
+
+	rcu_read_lock();
+	bond = netdev_priv(bond_dev);
+	slave = rcu_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	rcu_read_unlock();
+
+	if (!slave)
+		return;
+
+	if (!xs->xso.real_dev)
+		return;
+
+	WARN_ON(xs->xso.real_dev != real_dev);
+
+	if (real_dev && real_dev->xfrmdev_ops &&
+	    real_dev->xfrmdev_ops->xdo_dev_state_free)
+		real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
+}
+
 /**
  * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
  * @skb: current data packet
@@ -627,6 +658,7 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
 static const struct xfrmdev_ops bond_xfrmdev_ops = {
 	.xdo_dev_state_add = bond_ipsec_add_sa,
 	.xdo_dev_state_delete = bond_ipsec_del_sa,
+	.xdo_dev_state_free = bond_ipsec_free_sa,
 	.xdo_dev_offload_ok = bond_ipsec_offload_ok,
 };
 #endif /* CONFIG_XFRM_OFFLOAD */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net V5 2/3] bonding: extract the use of real_device into local variable
  2024-08-21  9:04 [PATCH net V5 0/3] Fixes for IPsec over bonding Jianbo Liu
  2024-08-21  9:04 ` [PATCH net V5 1/3] bonding: implement xdo_dev_state_free and call it after deletion Jianbo Liu
@ 2024-08-21  9:04 ` Jianbo Liu
  2024-08-21  9:04 ` [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex Jianbo Liu
  2 siblings, 0 replies; 12+ messages in thread
From: Jianbo Liu @ 2024-08-21  9:04 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet, jv, andy
  Cc: saeedm, gal, leonro, liuhangbin, tariqt, Jianbo Liu, Cosmin Ratiu

Add a local variable for slave->dev, to prepare for the lock change in
the next patch. There is no functionality change.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f191a48c7766..0d1129eaf47b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -427,6 +427,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 			     struct netlink_ext_ack *extack)
 {
 	struct net_device *bond_dev = xs->xso.dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct bonding *bond;
 	struct slave *slave;
@@ -443,9 +444,10 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 		return -ENODEV;
 	}
 
-	if (!slave->dev->xfrmdev_ops ||
-	    !slave->dev->xfrmdev_ops->xdo_dev_state_add ||
-	    netif_is_bond_master(slave->dev)) {
+	real_dev = slave->dev;
+	if (!real_dev->xfrmdev_ops ||
+	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
+	    netif_is_bond_master(real_dev)) {
 		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
 		rcu_read_unlock();
 		return -EINVAL;
@@ -456,9 +458,9 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 		rcu_read_unlock();
 		return -ENOMEM;
 	}
-	xs->xso.real_dev = slave->dev;
 
-	err = slave->dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
+	xs->xso.real_dev = real_dev;
+	err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
 	if (!err) {
 		ipsec->xs = xs;
 		INIT_LIST_HEAD(&ipsec->list);
@@ -475,6 +477,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 static void bond_ipsec_add_sa_all(struct bonding *bond)
 {
 	struct net_device *bond_dev = bond->dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
@@ -483,12 +486,13 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 	if (!slave)
 		goto out;
 
-	if (!slave->dev->xfrmdev_ops ||
-	    !slave->dev->xfrmdev_ops->xdo_dev_state_add ||
-	    netif_is_bond_master(slave->dev)) {
+	real_dev = slave->dev;
+	if (!real_dev->xfrmdev_ops ||
+	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
+	    netif_is_bond_master(real_dev)) {
 		spin_lock_bh(&bond->ipsec_lock);
 		if (!list_empty(&bond->ipsec_list))
-			slave_warn(bond_dev, slave->dev,
+			slave_warn(bond_dev, real_dev,
 				   "%s: no slave xdo_dev_state_add\n",
 				   __func__);
 		spin_unlock_bh(&bond->ipsec_lock);
@@ -497,9 +501,9 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 
 	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
-		ipsec->xs->xso.real_dev = slave->dev;
-		if (slave->dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
-			slave_warn(bond_dev, slave->dev, "%s: failed to add SA\n", __func__);
+		ipsec->xs->xso.real_dev = real_dev;
+		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
+			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			ipsec->xs->xso.real_dev = NULL;
 		}
 	}
@@ -515,6 +519,7 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 static void bond_ipsec_del_sa(struct xfrm_state *xs)
 {
 	struct net_device *bond_dev = xs->xso.dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct bonding *bond;
 	struct slave *slave;
@@ -532,16 +537,17 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 	if (!xs->xso.real_dev)
 		goto out;
 
-	WARN_ON(xs->xso.real_dev != slave->dev);
+	real_dev = slave->dev;
+	WARN_ON(xs->xso.real_dev != real_dev);
 
-	if (!slave->dev->xfrmdev_ops ||
-	    !slave->dev->xfrmdev_ops->xdo_dev_state_delete ||
-	    netif_is_bond_master(slave->dev)) {
-		slave_warn(bond_dev, slave->dev, "%s: no slave xdo_dev_state_delete\n", __func__);
+	if (!real_dev->xfrmdev_ops ||
+	    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
+	    netif_is_bond_master(real_dev)) {
+		slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__);
 		goto out;
 	}
 
-	slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
 out:
 	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
@@ -558,6 +564,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 static void bond_ipsec_del_sa_all(struct bonding *bond)
 {
 	struct net_device *bond_dev = bond->dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
@@ -568,21 +575,22 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 		return;
 	}
 
+	real_dev = slave->dev;
 	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
 		if (!ipsec->xs->xso.real_dev)
 			continue;
 
-		if (!slave->dev->xfrmdev_ops ||
-		    !slave->dev->xfrmdev_ops->xdo_dev_state_delete ||
-		    netif_is_bond_master(slave->dev)) {
-			slave_warn(bond_dev, slave->dev,
+		if (!real_dev->xfrmdev_ops ||
+		    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
+		    netif_is_bond_master(real_dev)) {
+			slave_warn(bond_dev, real_dev,
 				   "%s: no slave xdo_dev_state_delete\n",
 				   __func__);
 		} else {
-			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
-			if (slave->dev->xfrmdev_ops->xdo_dev_state_free)
-				slave->dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
+			real_dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
+			if (real_dev->xfrmdev_ops->xdo_dev_state_free)
+				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
 		}
 	}
 	spin_unlock_bh(&bond->ipsec_lock);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-21  9:04 [PATCH net V5 0/3] Fixes for IPsec over bonding Jianbo Liu
  2024-08-21  9:04 ` [PATCH net V5 1/3] bonding: implement xdo_dev_state_free and call it after deletion Jianbo Liu
  2024-08-21  9:04 ` [PATCH net V5 2/3] bonding: extract the use of real_device into local variable Jianbo Liu
@ 2024-08-21  9:04 ` Jianbo Liu
  2024-08-21 16:00   ` Jay Vosburgh
  2 siblings, 1 reply; 12+ messages in thread
From: Jianbo Liu @ 2024-08-21  9:04 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet, jv, andy
  Cc: saeedm, gal, leonro, liuhangbin, tariqt, Jianbo Liu

In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
hence xdo_dev_state_add and xdo_dev_state_delete are called inside
this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
"scheduling while atomic" will be triggered when changing bond's
active slave.

[  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
[  101.055726] Modules linked in:
[  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
[  101.058760] Hardware name:
[  101.059434] Call Trace:
[  101.059436]  <TASK>
[  101.060873]  dump_stack_lvl+0x51/0x60
[  101.061275]  __schedule_bug+0x4e/0x60
[  101.061682]  __schedule+0x612/0x7c0
[  101.062078]  ? __mod_timer+0x25c/0x370
[  101.062486]  schedule+0x25/0xd0
[  101.062845]  schedule_timeout+0x77/0xf0
[  101.063265]  ? asm_common_interrupt+0x22/0x40
[  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
[  101.064215]  __wait_for_common+0x87/0x190
[  101.064648]  ? usleep_range_state+0x90/0x90
[  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
[  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
[  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
[  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
[  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
[  101.067738]  ? kmalloc_trace+0x4d/0x350
[  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
[  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
[  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
[  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
[  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
[  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
[  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
[  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
[  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
[  101.073033]  vfs_write+0x2d8/0x400
[  101.073416]  ? alloc_fd+0x48/0x180
[  101.073798]  ksys_write+0x5f/0xe0
[  101.074175]  do_syscall_64+0x52/0x110
[  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
from bond_change_active_slave, which requires holding the RTNL lock.
And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
context. So ipsec_lock doesn't have to be spin lock, change it to
mutex, and thus the above issue can be resolved.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/bonding/bond_main.c | 67 +++++++++++++++------------------
 include/net/bonding.h           |  2 +-
 2 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0d1129eaf47b..f20f6d83ad54 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -439,38 +439,33 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 	rcu_read_lock();
 	bond = netdev_priv(bond_dev);
 	slave = rcu_dereference(bond->curr_active_slave);
-	if (!slave) {
-		rcu_read_unlock();
+	real_dev = slave ? slave->dev : NULL;
+	rcu_read_unlock();
+	if (!real_dev)
 		return -ENODEV;
-	}
 
-	real_dev = slave->dev;
 	if (!real_dev->xfrmdev_ops ||
 	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
 	    netif_is_bond_master(real_dev)) {
 		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
-		rcu_read_unlock();
 		return -EINVAL;
 	}
 
-	ipsec = kmalloc(sizeof(*ipsec), GFP_ATOMIC);
-	if (!ipsec) {
-		rcu_read_unlock();
+	ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
+	if (!ipsec)
 		return -ENOMEM;
-	}
 
 	xs->xso.real_dev = real_dev;
 	err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
 	if (!err) {
 		ipsec->xs = xs;
 		INIT_LIST_HEAD(&ipsec->list);
-		spin_lock_bh(&bond->ipsec_lock);
+		mutex_lock(&bond->ipsec_lock);
 		list_add(&ipsec->list, &bond->ipsec_list);
-		spin_unlock_bh(&bond->ipsec_lock);
+		mutex_unlock(&bond->ipsec_lock);
 	} else {
 		kfree(ipsec);
 	}
-	rcu_read_unlock();
 	return err;
 }
 
@@ -481,35 +476,35 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
-	rcu_read_lock();
-	slave = rcu_dereference(bond->curr_active_slave);
-	if (!slave)
-		goto out;
+	slave = rtnl_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	if (!real_dev)
+		return;
 
-	real_dev = slave->dev;
+	mutex_lock(&bond->ipsec_lock);
 	if (!real_dev->xfrmdev_ops ||
 	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
 	    netif_is_bond_master(real_dev)) {
-		spin_lock_bh(&bond->ipsec_lock);
 		if (!list_empty(&bond->ipsec_list))
 			slave_warn(bond_dev, real_dev,
 				   "%s: no slave xdo_dev_state_add\n",
 				   __func__);
-		spin_unlock_bh(&bond->ipsec_lock);
 		goto out;
 	}
 
-	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+		/* If new state is added before ipsec_lock acquired */
+		if (ipsec->xs->xso.real_dev == real_dev)
+			continue;
+
 		ipsec->xs->xso.real_dev = real_dev;
 		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			ipsec->xs->xso.real_dev = NULL;
 		}
 	}
-	spin_unlock_bh(&bond->ipsec_lock);
 out:
-	rcu_read_unlock();
+	mutex_unlock(&bond->ipsec_lock);
 }
 
 /**
@@ -530,6 +525,8 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 	rcu_read_lock();
 	bond = netdev_priv(bond_dev);
 	slave = rcu_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	rcu_read_unlock();
 
 	if (!slave)
 		goto out;
@@ -537,7 +534,6 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 	if (!xs->xso.real_dev)
 		goto out;
 
-	real_dev = slave->dev;
 	WARN_ON(xs->xso.real_dev != real_dev);
 
 	if (!real_dev->xfrmdev_ops ||
@@ -549,7 +545,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 
 	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
 out:
-	spin_lock_bh(&bond->ipsec_lock);
+	mutex_lock(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
 		if (ipsec->xs == xs) {
 			list_del(&ipsec->list);
@@ -557,8 +553,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 			break;
 		}
 	}
-	spin_unlock_bh(&bond->ipsec_lock);
-	rcu_read_unlock();
+	mutex_unlock(&bond->ipsec_lock);
 }
 
 static void bond_ipsec_del_sa_all(struct bonding *bond)
@@ -568,15 +563,12 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
-	rcu_read_lock();
-	slave = rcu_dereference(bond->curr_active_slave);
-	if (!slave) {
-		rcu_read_unlock();
+	slave = rtnl_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	if (!real_dev)
 		return;
-	}
 
-	real_dev = slave->dev;
-	spin_lock_bh(&bond->ipsec_lock);
+	mutex_lock(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
 		if (!ipsec->xs->xso.real_dev)
 			continue;
@@ -593,8 +585,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
 		}
 	}
-	spin_unlock_bh(&bond->ipsec_lock);
-	rcu_read_unlock();
+	mutex_unlock(&bond->ipsec_lock);
 }
 
 static void bond_ipsec_free_sa(struct xfrm_state *xs)
@@ -5917,7 +5908,7 @@ void bond_setup(struct net_device *bond_dev)
 	/* set up xfrm device ops (only supported in active-backup right now) */
 	bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
 	INIT_LIST_HEAD(&bond->ipsec_list);
-	spin_lock_init(&bond->ipsec_lock);
+	mutex_init(&bond->ipsec_lock);
 #endif /* CONFIG_XFRM_OFFLOAD */
 
 	/* don't acquire bond device's netif_tx_lock when transmitting */
@@ -5966,6 +5957,10 @@ static void bond_uninit(struct net_device *bond_dev)
 		__bond_release_one(bond_dev, slave->dev, true, true);
 	netdev_info(bond_dev, "Released all slaves\n");
 
+#ifdef CONFIG_XFRM_OFFLOAD
+	mutex_destroy(&bond->ipsec_lock);
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 	bond_set_slave_arr(bond, NULL, NULL);
 
 	list_del_rcu(&bond->bond_list);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index b61fb1aa3a56..8bb5f016969f 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -260,7 +260,7 @@ struct bonding {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct list_head ipsec_list;
 	/* protecting ipsec_list */
-	spinlock_t ipsec_lock;
+	struct mutex ipsec_lock;
 #endif /* CONFIG_XFRM_OFFLOAD */
 	struct bpf_prog *xdp_prog;
 };
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-21  9:04 ` [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex Jianbo Liu
@ 2024-08-21 16:00   ` Jay Vosburgh
  2024-08-22  0:11     ` Jakub Kicinski
  2024-08-22  1:53     ` Jianbo Liu
  0 siblings, 2 replies; 12+ messages in thread
From: Jay Vosburgh @ 2024-08-21 16:00 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: netdev, davem, kuba, pabeni, edumazet, andy, saeedm, gal, leonro,
	liuhangbin, tariqt

Jianbo Liu <jianbol@nvidia.com> wrote:

>In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
>hence xdo_dev_state_add and xdo_dev_state_delete are called inside
>this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
>"scheduling while atomic" will be triggered when changing bond's
>active slave.
>
>[  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
>[  101.055726] Modules linked in:
>[  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
>[  101.058760] Hardware name:
>[  101.059434] Call Trace:
>[  101.059436]  <TASK>
>[  101.060873]  dump_stack_lvl+0x51/0x60
>[  101.061275]  __schedule_bug+0x4e/0x60
>[  101.061682]  __schedule+0x612/0x7c0
>[  101.062078]  ? __mod_timer+0x25c/0x370
>[  101.062486]  schedule+0x25/0xd0
>[  101.062845]  schedule_timeout+0x77/0xf0
>[  101.063265]  ? asm_common_interrupt+0x22/0x40
>[  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
>[  101.064215]  __wait_for_common+0x87/0x190
>[  101.064648]  ? usleep_range_state+0x90/0x90
>[  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
>[  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
>[  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
>[  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
>[  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
>[  101.067738]  ? kmalloc_trace+0x4d/0x350
>[  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
>[  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
>[  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
>[  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
>[  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
>[  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
>[  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
>[  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
>[  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
>[  101.073033]  vfs_write+0x2d8/0x400
>[  101.073416]  ? alloc_fd+0x48/0x180
>[  101.073798]  ksys_write+0x5f/0xe0
>[  101.074175]  do_syscall_64+0x52/0x110
>[  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
>from bond_change_active_slave, which requires holding the RTNL lock.
>And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
>xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
>context. So ipsec_lock doesn't have to be spin lock, change it to
>mutex, and thus the above issue can be resolved.
>
>Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
>Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
>Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 67 +++++++++++++++------------------
> include/net/bonding.h           |  2 +-
> 2 files changed, 32 insertions(+), 37 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 0d1129eaf47b..f20f6d83ad54 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -439,38 +439,33 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
> 	rcu_read_lock();
> 	bond = netdev_priv(bond_dev);
> 	slave = rcu_dereference(bond->curr_active_slave);
>-	if (!slave) {
>-		rcu_read_unlock();
>+	real_dev = slave ? slave->dev : NULL;
>+	rcu_read_unlock();
>+	if (!real_dev)
> 		return -ENODEV;

	In reading these, I was confused as to why some changes use
rcu_read_lock(), rcu_dereference() and others use rtnl_dereference(); I
think it's because bond_ipsec_{add,del}_sa_all() are guaranteed to be
called under RTNL, while the bond_ipsec_{add,del}_sa() functions are do
not have that guarantee.  Am I understanding correctly?

>-	}
> 
>-	real_dev = slave->dev;
> 	if (!real_dev->xfrmdev_ops ||
> 	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> 	    netif_is_bond_master(real_dev)) {
> 		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
>-		rcu_read_unlock();
> 		return -EINVAL;
> 	}
> 
>-	ipsec = kmalloc(sizeof(*ipsec), GFP_ATOMIC);
>-	if (!ipsec) {
>-		rcu_read_unlock();
>+	ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
>+	if (!ipsec)
> 		return -ENOMEM;

	Presumably the switch from ATOMIC to KERNEL is safe because this
is only called under RTNL (and therefore always has a process context),
i.e., this change is independent of any other changes in the patch.
Correct?

>-	}
> 
> 	xs->xso.real_dev = real_dev;
> 	err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
> 	if (!err) {
> 		ipsec->xs = xs;
> 		INIT_LIST_HEAD(&ipsec->list);
>-		spin_lock_bh(&bond->ipsec_lock);
>+		mutex_lock(&bond->ipsec_lock);
> 		list_add(&ipsec->list, &bond->ipsec_list);
>-		spin_unlock_bh(&bond->ipsec_lock);
>+		mutex_unlock(&bond->ipsec_lock);
> 	} else {
> 		kfree(ipsec);
> 	}
>-	rcu_read_unlock();
> 	return err;
> }
> 
>@@ -481,35 +476,35 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
> 	struct bond_ipsec *ipsec;
> 	struct slave *slave;
> 
>-	rcu_read_lock();
>-	slave = rcu_dereference(bond->curr_active_slave);
>-	if (!slave)
>-		goto out;
>+	slave = rtnl_dereference(bond->curr_active_slave);
>+	real_dev = slave ? slave->dev : NULL;
>+	if (!real_dev)
>+		return;
> 
>-	real_dev = slave->dev;
>+	mutex_lock(&bond->ipsec_lock);
> 	if (!real_dev->xfrmdev_ops ||
> 	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> 	    netif_is_bond_master(real_dev)) {
>-		spin_lock_bh(&bond->ipsec_lock);
> 		if (!list_empty(&bond->ipsec_list))
> 			slave_warn(bond_dev, real_dev,
> 				   "%s: no slave xdo_dev_state_add\n",
> 				   __func__);
>-		spin_unlock_bh(&bond->ipsec_lock);
> 		goto out;
> 	}
> 
>-	spin_lock_bh(&bond->ipsec_lock);
> 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>+		/* If new state is added before ipsec_lock acquired */
>+		if (ipsec->xs->xso.real_dev == real_dev)
>+			continue;
>+
> 		ipsec->xs->xso.real_dev = real_dev;
> 		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
> 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
> 			ipsec->xs->xso.real_dev = NULL;
> 		}
> 	}
>-	spin_unlock_bh(&bond->ipsec_lock);
> out:
>-	rcu_read_unlock();
>+	mutex_unlock(&bond->ipsec_lock);
> }
> 
> /**
>@@ -530,6 +525,8 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> 	rcu_read_lock();
> 	bond = netdev_priv(bond_dev);
> 	slave = rcu_dereference(bond->curr_active_slave);
>+	real_dev = slave ? slave->dev : NULL;
>+	rcu_read_unlock();

	Is it really safe to access real_dev once we've left the rcu
critical section?  What prevents the device referenced by real_dev from
being deleted as soon as rcu_read_unlock() completes?

	-J
	
> 
> 	if (!slave)
> 		goto out;
>@@ -537,7 +534,6 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> 	if (!xs->xso.real_dev)
> 		goto out;
> 
>-	real_dev = slave->dev;
> 	WARN_ON(xs->xso.real_dev != real_dev);
> 
> 	if (!real_dev->xfrmdev_ops ||
>@@ -549,7 +545,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> 
> 	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> out:
>-	spin_lock_bh(&bond->ipsec_lock);
>+	mutex_lock(&bond->ipsec_lock);
> 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> 		if (ipsec->xs == xs) {
> 			list_del(&ipsec->list);
>@@ -557,8 +553,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> 			break;
> 		}
> 	}
>-	spin_unlock_bh(&bond->ipsec_lock);
>-	rcu_read_unlock();
>+	mutex_unlock(&bond->ipsec_lock);
> }
> 
> static void bond_ipsec_del_sa_all(struct bonding *bond)
>@@ -568,15 +563,12 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> 	struct bond_ipsec *ipsec;
> 	struct slave *slave;
> 
>-	rcu_read_lock();
>-	slave = rcu_dereference(bond->curr_active_slave);
>-	if (!slave) {
>-		rcu_read_unlock();
>+	slave = rtnl_dereference(bond->curr_active_slave);
>+	real_dev = slave ? slave->dev : NULL;
>+	if (!real_dev)
> 		return;
>-	}
> 
>-	real_dev = slave->dev;
>-	spin_lock_bh(&bond->ipsec_lock);
>+	mutex_lock(&bond->ipsec_lock);
> 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> 		if (!ipsec->xs->xso.real_dev)
> 			continue;
>@@ -593,8 +585,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> 				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
> 		}
> 	}
>-	spin_unlock_bh(&bond->ipsec_lock);
>-	rcu_read_unlock();
>+	mutex_unlock(&bond->ipsec_lock);
> }
> 
> static void bond_ipsec_free_sa(struct xfrm_state *xs)
>@@ -5917,7 +5908,7 @@ void bond_setup(struct net_device *bond_dev)
> 	/* set up xfrm device ops (only supported in active-backup right now) */
> 	bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
> 	INIT_LIST_HEAD(&bond->ipsec_list);
>-	spin_lock_init(&bond->ipsec_lock);
>+	mutex_init(&bond->ipsec_lock);
> #endif /* CONFIG_XFRM_OFFLOAD */
> 
> 	/* don't acquire bond device's netif_tx_lock when transmitting */
>@@ -5966,6 +5957,10 @@ static void bond_uninit(struct net_device *bond_dev)
> 		__bond_release_one(bond_dev, slave->dev, true, true);
> 	netdev_info(bond_dev, "Released all slaves\n");
> 
>+#ifdef CONFIG_XFRM_OFFLOAD
>+	mutex_destroy(&bond->ipsec_lock);
>+#endif /* CONFIG_XFRM_OFFLOAD */
>+
> 	bond_set_slave_arr(bond, NULL, NULL);
> 
> 	list_del_rcu(&bond->bond_list);
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index b61fb1aa3a56..8bb5f016969f 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -260,7 +260,7 @@ struct bonding {
> #ifdef CONFIG_XFRM_OFFLOAD
> 	struct list_head ipsec_list;
> 	/* protecting ipsec_list */
>-	spinlock_t ipsec_lock;
>+	struct mutex ipsec_lock;
> #endif /* CONFIG_XFRM_OFFLOAD */
> 	struct bpf_prog *xdp_prog;
> };
>-- 
>2.21.0
>

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-21 16:00   ` Jay Vosburgh
@ 2024-08-22  0:11     ` Jakub Kicinski
  2024-08-22  2:07       ` Jianbo Liu
  2024-08-22  1:53     ` Jianbo Liu
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2024-08-22  0:11 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Jianbo Liu, netdev, davem, pabeni, edumazet, andy, saeedm, gal,
	leonro, liuhangbin, tariqt

On Wed, 21 Aug 2024 09:00:30 -0700 Jay Vosburgh wrote:
> 	Is it really safe to access real_dev once we've left the rcu
> critical section?  What prevents the device referenced by real_dev from
> being deleted as soon as rcu_read_unlock() completes?

Hah, I asked them this question at least 2 times.
Let's see if your superior communication skills help :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-21 16:00   ` Jay Vosburgh
  2024-08-22  0:11     ` Jakub Kicinski
@ 2024-08-22  1:53     ` Jianbo Liu
  2024-08-22  6:05       ` Jay Vosburgh
  1 sibling, 1 reply; 12+ messages in thread
From: Jianbo Liu @ 2024-08-22  1:53 UTC (permalink / raw)
  To: jv@jvosburgh.net
  Cc: liuhangbin@gmail.com, davem@davemloft.net, Leon Romanovsky,
	Gal Pressman, andy@greyhouse.net, Tariq Toukan,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed, kuba@kernel.org

On Wed, 2024-08-21 at 09:00 -0700, Jay Vosburgh wrote:
> Jianbo Liu <jianbol@nvidia.com> wrote:
> 
> > In the cited commit, bond->ipsec_lock is added to protect
> > ipsec_list,
> > hence xdo_dev_state_add and xdo_dev_state_delete are called inside
> > this lock. As ipsec_lock is a spin lock and such xfrmdev ops may
> > sleep,
> > "scheduling while atomic" will be triggered when changing bond's
> > active slave.
> > 
> > [  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
> > [  101.055726] Modules linked in:
> > [  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
> > [  101.058760] Hardware name:
> > [  101.059434] Call Trace:
> > [  101.059436]  <TASK>
> > [  101.060873]  dump_stack_lvl+0x51/0x60
> > [  101.061275]  __schedule_bug+0x4e/0x60
> > [  101.061682]  __schedule+0x612/0x7c0
> > [  101.062078]  ? __mod_timer+0x25c/0x370
> > [  101.062486]  schedule+0x25/0xd0
> > [  101.062845]  schedule_timeout+0x77/0xf0
> > [  101.063265]  ? asm_common_interrupt+0x22/0x40
> > [  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
> > [  101.064215]  __wait_for_common+0x87/0x190
> > [  101.064648]  ? usleep_range_state+0x90/0x90
> > [  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
> > [  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
> > [  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
> > [  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
> > [  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
> > [  101.067738]  ? kmalloc_trace+0x4d/0x350
> > [  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
> > [  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
> > [  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
> > [  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
> > [  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
> > [  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
> > [  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
> > [  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
> > [  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
> > [  101.073033]  vfs_write+0x2d8/0x400
> > [  101.073416]  ? alloc_fd+0x48/0x180
> > [  101.073798]  ksys_write+0x5f/0xe0
> > [  101.074175]  do_syscall_64+0x52/0x110
> > [  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > 
> > As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
> > from bond_change_active_slave, which requires holding the RTNL
> > lock.
> > And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
> > xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
> > context. So ipsec_lock doesn't have to be spin lock, change it to
> > mutex, and thus the above issue can be resolved.
> > 
> > Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> > Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> > Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
> > ---
> > drivers/net/bonding/bond_main.c | 67 +++++++++++++++---------------
> > ---
> > include/net/bonding.h           |  2 +-
> > 2 files changed, 32 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/net/bonding/bond_main.c
> > b/drivers/net/bonding/bond_main.c
> > index 0d1129eaf47b..f20f6d83ad54 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -439,38 +439,33 @@ static int bond_ipsec_add_sa(struct
> > xfrm_state *xs,
> >         rcu_read_lock();
> >         bond = netdev_priv(bond_dev);
> >         slave = rcu_dereference(bond->curr_active_slave);
> > -       if (!slave) {
> > -               rcu_read_unlock();
> > +       real_dev = slave ? slave->dev : NULL;
> > +       rcu_read_unlock();
> > +       if (!real_dev)
> >                 return -ENODEV;
> 
>         In reading these, I was confused as to why some changes use
> rcu_read_lock(), rcu_dereference() and others use rtnl_dereference();
> I
> think it's because bond_ipsec_{add,del}_sa_all() are guaranteed to be
> called under RTNL, while the bond_ipsec_{add,del}_sa() functions are
> do
> not have that guarantee.  Am I understanding correctly?
> 

Right. bond_ipsec_{add,del}_sa_all() are called by
bond_change_active_slave() which has ASSERT_RTNL(), so I think they are
under RTNL.

> > -       }
> > 
> > -       real_dev = slave->dev;
> >         if (!real_dev->xfrmdev_ops ||
> >             !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> >             netif_is_bond_master(real_dev)) {
> >                 NL_SET_ERR_MSG_MOD(extack, "Slave does not support
> > ipsec offload");
> > -               rcu_read_unlock();
> >                 return -EINVAL;
> >         }
> > 
> > -       ipsec = kmalloc(sizeof(*ipsec), GFP_ATOMIC);
> > -       if (!ipsec) {
> > -               rcu_read_unlock();
> > +       ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
> > +       if (!ipsec)
> >                 return -ENOMEM;
> 
>         Presumably the switch from ATOMIC to KERNEL is safe because
> this
> is only called under RTNL (and therefore always has a process
> context),
> i.e., this change is independent of any other changes in the patch.
> Correct?
> 

No. And it's RCU here, not RTNL. We are safe to use KERNEL after it's
out of the RCU context, right? And this was suggested by Paolo after he
reviewd the first version.      

> > -       }
> > 
> >         xs->xso.real_dev = real_dev;
> >         err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
> >         if (!err) {
> >                 ipsec->xs = xs;
> >                 INIT_LIST_HEAD(&ipsec->list);
> > -               spin_lock_bh(&bond->ipsec_lock);
> > +               mutex_lock(&bond->ipsec_lock);
> >                 list_add(&ipsec->list, &bond->ipsec_list);
> > -               spin_unlock_bh(&bond->ipsec_lock);
> > +               mutex_unlock(&bond->ipsec_lock);
> >         } else {
> >                 kfree(ipsec);
> >         }
> > -       rcu_read_unlock();
> >         return err;
> > }
> > 
> > @@ -481,35 +476,35 @@ static void bond_ipsec_add_sa_all(struct
> > bonding *bond)
> >         struct bond_ipsec *ipsec;
> >         struct slave *slave;
> > 
> > -       rcu_read_lock();
> > -       slave = rcu_dereference(bond->curr_active_slave);
> > -       if (!slave)
> > -               goto out;
> > +       slave = rtnl_dereference(bond->curr_active_slave);
> > +       real_dev = slave ? slave->dev : NULL;
> > +       if (!real_dev)
> > +               return;
> > 
> > -       real_dev = slave->dev;
> > +       mutex_lock(&bond->ipsec_lock);
> >         if (!real_dev->xfrmdev_ops ||
> >             !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> >             netif_is_bond_master(real_dev)) {
> > -               spin_lock_bh(&bond->ipsec_lock);
> >                 if (!list_empty(&bond->ipsec_list))
> >                         slave_warn(bond_dev, real_dev,
> >                                    "%s: no slave
> > xdo_dev_state_add\n",
> >                                    __func__);
> > -               spin_unlock_bh(&bond->ipsec_lock);
> >                 goto out;
> >         }
> > 
> > -       spin_lock_bh(&bond->ipsec_lock);
> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > +               /* If new state is added before ipsec_lock acquired
> > */
> > +               if (ipsec->xs->xso.real_dev == real_dev)
> > +                       continue;
> > +
> >                 ipsec->xs->xso.real_dev = real_dev;
> >                 if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec-
> > >xs, NULL)) {
> >                         slave_warn(bond_dev, real_dev, "%s: failed
> > to add SA\n", __func__);
> >                         ipsec->xs->xso.real_dev = NULL;
> >                 }
> >         }
> > -       spin_unlock_bh(&bond->ipsec_lock);
> > out:
> > -       rcu_read_unlock();
> > +       mutex_unlock(&bond->ipsec_lock);
> > }
> > 
> > /**
> > @@ -530,6 +525,8 @@ static void bond_ipsec_del_sa(struct xfrm_state
> > *xs)
> >         rcu_read_lock();
> >         bond = netdev_priv(bond_dev);
> >         slave = rcu_dereference(bond->curr_active_slave);
> > +       real_dev = slave ? slave->dev : NULL;
> > +       rcu_read_unlock();
> 
>         Is it really safe to access real_dev once we've left the rcu
> critical section?  What prevents the device referenced by real_dev
> from
> being deleted as soon as rcu_read_unlock() completes?
> 

I am not sure. But RCU protects accessing of the context pointed by
curr_active_slave, not slave->dev itself. I wrong about this?
I can move rcu_read_unlock after xdo_dev_state_delete(). And do the
same change for bond_ipsec_add_sa and bond_ipsec_free_sa.
What do you think?

Thanks!
Jianbo 

>         -J
>         
> > 
> >         if (!slave)
> >                 goto out;
> > @@ -537,7 +534,6 @@ static void bond_ipsec_del_sa(struct xfrm_state
> > *xs)
> >         if (!xs->xso.real_dev)
> >                 goto out;
> > 
> > -       real_dev = slave->dev;
> >         WARN_ON(xs->xso.real_dev != real_dev);
> > 
> >         if (!real_dev->xfrmdev_ops ||
> > @@ -549,7 +545,7 @@ static void bond_ipsec_del_sa(struct xfrm_state
> > *xs)
> > 
> >         real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> > out:
> > -       spin_lock_bh(&bond->ipsec_lock);
> > +       mutex_lock(&bond->ipsec_lock);
> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> >                 if (ipsec->xs == xs) {
> >                         list_del(&ipsec->list);
> > @@ -557,8 +553,7 @@ static void bond_ipsec_del_sa(struct xfrm_state
> > *xs)
> >                         break;
> >                 }
> >         }
> > -       spin_unlock_bh(&bond->ipsec_lock);
> > -       rcu_read_unlock();
> > +       mutex_unlock(&bond->ipsec_lock);
> > }
> > 
> > static void bond_ipsec_del_sa_all(struct bonding *bond)
> > @@ -568,15 +563,12 @@ static void bond_ipsec_del_sa_all(struct
> > bonding *bond)
> >         struct bond_ipsec *ipsec;
> >         struct slave *slave;
> > 
> > -       rcu_read_lock();
> > -       slave = rcu_dereference(bond->curr_active_slave);
> > -       if (!slave) {
> > -               rcu_read_unlock();
> > +       slave = rtnl_dereference(bond->curr_active_slave);
> > +       real_dev = slave ? slave->dev : NULL;
> > +       if (!real_dev)
> >                 return;
> > -       }
> > 
> > -       real_dev = slave->dev;
> > -       spin_lock_bh(&bond->ipsec_lock);
> > +       mutex_lock(&bond->ipsec_lock);
> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> >                 if (!ipsec->xs->xso.real_dev)
> >                         continue;
> > @@ -593,8 +585,7 @@ static void bond_ipsec_del_sa_all(struct
> > bonding *bond)
> >                                 real_dev->xfrmdev_ops-
> > >xdo_dev_state_free(ipsec->xs);
> >                 }
> >         }
> > -       spin_unlock_bh(&bond->ipsec_lock);
> > -       rcu_read_unlock();
> > +       mutex_unlock(&bond->ipsec_lock);
> > }
> > 
> > static void bond_ipsec_free_sa(struct xfrm_state *xs)
> > @@ -5917,7 +5908,7 @@ void bond_setup(struct net_device *bond_dev)
> >         /* set up xfrm device ops (only supported in active-backup
> > right now) */
> >         bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
> >         INIT_LIST_HEAD(&bond->ipsec_list);
> > -       spin_lock_init(&bond->ipsec_lock);
> > +       mutex_init(&bond->ipsec_lock);
> > #endif /* CONFIG_XFRM_OFFLOAD */
> > 
> >         /* don't acquire bond device's netif_tx_lock when
> > transmitting */
> > @@ -5966,6 +5957,10 @@ static void bond_uninit(struct net_device
> > *bond_dev)
> >                 __bond_release_one(bond_dev, slave->dev, true,
> > true);
> >         netdev_info(bond_dev, "Released all slaves\n");
> > 
> > +#ifdef CONFIG_XFRM_OFFLOAD
> > +       mutex_destroy(&bond->ipsec_lock);
> > +#endif /* CONFIG_XFRM_OFFLOAD */
> > +
> >         bond_set_slave_arr(bond, NULL, NULL);
> > 
> >         list_del_rcu(&bond->bond_list);
> > diff --git a/include/net/bonding.h b/include/net/bonding.h
> > index b61fb1aa3a56..8bb5f016969f 100644
> > --- a/include/net/bonding.h
> > +++ b/include/net/bonding.h
> > @@ -260,7 +260,7 @@ struct bonding {
> > #ifdef CONFIG_XFRM_OFFLOAD
> >         struct list_head ipsec_list;
> >         /* protecting ipsec_list */
> > -       spinlock_t ipsec_lock;
> > +       struct mutex ipsec_lock;
> > #endif /* CONFIG_XFRM_OFFLOAD */
> >         struct bpf_prog *xdp_prog;
> > };
> > -- 
> > 2.21.0
> > 
> 
> ---
>         -Jay Vosburgh, jv@jvosburgh.net


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-22  0:11     ` Jakub Kicinski
@ 2024-08-22  2:07       ` Jianbo Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Jianbo Liu @ 2024-08-22  2:07 UTC (permalink / raw)
  To: jv@jvosburgh.net, kuba@kernel.org
  Cc: liuhangbin@gmail.com, davem@davemloft.net, Tariq Toukan,
	Gal Pressman, andy@greyhouse.net, netdev@vger.kernel.org,
	pabeni@redhat.com, edumazet@google.com, Saeed Mahameed,
	Leon Romanovsky

On Wed, 2024-08-21 at 17:11 -0700, Jakub Kicinski wrote:
> On Wed, 21 Aug 2024 09:00:30 -0700 Jay Vosburgh wrote:
> >         Is it really safe to access real_dev once we've left the
> > rcu
> > critical section?  What prevents the device referenced by real_dev
> > from
> > being deleted as soon as rcu_read_unlock() completes?
> 
> Hah, I asked them this question at least 2 times.
> Let's see if your superior communication skills help :)


Sorry, I maybe misunderstood. I really need your suggestion about how
to change it. Always waiting for you, hangbin or anyother's confirm
before do any update. V5 is the rebase. If you and Jay want to move
rcu_read_unlock after xfrm callbacks, I will send new version soon. 

Thanks!
Jianbo 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-22  1:53     ` Jianbo Liu
@ 2024-08-22  6:05       ` Jay Vosburgh
  2024-08-22 11:15         ` Jianbo Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Jay Vosburgh @ 2024-08-22  6:05 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: liuhangbin@gmail.com, davem@davemloft.net, Leon Romanovsky,
	Gal Pressman, andy@greyhouse.net, Tariq Toukan,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed, kuba@kernel.org

Jianbo Liu <jianbol@nvidia.com> wrote:

>On Wed, 2024-08-21 at 09:00 -0700, Jay Vosburgh wrote:
>> Jianbo Liu <jianbol@nvidia.com> wrote:
>> 
>> > In the cited commit, bond->ipsec_lock is added to protect
>> > ipsec_list,
>> > hence xdo_dev_state_add and xdo_dev_state_delete are called inside
>> > this lock. As ipsec_lock is a spin lock and such xfrmdev ops may
>> > sleep,
>> > "scheduling while atomic" will be triggered when changing bond's
>> > active slave.
>> > 
>> > [  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
>> > [  101.055726] Modules linked in:
>> > [  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
>> > [  101.058760] Hardware name:
>> > [  101.059434] Call Trace:
>> > [  101.059436]  <TASK>
>> > [  101.060873]  dump_stack_lvl+0x51/0x60
>> > [  101.061275]  __schedule_bug+0x4e/0x60
>> > [  101.061682]  __schedule+0x612/0x7c0
>> > [  101.062078]  ? __mod_timer+0x25c/0x370
 >> > [  101.062486]  schedule+0x25/0xd0
>> > [  101.062845]  schedule_timeout+0x77/0xf0
>> > [  101.063265]  ? asm_common_interrupt+0x22/0x40
>> > [  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
>> > [  101.064215]  __wait_for_common+0x87/0x190
>> > [  101.064648]  ? usleep_range_state+0x90/0x90
>> > [  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
>> > [  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
>> > [  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
>> > [  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
>> > [  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
>> > [  101.067738]  ? kmalloc_trace+0x4d/0x350
>> > [  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
>> > [  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
>> > [  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
>> > [  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
>> > [  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
>> > [  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
>> > [  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
>> > [  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
>> > [  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
>> > [  101.073033]  vfs_write+0x2d8/0x400
>> > [  101.073416]  ? alloc_fd+0x48/0x180
>> > [  101.073798]  ksys_write+0x5f/0xe0
>> > [  101.074175]  do_syscall_64+0x52/0x110
>> > [  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>> > 
>> > As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
>> > from bond_change_active_slave, which requires holding the RTNL
>> > lock.
>> > And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
>> > xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
>> > context. So ipsec_lock doesn't have to be spin lock, change it to
>> > mutex, and thus the above issue can be resolved.
>> > 
>> > Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
>> > Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
>> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> > Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
>> > ---
>> > drivers/net/bonding/bond_main.c | 67 +++++++++++++++---------------
>> > ---
>> > include/net/bonding.h           |  2 +-
>> > 2 files changed, 32 insertions(+), 37 deletions(-)
>> > 
>> > diff --git a/drivers/net/bonding/bond_main.c
>> > b/drivers/net/bonding/bond_main.c
>> > index 0d1129eaf47b..f20f6d83ad54 100644
>> > --- a/drivers/net/bonding/bond_main.c
>> > +++ b/drivers/net/bonding/bond_main.c
>> > @@ -439,38 +439,33 @@ static int bond_ipsec_add_sa(struct
>> > xfrm_state *xs,
>> >         rcu_read_lock();
>> >         bond = netdev_priv(bond_dev);
>> >         slave = rcu_dereference(bond->curr_active_slave);
>> > -       if (!slave) {
>> > -               rcu_read_unlock();
>> > +       real_dev = slave ? slave->dev : NULL;
>> > +       rcu_read_unlock();
>> > +       if (!real_dev)
>> >                 return -ENODEV;
>> 
>>         In reading these, I was confused as to why some changes use
>> rcu_read_lock(), rcu_dereference() and others use rtnl_dereference();
>> I
>> think it's because bond_ipsec_{add,del}_sa_all() are guaranteed to be
>> called under RTNL, while the bond_ipsec_{add,del}_sa() functions are
>> do
>> not have that guarantee.  Am I understanding correctly?
>> 
>
>Right. bond_ipsec_{add,del}_sa_all() are called by
>bond_change_active_slave() which has ASSERT_RTNL(), so I think they are
>under RTNL.

	Yes.

>> > -       }
>> > 
>> > -       real_dev = slave->dev;
>> >         if (!real_dev->xfrmdev_ops ||
>> >             !real_dev->xfrmdev_ops->xdo_dev_state_add ||
>> >             netif_is_bond_master(real_dev)) {
>> >                 NL_SET_ERR_MSG_MOD(extack, "Slave does not support
>> > ipsec offload");
>> > -               rcu_read_unlock();
>> >                 return -EINVAL;
>> >         }
>> > 
>> > -       ipsec = kmalloc(sizeof(*ipsec), GFP_ATOMIC);
>> > -       if (!ipsec) {
>> > -               rcu_read_unlock();
>> > +       ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
>> > +       if (!ipsec)
>> >                 return -ENOMEM;
>> 
>>         Presumably the switch from ATOMIC to KERNEL is safe because
>> this
>> is only called under RTNL (and therefore always has a process
>> context),
>> i.e., this change is independent of any other changes in the patch.
>> Correct?
>> 
>
>No. And it's RCU here, not RTNL. We are safe to use KERNEL after it's
>out of the RCU context, right? And this was suggested by Paolo after he
>reviewd the first version.      

	Ok, I think I follow now.  And, yes, KERNEL is ok when outside
the RCU critical section, but not inside of it (because sleeping is not
permitted within the critical section).

>> > -       }
>> > 
>> >         xs->xso.real_dev = real_dev;
>> >         err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
>> >         if (!err) {
>> >                 ipsec->xs = xs;
>> >                 INIT_LIST_HEAD(&ipsec->list);
>> > -               spin_lock_bh(&bond->ipsec_lock);
>> > +               mutex_lock(&bond->ipsec_lock);
>> >                 list_add(&ipsec->list, &bond->ipsec_list);
>> > -               spin_unlock_bh(&bond->ipsec_lock);
>> > +               mutex_unlock(&bond->ipsec_lock);
>> >         } else {
>> >                 kfree(ipsec);
>> >         }
>> > -       rcu_read_unlock();
>> >         return err;
>> > }
>> > 
>> > @@ -481,35 +476,35 @@ static void bond_ipsec_add_sa_all(struct
>> > bonding *bond)
>> >         struct bond_ipsec *ipsec;
>> >         struct slave *slave;
>> > 
>> > -       rcu_read_lock();
>> > -       slave = rcu_dereference(bond->curr_active_slave);
>> > -       if (!slave)
>> > -               goto out;
>> > +       slave = rtnl_dereference(bond->curr_active_slave);
>> > +       real_dev = slave ? slave->dev : NULL;
>> > +       if (!real_dev)
>> > +               return;
>> > 
>> > -       real_dev = slave->dev;
>> > +       mutex_lock(&bond->ipsec_lock);
>> >         if (!real_dev->xfrmdev_ops ||
>> >             !real_dev->xfrmdev_ops->xdo_dev_state_add ||
>> >             netif_is_bond_master(real_dev)) {
>> > -               spin_lock_bh(&bond->ipsec_lock);
>> >                 if (!list_empty(&bond->ipsec_list))
>> >                         slave_warn(bond_dev, real_dev,
>> >                                    "%s: no slave
>> > xdo_dev_state_add\n",
>> >                                    __func__);
>> > -               spin_unlock_bh(&bond->ipsec_lock);
>> >                 goto out;
>> >         }
>> > 
>> > -       spin_lock_bh(&bond->ipsec_lock);
>> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>> > +               /* If new state is added before ipsec_lock acquired
>> > */
>> > +               if (ipsec->xs->xso.real_dev == real_dev)
>> > +                       continue;
>> > +
>> >                 ipsec->xs->xso.real_dev = real_dev;
>> >                 if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec-
>> > >xs, NULL)) {
>> >                         slave_warn(bond_dev, real_dev, "%s: failed
>> > to add SA\n", __func__);
>> >                         ipsec->xs->xso.real_dev = NULL;
>> >                 }
>> >         }
>> > -       spin_unlock_bh(&bond->ipsec_lock);
>> > out:
>> > -       rcu_read_unlock();
>> > +       mutex_unlock(&bond->ipsec_lock);
>> > }
>> > 
>> > /**
>> > @@ -530,6 +525,8 @@ static void bond_ipsec_del_sa(struct xfrm_state
>> > *xs)
>> >         rcu_read_lock();
>> >         bond = netdev_priv(bond_dev);
>> >         slave = rcu_dereference(bond->curr_active_slave);
>> > +       real_dev = slave ? slave->dev : NULL;
>> > +       rcu_read_unlock();
>> 
>>         Is it really safe to access real_dev once we've left the rcu
>> critical section?  What prevents the device referenced by real_dev
>> from
>> being deleted as soon as rcu_read_unlock() completes?
>> 
>
>I am not sure. But RCU protects accessing of the context pointed by
>curr_active_slave, not slave->dev itself. I wrong about this?

	No, you're not wrong: RCU does indeed protect the "slave"
pointer in the above code while inside the RCU read-side critical
section.

	However, we also know that as long as we're within that critical
section, whatever the "slave" pointer points to will remain valid,
because what curr_active_slave points to is also RCU protected (not only
the pointer itself).

	For the interface behind slave->dev specifically, any attempt to
delete an interface that is a member of a bond must pass through
__bond_release_one() first, and it calls synchronize_rcu() as part of
its processing (which will wait for active read-side critical sections
to complete).  Therefore, the bond member interface behind slave->dev
here cannot simply vanish while execution is within this critical
section.

>I can move rcu_read_unlock after xdo_dev_state_delete(). And do the
>same change for bond_ipsec_add_sa and bond_ipsec_free_sa.
>What do you think?

	The original issue was that the xfrm callback within mlx5 could
sleep while a spin lock was held.  However, sleeping is not permitted
within an RCU read-side critical section, either, so would this simply
reintroduce the original problem from a different angle?

	Assuming that's correct, I think one way around that is to
acquire a reference (via dev_hold or netdev_hold) to the interface
(i.e., real_dev) within the minimal rcu_read_lock / rcu_read_unlock, do
the xfrm magic, and then release the reference when finished.  That
won't prevent the interface from being removed from the bond and the
struct slave being freed outside of the RCU critical section, so the
code would also need to use only real_dev after rcu_read_unlock() is
called.

	-J

>Thanks!
>Jianbo 
>
>>         -J
>>         
>> > 
>> >         if (!slave)
>> >                 goto out;
>> > @@ -537,7 +534,6 @@ static void bond_ipsec_del_sa(struct xfrm_state
>> > *xs)
>> >         if (!xs->xso.real_dev)
>> >                 goto out;
>> > 
>> > -       real_dev = slave->dev;
>> >         WARN_ON(xs->xso.real_dev != real_dev);
>> > 
>> >         if (!real_dev->xfrmdev_ops ||
>> > @@ -549,7 +545,7 @@ static void bond_ipsec_del_sa(struct xfrm_state
>> > *xs)
>> > 
>> >         real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
>> > out:
>> > -       spin_lock_bh(&bond->ipsec_lock);
>> > +       mutex_lock(&bond->ipsec_lock);
>> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>> >                 if (ipsec->xs == xs) {
>> >                         list_del(&ipsec->list);
>> > @@ -557,8 +553,7 @@ static void bond_ipsec_del_sa(struct xfrm_state
>> > *xs)
>> >                         break;
>> >                 }
>> >         }
>> > -       spin_unlock_bh(&bond->ipsec_lock);
>> > -       rcu_read_unlock();
>> > +       mutex_unlock(&bond->ipsec_lock);
>> > }
>> > 
>> > static void bond_ipsec_del_sa_all(struct bonding *bond)
>> > @@ -568,15 +563,12 @@ static void bond_ipsec_del_sa_all(struct
>> > bonding *bond)
>> >         struct bond_ipsec *ipsec;
>> >         struct slave *slave;
>> > 
>> > -       rcu_read_lock();
>> > -       slave = rcu_dereference(bond->curr_active_slave);
>> > -       if (!slave) {
>> > -               rcu_read_unlock();
>> > +       slave = rtnl_dereference(bond->curr_active_slave);
>> > +       real_dev = slave ? slave->dev : NULL;
>> > +       if (!real_dev)
>> >                 return;
>> > -       }
>> > 
>> > -       real_dev = slave->dev;
>> > -       spin_lock_bh(&bond->ipsec_lock);
>> > +       mutex_lock(&bond->ipsec_lock);
>> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>> >                 if (!ipsec->xs->xso.real_dev)
>> >                         continue;
>> > @@ -593,8 +585,7 @@ static void bond_ipsec_del_sa_all(struct
>> > bonding *bond)
>> >                                 real_dev->xfrmdev_ops-
>> > >xdo_dev_state_free(ipsec->xs);
>> >                 }
>> >         }
>> > -       spin_unlock_bh(&bond->ipsec_lock);
>> > -       rcu_read_unlock();
>> > +       mutex_unlock(&bond->ipsec_lock);
>> > }
>> > 
>> > static void bond_ipsec_free_sa(struct xfrm_state *xs)
>> > @@ -5917,7 +5908,7 @@ void bond_setup(struct net_device *bond_dev)
>> >         /* set up xfrm device ops (only supported in active-backup
>> > right now) */
>> >         bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
>> >         INIT_LIST_HEAD(&bond->ipsec_list);
>> > -       spin_lock_init(&bond->ipsec_lock);
>> > +       mutex_init(&bond->ipsec_lock);
>> > #endif /* CONFIG_XFRM_OFFLOAD */
>> > 
>> >         /* don't acquire bond device's netif_tx_lock when
>> > transmitting */
>> > @@ -5966,6 +5957,10 @@ static void bond_uninit(struct net_device
>> > *bond_dev)
>> >                 __bond_release_one(bond_dev, slave->dev, true,
>> > true);
>> >         netdev_info(bond_dev, "Released all slaves\n");
>> > 
>> > +#ifdef CONFIG_XFRM_OFFLOAD
>> > +       mutex_destroy(&bond->ipsec_lock);
>> > +#endif /* CONFIG_XFRM_OFFLOAD */
>> > +
>> >         bond_set_slave_arr(bond, NULL, NULL);
>> > 
>> >         list_del_rcu(&bond->bond_list);
>> > diff --git a/include/net/bonding.h b/include/net/bonding.h
>> > index b61fb1aa3a56..8bb5f016969f 100644
>> > --- a/include/net/bonding.h
>> > +++ b/include/net/bonding.h
>> > @@ -260,7 +260,7 @@ struct bonding {
>> > #ifdef CONFIG_XFRM_OFFLOAD
>> >         struct list_head ipsec_list;
>> >         /* protecting ipsec_list */
>> > -       spinlock_t ipsec_lock;
>> > +       struct mutex ipsec_lock;
>> > #endif /* CONFIG_XFRM_OFFLOAD */
>> >         struct bpf_prog *xdp_prog;
>> > };
>> > -- 
>> > 2.21.0

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-22  6:05       ` Jay Vosburgh
@ 2024-08-22 11:15         ` Jianbo Liu
  2024-08-22 23:19           ` Jakub Kicinski
  2024-08-23  0:17           ` Jay Vosburgh
  0 siblings, 2 replies; 12+ messages in thread
From: Jianbo Liu @ 2024-08-22 11:15 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: liuhangbin@gmail.com, davem@davemloft.net, Leon Romanovsky,
	Gal Pressman, andy@greyhouse.net, Tariq Toukan,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed, kuba@kernel.org



On 8/22/2024 2:05 PM, Jay Vosburgh wrote:
> Jianbo Liu <jianbol@nvidia.com> wrote:
> 
>> On Wed, 2024-08-21 at 09:00 -0700, Jay Vosburgh wrote:
>>> Jianbo Liu <jianbol@nvidia.com> wrote:
>>>
>>>> In the cited commit, bond->ipsec_lock is added to protect
>>>> ipsec_list,
>>>> hence xdo_dev_state_add and xdo_dev_state_delete are called inside
>>>> this lock. As ipsec_lock is a spin lock and such xfrmdev ops may
>>>> sleep,
>>>> "scheduling while atomic" will be triggered when changing bond's
>>>> active slave.
>>>>
>>>> [  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
>>>> [  101.055726] Modules linked in:
>>>> [  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
>>>> [  101.058760] Hardware name:
>>>> [  101.059434] Call Trace:
>>>> [  101.059436]  <TASK>
>>>> [  101.060873]  dump_stack_lvl+0x51/0x60
>>>> [  101.061275]  __schedule_bug+0x4e/0x60
>>>> [  101.061682]  __schedule+0x612/0x7c0
>>>> [  101.062078]  ? __mod_timer+0x25c/0x370
>   >> > [  101.062486]  schedule+0x25/0xd0
>>>> [  101.062845]  schedule_timeout+0x77/0xf0
>>>> [  101.063265]  ? asm_common_interrupt+0x22/0x40
>>>> [  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
>>>> [  101.064215]  __wait_for_common+0x87/0x190
>>>> [  101.064648]  ? usleep_range_state+0x90/0x90
>>>> [  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
>>>> [  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
>>>> [  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
>>>> [  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
>>>> [  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
>>>> [  101.067738]  ? kmalloc_trace+0x4d/0x350
>>>> [  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
>>>> [  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
>>>> [  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
>>>> [  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
>>>> [  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
>>>> [  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
>>>> [  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
>>>> [  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
>>>> [  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
>>>> [  101.073033]  vfs_write+0x2d8/0x400
>>>> [  101.073416]  ? alloc_fd+0x48/0x180
>>>> [  101.073798]  ksys_write+0x5f/0xe0
>>>> [  101.074175]  do_syscall_64+0x52/0x110
>>>> [  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>
>>>> As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
>>>> from bond_change_active_slave, which requires holding the RTNL
>>>> lock.
>>>> And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
>>>> xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
>>>> context. So ipsec_lock doesn't have to be spin lock, change it to
>>>> mutex, and thus the above issue can be resolved.
>>>>
>>>> Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
>>>> Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
>>>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>>>> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
>>>> ---
>>>> drivers/net/bonding/bond_main.c | 67 +++++++++++++++---------------
>>>> ---
>>>> include/net/bonding.h           |  2 +-
>>>> 2 files changed, 32 insertions(+), 37 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/bond_main.c
>>>> b/drivers/net/bonding/bond_main.c
>>>> index 0d1129eaf47b..f20f6d83ad54 100644
>>>> --- a/drivers/net/bonding/bond_main.c
>>>> +++ b/drivers/net/bonding/bond_main.c
>>>> @@ -439,38 +439,33 @@ static int bond_ipsec_add_sa(struct
>>>> xfrm_state *xs,
>>>>          rcu_read_lock();
>>>>          bond = netdev_priv(bond_dev);
>>>>          slave = rcu_dereference(bond->curr_active_slave);
>>>> -       if (!slave) {
>>>> -               rcu_read_unlock();
>>>> +       real_dev = slave ? slave->dev : NULL;
>>>> +       rcu_read_unlock();
>>>> +       if (!real_dev)
>>>>                  return -ENODEV;
>>>
>>>          In reading these, I was confused as to why some changes use
>>> rcu_read_lock(), rcu_dereference() and others use rtnl_dereference();
>>> I
>>> think it's because bond_ipsec_{add,del}_sa_all() are guaranteed to be
>>> called under RTNL, while the bond_ipsec_{add,del}_sa() functions are
>>> do
>>> not have that guarantee.  Am I understanding correctly?
>>>
>>
>> Right. bond_ipsec_{add,del}_sa_all() are called by
>> bond_change_active_slave() which has ASSERT_RTNL(), so I think they are
>> under RTNL.
> 
> 	Yes.
> 
>>>> -       }
>>>>
>>>> -       real_dev = slave->dev;
>>>>          if (!real_dev->xfrmdev_ops ||
>>>>              !real_dev->xfrmdev_ops->xdo_dev_state_add ||
>>>>              netif_is_bond_master(real_dev)) {
>>>>                  NL_SET_ERR_MSG_MOD(extack, "Slave does not support
>>>> ipsec offload");
>>>> -               rcu_read_unlock();
>>>>                  return -EINVAL;
>>>>          }
>>>>
>>>> -       ipsec = kmalloc(sizeof(*ipsec), GFP_ATOMIC);
>>>> -       if (!ipsec) {
>>>> -               rcu_read_unlock();
>>>> +       ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
>>>> +       if (!ipsec)
>>>>                  return -ENOMEM;
>>>
>>>          Presumably the switch from ATOMIC to KERNEL is safe because
>>> this
>>> is only called under RTNL (and therefore always has a process
>>> context),
>>> i.e., this change is independent of any other changes in the patch.
>>> Correct?
>>>
>>
>> No. And it's RCU here, not RTNL. We are safe to use KERNEL after it's
>> out of the RCU context, right? And this was suggested by Paolo after he
>> reviewd the first version.
> 
> 	Ok, I think I follow now.  And, yes, KERNEL is ok when outside
> the RCU critical section, but not inside of it (because sleeping is not
> permitted within the critical section).
> 

Yes.

>>>> -       }
>>>>
>>>>          xs->xso.real_dev = real_dev;
>>>>          err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
>>>>          if (!err) {
>>>>                  ipsec->xs = xs;
>>>>                  INIT_LIST_HEAD(&ipsec->list);
>>>> -               spin_lock_bh(&bond->ipsec_lock);
>>>> +               mutex_lock(&bond->ipsec_lock);
>>>>                  list_add(&ipsec->list, &bond->ipsec_list);
>>>> -               spin_unlock_bh(&bond->ipsec_lock);
>>>> +               mutex_unlock(&bond->ipsec_lock);
>>>>          } else {
>>>>                  kfree(ipsec);
>>>>          }
>>>> -       rcu_read_unlock();
>>>>          return err;
>>>> }
>>>>
>>>> @@ -481,35 +476,35 @@ static void bond_ipsec_add_sa_all(struct
>>>> bonding *bond)
>>>>          struct bond_ipsec *ipsec;
>>>>          struct slave *slave;
>>>>
>>>> -       rcu_read_lock();
>>>> -       slave = rcu_dereference(bond->curr_active_slave);
>>>> -       if (!slave)
>>>> -               goto out;
>>>> +       slave = rtnl_dereference(bond->curr_active_slave);
>>>> +       real_dev = slave ? slave->dev : NULL;
>>>> +       if (!real_dev)
>>>> +               return;
>>>>
>>>> -       real_dev = slave->dev;
>>>> +       mutex_lock(&bond->ipsec_lock);
>>>>          if (!real_dev->xfrmdev_ops ||
>>>>              !real_dev->xfrmdev_ops->xdo_dev_state_add ||
>>>>              netif_is_bond_master(real_dev)) {
>>>> -               spin_lock_bh(&bond->ipsec_lock);
>>>>                  if (!list_empty(&bond->ipsec_list))
>>>>                          slave_warn(bond_dev, real_dev,
>>>>                                     "%s: no slave
>>>> xdo_dev_state_add\n",
>>>>                                     __func__);
>>>> -               spin_unlock_bh(&bond->ipsec_lock);
>>>>                  goto out;
>>>>          }
>>>>
>>>> -       spin_lock_bh(&bond->ipsec_lock);
>>>>          list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>>>> +               /* If new state is added before ipsec_lock acquired
>>>> */
>>>> +               if (ipsec->xs->xso.real_dev == real_dev)
>>>> +                       continue;
>>>> +
>>>>                  ipsec->xs->xso.real_dev = real_dev;
>>>>                  if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec-
>>>>> xs, NULL)) {
>>>>                          slave_warn(bond_dev, real_dev, "%s: failed
>>>> to add SA\n", __func__);
>>>>                          ipsec->xs->xso.real_dev = NULL;
>>>>                  }
>>>>          }
>>>> -       spin_unlock_bh(&bond->ipsec_lock);
>>>> out:
>>>> -       rcu_read_unlock();
>>>> +       mutex_unlock(&bond->ipsec_lock);
>>>> }
>>>>
>>>> /**
>>>> @@ -530,6 +525,8 @@ static void bond_ipsec_del_sa(struct xfrm_state
>>>> *xs)
>>>>          rcu_read_lock();
>>>>          bond = netdev_priv(bond_dev);
>>>>          slave = rcu_dereference(bond->curr_active_slave);
>>>> +       real_dev = slave ? slave->dev : NULL;
>>>> +       rcu_read_unlock();
>>>
>>>          Is it really safe to access real_dev once we've left the rcu
>>> critical section?  What prevents the device referenced by real_dev
>>> from
>>> being deleted as soon as rcu_read_unlock() completes?
>>>
>>
>> I am not sure. But RCU protects accessing of the context pointed by
>> curr_active_slave, not slave->dev itself. I wrong about this?
> 
> 	No, you're not wrong: RCU does indeed protect the "slave"
> pointer in the above code while inside the RCU read-side critical
> section.
> 
> 	However, we also know that as long as we're within that critical
> section, whatever the "slave" pointer points to will remain valid,
> because what curr_active_slave points to is also RCU protected (not only
> the pointer itself).
> 
> 	For the interface behind slave->dev specifically, any attempt to
> delete an interface that is a member of a bond must pass through
> __bond_release_one() first, and it calls synchronize_rcu() as part of
> its processing (which will wait for active read-side critical sections
> to complete).  Therefore, the bond member interface behind slave->dev
> here cannot simply vanish while execution is within this critical
> section.
> 

Many thanks for you clarification. It helps me a lot.

>> I can move rcu_read_unlock after xdo_dev_state_delete(). And do the
>> same change for bond_ipsec_add_sa and bond_ipsec_free_sa.
>> What do you think?
> 
> 	The original issue was that the xfrm callback within mlx5 could
> sleep while a spin lock was held.  However, sleeping is not permitted

But, can we assume no sleep in xfrm callbacks, considering drivers need 
to interact with hardware? It's not related to mlx5 only, but others 
from different vendors.

> within an RCU read-side critical section, either, so would this simply
> reintroduce the original problem from a different angle?
> 

I moved the callbacks out of RCU critical section, because we never know 
what drivers will do. I thought bond may (should) hold the netdev, but I 
was wrong as you explained.

> 	Assuming that's correct, I think one way around that is to
> acquire a reference (via dev_hold or netdev_hold) to the interface
> (i.e., real_dev) within the minimal rcu_read_lock / rcu_read_unlock, do

That's what I did, trying to reduce critical section, unlock immediately 
after get real_dev.

> the xfrm magic, and then release the reference when finished.  That
> won't prevent the interface from being removed from the bond and the
> struct slave being freed outside of the RCU critical section, so the
> code would also need to use only real_dev after rcu_read_unlock() is
> called.
> 

I think it's good solution.
So I need to add the dev_hold/dev_put as following, for example, for 
bond_ipsec_del_sa, right?

@@ -526,6 +534,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
         bond = netdev_priv(bond_dev);
         slave = rcu_dereference(bond->curr_active_slave);
         real_dev = slave ? slave->dev : NULL;
+       dev_hold(real_dev);
         rcu_read_unlock();

         if (!slave)
@@ -545,6 +554,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)

         real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
  out:
+       dev_put(real_dev);
         mutex_lock(&bond->ipsec_lock);
         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
                 if (ipsec->xs == xs) {

If you are ok with that, I will add the same for 
bond_ipsec_add_sa/bond_ipsec_free_sa, and send new version.

Please confirm.

Thanks!
Jianbo


> 	-J
> 
>> Thanks!
>> Jianbo
>>
>>>          -J
>>>          
>>>>
>>>>          if (!slave)
>>>>                  goto out;
>>>> @@ -537,7 +534,6 @@ static void bond_ipsec_del_sa(struct xfrm_state
>>>> *xs)
>>>>          if (!xs->xso.real_dev)
>>>>                  goto out;
>>>>
>>>> -       real_dev = slave->dev;
>>>>          WARN_ON(xs->xso.real_dev != real_dev);
>>>>
>>>>          if (!real_dev->xfrmdev_ops ||
>>>> @@ -549,7 +545,7 @@ static void bond_ipsec_del_sa(struct xfrm_state
>>>> *xs)
>>>>
>>>>          real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
>>>> out:
>>>> -       spin_lock_bh(&bond->ipsec_lock);
>>>> +       mutex_lock(&bond->ipsec_lock);
>>>>          list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>>>>                  if (ipsec->xs == xs) {
>>>>                          list_del(&ipsec->list);
>>>> @@ -557,8 +553,7 @@ static void bond_ipsec_del_sa(struct xfrm_state
>>>> *xs)
>>>>                          break;
>>>>                  }
>>>>          }
>>>> -       spin_unlock_bh(&bond->ipsec_lock);
>>>> -       rcu_read_unlock();
>>>> +       mutex_unlock(&bond->ipsec_lock);
>>>> }
>>>>
>>>> static void bond_ipsec_del_sa_all(struct bonding *bond)
>>>> @@ -568,15 +563,12 @@ static void bond_ipsec_del_sa_all(struct
>>>> bonding *bond)
>>>>          struct bond_ipsec *ipsec;
>>>>          struct slave *slave;
>>>>
>>>> -       rcu_read_lock();
>>>> -       slave = rcu_dereference(bond->curr_active_slave);
>>>> -       if (!slave) {
>>>> -               rcu_read_unlock();
>>>> +       slave = rtnl_dereference(bond->curr_active_slave);
>>>> +       real_dev = slave ? slave->dev : NULL;
>>>> +       if (!real_dev)
>>>>                  return;
>>>> -       }
>>>>
>>>> -       real_dev = slave->dev;
>>>> -       spin_lock_bh(&bond->ipsec_lock);
>>>> +       mutex_lock(&bond->ipsec_lock);
>>>>          list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>>>>                  if (!ipsec->xs->xso.real_dev)
>>>>                          continue;
>>>> @@ -593,8 +585,7 @@ static void bond_ipsec_del_sa_all(struct
>>>> bonding *bond)
>>>>                                  real_dev->xfrmdev_ops-
>>>>> xdo_dev_state_free(ipsec->xs);
>>>>                  }
>>>>          }
>>>> -       spin_unlock_bh(&bond->ipsec_lock);
>>>> -       rcu_read_unlock();
>>>> +       mutex_unlock(&bond->ipsec_lock);
>>>> }
>>>>
>>>> static void bond_ipsec_free_sa(struct xfrm_state *xs)
>>>> @@ -5917,7 +5908,7 @@ void bond_setup(struct net_device *bond_dev)
>>>>          /* set up xfrm device ops (only supported in active-backup
>>>> right now) */
>>>>          bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
>>>>          INIT_LIST_HEAD(&bond->ipsec_list);
>>>> -       spin_lock_init(&bond->ipsec_lock);
>>>> +       mutex_init(&bond->ipsec_lock);
>>>> #endif /* CONFIG_XFRM_OFFLOAD */
>>>>
>>>>          /* don't acquire bond device's netif_tx_lock when
>>>> transmitting */
>>>> @@ -5966,6 +5957,10 @@ static void bond_uninit(struct net_device
>>>> *bond_dev)
>>>>                  __bond_release_one(bond_dev, slave->dev, true,
>>>> true);
>>>>          netdev_info(bond_dev, "Released all slaves\n");
>>>>
>>>> +#ifdef CONFIG_XFRM_OFFLOAD
>>>> +       mutex_destroy(&bond->ipsec_lock);
>>>> +#endif /* CONFIG_XFRM_OFFLOAD */
>>>> +
>>>>          bond_set_slave_arr(bond, NULL, NULL);
>>>>
>>>>          list_del_rcu(&bond->bond_list);
>>>> diff --git a/include/net/bonding.h b/include/net/bonding.h
>>>> index b61fb1aa3a56..8bb5f016969f 100644
>>>> --- a/include/net/bonding.h
>>>> +++ b/include/net/bonding.h
>>>> @@ -260,7 +260,7 @@ struct bonding {
>>>> #ifdef CONFIG_XFRM_OFFLOAD
>>>>          struct list_head ipsec_list;
>>>>          /* protecting ipsec_list */
>>>> -       spinlock_t ipsec_lock;
>>>> +       struct mutex ipsec_lock;
>>>> #endif /* CONFIG_XFRM_OFFLOAD */
>>>>          struct bpf_prog *xdp_prog;
>>>> };
>>>> -- 
>>>> 2.21.0
> 
> ---
> 	-Jay Vosburgh, jv@jvosburgh.net


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-22 11:15         ` Jianbo Liu
@ 2024-08-22 23:19           ` Jakub Kicinski
  2024-08-23  0:17           ` Jay Vosburgh
  1 sibling, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2024-08-22 23:19 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: Jay Vosburgh, liuhangbin@gmail.com, davem@davemloft.net,
	Leon Romanovsky, Gal Pressman, andy@greyhouse.net, Tariq Toukan,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed

On Thu, 22 Aug 2024 19:15:24 +0800 Jianbo Liu wrote:
> +       dev_hold(real_dev);

netdev_hold(), please, so that we can find the location of the
reference leak if a bug sneaks in.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-22 11:15         ` Jianbo Liu
  2024-08-22 23:19           ` Jakub Kicinski
@ 2024-08-23  0:17           ` Jay Vosburgh
  1 sibling, 0 replies; 12+ messages in thread
From: Jay Vosburgh @ 2024-08-23  0:17 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: liuhangbin@gmail.com, davem@davemloft.net, Leon Romanovsky,
	Gal Pressman, andy@greyhouse.net, Tariq Toukan,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed, kuba@kernel.org

Jianbo Liu <jianbol@nvidia.com> wrote:

[...]
>I think it's good solution.
>So I need to add the dev_hold/dev_put as following, for example, for
>bond_ipsec_del_sa, right?
>
>@@ -526,6 +534,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>        bond = netdev_priv(bond_dev);
>        slave = rcu_dereference(bond->curr_active_slave);
>        real_dev = slave ? slave->dev : NULL;
>+       dev_hold(real_dev);
>        rcu_read_unlock();
>
>        if (!slave)
>@@ -545,6 +554,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>
>        real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> out:
>+       dev_put(real_dev);
>        mutex_lock(&bond->ipsec_lock);
>        list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>                if (ipsec->xs == xs) {
>
>If you are ok with that, I will add the same for
>bond_ipsec_add_sa/bond_ipsec_free_sa, and send new version.

	Yes, I think that will work, but please use netdev_hold() as
Jakub requested.

	-J

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-08-23  0:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21  9:04 [PATCH net V5 0/3] Fixes for IPsec over bonding Jianbo Liu
2024-08-21  9:04 ` [PATCH net V5 1/3] bonding: implement xdo_dev_state_free and call it after deletion Jianbo Liu
2024-08-21  9:04 ` [PATCH net V5 2/3] bonding: extract the use of real_device into local variable Jianbo Liu
2024-08-21  9:04 ` [PATCH net V5 3/3] bonding: change ipsec_lock from spin lock to mutex Jianbo Liu
2024-08-21 16:00   ` Jay Vosburgh
2024-08-22  0:11     ` Jakub Kicinski
2024-08-22  2:07       ` Jianbo Liu
2024-08-22  1:53     ` Jianbo Liu
2024-08-22  6:05       ` Jay Vosburgh
2024-08-22 11:15         ` Jianbo Liu
2024-08-22 23:19           ` Jakub Kicinski
2024-08-23  0:17           ` Jay Vosburgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).