netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net V3 0/3] Fixes for IPsec over bonding
@ 2024-08-05  5:03 Tariq Toukan
  2024-08-05  5:03 ` [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion Tariq Toukan
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Tariq Toukan @ 2024-08-05  5:03 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Hangbin Liu, Jianbo Liu, Tariq Toukan

Hi,

This patchset by Jianbo provides bug fixes for IPsec over bonding
driver.

It adds the missing xdo_dev_state_free API, and fixes "scheduling while
atomic" by using mutex lock instead.

Series generated against:
commit 14ab4792ee12 ("net/tcp: Disable TCP-AO static key after RCU grace period")

Regards,
Tariq

V3:
- Add RCU read lock/unlock for bond_ipsec_add_sa, bond_ipsec_del_sa and bond_ipsec_free_sa.

V2:
- Rebased on top of latest net branch.
- Squashed patch #2 into #1 per Hangbin comment.
- Addressed Hangbin's comments.
- Patch #3 (was #4): Addressed comments by Paolo.

Jianbo Liu (3):
  bonding: implement xdo_dev_state_free and call it after deletion
  bonding: extract the use of real_device into local variable
  bonding: change ipsec_lock from spin lock to mutex

 drivers/net/bonding/bond_main.c | 151 ++++++++++++++++++++------------
 include/net/bonding.h           |   2 +-
 2 files changed, 98 insertions(+), 55 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-05  5:03 [PATCH net V3 0/3] Fixes for IPsec over bonding Tariq Toukan
@ 2024-08-05  5:03 ` Tariq Toukan
  2024-08-06  8:45   ` Hangbin Liu
                     ` (2 more replies)
  2024-08-05  5:03 ` [PATCH net V3 2/3] bonding: extract the use of real_device into local variable Tariq Toukan
  2024-08-05  5:03 ` [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex Tariq Toukan
  2 siblings, 3 replies; 19+ messages in thread
From: Tariq Toukan @ 2024-08-05  5:03 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Hangbin Liu, Jianbo Liu, Tariq Toukan

From: Jianbo Liu <jianbol@nvidia.com>

Add this implementation for bonding, so hardware resources can be
freed after xfrm state is deleted.

And call it when deleting all SAs from old active real interface.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/bonding/bond_main.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1cd92c12e782..eb5e43860670 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -581,6 +581,8 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 				   __func__);
 		} else {
 			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
+			if (slave->dev->xfrmdev_ops->xdo_dev_state_free)
+				slave->dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
 		}
 		ipsec->xs->xso.real_dev = NULL;
 	}
@@ -588,6 +590,35 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 	rcu_read_unlock();
 }
 
+static void bond_ipsec_free_sa(struct xfrm_state *xs)
+{
+	struct net_device *bond_dev = xs->xso.dev;
+	struct net_device *real_dev;
+	struct bonding *bond;
+	struct slave *slave;
+
+	if (!bond_dev)
+		return;
+
+	rcu_read_lock();
+	bond = netdev_priv(bond_dev);
+	slave = rcu_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	rcu_read_unlock();
+
+	if (!slave)
+		return;
+
+	if (!xs->xso.real_dev)
+		return;
+
+	WARN_ON(xs->xso.real_dev != real_dev);
+
+	if (real_dev && real_dev->xfrmdev_ops &&
+	    real_dev->xfrmdev_ops->xdo_dev_state_free)
+		real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
+}
+
 /**
  * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
  * @skb: current data packet
@@ -632,6 +663,7 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
 static const struct xfrmdev_ops bond_xfrmdev_ops = {
 	.xdo_dev_state_add = bond_ipsec_add_sa,
 	.xdo_dev_state_delete = bond_ipsec_del_sa,
+	.xdo_dev_state_free = bond_ipsec_free_sa,
 	.xdo_dev_offload_ok = bond_ipsec_offload_ok,
 };
 #endif /* CONFIG_XFRM_OFFLOAD */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net V3 2/3] bonding: extract the use of real_device into local variable
  2024-08-05  5:03 [PATCH net V3 0/3] Fixes for IPsec over bonding Tariq Toukan
  2024-08-05  5:03 ` [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion Tariq Toukan
@ 2024-08-05  5:03 ` Tariq Toukan
  2024-08-07 10:13   ` Hangbin Liu
  2024-08-05  5:03 ` [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex Tariq Toukan
  2 siblings, 1 reply; 19+ messages in thread
From: Tariq Toukan @ 2024-08-05  5:03 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Hangbin Liu, Jianbo Liu, Cosmin Ratiu, Tariq Toukan

From: Jianbo Liu <jianbol@nvidia.com>

Add a local variable for slave->dev, to prepare for the lock change in
the next patch. There is no functionality change.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index eb5e43860670..e550b1c08fdb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -427,6 +427,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 			     struct netlink_ext_ack *extack)
 {
 	struct net_device *bond_dev = xs->xso.dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct bonding *bond;
 	struct slave *slave;
@@ -443,9 +444,10 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 		return -ENODEV;
 	}
 
-	if (!slave->dev->xfrmdev_ops ||
-	    !slave->dev->xfrmdev_ops->xdo_dev_state_add ||
-	    netif_is_bond_master(slave->dev)) {
+	real_dev = slave->dev;
+	if (!real_dev->xfrmdev_ops ||
+	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
+	    netif_is_bond_master(real_dev)) {
 		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
 		rcu_read_unlock();
 		return -EINVAL;
@@ -456,9 +458,9 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 		rcu_read_unlock();
 		return -ENOMEM;
 	}
-	xs->xso.real_dev = slave->dev;
 
-	err = slave->dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
+	xs->xso.real_dev = real_dev;
+	err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
 	if (!err) {
 		ipsec->xs = xs;
 		INIT_LIST_HEAD(&ipsec->list);
@@ -475,6 +477,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 static void bond_ipsec_add_sa_all(struct bonding *bond)
 {
 	struct net_device *bond_dev = bond->dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
@@ -483,12 +486,13 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 	if (!slave)
 		goto out;
 
-	if (!slave->dev->xfrmdev_ops ||
-	    !slave->dev->xfrmdev_ops->xdo_dev_state_add ||
-	    netif_is_bond_master(slave->dev)) {
+	real_dev = slave->dev;
+	if (!real_dev->xfrmdev_ops ||
+	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
+	    netif_is_bond_master(real_dev)) {
 		spin_lock_bh(&bond->ipsec_lock);
 		if (!list_empty(&bond->ipsec_list))
-			slave_warn(bond_dev, slave->dev,
+			slave_warn(bond_dev, real_dev,
 				   "%s: no slave xdo_dev_state_add\n",
 				   __func__);
 		spin_unlock_bh(&bond->ipsec_lock);
@@ -497,9 +501,9 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 
 	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
-		ipsec->xs->xso.real_dev = slave->dev;
-		if (slave->dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
-			slave_warn(bond_dev, slave->dev, "%s: failed to add SA\n", __func__);
+		ipsec->xs->xso.real_dev = real_dev;
+		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
+			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			ipsec->xs->xso.real_dev = NULL;
 		}
 	}
@@ -515,6 +519,7 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 static void bond_ipsec_del_sa(struct xfrm_state *xs)
 {
 	struct net_device *bond_dev = xs->xso.dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct bonding *bond;
 	struct slave *slave;
@@ -532,16 +537,17 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 	if (!xs->xso.real_dev)
 		goto out;
 
-	WARN_ON(xs->xso.real_dev != slave->dev);
+	real_dev = slave->dev;
+	WARN_ON(xs->xso.real_dev != real_dev);
 
-	if (!slave->dev->xfrmdev_ops ||
-	    !slave->dev->xfrmdev_ops->xdo_dev_state_delete ||
-	    netif_is_bond_master(slave->dev)) {
-		slave_warn(bond_dev, slave->dev, "%s: no slave xdo_dev_state_delete\n", __func__);
+	if (!real_dev->xfrmdev_ops ||
+	    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
+	    netif_is_bond_master(real_dev)) {
+		slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__);
 		goto out;
 	}
 
-	slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
 out:
 	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
@@ -558,6 +564,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 static void bond_ipsec_del_sa_all(struct bonding *bond)
 {
 	struct net_device *bond_dev = bond->dev;
+	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
@@ -568,21 +575,22 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 		return;
 	}
 
+	real_dev = slave->dev;
 	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
 		if (!ipsec->xs->xso.real_dev)
 			continue;
 
-		if (!slave->dev->xfrmdev_ops ||
-		    !slave->dev->xfrmdev_ops->xdo_dev_state_delete ||
-		    netif_is_bond_master(slave->dev)) {
-			slave_warn(bond_dev, slave->dev,
+		if (!real_dev->xfrmdev_ops ||
+		    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
+		    netif_is_bond_master(real_dev)) {
+			slave_warn(bond_dev, real_dev,
 				   "%s: no slave xdo_dev_state_delete\n",
 				   __func__);
 		} else {
-			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
-			if (slave->dev->xfrmdev_ops->xdo_dev_state_free)
-				slave->dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
+			real_dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
+			if (real_dev->xfrmdev_ops->xdo_dev_state_free)
+				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
 		}
 		ipsec->xs->xso.real_dev = NULL;
 	}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-05  5:03 [PATCH net V3 0/3] Fixes for IPsec over bonding Tariq Toukan
  2024-08-05  5:03 ` [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion Tariq Toukan
  2024-08-05  5:03 ` [PATCH net V3 2/3] bonding: extract the use of real_device into local variable Tariq Toukan
@ 2024-08-05  5:03 ` Tariq Toukan
  2024-08-08  9:34   ` Hangbin Liu
  2 siblings, 1 reply; 19+ messages in thread
From: Tariq Toukan @ 2024-08-05  5:03 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek
  Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
	Hangbin Liu, Jianbo Liu, Tariq Toukan

From: Jianbo Liu <jianbol@nvidia.com>

In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
hence xdo_dev_state_add and xdo_dev_state_delete are called inside
this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
"scheduling while atomic" will be triggered when changing bond's
active slave.

[  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
[  101.055726] Modules linked in:
[  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
[  101.058760] Hardware name:
[  101.059434] Call Trace:
[  101.059436]  <TASK>
[  101.060873]  dump_stack_lvl+0x51/0x60
[  101.061275]  __schedule_bug+0x4e/0x60
[  101.061682]  __schedule+0x612/0x7c0
[  101.062078]  ? __mod_timer+0x25c/0x370
[  101.062486]  schedule+0x25/0xd0
[  101.062845]  schedule_timeout+0x77/0xf0
[  101.063265]  ? asm_common_interrupt+0x22/0x40
[  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
[  101.064215]  __wait_for_common+0x87/0x190
[  101.064648]  ? usleep_range_state+0x90/0x90
[  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
[  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
[  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
[  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
[  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
[  101.067738]  ? kmalloc_trace+0x4d/0x350
[  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
[  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
[  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
[  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
[  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
[  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
[  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
[  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
[  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
[  101.073033]  vfs_write+0x2d8/0x400
[  101.073416]  ? alloc_fd+0x48/0x180
[  101.073798]  ksys_write+0x5f/0xe0
[  101.074175]  do_syscall_64+0x52/0x110
[  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
from bond_change_active_slave, which requires holding the RTNL lock.
And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
context. So ipsec_lock doesn't have to be spin lock, change it to
mutex, and thus the above issue can be resolved.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/bonding/bond_main.c | 75 +++++++++++++++++----------------
 include/net/bonding.h           |  2 +-
 2 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e550b1c08fdb..56764f1c39b8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -439,38 +439,33 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 	rcu_read_lock();
 	bond = netdev_priv(bond_dev);
 	slave = rcu_dereference(bond->curr_active_slave);
-	if (!slave) {
-		rcu_read_unlock();
+	real_dev = slave ? slave->dev : NULL;
+	rcu_read_unlock();
+	if (!real_dev)
 		return -ENODEV;
-	}
 
-	real_dev = slave->dev;
 	if (!real_dev->xfrmdev_ops ||
 	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
 	    netif_is_bond_master(real_dev)) {
 		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
-		rcu_read_unlock();
 		return -EINVAL;
 	}
 
-	ipsec = kmalloc(sizeof(*ipsec), GFP_ATOMIC);
-	if (!ipsec) {
-		rcu_read_unlock();
+	ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
+	if (!ipsec)
 		return -ENOMEM;
-	}
 
 	xs->xso.real_dev = real_dev;
 	err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
 	if (!err) {
 		ipsec->xs = xs;
 		INIT_LIST_HEAD(&ipsec->list);
-		spin_lock_bh(&bond->ipsec_lock);
+		mutex_lock(&bond->ipsec_lock);
 		list_add(&ipsec->list, &bond->ipsec_list);
-		spin_unlock_bh(&bond->ipsec_lock);
+		mutex_unlock(&bond->ipsec_lock);
 	} else {
 		kfree(ipsec);
 	}
-	rcu_read_unlock();
 	return err;
 }
 
@@ -481,35 +476,43 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
-	rcu_read_lock();
-	slave = rcu_dereference(bond->curr_active_slave);
-	if (!slave)
-		goto out;
+	slave = rtnl_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	if (!real_dev)
+		return;
 
-	real_dev = slave->dev;
+	mutex_lock(&bond->ipsec_lock);
 	if (!real_dev->xfrmdev_ops ||
 	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
 	    netif_is_bond_master(real_dev)) {
-		spin_lock_bh(&bond->ipsec_lock);
 		if (!list_empty(&bond->ipsec_list))
 			slave_warn(bond_dev, real_dev,
 				   "%s: no slave xdo_dev_state_add\n",
 				   __func__);
-		spin_unlock_bh(&bond->ipsec_lock);
 		goto out;
 	}
 
-	spin_lock_bh(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+		struct net_device *dev = ipsec->xs->xso.real_dev;
+
+		/* If new state is added before ipsec_lock acquired */
+		if (dev) {
+			if (dev == real_dev)
+				continue;
+
+			dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
+			if (dev->xfrmdev_ops->xdo_dev_state_free)
+				dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
+		}
+
 		ipsec->xs->xso.real_dev = real_dev;
 		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			ipsec->xs->xso.real_dev = NULL;
 		}
 	}
-	spin_unlock_bh(&bond->ipsec_lock);
 out:
-	rcu_read_unlock();
+	mutex_unlock(&bond->ipsec_lock);
 }
 
 /**
@@ -530,6 +533,8 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 	rcu_read_lock();
 	bond = netdev_priv(bond_dev);
 	slave = rcu_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	rcu_read_unlock();
 
 	if (!slave)
 		goto out;
@@ -537,7 +542,6 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 	if (!xs->xso.real_dev)
 		goto out;
 
-	real_dev = slave->dev;
 	WARN_ON(xs->xso.real_dev != real_dev);
 
 	if (!real_dev->xfrmdev_ops ||
@@ -549,7 +553,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 
 	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
 out:
-	spin_lock_bh(&bond->ipsec_lock);
+	mutex_lock(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
 		if (ipsec->xs == xs) {
 			list_del(&ipsec->list);
@@ -557,8 +561,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 			break;
 		}
 	}
-	spin_unlock_bh(&bond->ipsec_lock);
-	rcu_read_unlock();
+	mutex_unlock(&bond->ipsec_lock);
 }
 
 static void bond_ipsec_del_sa_all(struct bonding *bond)
@@ -568,15 +571,12 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
-	rcu_read_lock();
-	slave = rcu_dereference(bond->curr_active_slave);
-	if (!slave) {
-		rcu_read_unlock();
+	slave = rtnl_dereference(bond->curr_active_slave);
+	real_dev = slave ? slave->dev : NULL;
+	if (!real_dev)
 		return;
-	}
 
-	real_dev = slave->dev;
-	spin_lock_bh(&bond->ipsec_lock);
+	mutex_lock(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
 		if (!ipsec->xs->xso.real_dev)
 			continue;
@@ -594,8 +594,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 		}
 		ipsec->xs->xso.real_dev = NULL;
 	}
-	spin_unlock_bh(&bond->ipsec_lock);
-	rcu_read_unlock();
+	mutex_unlock(&bond->ipsec_lock);
 }
 
 static void bond_ipsec_free_sa(struct xfrm_state *xs)
@@ -5922,7 +5921,7 @@ void bond_setup(struct net_device *bond_dev)
 	/* set up xfrm device ops (only supported in active-backup right now) */
 	bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
 	INIT_LIST_HEAD(&bond->ipsec_list);
-	spin_lock_init(&bond->ipsec_lock);
+	mutex_init(&bond->ipsec_lock);
 #endif /* CONFIG_XFRM_OFFLOAD */
 
 	/* don't acquire bond device's netif_tx_lock when transmitting */
@@ -5971,6 +5970,10 @@ static void bond_uninit(struct net_device *bond_dev)
 		__bond_release_one(bond_dev, slave->dev, true, true);
 	netdev_info(bond_dev, "Released all slaves\n");
 
+#ifdef CONFIG_XFRM_OFFLOAD
+	mutex_destroy(&bond->ipsec_lock);
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 	bond_set_slave_arr(bond, NULL, NULL);
 
 	list_del_rcu(&bond->bond_list);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index b61fb1aa3a56..8bb5f016969f 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -260,7 +260,7 @@ struct bonding {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct list_head ipsec_list;
 	/* protecting ipsec_list */
-	spinlock_t ipsec_lock;
+	struct mutex ipsec_lock;
 #endif /* CONFIG_XFRM_OFFLOAD */
 	struct bpf_prog *xdp_prog;
 };
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-05  5:03 ` [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion Tariq Toukan
@ 2024-08-06  8:45   ` Hangbin Liu
  2024-08-06  9:09     ` Jianbo Liu
  2024-08-07 10:11   ` Hangbin Liu
  2024-08-13  0:48   ` Jakub Kicinski
  2 siblings, 1 reply; 19+ messages in thread
From: Hangbin Liu @ 2024-08-06  8:45 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek, netdev, Saeed Mahameed,
	Gal Pressman, Leon Romanovsky, Jianbo Liu

On Mon, Aug 05, 2024 at 08:03:55AM +0300, Tariq Toukan wrote:
> From: Jianbo Liu <jianbol@nvidia.com>
> 
> Add this implementation for bonding, so hardware resources can be
> freed after xfrm state is deleted.
> 
> And call it when deleting all SAs from old active real interface.
> 
> Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/bonding/bond_main.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 1cd92c12e782..eb5e43860670 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -581,6 +581,8 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>  				   __func__);
>  		} else {
>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
> +			if (slave->dev->xfrmdev_ops->xdo_dev_state_free)
> +				slave->dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
>  		}
>  		ipsec->xs->xso.real_dev = NULL;
>  	}
> @@ -588,6 +590,35 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>  	rcu_read_unlock();
>  }
>  
> +static void bond_ipsec_free_sa(struct xfrm_state *xs)
> +{
> +	struct net_device *bond_dev = xs->xso.dev;
> +	struct net_device *real_dev;
> +	struct bonding *bond;
> +	struct slave *slave;
> +
> +	if (!bond_dev)
> +		return;
> +
> +	rcu_read_lock();
> +	bond = netdev_priv(bond_dev);
> +	slave = rcu_dereference(bond->curr_active_slave);
> +	real_dev = slave ? slave->dev : NULL;
> +	rcu_read_unlock();
> +
> +	if (!slave)
> +		return;
> +
> +	if (!xs->xso.real_dev)
> +		return;
> +
> +	WARN_ON(xs->xso.real_dev != real_dev);
> +
> +	if (real_dev && real_dev->xfrmdev_ops &&
> +	    real_dev->xfrmdev_ops->xdo_dev_state_free)
> +		real_dev->xfrmdev_ops->xdo_dev_state_free(xs);

Do we need to check netif_is_bond_master(slave->dev) here?

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-06  8:45   ` Hangbin Liu
@ 2024-08-06  9:09     ` Jianbo Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Jianbo Liu @ 2024-08-06  9:09 UTC (permalink / raw)
  To: Tariq Toukan, liuhangbin@gmail.com
  Cc: davem@davemloft.net, Leon Romanovsky, andy@greyhouse.net,
	Gal Pressman, jv@jvosburgh.net, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com, Saeed Mahameed,
	netdev@vger.kernel.org

On Tue, 2024-08-06 at 16:45 +0800, Hangbin Liu wrote:
> On Mon, Aug 05, 2024 at 08:03:55AM +0300, Tariq Toukan wrote:
> > From: Jianbo Liu <jianbol@nvidia.com>
> > 
> > Add this implementation for bonding, so hardware resources can be
> > freed after xfrm state is deleted.
> > 
> > And call it when deleting all SAs from old active real interface.
> > 
> > Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> > Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> > ---
> >  drivers/net/bonding/bond_main.c | 32
> > ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/drivers/net/bonding/bond_main.c
> > b/drivers/net/bonding/bond_main.c
> > index 1cd92c12e782..eb5e43860670 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -581,6 +581,8 @@ static void bond_ipsec_del_sa_all(struct
> > bonding *bond)
> >                                    __func__);
> >                 } else {
> >                         slave->dev->xfrmdev_ops-
> > >xdo_dev_state_delete(ipsec->xs);
> > +                       if (slave->dev->xfrmdev_ops-
> > >xdo_dev_state_free)
> > +                               slave->dev->xfrmdev_ops-
> > >xdo_dev_state_free(ipsec->xs);
> >                 }
> >                 ipsec->xs->xso.real_dev = NULL;
> >         }
> > @@ -588,6 +590,35 @@ static void bond_ipsec_del_sa_all(struct
> > bonding *bond)
> >         rcu_read_unlock();
> >  }
> >  
> > +static void bond_ipsec_free_sa(struct xfrm_state *xs)
> > +{
> > +       struct net_device *bond_dev = xs->xso.dev;
> > +       struct net_device *real_dev;
> > +       struct bonding *bond;
> > +       struct slave *slave;
> > +
> > +       if (!bond_dev)
> > +               return;
> > +
> > +       rcu_read_lock();
> > +       bond = netdev_priv(bond_dev);
> > +       slave = rcu_dereference(bond->curr_active_slave);
> > +       real_dev = slave ? slave->dev : NULL;
> > +       rcu_read_unlock();
> > +
> > +       if (!slave)
> > +               return;
> > +
> > +       if (!xs->xso.real_dev)
> > +               return;
> > +
> > +       WARN_ON(xs->xso.real_dev != real_dev);
> > +
> > +       if (real_dev && real_dev->xfrmdev_ops &&
> > +           real_dev->xfrmdev_ops->xdo_dev_state_free)
> > +               real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
> 
> Do we need to check netif_is_bond_master(slave->dev) here?

Probably no need, because we will not call xdo_dev_state_free only. It
is called after xdo_dev_state_delete.

> 
> Thanks
> Hangbin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-05  5:03 ` [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion Tariq Toukan
  2024-08-06  8:45   ` Hangbin Liu
@ 2024-08-07 10:11   ` Hangbin Liu
  2024-08-13  0:48   ` Jakub Kicinski
  2 siblings, 0 replies; 19+ messages in thread
From: Hangbin Liu @ 2024-08-07 10:11 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek, netdev, Saeed Mahameed,
	Gal Pressman, Leon Romanovsky, Jianbo Liu

On Mon, Aug 05, 2024 at 08:03:55AM +0300, Tariq Toukan wrote:
> From: Jianbo Liu <jianbol@nvidia.com>
> 
> Add this implementation for bonding, so hardware resources can be
> freed after xfrm state is deleted.
> 
> And call it when deleting all SAs from old active real interface.
> 
> Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/bonding/bond_main.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 1cd92c12e782..eb5e43860670 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -581,6 +581,8 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>  				   __func__);
>  		} else {
>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
> +			if (slave->dev->xfrmdev_ops->xdo_dev_state_free)
> +				slave->dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
>  		}
>  		ipsec->xs->xso.real_dev = NULL;
>  	}
> @@ -588,6 +590,35 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>  	rcu_read_unlock();
>  }
>  
> +static void bond_ipsec_free_sa(struct xfrm_state *xs)
> +{
> +	struct net_device *bond_dev = xs->xso.dev;
> +	struct net_device *real_dev;
> +	struct bonding *bond;
> +	struct slave *slave;
> +
> +	if (!bond_dev)
> +		return;
> +
> +	rcu_read_lock();
> +	bond = netdev_priv(bond_dev);
> +	slave = rcu_dereference(bond->curr_active_slave);
> +	real_dev = slave ? slave->dev : NULL;
> +	rcu_read_unlock();
> +
> +	if (!slave)
> +		return;
> +
> +	if (!xs->xso.real_dev)
> +		return;
> +
> +	WARN_ON(xs->xso.real_dev != real_dev);
> +
> +	if (real_dev && real_dev->xfrmdev_ops &&
> +	    real_dev->xfrmdev_ops->xdo_dev_state_free)
> +		real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
> +}
> +
>  /**
>   * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
>   * @skb: current data packet
> @@ -632,6 +663,7 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
>  static const struct xfrmdev_ops bond_xfrmdev_ops = {
>  	.xdo_dev_state_add = bond_ipsec_add_sa,
>  	.xdo_dev_state_delete = bond_ipsec_del_sa,
> +	.xdo_dev_state_free = bond_ipsec_free_sa,
>  	.xdo_dev_offload_ok = bond_ipsec_offload_ok,
>  };
>  #endif /* CONFIG_XFRM_OFFLOAD */
> -- 
> 2.44.0
> 

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 2/3] bonding: extract the use of real_device into local variable
  2024-08-05  5:03 ` [PATCH net V3 2/3] bonding: extract the use of real_device into local variable Tariq Toukan
@ 2024-08-07 10:13   ` Hangbin Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Hangbin Liu @ 2024-08-07 10:13 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek, netdev, Saeed Mahameed,
	Gal Pressman, Leon Romanovsky, Jianbo Liu, Cosmin Ratiu

On Mon, Aug 05, 2024 at 08:03:56AM +0300, Tariq Toukan wrote:
> From: Jianbo Liu <jianbol@nvidia.com>
> 
> Add a local variable for slave->dev, to prepare for the lock change in
> the next patch. There is no functionality change.
> 
> Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++--------------
>  1 file changed, 33 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index eb5e43860670..e550b1c08fdb 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -427,6 +427,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
>  			     struct netlink_ext_ack *extack)
>  {
>  	struct net_device *bond_dev = xs->xso.dev;
> +	struct net_device *real_dev;
>  	struct bond_ipsec *ipsec;
>  	struct bonding *bond;
>  	struct slave *slave;
> @@ -443,9 +444,10 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
>  		return -ENODEV;
>  	}
>  
> -	if (!slave->dev->xfrmdev_ops ||
> -	    !slave->dev->xfrmdev_ops->xdo_dev_state_add ||
> -	    netif_is_bond_master(slave->dev)) {
> +	real_dev = slave->dev;
> +	if (!real_dev->xfrmdev_ops ||
> +	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> +	    netif_is_bond_master(real_dev)) {
>  		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
>  		rcu_read_unlock();
>  		return -EINVAL;
> @@ -456,9 +458,9 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
>  		rcu_read_unlock();
>  		return -ENOMEM;
>  	}
> -	xs->xso.real_dev = slave->dev;
>  
> -	err = slave->dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
> +	xs->xso.real_dev = real_dev;
> +	err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
>  	if (!err) {
>  		ipsec->xs = xs;
>  		INIT_LIST_HEAD(&ipsec->list);
> @@ -475,6 +477,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
>  static void bond_ipsec_add_sa_all(struct bonding *bond)
>  {
>  	struct net_device *bond_dev = bond->dev;
> +	struct net_device *real_dev;
>  	struct bond_ipsec *ipsec;
>  	struct slave *slave;
>  
> @@ -483,12 +486,13 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
>  	if (!slave)
>  		goto out;
>  
> -	if (!slave->dev->xfrmdev_ops ||
> -	    !slave->dev->xfrmdev_ops->xdo_dev_state_add ||
> -	    netif_is_bond_master(slave->dev)) {
> +	real_dev = slave->dev;
> +	if (!real_dev->xfrmdev_ops ||
> +	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> +	    netif_is_bond_master(real_dev)) {
>  		spin_lock_bh(&bond->ipsec_lock);
>  		if (!list_empty(&bond->ipsec_list))
> -			slave_warn(bond_dev, slave->dev,
> +			slave_warn(bond_dev, real_dev,
>  				   "%s: no slave xdo_dev_state_add\n",
>  				   __func__);
>  		spin_unlock_bh(&bond->ipsec_lock);
> @@ -497,9 +501,9 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
>  
>  	spin_lock_bh(&bond->ipsec_lock);
>  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> -		ipsec->xs->xso.real_dev = slave->dev;
> -		if (slave->dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
> -			slave_warn(bond_dev, slave->dev, "%s: failed to add SA\n", __func__);
> +		ipsec->xs->xso.real_dev = real_dev;
> +		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
> +			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
>  			ipsec->xs->xso.real_dev = NULL;
>  		}
>  	}
> @@ -515,6 +519,7 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
>  static void bond_ipsec_del_sa(struct xfrm_state *xs)
>  {
>  	struct net_device *bond_dev = xs->xso.dev;
> +	struct net_device *real_dev;
>  	struct bond_ipsec *ipsec;
>  	struct bonding *bond;
>  	struct slave *slave;
> @@ -532,16 +537,17 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>  	if (!xs->xso.real_dev)
>  		goto out;
>  
> -	WARN_ON(xs->xso.real_dev != slave->dev);
> +	real_dev = slave->dev;
> +	WARN_ON(xs->xso.real_dev != real_dev);
>  
> -	if (!slave->dev->xfrmdev_ops ||
> -	    !slave->dev->xfrmdev_ops->xdo_dev_state_delete ||
> -	    netif_is_bond_master(slave->dev)) {
> -		slave_warn(bond_dev, slave->dev, "%s: no slave xdo_dev_state_delete\n", __func__);
> +	if (!real_dev->xfrmdev_ops ||
> +	    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
> +	    netif_is_bond_master(real_dev)) {
> +		slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__);
>  		goto out;
>  	}
>  
> -	slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> +	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
>  out:
>  	spin_lock_bh(&bond->ipsec_lock);
>  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> @@ -558,6 +564,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>  static void bond_ipsec_del_sa_all(struct bonding *bond)
>  {
>  	struct net_device *bond_dev = bond->dev;
> +	struct net_device *real_dev;
>  	struct bond_ipsec *ipsec;
>  	struct slave *slave;
>  
> @@ -568,21 +575,22 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>  		return;
>  	}
>  
> +	real_dev = slave->dev;
>  	spin_lock_bh(&bond->ipsec_lock);
>  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>  		if (!ipsec->xs->xso.real_dev)
>  			continue;
>  
> -		if (!slave->dev->xfrmdev_ops ||
> -		    !slave->dev->xfrmdev_ops->xdo_dev_state_delete ||
> -		    netif_is_bond_master(slave->dev)) {
> -			slave_warn(bond_dev, slave->dev,
> +		if (!real_dev->xfrmdev_ops ||
> +		    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
> +		    netif_is_bond_master(real_dev)) {
> +			slave_warn(bond_dev, real_dev,
>  				   "%s: no slave xdo_dev_state_delete\n",
>  				   __func__);
>  		} else {
> -			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
> -			if (slave->dev->xfrmdev_ops->xdo_dev_state_free)
> -				slave->dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
> +			real_dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
> +			if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> +				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
>  		}
>  		ipsec->xs->xso.real_dev = NULL;
>  	}
> -- 
> 2.44.0
> 

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-05  5:03 ` [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex Tariq Toukan
@ 2024-08-08  9:34   ` Hangbin Liu
  2024-08-08 10:05     ` Jianbo Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Hangbin Liu @ 2024-08-08  9:34 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Jay Vosburgh, Andy Gospodarek, netdev, Saeed Mahameed,
	Gal Pressman, Leon Romanovsky, Jianbo Liu

On Mon, Aug 05, 2024 at 08:03:57AM +0300, Tariq Toukan wrote:
> From: Jianbo Liu <jianbol@nvidia.com>
> 
> In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
> hence xdo_dev_state_add and xdo_dev_state_delete are called inside
> this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
> "scheduling while atomic" will be triggered when changing bond's
> active slave.
> 
> [  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
> [  101.055726] Modules linked in:
> [  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
> [  101.058760] Hardware name:
> [  101.059434] Call Trace:
> [  101.059436]  <TASK>
> [  101.060873]  dump_stack_lvl+0x51/0x60
> [  101.061275]  __schedule_bug+0x4e/0x60
> [  101.061682]  __schedule+0x612/0x7c0
> [  101.062078]  ? __mod_timer+0x25c/0x370
> [  101.062486]  schedule+0x25/0xd0
> [  101.062845]  schedule_timeout+0x77/0xf0
> [  101.063265]  ? asm_common_interrupt+0x22/0x40
> [  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
> [  101.064215]  __wait_for_common+0x87/0x190
> [  101.064648]  ? usleep_range_state+0x90/0x90
> [  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
> [  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
> [  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
> [  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
> [  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
> [  101.067738]  ? kmalloc_trace+0x4d/0x350
> [  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
> [  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
> [  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
> [  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
> [  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
> [  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
> [  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
> [  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
> [  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
> [  101.073033]  vfs_write+0x2d8/0x400
> [  101.073416]  ? alloc_fd+0x48/0x180
> [  101.073798]  ksys_write+0x5f/0xe0
> [  101.074175]  do_syscall_64+0x52/0x110
> [  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
> from bond_change_active_slave, which requires holding the RTNL lock.
> And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
> xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
> context. So ipsec_lock doesn't have to be spin lock, change it to
> mutex, and thus the above issue can be resolved.
> 
> Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/bonding/bond_main.c | 75 +++++++++++++++++----------------
>  include/net/bonding.h           |  2 +-
>  2 files changed, 40 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index e550b1c08fdb..56764f1c39b8 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -481,35 +476,43 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
>  	struct bond_ipsec *ipsec;
>  	struct slave *slave;
>  
> -	rcu_read_lock();
> -	slave = rcu_dereference(bond->curr_active_slave);
> -	if (!slave)
> -		goto out;
> +	slave = rtnl_dereference(bond->curr_active_slave);
> +	real_dev = slave ? slave->dev : NULL;
> +	if (!real_dev)
> +		return;
>  
> -	real_dev = slave->dev;
> +	mutex_lock(&bond->ipsec_lock);
>  	if (!real_dev->xfrmdev_ops ||
>  	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
>  	    netif_is_bond_master(real_dev)) {
> -		spin_lock_bh(&bond->ipsec_lock);
>  		if (!list_empty(&bond->ipsec_list))
>  			slave_warn(bond_dev, real_dev,
>  				   "%s: no slave xdo_dev_state_add\n",
>  				   __func__);
> -		spin_unlock_bh(&bond->ipsec_lock);
>  		goto out;
>  	}
>  
> -	spin_lock_bh(&bond->ipsec_lock);
>  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> +		struct net_device *dev = ipsec->xs->xso.real_dev;
> +
> +		/* If new state is added before ipsec_lock acquired */
> +		if (dev) {
> +			if (dev == real_dev)
> +				continue;
Hi Jianbo,

Why we skip the deleting here if dev == real_dev? What if the state
is added again on the same slave? From the previous logic it looks we
don't check and do over write for the same device.

Thanks
Hangbin
> +			dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
> +			if (dev->xfrmdev_ops->xdo_dev_state_free)
> +				dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
> +		}
> +
>  		ipsec->xs->xso.real_dev = real_dev;
>  		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
>  			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
>  			ipsec->xs->xso.real_dev = NULL;
>  		}
>  	}
> -	spin_unlock_bh(&bond->ipsec_lock);
>  out:
> -	rcu_read_unlock();
> +	mutex_unlock(&bond->ipsec_lock);
>  }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-08  9:34   ` Hangbin Liu
@ 2024-08-08 10:05     ` Jianbo Liu
  2024-08-09  2:31       ` Hangbin Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Jianbo Liu @ 2024-08-08 10:05 UTC (permalink / raw)
  To: Tariq Toukan, liuhangbin@gmail.com
  Cc: davem@davemloft.net, Leon Romanovsky, andy@greyhouse.net,
	Gal Pressman, jv@jvosburgh.net, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com, Saeed Mahameed,
	netdev@vger.kernel.org

On Thu, 2024-08-08 at 17:34 +0800, Hangbin Liu wrote:
> On Mon, Aug 05, 2024 at 08:03:57AM +0300, Tariq Toukan wrote:
> > From: Jianbo Liu <jianbol@nvidia.com>
> > 
> > In the cited commit, bond->ipsec_lock is added to protect
> > ipsec_list,
> > hence xdo_dev_state_add and xdo_dev_state_delete are called inside
> > this lock. As ipsec_lock is a spin lock and such xfrmdev ops may
> > sleep,
> > "scheduling while atomic" will be triggered when changing bond's
> > active slave.
> > 
> > [  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
> > [  101.055726] Modules linked in:
> > [  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
> > [  101.058760] Hardware name:
> > [  101.059434] Call Trace:
> > [  101.059436]  <TASK>
> > [  101.060873]  dump_stack_lvl+0x51/0x60
> > [  101.061275]  __schedule_bug+0x4e/0x60
> > [  101.061682]  __schedule+0x612/0x7c0
> > [  101.062078]  ? __mod_timer+0x25c/0x370
> > [  101.062486]  schedule+0x25/0xd0
> > [  101.062845]  schedule_timeout+0x77/0xf0
> > [  101.063265]  ? asm_common_interrupt+0x22/0x40
> > [  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
> > [  101.064215]  __wait_for_common+0x87/0x190
> > [  101.064648]  ? usleep_range_state+0x90/0x90
> > [  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
> > [  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
> > [  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
> > [  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
> > [  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
> > [  101.067738]  ? kmalloc_trace+0x4d/0x350
> > [  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
> > [  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
> > [  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
> > [  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
> > [  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
> > [  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
> > [  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
> > [  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
> > [  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
> > [  101.073033]  vfs_write+0x2d8/0x400
> > [  101.073416]  ? alloc_fd+0x48/0x180
> > [  101.073798]  ksys_write+0x5f/0xe0
> > [  101.074175]  do_syscall_64+0x52/0x110
> > [  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > 
> > As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
> > from bond_change_active_slave, which requires holding the RTNL
> > lock.
> > And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
> > xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
> > context. So ipsec_lock doesn't have to be spin lock, change it to
> > mutex, and thus the above issue can be resolved.
> > 
> > Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> > Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> > ---
> >  drivers/net/bonding/bond_main.c | 75 +++++++++++++++++------------
> > ----
> >  include/net/bonding.h           |  2 +-
> >  2 files changed, 40 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/net/bonding/bond_main.c
> > b/drivers/net/bonding/bond_main.c
> > index e550b1c08fdb..56764f1c39b8 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -481,35 +476,43 @@ static void bond_ipsec_add_sa_all(struct
> > bonding *bond)
> >         struct bond_ipsec *ipsec;
> >         struct slave *slave;
> >  
> > -       rcu_read_lock();
> > -       slave = rcu_dereference(bond->curr_active_slave);
> > -       if (!slave)
> > -               goto out;
> > +       slave = rtnl_dereference(bond->curr_active_slave);
> > +       real_dev = slave ? slave->dev : NULL;
> > +       if (!real_dev)
> > +               return;
> >  
> > -       real_dev = slave->dev;
> > +       mutex_lock(&bond->ipsec_lock);
> >         if (!real_dev->xfrmdev_ops ||
> >             !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> >             netif_is_bond_master(real_dev)) {
> > -               spin_lock_bh(&bond->ipsec_lock);
> >                 if (!list_empty(&bond->ipsec_list))
> >                         slave_warn(bond_dev, real_dev,
> >                                    "%s: no slave
> > xdo_dev_state_add\n",
> >                                    __func__);
> > -               spin_unlock_bh(&bond->ipsec_lock);
> >                 goto out;
> >         }
> >  
> > -       spin_lock_bh(&bond->ipsec_lock);
> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > +               struct net_device *dev = ipsec->xs->xso.real_dev;
> > +
> > +               /* If new state is added before ipsec_lock acquired
> > */
> > +               if (dev) {
> > +                       if (dev == real_dev)
> > +                               continue;
> Hi Jianbo,
> 
> Why we skip the deleting here if dev == real_dev? What if the state

Here the bond active slave is updated. If dev == real_dev, the state
(should be newly added) is offloaded to new active, so no need to
delete and add back again.  

> is added again on the same slave? From the previous logic it looks we

Why is it added to the same slave? It's not the active one.

> don't check and do over write for the same device.
> 
> Thanks
> Hangbin
> > +                       dev->xfrmdev_ops-
> > >xdo_dev_state_delete(ipsec->xs);
> > +                       if (dev->xfrmdev_ops->xdo_dev_state_free)
> > +                               dev->xfrmdev_ops-
> > >xdo_dev_state_free(ipsec->xs);
> > +               }
> > +
> >                 ipsec->xs->xso.real_dev = real_dev;
> >                 if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec-
> > >xs, NULL)) {
> >                         slave_warn(bond_dev, real_dev, "%s: failed
> > to add SA\n", __func__);
> >                         ipsec->xs->xso.real_dev = NULL;
> >                 }
> >         }
> > -       spin_unlock_bh(&bond->ipsec_lock);
> >  out:
> > -       rcu_read_unlock();
> > +       mutex_unlock(&bond->ipsec_lock);
> >  }


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex
  2024-08-08 10:05     ` Jianbo Liu
@ 2024-08-09  2:31       ` Hangbin Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Hangbin Liu @ 2024-08-09  2:31 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: Tariq Toukan, davem@davemloft.net, Leon Romanovsky,
	andy@greyhouse.net, Gal Pressman, jv@jvosburgh.net,
	kuba@kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed, netdev@vger.kernel.org

On Thu, Aug 08, 2024 at 10:05:26AM +0000, Jianbo Liu wrote:
> On Thu, 2024-08-08 at 17:34 +0800, Hangbin Liu wrote:
> > On Mon, Aug 05, 2024 at 08:03:57AM +0300, Tariq Toukan wrote:
> > >  drivers/net/bonding/bond_main.c | 75 +++++++++++++++++------------
> > > ----
> > >  include/net/bonding.h           |  2 +-
> > >  2 files changed, 40 insertions(+), 37 deletions(-)
> > > 
> > > diff --git a/drivers/net/bonding/bond_main.c
> > > b/drivers/net/bonding/bond_main.c
> > > index e550b1c08fdb..56764f1c39b8 100644
> > > --- a/drivers/net/bonding/bond_main.c
> > > +++ b/drivers/net/bonding/bond_main.c
> > > @@ -481,35 +476,43 @@ static void bond_ipsec_add_sa_all(struct
> > > bonding *bond)
> > >         struct bond_ipsec *ipsec;
> > >         struct slave *slave;
> > >  
> > > -       rcu_read_lock();
> > > -       slave = rcu_dereference(bond->curr_active_slave);
> > > -       if (!slave)
> > > -               goto out;
> > > +       slave = rtnl_dereference(bond->curr_active_slave);
> > > +       real_dev = slave ? slave->dev : NULL;
> > > +       if (!real_dev)
> > > +               return;
> > >  
> > > -       real_dev = slave->dev;
> > > +       mutex_lock(&bond->ipsec_lock);
> > >         if (!real_dev->xfrmdev_ops ||
> > >             !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> > >             netif_is_bond_master(real_dev)) {
> > > -               spin_lock_bh(&bond->ipsec_lock);
> > >                 if (!list_empty(&bond->ipsec_list))
> > >                         slave_warn(bond_dev, real_dev,
> > >                                    "%s: no slave
> > > xdo_dev_state_add\n",
> > >                                    __func__);
> > > -               spin_unlock_bh(&bond->ipsec_lock);
> > >                 goto out;
> > >         }
> > >  
> > > -       spin_lock_bh(&bond->ipsec_lock);
> > >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > > +               struct net_device *dev = ipsec->xs->xso.real_dev;
> > > +
> > > +               /* If new state is added before ipsec_lock acquired
> > > */
> > > +               if (dev) {
> > > +                       if (dev == real_dev)
> > > +                               continue;
> > Hi Jianbo,
> > 
> > Why we skip the deleting here if dev == real_dev? What if the state
> 
> Here the bond active slave is updated. If dev == real_dev, the state
> (should be newly added) is offloaded to new active, so no need to
> delete and add back again.  
> 
> > is added again on the same slave? From the previous logic it looks we
> 
> Why is it added to the same slave? It's not the active one.

OK, I got what you mean now. Thanks for the explaination.

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-05  5:03 ` [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion Tariq Toukan
  2024-08-06  8:45   ` Hangbin Liu
  2024-08-07 10:11   ` Hangbin Liu
@ 2024-08-13  0:48   ` Jakub Kicinski
  2024-08-13  2:58     ` Jianbo Liu
  2 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-08-13  0:48 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Jay Vosburgh,
	Andy Gospodarek, netdev, Saeed Mahameed, Gal Pressman,
	Leon Romanovsky, Hangbin Liu, Jianbo Liu

On Mon, 5 Aug 2024 08:03:55 +0300 Tariq Toukan wrote:
> +static void bond_ipsec_free_sa(struct xfrm_state *xs)
> +{
> +	struct net_device *bond_dev = xs->xso.dev;
> +	struct net_device *real_dev;
> +	struct bonding *bond;
> +	struct slave *slave;
> +
> +	if (!bond_dev)
> +		return;

can xs->xso.dev be NULL during the dev_free_state callback?

> +	rcu_read_lock();
> +	bond = netdev_priv(bond_dev);
> +	slave = rcu_dereference(bond->curr_active_slave);
> +	real_dev = slave ? slave->dev : NULL;
> +	rcu_read_unlock();

What's holding onto real_dev once you drop the rcu lock here?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-13  0:48   ` Jakub Kicinski
@ 2024-08-13  2:58     ` Jianbo Liu
  2024-08-13 14:14       ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Jianbo Liu @ 2024-08-13  2:58 UTC (permalink / raw)
  To: Tariq Toukan, kuba@kernel.org
  Cc: liuhangbin@gmail.com, davem@davemloft.net, andy@greyhouse.net,
	Gal Pressman, jv@jvosburgh.net, netdev@vger.kernel.org,
	pabeni@redhat.com, edumazet@google.com, Saeed Mahameed,
	Leon Romanovsky

On Mon, 2024-08-12 at 17:48 -0700, Jakub Kicinski wrote:
> On Mon, 5 Aug 2024 08:03:55 +0300 Tariq Toukan wrote:
> > +static void bond_ipsec_free_sa(struct xfrm_state *xs)
> > +{
> > +       struct net_device *bond_dev = xs->xso.dev;
> > +       struct net_device *real_dev;
> > +       struct bonding *bond;
> > +       struct slave *slave;
> > +
> > +       if (!bond_dev)
> > +               return;
> 
> can xs->xso.dev be NULL during the dev_free_state callback?

Shouldn't be NULL because bond_ipsec_del_sa is called before and xs is
removed from bond->ipsec_list, so xs->xso.dev is kept unless it's
cleared in dev's xdo_dev_state_delete callback.

> 
> > +       rcu_read_lock();
> > +       bond = netdev_priv(bond_dev);
> > +       slave = rcu_dereference(bond->curr_active_slave);
> > +       real_dev = slave ? slave->dev : NULL;
> > +       rcu_read_unlock();
> 
> What's holding onto real_dev once you drop the rcu lock here?

I think it should be xfrm state (and bond device).

Thanks!
Jianbo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-13  2:58     ` Jianbo Liu
@ 2024-08-13 14:14       ` Jakub Kicinski
  2024-08-14  2:03         ` Jianbo Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-08-13 14:14 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: Tariq Toukan, liuhangbin@gmail.com, davem@davemloft.net,
	andy@greyhouse.net, Gal Pressman, jv@jvosburgh.net,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	Saeed Mahameed, Leon Romanovsky

On Tue, 13 Aug 2024 02:58:12 +0000 Jianbo Liu wrote:
> > > +       rcu_read_lock();
> > > +       bond = netdev_priv(bond_dev);
> > > +       slave = rcu_dereference(bond->curr_active_slave);
> > > +       real_dev = slave ? slave->dev : NULL;
> > > +       rcu_read_unlock();  
> > 
> > What's holding onto real_dev once you drop the rcu lock here?  
> 
> I think it should be xfrm state (and bond device).

Please explain it in the commit message in more certain terms.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-13 14:14       ` Jakub Kicinski
@ 2024-08-14  2:03         ` Jianbo Liu
  2024-08-14  3:11           ` Hangbin Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Jianbo Liu @ 2024-08-14  2:03 UTC (permalink / raw)
  To: kuba@kernel.org
  Cc: liuhangbin@gmail.com, davem@davemloft.net, Tariq Toukan,
	andy@greyhouse.net, Gal Pressman, jv@jvosburgh.net,
	pabeni@redhat.com, netdev@vger.kernel.org, edumazet@google.com,
	Saeed Mahameed, Leon Romanovsky

On Tue, 2024-08-13 at 07:14 -0700, Jakub Kicinski wrote:
> On Tue, 13 Aug 2024 02:58:12 +0000 Jianbo Liu wrote:
> > > > +       rcu_read_lock();
> > > > +       bond = netdev_priv(bond_dev);
> > > > +       slave = rcu_dereference(bond->curr_active_slave);
> > > > +       real_dev = slave ? slave->dev : NULL;
> > > > +       rcu_read_unlock();  
> > > 
> > > What's holding onto real_dev once you drop the rcu lock here?  
> > 
> > I think it should be xfrm state (and bond device).
> 
> Please explain it in the commit message in more certain terms.

Sorry, I don't understand. The real_dev is saved in xs->xso.real_dev,
and also bond's slave. It's straightforward. What else do I need to
explain?

Thanks!
Jianbo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-14  2:03         ` Jianbo Liu
@ 2024-08-14  3:11           ` Hangbin Liu
  2024-08-15  1:23             ` Jianbo Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Hangbin Liu @ 2024-08-14  3:11 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: kuba@kernel.org, davem@davemloft.net, Tariq Toukan,
	andy@greyhouse.net, Gal Pressman, jv@jvosburgh.net,
	pabeni@redhat.com, netdev@vger.kernel.org, edumazet@google.com,
	Saeed Mahameed, Leon Romanovsky

On Wed, Aug 14, 2024 at 02:03:58AM +0000, Jianbo Liu wrote:
> On Tue, 2024-08-13 at 07:14 -0700, Jakub Kicinski wrote:
> > On Tue, 13 Aug 2024 02:58:12 +0000 Jianbo Liu wrote:
> > > > > +       rcu_read_lock();
> > > > > +       bond = netdev_priv(bond_dev);
> > > > > +       slave = rcu_dereference(bond->curr_active_slave);
> > > > > +       real_dev = slave ? slave->dev : NULL;
> > > > > +       rcu_read_unlock();  
> > > > 
> > > > What's holding onto real_dev once you drop the rcu lock here?  
> > > 
> > > I think it should be xfrm state (and bond device).
> > 
> > Please explain it in the commit message in more certain terms.
> 
> Sorry, I don't understand. The real_dev is saved in xs->xso.real_dev,
> and also bond's slave. It's straightforward. What else do I need to
> explain?

I think Jakub means you need to make sure the real_dev is not freed during
xfrmdev_ops. See bond_ipsec_add_sa(). You unlock it too early and later
xfrmdev_ops is not protected.

Hangbin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-14  3:11           ` Hangbin Liu
@ 2024-08-15  1:23             ` Jianbo Liu
  2024-08-15  3:34               ` Hangbin Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Jianbo Liu @ 2024-08-15  1:23 UTC (permalink / raw)
  To: liuhangbin@gmail.com
  Cc: davem@davemloft.net, Tariq Toukan, andy@greyhouse.net,
	Gal Pressman, jv@jvosburgh.net, netdev@vger.kernel.org,
	kuba@kernel.org, edumazet@google.com, pabeni@redhat.com,
	Saeed Mahameed, Leon Romanovsky

On Wed, 2024-08-14 at 11:11 +0800, Hangbin Liu wrote:
> On Wed, Aug 14, 2024 at 02:03:58AM +0000, Jianbo Liu wrote:
> > On Tue, 2024-08-13 at 07:14 -0700, Jakub Kicinski wrote:
> > > On Tue, 13 Aug 2024 02:58:12 +0000 Jianbo Liu wrote:
> > > > > > +       rcu_read_lock();
> > > > > > +       bond = netdev_priv(bond_dev);
> > > > > > +       slave = rcu_dereference(bond->curr_active_slave);
> > > > > > +       real_dev = slave ? slave->dev : NULL;
> > > > > > +       rcu_read_unlock();  
> > > > > 
> > > > > What's holding onto real_dev once you drop the rcu lock
> > > > > here?  
> > > > 
> > > > I think it should be xfrm state (and bond device).
> > > 
> > > Please explain it in the commit message in more certain terms.
> > 
> > Sorry, I don't understand. The real_dev is saved in xs-
> > >xso.real_dev,
> > and also bond's slave. It's straightforward. What else do I need to
> > explain?
> 
> I think Jakub means you need to make sure the real_dev is not freed
> during
> xfrmdev_ops. See bond_ipsec_add_sa(). You unlock it too early and
> later
> xfrmdev_ops is not protected.

This RCU lock is to protect the reading of curr_active_slave, which is
pointing to a big stuct - slave struct, so there is no error to get
real_dev from slave->dev.

Thanks!
Jianbo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-15  1:23             ` Jianbo Liu
@ 2024-08-15  3:34               ` Hangbin Liu
  2024-08-15  3:57                 ` Jianbo Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Hangbin Liu @ 2024-08-15  3:34 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: davem@davemloft.net, Tariq Toukan, andy@greyhouse.net,
	Gal Pressman, jv@jvosburgh.net, netdev@vger.kernel.org,
	kuba@kernel.org, edumazet@google.com, pabeni@redhat.com,
	Saeed Mahameed, Leon Romanovsky

On Thu, Aug 15, 2024 at 01:23:05AM +0000, Jianbo Liu wrote:
> On Wed, 2024-08-14 at 11:11 +0800, Hangbin Liu wrote:
> > On Wed, Aug 14, 2024 at 02:03:58AM +0000, Jianbo Liu wrote:
> > > On Tue, 2024-08-13 at 07:14 -0700, Jakub Kicinski wrote:
> > > > On Tue, 13 Aug 2024 02:58:12 +0000 Jianbo Liu wrote:
> > > > > > > +       rcu_read_lock();
> > > > > > > +       bond = netdev_priv(bond_dev);
> > > > > > > +       slave = rcu_dereference(bond->curr_active_slave);
> > > > > > > +       real_dev = slave ? slave->dev : NULL;
> > > > > > > +       rcu_read_unlock();  
> > > > > > 
> > > > > > What's holding onto real_dev once you drop the rcu lock
> > > > > > here?  
> > > > > 
> > > > > I think it should be xfrm state (and bond device).
> > > > 
> > > > Please explain it in the commit message in more certain terms.
> > > 
> > > Sorry, I don't understand. The real_dev is saved in xs-
> > > >xso.real_dev,
> > > and also bond's slave. It's straightforward. What else do I need to
> > > explain?
> > 
> > I think Jakub means you need to make sure the real_dev is not freed
> > during
> > xfrmdev_ops. See bond_ipsec_add_sa(). You unlock it too early and
> > later
> > xfrmdev_ops is not protected.
> 
> This RCU lock is to protect the reading of curr_active_slave, which is
> pointing to a big stuct - slave struct, so there is no error to get
> real_dev from slave->dev.

It's not about getting real_dev from slave->dev. As Jakub said, What's holding
on real_dev once you drop the rcu lock?

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion
  2024-08-15  3:34               ` Hangbin Liu
@ 2024-08-15  3:57                 ` Jianbo Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Jianbo Liu @ 2024-08-15  3:57 UTC (permalink / raw)
  To: liuhangbin@gmail.com
  Cc: davem@davemloft.net, Tariq Toukan, andy@greyhouse.net,
	Gal Pressman, jv@jvosburgh.net, netdev@vger.kernel.org,
	kuba@kernel.org, edumazet@google.com, pabeni@redhat.com,
	Saeed Mahameed, Leon Romanovsky

On Thu, 2024-08-15 at 11:34 +0800, Hangbin Liu wrote:
> On Thu, Aug 15, 2024 at 01:23:05AM +0000, Jianbo Liu wrote:
> > On Wed, 2024-08-14 at 11:11 +0800, Hangbin Liu wrote:
> > > On Wed, Aug 14, 2024 at 02:03:58AM +0000, Jianbo Liu wrote:
> > > > On Tue, 2024-08-13 at 07:14 -0700, Jakub Kicinski wrote:
> > > > > On Tue, 13 Aug 2024 02:58:12 +0000 Jianbo Liu wrote:
> > > > > > > > +       rcu_read_lock();
> > > > > > > > +       bond = netdev_priv(bond_dev);
> > > > > > > > +       slave = rcu_dereference(bond-
> > > > > > > > >curr_active_slave);
> > > > > > > > +       real_dev = slave ? slave->dev : NULL;
> > > > > > > > +       rcu_read_unlock();  
> > > > > > > 
> > > > > > > What's holding onto real_dev once you drop the rcu lock
> > > > > > > here?  
> > > > > > 
> > > > > > I think it should be xfrm state (and bond device).
> > > > > 
> > > > > Please explain it in the commit message in more certain
> > > > > terms.
> > > > 
> > > > Sorry, I don't understand. The real_dev is saved in xs-
> > > > > xso.real_dev,
> > > > and also bond's slave. It's straightforward. What else do I
> > > > need to
> > > > explain?
> > > 
> > > I think Jakub means you need to make sure the real_dev is not
> > > freed
> > > during
> > > xfrmdev_ops. See bond_ipsec_add_sa(). You unlock it too early and
> > > later
> > > xfrmdev_ops is not protected.
> > 
> > This RCU lock is to protect the reading of curr_active_slave, which
> > is
> > pointing to a big stuct - slave struct, so there is no error to get
> > real_dev from slave->dev.
> 
> It's not about getting real_dev from slave->dev. As Jakub said,
> What's holding
> on real_dev once you drop the rcu lock?
> 

As you mentioned the lock, I explained what's it used for, so we will
not mix basic concepts and make things complicated. 
As for Jakub's question, I already answered. And I'm waiting for his
reply so I can better undestand how to modify if there is any. 

Thanks!
Jianbo

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-08-15  3:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-05  5:03 [PATCH net V3 0/3] Fixes for IPsec over bonding Tariq Toukan
2024-08-05  5:03 ` [PATCH net V3 1/3] bonding: implement xdo_dev_state_free and call it after deletion Tariq Toukan
2024-08-06  8:45   ` Hangbin Liu
2024-08-06  9:09     ` Jianbo Liu
2024-08-07 10:11   ` Hangbin Liu
2024-08-13  0:48   ` Jakub Kicinski
2024-08-13  2:58     ` Jianbo Liu
2024-08-13 14:14       ` Jakub Kicinski
2024-08-14  2:03         ` Jianbo Liu
2024-08-14  3:11           ` Hangbin Liu
2024-08-15  1:23             ` Jianbo Liu
2024-08-15  3:34               ` Hangbin Liu
2024-08-15  3:57                 ` Jianbo Liu
2024-08-05  5:03 ` [PATCH net V3 2/3] bonding: extract the use of real_device into local variable Tariq Toukan
2024-08-07 10:13   ` Hangbin Liu
2024-08-05  5:03 ` [PATCH net V3 3/3] bonding: change ipsec_lock from spin lock to mutex Tariq Toukan
2024-08-08  9:34   ` Hangbin Liu
2024-08-08 10:05     ` Jianbo Liu
2024-08-09  2:31       ` Hangbin Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).