* [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-04 13:11 [PATCHv4 net 0/3] bond: fix xfrm offload issues Hangbin Liu
@ 2025-03-04 13:11 ` Hangbin Liu
2025-03-05 8:38 ` Nikolay Aleksandrov
2025-03-04 13:11 ` [PATCHv4 net 2/3] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 3/3] selftests: bonding: add ipsec offload test Hangbin Liu
2 siblings, 1 reply; 14+ messages in thread
From: Hangbin Liu @ 2025-03-04 13:11 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Shuah Khan, Tariq Toukan, Jianbo Liu, Jarod Wilson,
Steffen Klassert, Cosmin Ratiu, Petr Machata, linux-kselftest,
linux-kernel, Hangbin Liu
The fixed commit placed mutex_lock() inside spin_lock_bh(), which triggers
a warning:
BUG: sleeping function called from invalid context at...
Fix this by moving the IPsec deletion operation to bond_ipsec_free_sa,
which is not held by spin_lock_bh().
Additionally, delete the IPsec list in bond_ipsec_del_sa_all() when the
XFRM state is DEAD to prevent xdo_dev_state_free() from being triggered
again in bond_ipsec_free_sa().
For bond_ipsec_free_sa(), there are now three conditions:
1. if (!slave): When no active device exists.
2. if (!xs->xso.real_dev): When xdo_dev_state_add() fails.
3. if (xs->xso.real_dev != real_dev): When an xs has already been freed
by bond_ipsec_del_sa_all() due to migration, and the active slave has
changed to a new device. At the same time, the xs is marked as DEAD
due to the XFRM entry is removed, triggering xfrm_state_gc_task() and
bond_ipsec_free_sa().
In all three cases, xdo_dev_state_free() should not be called, only xs
should be removed from bond->ipsec list.
At the same time, protect bond_ipsec_del_sa_all and bond_ipsec_add_sa_all
with x->lock for each xs being processed. This prevents XFRM from
concurrently initiating add/delete operations on the managed states.
Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241212062734.182a0164@kernel.org
Suggested-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
drivers/net/bonding/bond_main.c | 53 +++++++++++++++++++++++----------
1 file changed, 37 insertions(+), 16 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e45bba240cbc..06b060d9b031 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -537,15 +537,22 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
}
list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+ spin_lock_bh(&ipsec->xs->lock);
+ /* Skip dead xfrm states, they'll be freed later. */
+ if (ipsec->xs->km.state == XFRM_STATE_DEAD)
+ goto next;
+
/* If new state is added before ipsec_lock acquired */
if (ipsec->xs->xso.real_dev == real_dev)
- continue;
+ goto next;
ipsec->xs->xso.real_dev = real_dev;
if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
ipsec->xs->xso.real_dev = NULL;
}
+next:
+ spin_unlock_bh(&ipsec->xs->lock);
}
out:
mutex_unlock(&bond->ipsec_lock);
@@ -560,7 +567,6 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
struct net_device *bond_dev = xs->xso.dev;
struct net_device *real_dev;
netdevice_tracker tracker;
- struct bond_ipsec *ipsec;
struct bonding *bond;
struct slave *slave;
@@ -592,15 +598,6 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
out:
netdev_put(real_dev, &tracker);
- mutex_lock(&bond->ipsec_lock);
- list_for_each_entry(ipsec, &bond->ipsec_list, list) {
- if (ipsec->xs == xs) {
- list_del(&ipsec->list);
- kfree(ipsec);
- break;
- }
- }
- mutex_unlock(&bond->ipsec_lock);
}
static void bond_ipsec_del_sa_all(struct bonding *bond)
@@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
mutex_lock(&bond->ipsec_lock);
list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+ spin_lock_bh(&ipsec->xs->lock);
if (!ipsec->xs->xso.real_dev)
- continue;
+ goto next;
+
+ if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
+ /* already dead no need to delete again */
+ if (real_dev->xfrmdev_ops->xdo_dev_state_free)
+ real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
+ list_del(&ipsec->list);
+ kfree(ipsec);
+ goto next;
+ }
if (!real_dev->xfrmdev_ops ||
!real_dev->xfrmdev_ops->xdo_dev_state_delete ||
@@ -631,6 +638,8 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
if (real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
}
+next:
+ spin_unlock_bh(&ipsec->xs->lock);
}
mutex_unlock(&bond->ipsec_lock);
}
@@ -640,6 +649,7 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
struct net_device *bond_dev = xs->xso.dev;
struct net_device *real_dev;
netdevice_tracker tracker;
+ struct bond_ipsec *ipsec;
struct bonding *bond;
struct slave *slave;
@@ -659,11 +669,22 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
if (!xs->xso.real_dev)
goto out;
- WARN_ON(xs->xso.real_dev != real_dev);
+ mutex_lock(&bond->ipsec_lock);
+ list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+ if (ipsec->xs == xs) {
+ /* do xdo_dev_state_free if real_dev matches,
+ * otherwise only remove the list
+ */
+ if (real_dev && real_dev->xfrmdev_ops &&
+ real_dev->xfrmdev_ops->xdo_dev_state_free)
+ real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
+ list_del(&ipsec->list);
+ kfree(ipsec);
+ break;
+ }
+ }
+ mutex_unlock(&bond->ipsec_lock);
- if (real_dev && real_dev->xfrmdev_ops &&
- real_dev->xfrmdev_ops->xdo_dev_state_free)
- real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
out:
netdev_put(real_dev, &tracker);
}
--
2.46.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-04 13:11 ` [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa Hangbin Liu
@ 2025-03-05 8:38 ` Nikolay Aleksandrov
2025-03-05 14:13 ` Hangbin Liu
0 siblings, 1 reply; 14+ messages in thread
From: Nikolay Aleksandrov @ 2025-03-05 8:38 UTC (permalink / raw)
To: Hangbin Liu, netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan,
Tariq Toukan, Jianbo Liu, Jarod Wilson, Steffen Klassert,
Cosmin Ratiu, Petr Machata, linux-kselftest, linux-kernel
On 3/4/25 15:11, Hangbin Liu wrote:
> The fixed commit placed mutex_lock() inside spin_lock_bh(), which triggers
> a warning:
>
> BUG: sleeping function called from invalid context at...
>
> Fix this by moving the IPsec deletion operation to bond_ipsec_free_sa,
> which is not held by spin_lock_bh().
>
> Additionally, delete the IPsec list in bond_ipsec_del_sa_all() when the
> XFRM state is DEAD to prevent xdo_dev_state_free() from being triggered
> again in bond_ipsec_free_sa().
>
> For bond_ipsec_free_sa(), there are now three conditions:
>
> 1. if (!slave): When no active device exists.
> 2. if (!xs->xso.real_dev): When xdo_dev_state_add() fails.
> 3. if (xs->xso.real_dev != real_dev): When an xs has already been freed
> by bond_ipsec_del_sa_all() due to migration, and the active slave has
> changed to a new device. At the same time, the xs is marked as DEAD
> due to the XFRM entry is removed, triggering xfrm_state_gc_task() and
> bond_ipsec_free_sa().
>
> In all three cases, xdo_dev_state_free() should not be called, only xs
> should be removed from bond->ipsec list.
>
> At the same time, protect bond_ipsec_del_sa_all and bond_ipsec_add_sa_all
> with x->lock for each xs being processed. This prevents XFRM from
> concurrently initiating add/delete operations on the managed states.
>
> Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex")
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/netdev/20241212062734.182a0164@kernel.org
> Suggested-by: Cosmin Ratiu <cratiu@nvidia.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> drivers/net/bonding/bond_main.c | 53 +++++++++++++++++++++++----------
> 1 file changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index e45bba240cbc..06b060d9b031 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -537,15 +537,22 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
> }
>
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> + spin_lock_bh(&ipsec->xs->lock);
> + /* Skip dead xfrm states, they'll be freed later. */
> + if (ipsec->xs->km.state == XFRM_STATE_DEAD)
> + goto next;
> +
> /* If new state is added before ipsec_lock acquired */
> if (ipsec->xs->xso.real_dev == real_dev)
> - continue;
> + goto next;
>
> ipsec->xs->xso.real_dev = real_dev;
> if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
> slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
> ipsec->xs->xso.real_dev = NULL;
> }
> +next:
> + spin_unlock_bh(&ipsec->xs->lock);
> }
> out:
> mutex_unlock(&bond->ipsec_lock);
> @@ -560,7 +567,6 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> struct net_device *bond_dev = xs->xso.dev;
> struct net_device *real_dev;
> netdevice_tracker tracker;
> - struct bond_ipsec *ipsec;
> struct bonding *bond;
> struct slave *slave;
>
> @@ -592,15 +598,6 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> out:
> netdev_put(real_dev, &tracker);
> - mutex_lock(&bond->ipsec_lock);
> - list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> - if (ipsec->xs == xs) {
> - list_del(&ipsec->list);
> - kfree(ipsec);
> - break;
> - }
> - }
> - mutex_unlock(&bond->ipsec_lock);
> }
>
> static void bond_ipsec_del_sa_all(struct bonding *bond)
> @@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>
> mutex_lock(&bond->ipsec_lock);
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
Second time - you should use list_for_each_entry_safe if you're walking and deleting
elements from the list.
> + spin_lock_bh(&ipsec->xs->lock);
> if (!ipsec->xs->xso.real_dev)
> - continue;
> + goto next;
> +
> + if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
> + /* already dead no need to delete again */
> + if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> + real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
Have you checked if .xdo_dev_state_free can sleep?
I see at least one that can: mlx5e_xfrm_free_state().
> + list_del(&ipsec->list);
> + kfree(ipsec);
> + goto next;
> + }
>
> if (!real_dev->xfrmdev_ops ||
> !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
> @@ -631,6 +638,8 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
> }
> +next:
> + spin_unlock_bh(&ipsec->xs->lock);
> }
> mutex_unlock(&bond->ipsec_lock);
> }
> @@ -640,6 +649,7 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
> struct net_device *bond_dev = xs->xso.dev;
> struct net_device *real_dev;
> netdevice_tracker tracker;
> + struct bond_ipsec *ipsec;
> struct bonding *bond;
> struct slave *slave;
>
> @@ -659,11 +669,22 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
> if (!xs->xso.real_dev)
> goto out;
>
> - WARN_ON(xs->xso.real_dev != real_dev);
> + mutex_lock(&bond->ipsec_lock);
> + list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> + if (ipsec->xs == xs) {
> + /* do xdo_dev_state_free if real_dev matches,
> + * otherwise only remove the list
> + */
> + if (real_dev && real_dev->xfrmdev_ops &&
> + real_dev->xfrmdev_ops->xdo_dev_state_free)
> + real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
> + list_del(&ipsec->list);
> + kfree(ipsec);
> + break;
> + }
> + }
> + mutex_unlock(&bond->ipsec_lock);
>
> - if (real_dev && real_dev->xfrmdev_ops &&
> - real_dev->xfrmdev_ops->xdo_dev_state_free)
> - real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
> out:
> netdev_put(real_dev, &tracker);
> }
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-05 8:38 ` Nikolay Aleksandrov
@ 2025-03-05 14:13 ` Hangbin Liu
2025-03-05 16:12 ` Cosmin Ratiu
0 siblings, 1 reply; 14+ messages in thread
From: Hangbin Liu @ 2025-03-05 14:13 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan,
Tariq Toukan, Jianbo Liu, Jarod Wilson, Steffen Klassert,
Cosmin Ratiu, Petr Machata, linux-kselftest, linux-kernel
On Wed, Mar 05, 2025 at 10:38:36AM +0200, Nikolay Aleksandrov wrote:
> > @@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> >
> > mutex_lock(&bond->ipsec_lock);
> > list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>
> Second time - you should use list_for_each_entry_safe if you're walking and deleting
> elements from the list.
Sorry, I missed this comment. I will update in next version.
>
> > + spin_lock_bh(&ipsec->xs->lock);
> > if (!ipsec->xs->xso.real_dev)
> > - continue;
> > + goto next;
> > +
> > + if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
> > + /* already dead no need to delete again */
> > + if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> > + real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
>
> Have you checked if .xdo_dev_state_free can sleep?
> I see at least one that can: mlx5e_xfrm_free_state().
Hmm, This brings us back to the initial problem. We tried to avoid calling
a spin lock in a sleep context (bond_ipsec_del_sa), but now the new code
encounters this issue again.
With your reply, I also checked the xdo_dev_state_add() in
bond_ipsec_add_sa_all(), which may also sleep, e.g. mlx5e_xfrm_add_state(),
If we unlock the spin lock, then the race came back again.
Any idea about this?
thanks
Hangbin
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-05 14:13 ` Hangbin Liu
@ 2025-03-05 16:12 ` Cosmin Ratiu
2025-03-06 9:37 ` Hangbin Liu
2025-03-06 13:04 ` Hangbin Liu
0 siblings, 2 replies; 14+ messages in thread
From: Cosmin Ratiu @ 2025-03-05 16:12 UTC (permalink / raw)
To: razor@blackwall.org, liuhangbin@gmail.com
Cc: Petr Machata, shuah@kernel.org, andrew+netdev@lunn.ch,
davem@davemloft.net, jv@jvosburgh.net, jarod@redhat.com,
Jianbo Liu, linux-kernel@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, horms@kernel.org, kuba@kernel.org,
Tariq Toukan, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Wed, 2025-03-05 at 14:13 +0000, Hangbin Liu wrote:
> On Wed, Mar 05, 2025 at 10:38:36AM +0200, Nikolay Aleksandrov wrote:
> > > @@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct
> > > bonding *bond)
> > >
> > > mutex_lock(&bond->ipsec_lock);
> > > list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> >
> > Second time - you should use list_for_each_entry_safe if you're
> > walking and deleting
> > elements from the list.
>
> Sorry, I missed this comment. I will update in next version.
>
> >
> > > + spin_lock_bh(&ipsec->xs->lock);
> > > if (!ipsec->xs->xso.real_dev)
> > > - continue;
> > > + goto next;
> > > +
> > > + if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
> > > + /* already dead no need to delete again
> > > */
> > > + if (real_dev->xfrmdev_ops-
> > > >xdo_dev_state_free)
> > > + real_dev->xfrmdev_ops-
> > > >xdo_dev_state_free(ipsec->xs);
> >
> > Have you checked if .xdo_dev_state_free can sleep?
> > I see at least one that can: mlx5e_xfrm_free_state().
>
> Hmm, This brings us back to the initial problem. We tried to avoid
> calling
> a spin lock in a sleep context (bond_ipsec_del_sa), but now the new
> code
> encounters this issue again.
The reason the mutex was added (instead of the spinlock used before)
was exactly because the add and free offload operations could sleep.
> With your reply, I also checked the xdo_dev_state_add() in
> bond_ipsec_add_sa_all(), which may also sleep, e.g.
> mlx5e_xfrm_add_state(),
>
> If we unlock the spin lock, then the race came back again.
>
> Any idea about this?
The race is between bond_ipsec_del_sa_all and bond_ipsec_del_sa (plus
bond_ipsec_free_sa). The issue is that when bond_ipsec_del_sa_all
releases x->lock, bond_ipsec_del_sa can immediately be called, followed
by bond_ipsec_free_sa.
Maybe dropping x->lock after setting real_dev to NULL? I checked,
real_dev is not used anywhere on the free calls, I think. I have
another series refactoring things around real_dev, I hope to be able to
send it soon.
Here's a sketch of this idea:
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding
*bond)
mutex_lock(&bond->ipsec_lock);
list_for_each_entry(ipsec, &bond->ipsec_list, list) {
- if (!ipsec->xs->xso.real_dev)
+ spin_lock(&ipsec->x->lock);
+ if (!ipsec->xs->xso.real_dev) {
+ spin_unlock(&ipsec->x->lock);
continue;
+ }
if (!real_dev->xfrmdev_ops ||
!real_dev->xfrmdev_ops->xdo_dev_state_delete ||
@@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding
*bond)
slave_warn(bond_dev, real_dev,
"%s: no slave
xdo_dev_state_delete\n",
__func__);
- } else {
- real_dev->xfrmdev_ops-
>xdo_dev_state_delete(real_dev, ipsec->xs);
- if (real_dev->xfrmdev_ops->xdo_dev_state_free)
- real_dev->xfrmdev_ops-
>xdo_dev_state_free(ipsec->xs);
- ipsec->xs->xso.real_dev = NULL;
+ spin_unlock(&ipsec->x->lock);
+ continue;
}
+
+ real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
ipsec->xs);
+ ipsec->xs->xso.real_dev = NULL;
+ /* Unlock before freeing device state, it could sleep.
*/
+ spin_unlock(&ipsec->x->lock);
+ if (real_dev->xfrmdev_ops->xdo_dev_state_free)
+ real_dev->xfrmdev_ops-
>xdo_dev_state_free(ipsec->xs);
Cosmin.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-05 16:12 ` Cosmin Ratiu
@ 2025-03-06 9:37 ` Hangbin Liu
2025-03-06 10:02 ` Hangbin Liu
2025-03-06 13:04 ` Hangbin Liu
1 sibling, 1 reply; 14+ messages in thread
From: Hangbin Liu @ 2025-03-06 9:37 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: razor@blackwall.org, Petr Machata, shuah@kernel.org,
andrew+netdev@lunn.ch, davem@davemloft.net, jv@jvosburgh.net,
jarod@redhat.com, Jianbo Liu, linux-kernel@vger.kernel.org,
edumazet@google.com, pabeni@redhat.com, horms@kernel.org,
kuba@kernel.org, Tariq Toukan, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Wed, Mar 05, 2025 at 04:12:18PM +0000, Cosmin Ratiu wrote:
> On Wed, 2025-03-05 at 14:13 +0000, Hangbin Liu wrote:
> > On Wed, Mar 05, 2025 at 10:38:36AM +0200, Nikolay Aleksandrov wrote:
> > > > @@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct
> > > > bonding *bond)
> > > >
> > > > mutex_lock(&bond->ipsec_lock);
> > > > list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > >
> > > Second time - you should use list_for_each_entry_safe if you're
> > > walking and deleting
> > > elements from the list.
> >
> > Sorry, I missed this comment. I will update in next version.
> >
> > >
> > > > + spin_lock_bh(&ipsec->xs->lock);
> > > > if (!ipsec->xs->xso.real_dev)
> > > > - continue;
> > > > + goto next;
> > > > +
> > > > + if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
> > > > + /* already dead no need to delete again
> > > > */
> > > > + if (real_dev->xfrmdev_ops-
> > > > >xdo_dev_state_free)
> > > > + real_dev->xfrmdev_ops-
> > > > >xdo_dev_state_free(ipsec->xs);
> > >
> > > Have you checked if .xdo_dev_state_free can sleep?
> > > I see at least one that can: mlx5e_xfrm_free_state().
> >
> > Hmm, This brings us back to the initial problem. We tried to avoid
> > calling
> > a spin lock in a sleep context (bond_ipsec_del_sa), but now the new
> > code
> > encounters this issue again.
>
> The reason the mutex was added (instead of the spinlock used before)
> was exactly because the add and free offload operations could sleep.
>
> > With your reply, I also checked the xdo_dev_state_add() in
> > bond_ipsec_add_sa_all(), which may also sleep, e.g.
> > mlx5e_xfrm_add_state(),
> >
> > If we unlock the spin lock, then the race came back again.
> >
> > Any idea about this?
>
> The race is between bond_ipsec_del_sa_all and bond_ipsec_del_sa (plus
> bond_ipsec_free_sa). The issue is that when bond_ipsec_del_sa_all
> releases x->lock, bond_ipsec_del_sa can immediately be called, followed
> by bond_ipsec_free_sa.
> Maybe dropping x->lock after setting real_dev to NULL? I checked,
> real_dev is not used anywhere on the free calls, I think. I have
> another series refactoring things around real_dev, I hope to be able to
> send it soon.
>
> Here's a sketch of this idea:
>
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding
> *bond)
>
> mutex_lock(&bond->ipsec_lock);
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> - if (!ipsec->xs->xso.real_dev)
> + spin_lock(&ipsec->x->lock);
> + if (!ipsec->xs->xso.real_dev) {
> + spin_unlock(&ipsec->x->lock);
> continue;
> + }
>
> if (!real_dev->xfrmdev_ops ||
> !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
> @@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding
> *bond)
> slave_warn(bond_dev, real_dev,
> "%s: no slave
> xdo_dev_state_delete\n",
> __func__);
> - } else {
> - real_dev->xfrmdev_ops-
> >xdo_dev_state_delete(real_dev, ipsec->xs);
> - if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> - real_dev->xfrmdev_ops-
> >xdo_dev_state_free(ipsec->xs);
> - ipsec->xs->xso.real_dev = NULL;
> + spin_unlock(&ipsec->x->lock);
> + continue;
> }
> +
> + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
> ipsec->xs);
> + ipsec->xs->xso.real_dev = NULL;
Set xs->xso.real_dev = NULL is a good idea. As we will break
in bond_ipsec_del_sa()/bond_ipsec_free_sa() when there is no
xs->xso.real_dev.
For bond_ipsec_add_sa_all(), I will move the xso.real_dev = real_dev
after .xdo_dev_state_add() in case the following situation.
bond_ipsec_add_sa_all()
spin_unlock(&ipsec->x->lock);
ipsec->xs->xso.real_dev = real_dev;
__xfrm_state_delete x->state = DEAD
- bond_ipsec_del_sa()
- .xdo_dev_state_delete()
.xdo_dev_state_add()
Thanks
Hangbin
> + /* Unlock before freeing device state, it could sleep.
> */
> + spin_unlock(&ipsec->x->lock);
> + if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> + real_dev->xfrmdev_ops-
> >xdo_dev_state_free(ipsec->xs);
>
> Cosmin.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-06 9:37 ` Hangbin Liu
@ 2025-03-06 10:02 ` Hangbin Liu
2025-03-06 13:29 ` Hangbin Liu
2025-03-06 13:37 ` Cosmin Ratiu
0 siblings, 2 replies; 14+ messages in thread
From: Hangbin Liu @ 2025-03-06 10:02 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: razor@blackwall.org, Petr Machata, shuah@kernel.org,
andrew+netdev@lunn.ch, davem@davemloft.net, jv@jvosburgh.net,
jarod@redhat.com, Jianbo Liu, linux-kernel@vger.kernel.org,
edumazet@google.com, pabeni@redhat.com, horms@kernel.org,
kuba@kernel.org, Tariq Toukan, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Thu, Mar 06, 2025 at 09:37:53AM +0000, Hangbin Liu wrote:
> >
> > The reason the mutex was added (instead of the spinlock used before)
> > was exactly because the add and free offload operations could sleep.
> >
> > > With your reply, I also checked the xdo_dev_state_add() in
> > > bond_ipsec_add_sa_all(), which may also sleep, e.g.
> > > mlx5e_xfrm_add_state(),
> > >
> > > If we unlock the spin lock, then the race came back again.
> > >
> > > Any idea about this?
> >
> > The race is between bond_ipsec_del_sa_all and bond_ipsec_del_sa (plus
> > bond_ipsec_free_sa). The issue is that when bond_ipsec_del_sa_all
> > releases x->lock, bond_ipsec_del_sa can immediately be called, followed
> > by bond_ipsec_free_sa.
> > Maybe dropping x->lock after setting real_dev to NULL? I checked,
> > real_dev is not used anywhere on the free calls, I think. I have
> > another series refactoring things around real_dev, I hope to be able to
> > send it soon.
> >
> > Here's a sketch of this idea:
> >
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding
> > *bond)
> >
> > mutex_lock(&bond->ipsec_lock);
> > list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > - if (!ipsec->xs->xso.real_dev)
> > + spin_lock(&ipsec->x->lock);
> > + if (!ipsec->xs->xso.real_dev) {
> > + spin_unlock(&ipsec->x->lock);
> > continue;
> > + }
> >
> > if (!real_dev->xfrmdev_ops ||
> > !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
> > @@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding
> > *bond)
> > slave_warn(bond_dev, real_dev,
> > "%s: no slave
> > xdo_dev_state_delete\n",
> > __func__);
> > - } else {
> > - real_dev->xfrmdev_ops-
> > >xdo_dev_state_delete(real_dev, ipsec->xs);
> > - if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> > - real_dev->xfrmdev_ops-
> > >xdo_dev_state_free(ipsec->xs);
> > - ipsec->xs->xso.real_dev = NULL;
> > + spin_unlock(&ipsec->x->lock);
> > + continue;
> > }
> > +
> > + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
> > ipsec->xs);
> > + ipsec->xs->xso.real_dev = NULL;
>
> Set xs->xso.real_dev = NULL is a good idea. As we will break
> in bond_ipsec_del_sa()/bond_ipsec_free_sa() when there is no
> xs->xso.real_dev.
>
> For bond_ipsec_add_sa_all(), I will move the xso.real_dev = real_dev
> after .xdo_dev_state_add() in case the following situation.
>
> bond_ipsec_add_sa_all()
> spin_unlock(&ipsec->x->lock);
> ipsec->xs->xso.real_dev = real_dev;
> __xfrm_state_delete x->state = DEAD
> - bond_ipsec_del_sa()
> - .xdo_dev_state_delete()
> .xdo_dev_state_add()
Hmm, do we still need to the spin_lock in bond_ipsec_add_sa_all()? With
xs->xso.real_dev = NULL after bond_ipsec_del_sa_all(), it looks there is
no need the spin_lock in bond_ipsec_add_sa_all(). e.g.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 04b677d0c45b..3ada51c63207 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -537,15 +537,27 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
}
list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+ spin_lock_bh(&ipsec->xs->lock);
+ /* Skip dead xfrm states, they'll be freed later. */
+ if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
+ spin_unlock_bh(&ipsec->xs->lock);
+ continue;
+ }
+
/* If new state is added before ipsec_lock acquired */
- if (ipsec->xs->xso.real_dev == real_dev)
+ if (ipsec->xs->xso.real_dev == real_dev) {
+ spin_unlock_bh(&ipsec->xs->lock);
continue;
+ }
- ipsec->xs->xso.real_dev = real_dev;
if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
ipsec->xs->xso.real_dev = NULL;
}
+ /* Set real_dev after .xdo_dev_state_add in case
+ * __xfrm_state_delete() is called in parallel
+ */
+ ipsec->xs->xso.real_dev = real_dev;
}
The spin_lock here seems useless now. What do you think?
Thanks
Hangbin
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-06 10:02 ` Hangbin Liu
@ 2025-03-06 13:29 ` Hangbin Liu
2025-03-06 13:37 ` Cosmin Ratiu
1 sibling, 0 replies; 14+ messages in thread
From: Hangbin Liu @ 2025-03-06 13:29 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: razor@blackwall.org, Petr Machata, shuah@kernel.org,
andrew+netdev@lunn.ch, davem@davemloft.net, jv@jvosburgh.net,
jarod@redhat.com, Jianbo Liu, linux-kernel@vger.kernel.org,
edumazet@google.com, pabeni@redhat.com, horms@kernel.org,
kuba@kernel.org, Tariq Toukan, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Thu, Mar 06, 2025 at 10:02:34AM +0000, Hangbin Liu wrote:
> > Set xs->xso.real_dev = NULL is a good idea. As we will break
> > in bond_ipsec_del_sa()/bond_ipsec_free_sa() when there is no
> > xs->xso.real_dev.
> >
> > For bond_ipsec_add_sa_all(), I will move the xso.real_dev = real_dev
> > after .xdo_dev_state_add() in case the following situation.
> >
> Hmm, do we still need to the spin_lock in bond_ipsec_add_sa_all()? With
> xs->xso.real_dev = NULL after bond_ipsec_del_sa_all(), it looks there is
> no need the spin_lock in bond_ipsec_add_sa_all(). e.g.
>
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 04b677d0c45b..3ada51c63207 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -537,15 +537,27 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
> }
>
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> + spin_lock_bh(&ipsec->xs->lock);
> + /* Skip dead xfrm states, they'll be freed later. */
> + if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
> + spin_unlock_bh(&ipsec->xs->lock);
> + continue;
> + }
> +
> /* If new state is added before ipsec_lock acquired */
> - if (ipsec->xs->xso.real_dev == real_dev)
> + if (ipsec->xs->xso.real_dev == real_dev) {
> + spin_unlock_bh(&ipsec->xs->lock);
> continue;
> + }
>
> - ipsec->xs->xso.real_dev = real_dev;
> if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
> slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
> ipsec->xs->xso.real_dev = NULL;
> }
> + /* Set real_dev after .xdo_dev_state_add in case
> + * __xfrm_state_delete() is called in parallel
> + */
> + ipsec->xs->xso.real_dev = real_dev;
> }
OK, please ignore this, the .xdo_dev_state_add() need xso.real_dev to
be set first. Then I'm still wonder how to avoid the race before
.xdo_dev_state_add() is called, e.g.
bond_ipsec_add_sa_all()
spin_lock_bh(&ipsec->xs->lock);
ipsec->xs->xso.real_dev = real_dev;
spin_unlock(&ipsec->x->lock);
__xfrm_state_delete
- bond_ipsec_del_sa()
- .xdo_dev_state_delete()
- bond_ipsec_free_sa()
- .xdo_dev_state_free()
.xdo_dev_state_add()
Thanks
Hangbin
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-06 10:02 ` Hangbin Liu
2025-03-06 13:29 ` Hangbin Liu
@ 2025-03-06 13:37 ` Cosmin Ratiu
2025-03-07 2:39 ` Hangbin Liu
1 sibling, 1 reply; 14+ messages in thread
From: Cosmin Ratiu @ 2025-03-06 13:37 UTC (permalink / raw)
To: liuhangbin@gmail.com
Cc: Petr Machata, shuah@kernel.org, andrew+netdev@lunn.ch,
davem@davemloft.net, Jianbo Liu, jarod@redhat.com,
razor@blackwall.org, linux-kernel@vger.kernel.org,
pabeni@redhat.com, edumazet@google.com, jv@jvosburgh.net,
horms@kernel.org, kuba@kernel.org, Tariq Toukan,
netdev@vger.kernel.org, steffen.klassert@secunet.com,
linux-kselftest@vger.kernel.org
On Thu, 2025-03-06 at 10:02 +0000, Hangbin Liu wrote:
> > For bond_ipsec_add_sa_all(), I will move the xso.real_dev =
> > real_dev
> > after .xdo_dev_state_add() in case the following situation.
xso.real_dev needs to be initialized before the call to
xdo_dev_state_add, since many of the implementations look in
xso.real_dev to determine on which device to operate on.
So the ordering should be:
- get the lock
- set xso.real_dev to real_dev
- release the lock
- call xdo_dev_state_add
- if it fails, reacquire the lock and set the device to NULL.
Unfortunately, this doesn't seem to protect against the scenario below,
as after dropping the spinlock from bond_ipsec_add_sa_all,
bond_ipsec_del_sa can freely call xdo_dev_state_delete() on real_dev
before xdo_dev_state_add happens.
I don't know what to do in this case...
> >
> > bond_ipsec_add_sa_all()
> > spin_unlock(&ipsec->x->lock);
> > ipsec->xs->xso.real_dev = real_dev;
> > __xfrm_state_delete x->state = DEAD
> > - bond_ipsec_del_sa()
> > - .xdo_dev_state_delete()
> > .xdo_dev_state_add()
Cosmin.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-06 13:37 ` Cosmin Ratiu
@ 2025-03-07 2:39 ` Hangbin Liu
0 siblings, 0 replies; 14+ messages in thread
From: Hangbin Liu @ 2025-03-07 2:39 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: Petr Machata, shuah@kernel.org, andrew+netdev@lunn.ch,
davem@davemloft.net, Jianbo Liu, jarod@redhat.com,
razor@blackwall.org, linux-kernel@vger.kernel.org,
pabeni@redhat.com, edumazet@google.com, jv@jvosburgh.net,
horms@kernel.org, kuba@kernel.org, Tariq Toukan,
netdev@vger.kernel.org, steffen.klassert@secunet.com,
linux-kselftest@vger.kernel.org
On Thu, Mar 06, 2025 at 01:37:15PM +0000, Cosmin Ratiu wrote:
> On Thu, 2025-03-06 at 10:02 +0000, Hangbin Liu wrote:
> > > For bond_ipsec_add_sa_all(), I will move the xso.real_dev =
> > > real_dev
> > > after .xdo_dev_state_add() in case the following situation.
>
> xso.real_dev needs to be initialized before the call to
> xdo_dev_state_add, since many of the implementations look in
> xso.real_dev to determine on which device to operate on.
> So the ordering should be:
> - get the lock
> - set xso.real_dev to real_dev
> - release the lock
> - call xdo_dev_state_add
> - if it fails, reacquire the lock and set the device to NULL.
>
> Unfortunately, this doesn't seem to protect against the scenario below,
> as after dropping the spinlock from bond_ipsec_add_sa_all,
> bond_ipsec_del_sa can freely call xdo_dev_state_delete() on real_dev
> before xdo_dev_state_add happens.
>
> I don't know what to do in this case...
Yes, me neither. How about add a note and leave it there until we
have a solution?
Regards
Hangbin
>
> > >
> > > bond_ipsec_add_sa_all()
> > > spin_unlock(&ipsec->x->lock);
> > > ipsec->xs->xso.real_dev = real_dev;
> > > __xfrm_state_delete x->state = DEAD
> > > - bond_ipsec_del_sa()
> > > - .xdo_dev_state_delete()
> > > .xdo_dev_state_add()
>
> Cosmin.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
2025-03-05 16:12 ` Cosmin Ratiu
2025-03-06 9:37 ` Hangbin Liu
@ 2025-03-06 13:04 ` Hangbin Liu
1 sibling, 0 replies; 14+ messages in thread
From: Hangbin Liu @ 2025-03-06 13:04 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: razor@blackwall.org, Petr Machata, shuah@kernel.org,
andrew+netdev@lunn.ch, davem@davemloft.net, jv@jvosburgh.net,
jarod@redhat.com, Jianbo Liu, linux-kernel@vger.kernel.org,
edumazet@google.com, pabeni@redhat.com, horms@kernel.org,
kuba@kernel.org, Tariq Toukan, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Wed, Mar 05, 2025 at 04:12:18PM +0000, Cosmin Ratiu wrote:
> +++ b/drivers/net/bonding/bond_main.c
> @@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding
> *bond)
>
> mutex_lock(&bond->ipsec_lock);
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> - if (!ipsec->xs->xso.real_dev)
> + spin_lock(&ipsec->x->lock);
> + if (!ipsec->xs->xso.real_dev) {
> + spin_unlock(&ipsec->x->lock);
> continue;
> + }
>
> if (!real_dev->xfrmdev_ops ||
> !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
> @@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding
> *bond)
> slave_warn(bond_dev, real_dev,
> "%s: no slave
> xdo_dev_state_delete\n",
> __func__);
> - } else {
> - real_dev->xfrmdev_ops-
> >xdo_dev_state_delete(real_dev, ipsec->xs);
> - if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> - real_dev->xfrmdev_ops-
> >xdo_dev_state_free(ipsec->xs);
> - ipsec->xs->xso.real_dev = NULL;
> + spin_unlock(&ipsec->x->lock);
> + continue;
> }
> +
> + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
> ipsec->xs);
> + ipsec->xs->xso.real_dev = NULL;
> + /* Unlock before freeing device state, it could sleep.
> */
> + spin_unlock(&ipsec->x->lock);
> + if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> + real_dev->xfrmdev_ops-
> >xdo_dev_state_free(ipsec->xs);
BTW, with setting real_dev = NULL here, I think
> To fix that, these entries should be freed here and the WARN_ON in
> bond_ipsec_free_sa() should be converted to an if...goto out, so that
> bond_ipsec_free_sa() calls would hit one of these conditions:
> 1. "if (!slave)", when no active device exists.
> 2. "if (!xs->xso.real_dev)", when xdo_dev_state_add() failed.
> 3. "if (xs->xso.real_dev != real_dev)", when a DEAD xs was already
> freed by bond_ipsec_del_sa_all() migration to a new device.
> In all 3 cases, xdo_dev_state_free() shouldn't be called, only xs
> removed from the bond->ipsec list.
The if (xs->xso.real_dev != real_dev) should never happen again.
As the real_dev = NULL, it will trigger 2 "if (!xs->xso.real_dev)"
directly.
And in bond_ipsec_add_sa_all(), it will set ipsec->xs->xso.real_dev =
real_dev, which the active slave already finished changing.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCHv4 net 2/3] bonding: fix xfrm offload feature setup on active-backup mode
2025-03-04 13:11 [PATCHv4 net 0/3] bond: fix xfrm offload issues Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa Hangbin Liu
@ 2025-03-04 13:11 ` Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 3/3] selftests: bonding: add ipsec offload test Hangbin Liu
2 siblings, 0 replies; 14+ messages in thread
From: Hangbin Liu @ 2025-03-04 13:11 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Shuah Khan, Tariq Toukan, Jianbo Liu, Jarod Wilson,
Steffen Klassert, Cosmin Ratiu, Petr Machata, linux-kselftest,
linux-kernel, Hangbin Liu
The active-backup bonding mode supports XFRM ESP offload. However, when
a bond is added using command like `ip link add bond0 type bond mode 1
miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
disabled. This occurs because, in bond_newlink(), we change bond link
first and register bond device later. So the XFRM feature update in
bond_option_mode_set() is not called as the bond device is not yet
registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to
ensure that the bond device is registered first before changing the bond
link parameters. This change will allow the XFRM ESP offload feature to be
correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_netlink.c | 16 +++++++++-------
include/net/bonding.h | 1 +
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 06b060d9b031..1fd2c0a5b13d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4410,7 +4410,7 @@ void bond_work_init_all(struct bonding *bond)
INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
}
-static void bond_work_cancel_all(struct bonding *bond)
+void bond_work_cancel_all(struct bonding *bond)
{
cancel_delayed_work_sync(&bond->mii_work);
cancel_delayed_work_sync(&bond->arp_work);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 2a6a424806aa..ed16af6db557 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -568,18 +568,20 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct bonding *bond = netdev_priv(bond_dev);
int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
+ err = register_netdevice(bond_dev);
+ if (err)
return err;
- err = register_netdevice(bond_dev);
- if (!err) {
- struct bonding *bond = netdev_priv(bond_dev);
+ netif_carrier_off(bond_dev);
+ bond_work_init_all(bond);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
+ err = bond_changelink(bond_dev, tb, data, extack);
+ if (err) {
+ bond_work_cancel_all(bond);
+ unregister_netdevice(bond_dev);
}
return err;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 8bb5f016969f..e5e005cd2e17 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -707,6 +707,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
void bond_work_init_all(struct bonding *bond);
+void bond_work_cancel_all(struct bonding *bond);
#ifdef CONFIG_PROC_FS
void bond_create_proc_entry(struct bonding *bond);
--
2.46.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCHv4 net 3/3] selftests: bonding: add ipsec offload test
2025-03-04 13:11 [PATCHv4 net 0/3] bond: fix xfrm offload issues Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 2/3] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
@ 2025-03-04 13:11 ` Hangbin Liu
2025-03-05 10:13 ` Petr Machata
2 siblings, 1 reply; 14+ messages in thread
From: Hangbin Liu @ 2025-03-04 13:11 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Shuah Khan, Tariq Toukan, Jianbo Liu, Jarod Wilson,
Steffen Klassert, Cosmin Ratiu, Petr Machata, linux-kselftest,
linux-kernel, Hangbin Liu
This introduces a test for IPSec offload over bonding, utilizing netdevsim
for the testing process, as veth interfaces do not support IPSec offload.
The test will ensure that the IPSec offload functionality remains operational
even after a failover event occurs in the bonding configuration.
Here is the test result:
TEST: bond_ipsec_offload (active_slave eth0) [ OK ]
TEST: bond_ipsec_offload (active_slave eth1) [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_ipsec_offload.sh | 154 ++++++++++++++++++
.../selftests/drivers/net/bonding/config | 4 +
3 files changed, 160 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 2b10854e4b1e..d5a7de16d33a 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -10,7 +10,8 @@ TEST_PROGS := \
mode-2-recovery-updelay.sh \
bond_options.sh \
bond-eth-type-change.sh \
- bond_macvlan_ipvlan.sh
+ bond_macvlan_ipvlan.sh \
+ bond_ipsec_offload.sh
TEST_FILES := \
lag_lib.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh b/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
new file mode 100755
index 000000000000..4b19949a4c33
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
@@ -0,0 +1,154 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# IPsec over bonding offload test:
+#
+# +----------------+
+# | bond0 |
+# | | |
+# | eth0 eth1 |
+# +---+-------+----+
+#
+# We use netdevsim instead of physical interfaces
+#-------------------------------------------------------------------
+# Example commands
+# ip x s add proto esp src 192.0.2.1 dst 192.0.2.2 \
+# spi 0x07 mode transport reqid 0x07 replay-window 32 \
+# aead 'rfc4106(gcm(aes))' 1234567890123456dcba 128 \
+# sel src 192.0.2.1/24 dst 192.0.2.2/24
+# offload dev bond0 dir out
+# ip x p add dir out src 192.0.2.1/24 dst 192.0.2.2/24 \
+# tmpl proto esp src 192.0.2.1 dst 192.0.2.2 \
+# spi 0x07 mode transport reqid 0x07
+#
+#-------------------------------------------------------------------
+
+lib_dir=$(dirname "$0")
+source "$lib_dir"/../../../net/lib.sh
+algo="aead rfc4106(gcm(aes)) 0x3132333435363738393031323334353664636261 128"
+srcip=192.0.2.1
+dstip=192.0.2.2
+ipsec0=/sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
+ipsec1=/sys/kernel/debug/netdevsim/netdevsim0/ports/1/ipsec
+active_slave=""
+
+active_slave_changed()
+{
+ local old_active_slave=$1
+ local new_active_slave=$(ip -n ${ns} -d -j link show bond0 | \
+ jq -r ".[].linkinfo.info_data.active_slave")
+ [ "$new_active_slave" != "$old_active_slave" -a "$new_active_slave" != "null" ]
+}
+
+test_offload()
+{
+ # use ping to exercise the Tx path
+ ip netns exec $ns ping -I bond0 -c 3 -W 1 -i 0 $dstip >/dev/null
+
+ active_slave=$(ip -n ${ns} -d -j link show bond0 | \
+ jq -r ".[].linkinfo.info_data.active_slave")
+
+ if [ $active_slave = $nic0 ]; then
+ sysfs=$ipsec0
+ elif [ $active_slave = $nic1 ]; then
+ sysfs=$ipsec1
+ else
+ check_err 1 "bond_ipsec_offload invalid active_slave $active_slave"
+ fi
+
+ # The tx/rx order in sysfs may changed after failover
+ grep -q "SA count=2 tx=3" $sysfs && grep -q "tx ipaddr=$dstip" $sysfs
+ check_err $? "incorrect tx count with link ${active_slave}"
+
+ log_test bond_ipsec_offload "active_slave ${active_slave}"
+}
+
+setup_env()
+{
+ if ! mount | grep -q debugfs; then
+ mount -t debugfs none /sys/kernel/debug/ &> /dev/null
+ defer umount /sys/kernel/debug/
+
+ fi
+
+ # setup netdevsim since dummy/veth dev doesn't have offload support
+ if [ ! -w /sys/bus/netdevsim/new_device ] ; then
+ modprobe -q netdevsim
+ if [ $? -ne 0 ]; then
+ echo "SKIP: can't load netdevsim for ipsec offload"
+ exit $ksft_skip
+ fi
+ defer modprobe -r netdevsim
+ fi
+
+ setup_ns ns
+ defer cleanup_ns $ns
+}
+
+setup_bond()
+{
+ ip -n $ns link add bond0 type bond mode active-backup miimon 100
+ ip -n $ns addr add $srcip/24 dev bond0
+ ip -n $ns link set bond0 up
+
+ ifaces=$(ip netns exec $ns bash -c '
+ sysfsnet=/sys/bus/netdevsim/devices/netdevsim0/net/
+ echo "0 2" > /sys/bus/netdevsim/new_device
+ while [ ! -d $sysfsnet ] ; do :; done
+ udevadm settle
+ ls $sysfsnet
+ ')
+ nic0=$(echo $ifaces | cut -f1 -d ' ')
+ nic1=$(echo $ifaces | cut -f2 -d ' ')
+ ip -n $ns link set $nic0 master bond0
+ ip -n $ns link set $nic1 master bond0
+
+ # we didn't create a peer, make sure we can Tx by adding a permanent
+ # neighbour this need to be added after enslave
+ ip -n $ns neigh add $dstip dev bond0 lladdr 00:11:22:33:44:55
+
+ # create offloaded SAs, both in and out
+ ip -n $ns x p add dir out src $srcip/24 dst $dstip/24 \
+ tmpl proto esp src $srcip dst $dstip spi 9 \
+ mode transport reqid 42
+
+ ip -n $ns x p add dir in src $dstip/24 dst $srcip/24 \
+ tmpl proto esp src $dstip dst $srcip spi 9 \
+ mode transport reqid 42
+
+ ip -n $ns x s add proto esp src $srcip dst $dstip spi 9 \
+ mode transport reqid 42 $algo sel src $srcip/24 dst $dstip/24 \
+ offload dev bond0 dir out
+
+ ip -n $ns x s add proto esp src $dstip dst $srcip spi 9 \
+ mode transport reqid 42 $algo sel src $dstip/24 dst $srcip/24 \
+ offload dev bond0 dir in
+
+ # does offload show up in ip output
+ lines=`ip -n $ns x s list | grep -c "crypto offload parameters: dev bond0 dir"`
+ if [ $lines -ne 2 ] ; then
+ check_err 1 "bond_ipsec_offload SA offload missing from list output"
+ fi
+}
+
+trap defer_scopes_cleanup EXIT
+setup_env
+setup_bond
+
+# start Offload testing
+test_offload
+
+# do failover and re-test
+ip -n $ns link set $active_slave down
+slowwait 5 active_slave_changed $active_slave
+test_offload
+
+# make sure offload get removed from driver
+ip -n $ns x s flush
+ip -n $ns x p flush
+line0=$(grep -c "SA count=0" $ipsec0)
+line1=$(grep -c "SA count=0" $ipsec1)
+[ $line0 -ne 1 -o $line1 -ne 1 ]
+check_fail $? "bond_ipsec_offload SA not removed from driver"
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
index dad4e5fda4db..054fb772846f 100644
--- a/tools/testing/selftests/drivers/net/bonding/config
+++ b/tools/testing/selftests/drivers/net/bonding/config
@@ -9,3 +9,7 @@ CONFIG_NET_CLS_FLOWER=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NLMON=y
CONFIG_VETH=y
+CONFIG_INET_ESP=y
+CONFIG_INET_ESP_OFFLOAD=y
+CONFIG_XFRM_USER=m
+CONFIG_NETDEVSIM=m
--
2.46.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCHv4 net 3/3] selftests: bonding: add ipsec offload test
2025-03-04 13:11 ` [PATCHv4 net 3/3] selftests: bonding: add ipsec offload test Hangbin Liu
@ 2025-03-05 10:13 ` Petr Machata
0 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2025-03-05 10:13 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Shuah Khan, Tariq Toukan, Jianbo Liu, Jarod Wilson,
Steffen Klassert, Cosmin Ratiu, Petr Machata, linux-kselftest,
linux-kernel
Hangbin Liu <liuhangbin@gmail.com> writes:
> This introduces a test for IPSec offload over bonding, utilizing netdevsim
> for the testing process, as veth interfaces do not support IPSec offload.
> The test will ensure that the IPSec offload functionality remains operational
> even after a failover event occurs in the bonding configuration.
>
> Here is the test result:
>
> TEST: bond_ipsec_offload (active_slave eth0) [ OK ]
> TEST: bond_ipsec_offload (active_slave eth1) [ OK ]
>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
^ permalink raw reply [flat|nested] 14+ messages in thread