All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hangbin Liu <liuhangbin@gmail.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: "razor@blackwall.org" <razor@blackwall.org>,
	Petr Machata <petrm@nvidia.com>,
	"shuah@kernel.org" <shuah@kernel.org>,
	"andrew+netdev@lunn.ch" <andrew+netdev@lunn.ch>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"jv@jvosburgh.net" <jv@jvosburgh.net>,
	"jarod@redhat.com" <jarod@redhat.com>,
	Jianbo Liu <jianbol@nvidia.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"edumazet@google.com" <edumazet@google.com>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"horms@kernel.org" <horms@kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"steffen.klassert@secunet.com" <steffen.klassert@secunet.com>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>
Subject: Re: [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
Date: Thu, 6 Mar 2025 10:02:24 +0000	[thread overview]
Message-ID: <Z8lysOLMnYoknLsW@fedora> (raw)
In-Reply-To: <Z8ls6fAwBtiV_C9b@fedora>

On Thu, Mar 06, 2025 at 09:37:53AM +0000, Hangbin Liu wrote:
> > 
> > The reason the mutex was added (instead of the spinlock used before)
> > was exactly because the add and free offload operations could sleep.
> > 
> > > With your reply, I also checked the xdo_dev_state_add() in
> > > bond_ipsec_add_sa_all(), which may also sleep, e.g.
> > > mlx5e_xfrm_add_state(),
> > > 
> > > If we unlock the spin lock, then the race came back again.
> > > 
> > > Any idea about this?
> > 
> > The race is between bond_ipsec_del_sa_all and bond_ipsec_del_sa (plus
> > bond_ipsec_free_sa). The issue is that when bond_ipsec_del_sa_all
> > releases x->lock, bond_ipsec_del_sa can immediately be called, followed
> > by bond_ipsec_free_sa.
> > Maybe dropping x->lock after setting real_dev to NULL? I checked,
> > real_dev is not used anywhere on the free calls, I think. I have
> > another series refactoring things around real_dev, I hope to be able to
> > send it soon.
> > 
> > Here's a sketch of this idea:
> > 
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding
> > *bond)
> >  
> >         mutex_lock(&bond->ipsec_lock);
> >         list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > -               if (!ipsec->xs->xso.real_dev)
> > +               spin_lock(&ipsec->x->lock);
> > +               if (!ipsec->xs->xso.real_dev) {
> > +                       spin_unlock(&ipsec->x->lock);
> >                         continue;
> > +               }
> >  
> >                 if (!real_dev->xfrmdev_ops ||
> >                     !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
> > @@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding
> > *bond)
> >                         slave_warn(bond_dev, real_dev,
> >                                    "%s: no slave
> > xdo_dev_state_delete\n",
> >                                    __func__);
> > -               } else {
> > -                       real_dev->xfrmdev_ops-
> > >xdo_dev_state_delete(real_dev, ipsec->xs);
> > -                       if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> > -                               real_dev->xfrmdev_ops-
> > >xdo_dev_state_free(ipsec->xs);
> > -                       ipsec->xs->xso.real_dev = NULL;
> > +                       spin_unlock(&ipsec->x->lock);
> > +                       continue;
> >                 }
> > +
> > +               real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
> > ipsec->xs);
> > +               ipsec->xs->xso.real_dev = NULL;
> 
> Set xs->xso.real_dev = NULL is a good idea. As we will break
> in bond_ipsec_del_sa()/bond_ipsec_free_sa() when there is no
> xs->xso.real_dev.
> 
> For bond_ipsec_add_sa_all(), I will move the xso.real_dev = real_dev
> after .xdo_dev_state_add() in case the following situation.
> 
> bond_ipsec_add_sa_all()
> spin_unlock(&ipsec->x->lock);
> ipsec->xs->xso.real_dev = real_dev;
>                                            __xfrm_state_delete x->state = DEAD
>                                               - bond_ipsec_del_sa()
>                                                 - .xdo_dev_state_delete()
> .xdo_dev_state_add()


Hmm, do we still need to the spin_lock in bond_ipsec_add_sa_all()? With
xs->xso.real_dev = NULL after bond_ipsec_del_sa_all(), it looks there is
no need the spin_lock in bond_ipsec_add_sa_all(). e.g.


diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 04b677d0c45b..3ada51c63207 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -537,15 +537,27 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 	}
 
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+		spin_lock_bh(&ipsec->xs->lock);
+		/* Skip dead xfrm states, they'll be freed later. */
+		if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
+			spin_unlock_bh(&ipsec->xs->lock);
+			continue;
+		}
+
 		/* If new state is added before ipsec_lock acquired */
-		if (ipsec->xs->xso.real_dev == real_dev)
+		if (ipsec->xs->xso.real_dev == real_dev) {
+			spin_unlock_bh(&ipsec->xs->lock);
 			continue;
+		}
 
-		ipsec->xs->xso.real_dev = real_dev;
 		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			ipsec->xs->xso.real_dev = NULL;
 		}
+		/* Set real_dev after .xdo_dev_state_add in case
+		 * __xfrm_state_delete() is called in parallel
+		 */
+		ipsec->xs->xso.real_dev = real_dev;
 	}

The spin_lock here seems useless now. What do you think?

Thanks
Hangbin

  reply	other threads:[~2025-03-06 10:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04 13:11 [PATCHv4 net 0/3] bond: fix xfrm offload issues Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa Hangbin Liu
2025-03-05  8:38   ` Nikolay Aleksandrov
2025-03-05 14:13     ` Hangbin Liu
2025-03-05 16:12       ` Cosmin Ratiu
2025-03-06  9:37         ` Hangbin Liu
2025-03-06 10:02           ` Hangbin Liu [this message]
2025-03-06 13:29             ` Hangbin Liu
2025-03-06 13:37             ` Cosmin Ratiu
2025-03-07  2:39               ` Hangbin Liu
2025-03-06 13:04         ` Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 2/3] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
2025-03-04 13:11 ` [PATCHv4 net 3/3] selftests: bonding: add ipsec offload test Hangbin Liu
2025-03-05 10:13   ` Petr Machata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8lysOLMnYoknLsW@fedora \
    --to=liuhangbin@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jarod@redhat.com \
    --cc=jianbol@nvidia.com \
    --cc=jv@jvosburgh.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    --cc=razor@blackwall.org \
    --cc=shuah@kernel.org \
    --cc=steffen.klassert@secunet.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.