All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hangbin Liu <liuhangbin@gmail.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: "razor@blackwall.org" <razor@blackwall.org>,
	"andrew+netdev@lunn.ch" <andrew+netdev@lunn.ch>,
	"jarod@redhat.com" <jarod@redhat.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	Tariq Toukan <tariqt@nvidia.com>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"shuah@kernel.org" <shuah@kernel.org>,
	"steffen.klassert@secunet.com" <steffen.klassert@secunet.com>,
	"jv@jvosburgh.net" <jv@jvosburgh.net>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"horms@kernel.org" <horms@kernel.org>,
	"edumazet@google.com" <edumazet@google.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jianbo Liu <jianbol@nvidia.com>,
	"pabeni@redhat.com" <pabeni@redhat.com>
Subject: Re: [PATCHv3 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa
Date: Tue, 4 Mar 2025 09:18:33 +0000	[thread overview]
Message-ID: <Z8bFaXOH_Iwi-Nuj@fedora> (raw)
In-Reply-To: <76ed1d018596b81548d095aa2d4a9b31b360479c.camel@nvidia.com>

Hi Cosmin,
On Fri, Feb 28, 2025 at 10:31:58AM +0000, Cosmin Ratiu wrote:
> On Fri, 2025-02-28 at 02:20 +0000, Hangbin Liu wrote:
> > On Thu, Feb 27, 2025 at 03:31:01PM +0200, Nikolay Aleksandrov wrote:
> > > > > One more thing - note I'm not an xfrm expert by far but it
> > > > > seems to me here you have
> > > > > to also call  xdo_dev_state_free() with the old active slave
> > > > > dev otherwise that will
> > > > > never get called with the original real_dev after the switch to
> > > > > a new
> > > > > active slave (or more accurately it might if the GC runs
> > > > > between the switching
> > > > > but it is a race), care must be taken wrt sequence of events
> > > > > because the XFRM
> > > > 
> > > > Can we just call xs->xso.real_dev->xfrmdev_ops-
> > > > >xdo_dev_state_free(xs)
> > > > no matter xs->xso.real_dev == real_dev or not? I'm afraid calling
> > > > xdo_dev_state_free() every where may make us lot more easily.
> > > > 
> > > 
> > > You'd have to check all drivers that implement the callback to
> > > answer that and even then
> > > I'd stick to the canonical way of how it's done in xfrm and make
> > > the bond just passthrough.
> > > Any other games become dangerous and new code will have to be
> > > carefully reviewed every
> > > time, calling another device's free_sa when it wasn't added before
> > > doesn't sound good.
> > > 
> > > > > GC may be running in parallel which probably means that in
> > > > > bond_ipsec_free_sa()
> > > > > you'll have to take the mutex before calling
> > > > > xdo_dev_state_free() and check
> > > > > if the entry is still linked in the bond's ipsec list before
> > > > > calling the free_sa
> > > > > callback, if it isn't then del_sa_all got to it before the GC
> > > > > and there's nothing
> > > > > to do if it also called the dev's free_sa callback. The check
> > > > > for real_dev doesn't
> > > > > seem enough to protect against this race.
> > > > 
> > > > I agree that we need to take the mutex before calling
> > > > xdo_dev_state_free()
> > > > in bond_ipsec_free_sa(). Do you think if this is enough? I'm a
> > > > bit lot here.
> > > > 
> > > > Thanks
> > > > Hangbin
> > > 
> > > Well, the race is between the xfrm GC and del_sa_all, in bond's
> > > free_sa if you
> > > walk the list under the mutex before calling real_dev's free
> > > callback and
> > > don't find the current element that's being freed in free_sa then
> > > it was
> > > cleaned up by del_sa_all, otherwise del_sa_all is waiting to walk
> > > that
> > > list and clean the entries. I think it should be fine as long as
> > > free_sa
> > > was called once with the proper device.
> > 
> > OK, so the free will be called either in del_sa_all() or free_sa().
> > Something like this?
> > 
> [...]
> 
> Unfortunately, after applying these changes and reasoning about them
> for a bit, I don't think this will work. There are still races left.
> For example:
> 1. An xs is marked DEAD (in __xfrm_state_delete, with x->lock held) and
> before .xdo_dev_state_delete() is called on it, bond_ipsec_del_sa_all
> is called in parallel, doesn't call delete on xs (because it's dead),
> then calls free (incorrect without delete first), then removes the list
> entry. Later, xdo_dev_state_delete( == bond_ipsec_del_sa) is called,
> and calls delete (incorrect, out of order with free). Finally,
> bond_ipsec_free_sa is called, which fortunately doesn't do anything
> silly in the new proposed form because xs is no longer in the list.
> 
> 2. A more sinister form of the above race can happen when 
> bond_ipsec_del_sa_all() calls delete on real_dev, then in parallel and
> immediately after __xfrm_state_delete marks xs as DEAD and calls
> bond_ipsec_del_sa() which happily calls delete on real_dev again.
> 
> In order to fix these races (and others like it), I think
> bond_ipsec_del_sa_all and bond_ipsec_add_sa_all *need* to acquire x-
> >lock for each xs being processed. This would prevent xfrm from
> concurrently initiating add/delete operations on the managed states.
> 

Just to make sure I added the lock in correct place, would you please help
confirm.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e85878b12376..c59ad3a5cf43 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -537,19 +537,25 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 	}
 
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+		spin_lock_bh(&ipsec->xs->lock);
 		/* Skip dead xfrm states, they'll be freed later. */
-		if (ipsec->xs->km.state == XFRM_STATE_DEAD)
+		if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
+			spin_unlock_bh(&ipsec->xs->lock);
 			continue;
+		}
 
 		/* If new state is added before ipsec_lock acquired */
-		if (ipsec->xs->xso.real_dev == real_dev)
+		if (ipsec->xs->xso.real_dev == real_dev) {
+			spin_unlock_bh(&ipsec->xs->lock);
 			continue;
+		}
 
 		ipsec->xs->xso.real_dev = real_dev;
 		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			ipsec->xs->xso.real_dev = NULL;
 		}
+		spin_unlock_bh(&ipsec->xs->lock);
 	}
 out:
 	mutex_unlock(&bond->ipsec_lock);
@@ -614,6 +620,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 		if (!ipsec->xs->xso.real_dev)
 			continue;
 
+		spin_lock_bh(&ipsec->xs->lock);
 		if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
 			/* already dead no need to delete again */
 			if (ipsec->xs->xso.real_dev == real_dev &&
@@ -621,6 +628,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
 			list_del(&ipsec->list);
 			kfree(ipsec);
+			spin_unlock_bh(&ipsec->xs->lock);
 			continue;
 		}
 
@@ -635,6 +643,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 			if (real_dev->xfrmdev_ops->xdo_dev_state_free)
 				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
 		}
+		spin_unlock_bh(&ipsec->xs->lock);
 	}
 	mutex_unlock(&bond->ipsec_lock);
 }

Thanks
Hangbin

  parent reply	other threads:[~2025-03-04  9:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27  8:37 [PATCHv3 net 0/3] bond: fix xfrm offload issues Hangbin Liu
2025-02-27  8:37 ` [PATCHv3 net 1/3] bonding: move IPsec deletion to bond_ipsec_free_sa Hangbin Liu
2025-02-27  8:50   ` Nikolay Aleksandrov
2025-02-27  9:21     ` Nikolay Aleksandrov
2025-02-27 13:21       ` Hangbin Liu
2025-02-27 13:31         ` Nikolay Aleksandrov
2025-02-28  2:20           ` Hangbin Liu
2025-02-28 10:31             ` Cosmin Ratiu
2025-02-28 11:07               ` Nikolay Aleksandrov
2025-02-28 11:10                 ` Nikolay Aleksandrov
2025-02-28 12:59               ` Hangbin Liu
2025-03-04  9:18               ` Hangbin Liu [this message]
2025-03-04 10:25                 ` Cosmin Ratiu
2025-02-27  8:37 ` [PATCHv3 net 2/3] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
2025-02-27  8:37 ` [PATCHv3 net 3/3] selftests: bonding: add ipsec offload test Hangbin Liu
2025-02-27 13:59   ` Petr Machata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8bFaXOH_Iwi-Nuj@fedora \
    --to=liuhangbin@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jarod@redhat.com \
    --cc=jianbol@nvidia.com \
    --cc=jv@jvosburgh.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=razor@blackwall.org \
    --cc=shuah@kernel.org \
    --cc=steffen.klassert@secunet.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.