From: Hangbin Liu <liuhangbin@gmail.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: netdev@vger.kernel.org, Jay Vosburgh <jv@jvosburgh.net>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>, Taehee Yoo <ap420073@gmail.com>,
Jianbo Liu <jianbol@nvidia.com>,
Sabrina Dubroca <sd@queasysnail.net>,
Steffen Klassert <steffen.klassert@secunet.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
Leon Romanovsky <leonro@nvidia.com>
Subject: Re: [RFC PATCH ipsec 2/2] bonding: Maintain offloaded xfrm on all devices
Date: Fri, 5 Dec 2025 03:48:51 +0000 [thread overview]
Message-ID: <aTJWI3aybYO-NHg5@fedora> (raw)
In-Reply-To: <20251121151644.1797728-3-cratiu@nvidia.com>
On Fri, Nov 21, 2025 at 05:16:44PM +0200, Cosmin Ratiu wrote:
> The bonding driver manages offloaded SAs using the following strategy:
>
> An xfrm_state offloaded on the bond device with bond_ipsec_add_sa() uses
> 'real_dev' on the xfrm_state xs to redirect the offload to the current
> active slave. The corresponding bond_ipsec_del_sa() (called with the xs
> spinlock held) redirects the unoffload call to real_dev. Finally,
> cleanup happens in bond_ipsec_free_sa(), which removes the offload from
> the device. Since the last call happens without the xs spinlock held,
> that is where the real work to unoffload actually happens.
>
> When the active slave changes to a new device a 3-step process is used
> to migrate all xfrm states to the new device:
> 1. bond_ipsec_del_sa_all() unoffloads all states in bond->ipsec_list
> from the previously active device.
> 2. The active slave is flipped to the new device.
> 3. bond_ipsec_add_sa_all() offloads all states in bond->ipsec_list to
> the new device.
>
> There can be two races which result in unencrypted IPSec packets being
> transmitted on the wire:
>
> 1. Unencrypted IPSec packet on old_dev:
> CPU1 (xfrm_output) CPU2 (bond_change_active_slave)
> bond_ipsec_offload_ok -> true
> bond_ipsec_del_sa_all
> bond_xmit_activebackup
> bond_dev_queue_xmit
> dev_queue_xmit on old_dev
> bond->curr_active_slave = new_dev
> bond_ipsec_add_sa_all
>
> 2. Unencrypted IPSec packet on new_dev:
> CPU1 (xfrm_output) CPU2 (bond_change_active_slave)
> bond_ipsec_offload_ok -> true
> bond->curr_active_slave = new_dev
> bond_ipsec_migrate_sa_all
> bond_xmit_activebackup
> bond_dev_queue_xmit
> dev_queue_xmit on new_dev
> bond_ipsec_migrate_sa_all finishes
>
> This patch fixes both these issues. Bonding now maintain SAs on all
> devices by making use of the previous patch that allows the same xfrm
> state to be offloaded on multiple devices. This consists of:
>
> 1. Maintaining two linked lists:
> - bond->ipsec_list is the list of xfrm states offloaded to the bonding
> device.
> - Each slave has its own bond->ipsec_offloads list holding offloads of
> bond->ipsec_list on that slave.
> These lists are protected by the existing bond->ipsec_lock mutex.
>
> 2. When a slave is added (bond_enslave), bond_ipsec_add_sa_all now
> offloads all xfrm states to the new device.
>
> 3. When a slave is removed (__bond_release_one), bond_ipsec_del_sa_all
> now removes all xfrm state offloads from that device.
>
> 4. When the active slave is changed (bond_change_active_slave), a new
> bond_ipsec_migrate_sa_all function switches xs->xso.real_dev and
> xs->xso.offload handle for all offloaded xfrm states.
> xdo_dev_state_advance_esn is also called on the new device to update
> the esn state.
>
> 5. Adding an offloaded xfrm state to the bond device must now iterate
> through active slaves. To make that nice, RTNL is grabbed there. The
> alternative is repeatedly grabbing each slave under the RCU lock,
> holding it, releasing the lock to be able to offload a state, then
> re-grabbing the RCU lock and releasing the slave. RTNL seems cleaner.
>
> 6. bond_ipsec_del_sa (.xdo_dev_state_delete for bond) is unchanged, it
> now only deletes the state from the active device and leaves the rest
> for the xdo_dev_state_free callback, which can grab the required
> locks.
>
> Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
> drivers/net/bonding/bond_main.c | 283 +++++++++++++++++---------------
> include/net/bonding.h | 22 ++-
> 2 files changed, 164 insertions(+), 141 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 4c5b73786877..979e5aabf8d2 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -452,6 +452,61 @@ static struct net_device *bond_ipsec_dev(struct xfrm_state *xs)
> return slave->dev;
> }
>
> +static struct bond_ipsec_offload*
> +bond_ipsec_dev_add_sa(struct net_device *dev, struct bond_ipsec *ipsec,
> + struct netlink_ext_ack *extack)
> +{
> + struct bond_ipsec_offload *offload;
> + int err;
> +
> + if (!dev->xfrmdev_ops ||
> + !dev->xfrmdev_ops->xdo_dev_state_add ||
> + netif_is_bond_master(dev)) {
> + NL_SET_ERR_MSG_MOD(extack,
> + "Slave does not support ipsec offload");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + offload = kzalloc(sizeof(*offload), GFP_KERNEL);
> + if (!offload)
> + return ERR_PTR(-ENOMEM);
> +
> + offload->ipsec = ipsec;
> + offload->dev = dev;
> + err = dev->xfrmdev_ops->xdo_dev_state_add(dev, ipsec->xs,
> + &offload->handle, extack);
> + if (err)
Here we need to free the offload.
> + return ERR_PTR(err);
> + return offload;
> +}
> +
> +static void bond_ipsec_dev_del_sa(struct bond_ipsec_offload *offload)
> +{
> + struct xfrm_state *xs = offload->ipsec->xs;
> + struct net_device *dev = offload->dev;
> +
> + if (dev->xfrmdev_ops->xdo_dev_state_delete) {
> + spin_lock_bh(&xs->lock);
> + /* Don't double delete states killed by the user
> + * from xs->xso.real_dev.
> + */
> + if (dev != xs->xso.real_dev ||
> + xs->km.state != XFRM_STATE_DEAD)
> + dev->xfrmdev_ops->xdo_dev_state_delete(dev, xs,
> + offload->handle);
> + if (xs->xso.real_dev == dev)
> + xs->xso.real_dev = NULL;
> + spin_unlock_bh(&xs->lock);
> + }
> +
> + if (dev->xfrmdev_ops->xdo_dev_state_free)
> + dev->xfrmdev_ops->xdo_dev_state_free(dev, xs, offload->handle);
> +
> + list_del(&offload->list);
> + list_del(&offload->ipsec_list);
> + kfree(offload);
> +}
> +
[...]
> -static void bond_ipsec_add_sa_all(struct bonding *bond)
> +static void bond_ipsec_add_sa_all(struct bonding *bond, struct slave *new_slave,
> + struct netlink_ext_ack *extack)
> {
> + struct net_device *real_dev = new_slave->dev;
> struct net_device *bond_dev = bond->dev;
> - struct net_device *real_dev;
> struct bond_ipsec *ipsec;
> struct slave *slave;
>
> - slave = rtnl_dereference(bond->curr_active_slave);
> - real_dev = slave ? slave->dev : NULL;
> - if (!real_dev)
> - return;
> + INIT_LIST_HEAD(&new_slave->ipsec_offloads);
>
> mutex_lock(&bond->ipsec_lock);
> - if (!real_dev->xfrmdev_ops ||
> - !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> - netif_is_bond_master(real_dev)) {
> - if (!list_empty(&bond->ipsec_list))
> - slave_warn(bond_dev, real_dev,
> - "%s: no slave xdo_dev_state_add\n",
> - __func__);
> - goto out;
> - }
> -
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> + struct bond_ipsec_offload *offload;
> struct xfrm_state *xs = ipsec->xs;
>
> - /* If new state is added before ipsec_lock acquired */
> - if (xs->xso.real_dev == real_dev)
> - continue;
> -
> - if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs,
> - &xs->xso.offload_handle,
> - NULL)) {
> + offload = bond_ipsec_dev_add_sa(slave->dev, ipsec, extack);
Here should be real_dev.
> + if (IS_ERR(offload)) {
> slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
> continue;
> }
If we add offload failed on the slave, what would happen during migrate?
Thanks
Hangbin
next prev parent reply other threads:[~2025-12-05 3:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-21 15:16 [RFC PATCH ipsec 0/2] Fix bonding IPSec races Cosmin Ratiu
2025-11-21 15:16 ` [RFC PATCH ipsec 1/2] xfrm: Add explicit offload_handle to some xfrm callbacks Cosmin Ratiu
2025-11-21 15:16 ` [RFC PATCH ipsec 2/2] bonding: Maintain offloaded xfrm on all devices Cosmin Ratiu
2025-12-05 3:48 ` Hangbin Liu [this message]
2025-12-01 9:01 ` [RFC PATCH ipsec 0/2] Fix bonding IPSec races Steffen Klassert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTJWI3aybYO-NHg5@fedora \
--to=liuhangbin@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=ap420073@gmail.com \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=herbert@gondor.apana.org.au \
--cc=horms@kernel.org \
--cc=jianbol@nvidia.com \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
--cc=steffen.klassert@secunet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).