From: Hangbin Liu <liuhangbin@gmail.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: netdev@vger.kernel.org, Jay Vosburgh <jv@jvosburgh.net>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Nikolay Aleksandrov <razor@blackwall.org>,
Simon Horman <horms@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>, Jianbo Liu <jianbol@nvidia.com>,
Steffen Klassert <steffen.klassert@secunet.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
Ayush Sawal <ayush.sawal@chelsio.com>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Sunil Goutham <sgoutham@marvell.com>,
Geetha sowjanya <gakula@marvell.com>,
Subbaraya Sundeep <sbhatta@marvell.com>,
hariprasad <hkelam@marvell.com>,
Bharat Bhushan <bbhushan2@marvell.com>,
Louis Peens <louis.peens@corigine.com>,
Leon Romanovsky <leonro@nvidia.com>,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net-next v2 5/6] bonding: Mark active offloaded xfrm_states
Date: Thu, 10 Apr 2025 02:54:40 +0000 [thread overview]
Message-ID: <Z_cy8OLV4ZABRSrA@fedora> (raw)
In-Reply-To: <20250409144133.2833606-6-cratiu@nvidia.com>
On Wed, Apr 09, 2025 at 05:41:32PM +0300, Cosmin Ratiu wrote:
> When the active link is changed for a bond device, the existing xfrm
> states need to be migrated over to the new link. This is done with:
> - bond_ipsec_del_sa_all() goes through the offloaded states list and
> removes all of them from hw.
> - bond_ipsec_add_sa_all() re-offloads all states to the new device.
>
> But because the offload status of xfrm states isn't marked in any way,
> there can be bugs.
>
> When all bond links are down, bond_ipsec_del_sa_all() unoffloads
> everything from the previous active link. If the same link then comes
> back up, nothing gets reoffloaded by bond_ipsec_add_sa_all().
> This results in a stack trace like this a bit later when user space
> removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to
> remove a rule that's no longer offloaded:
>
> [] Call Trace:
> [] <TASK>
> [] ? __warn+0x7d/0x110
> [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
> [] ? report_bug+0x16d/0x180
> [] ? handle_bug+0x4f/0x90
> [] ? exc_invalid_op+0x14/0x70
> [] ? asm_exc_invalid_op+0x16/0x20
> [] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core]
> [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
> [] bond_ipsec_del_sa+0x1ab/0x200 [bonding]
> [] xfrm_dev_state_delete+0x1f/0x60
> [] __xfrm_state_delete+0x196/0x200
> [] xfrm_state_delete+0x21/0x40
> [] xfrm_del_sa+0x69/0x110
> [] xfrm_user_rcv_msg+0x11d/0x300
> [] ? release_pages+0xca/0x140
> [] ? copy_to_user_tmpl.part.0+0x110/0x110
> [] netlink_rcv_skb+0x54/0x100
> [] xfrm_netlink_rcv+0x31/0x40
> [] netlink_unicast+0x1fc/0x2d0
> [] netlink_sendmsg+0x1e4/0x410
> [] __sock_sendmsg+0x38/0x60
> [] sock_write_iter+0x94/0xf0
> [] vfs_write+0x338/0x3f0
> [] ksys_write+0xba/0xd0
> [] do_syscall_64+0x4c/0x100
> [] entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
> There's also another theoretical bug:
> Calling bond_ipsec_del_sa_all() multiple times can result in corruption
> in the driver implementation if the double-free isn't tolerated. This
> isn't nice.
>
> Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm
> state was unoffloaded from a device, but a race with netdevsim's
> .xdo_dev_offload_ok() accessing real_dev was considered a sufficient
> reason to not set real_dev to NULL anymore. This unfortunately
> introduced the new bugs.
>
> Since .xdo_dev_offload_ok() was significantly refactored by [1] and
> there are no more users in the stack of xso.real_dev, that
> race is now gone and xs->xso.real_dev can now once again be used to
> represent which device (if any) currently holds the offloaded rule.
>
> Go one step further and set real_dev after add/before delete calls, to
> catch any future driver misuses of real_dev.
>
> [1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/
> Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/net/bonding/bond_main.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 4ba525a564c5..14f7c9712ad4 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -496,9 +496,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev,
> goto out;
> }
>
> - xs->xso.real_dev = real_dev;
> err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
> if (!err) {
> + xs->xso.real_dev = real_dev;
> ipsec->xs = xs;
> INIT_LIST_HEAD(&ipsec->list);
> mutex_lock(&bond->ipsec_lock);
> @@ -540,12 +540,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
> if (ipsec->xs->xso.real_dev == real_dev)
> continue;
>
> - ipsec->xs->xso.real_dev = real_dev;
> if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
> ipsec->xs, NULL)) {
> slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
> - ipsec->xs->xso.real_dev = NULL;
> + continue;
> }
> + ipsec->xs->xso.real_dev = real_dev;
> }
> out:
> mutex_unlock(&bond->ipsec_lock);
> @@ -629,6 +629,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> __func__);
> continue;
> }
> + ipsec->xs->xso.real_dev = NULL;
> real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
> ipsec->xs);
> if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> @@ -664,6 +665,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
>
> WARN_ON(xs->xso.real_dev != real_dev);
>
> + xs->xso.real_dev = NULL;
> if (real_dev && real_dev->xfrmdev_ops &&
> real_dev->xfrmdev_ops->xdo_dev_state_free)
> real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
> --
> 2.45.0
>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Thanks
Hangbin
next prev parent reply other threads:[~2025-04-10 2:54 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 14:41 [PATCH net-next v2 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
2025-04-09 14:41 ` [PATCH net-next v2 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily Cosmin Ratiu
2025-04-09 14:41 ` [PATCH net-next v2 2/6] xfrm: Use xdo.dev instead of xdo.real_dev Cosmin Ratiu
2025-04-09 14:41 ` [PATCH net-next v2 3/6] xfrm: Remove unneeded device check from validate_xmit_xfrm Cosmin Ratiu
2025-04-09 14:41 ` [PATCH net-next v2 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Cosmin Ratiu
2025-04-09 14:41 ` [PATCH net-next v2 5/6] bonding: Mark active offloaded xfrm_states Cosmin Ratiu
2025-04-10 2:54 ` Hangbin Liu [this message]
2025-04-09 14:41 ` [PATCH net-next v2 6/6] bonding: Fix multiple long standing offload races Cosmin Ratiu
2025-04-10 3:07 ` Hangbin Liu
2025-04-11 7:06 ` Cosmin Ratiu
2025-04-10 11:10 ` [PATCH net-next v2 0/6] xfrm & bonding: Correct use of xso.real_dev Nikolay Aleksandrov
2025-04-11 7:46 ` Cosmin Ratiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_cy8OLV4ZABRSrA@fedora \
--to=liuhangbin@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=anthony.l.nguyen@intel.com \
--cc=ayush.sawal@chelsio.com \
--cc=bbhushan2@marvell.com \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gakula@marvell.com \
--cc=herbert@gondor.apana.org.au \
--cc=hkelam@marvell.com \
--cc=horms@kernel.org \
--cc=jianbol@nvidia.com \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kselftest@vger.kernel.org \
--cc=louis.peens@corigine.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=razor@blackwall.org \
--cc=saeedm@nvidia.com \
--cc=sbhatta@marvell.com \
--cc=sgoutham@marvell.com \
--cc=steffen.klassert@secunet.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.