From: Hangbin Liu <liuhangbin@gmail.com>
To: Cosmin Ratiu <cratiu@nvidia.com>
Cc: netdev@vger.kernel.org, Jay Vosburgh <jv@jvosburgh.net>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Nikolay Aleksandrov <razor@blackwall.org>,
Simon Horman <horms@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>, Jianbo Liu <jianbol@nvidia.com>,
Steffen Klassert <steffen.klassert@secunet.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
Ayush Sawal <ayush.sawal@chelsio.com>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Sunil Goutham <sgoutham@marvell.com>,
Geetha sowjanya <gakula@marvell.com>,
Subbaraya Sundeep <sbhatta@marvell.com>,
hariprasad <hkelam@marvell.com>,
Bharat Bhushan <bbhushan2@marvell.com>,
Louis Peens <louis.peens@corigine.com>,
Leon Romanovsky <leonro@nvidia.com>,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net-next v3 5/6] bonding: Mark active offloaded xfrm_states
Date: Fri, 11 Apr 2025 09:08:58 +0000 [thread overview]
Message-ID: <Z_jcKjrrX1rsq-tA@fedora> (raw)
In-Reply-To: <20250411074958.2858496-6-cratiu@nvidia.com>
On Fri, Apr 11, 2025 at 10:49:57AM +0300, Cosmin Ratiu wrote:
> When the active link is changed for a bond device, the existing xfrm
> states need to be migrated over to the new link. This is done with:
> - bond_ipsec_del_sa_all() goes through the offloaded states list and
> removes all of them from hw.
> - bond_ipsec_add_sa_all() re-offloads all states to the new device.
>
> But because the offload status of xfrm states isn't marked in any way,
> there can be bugs.
>
> When all bond links are down, bond_ipsec_del_sa_all() unoffloads
> everything from the previous active link. If the same link then comes
> back up, nothing gets reoffloaded by bond_ipsec_add_sa_all().
> This results in a stack trace like this a bit later when user space
> removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to
> remove a rule that's no longer offloaded:
>
> [] Call Trace:
> [] <TASK>
> [] ? __warn+0x7d/0x110
> [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
> [] ? report_bug+0x16d/0x180
> [] ? handle_bug+0x4f/0x90
> [] ? exc_invalid_op+0x14/0x70
> [] ? asm_exc_invalid_op+0x16/0x20
> [] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core]
> [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
> [] bond_ipsec_del_sa+0x1ab/0x200 [bonding]
> [] xfrm_dev_state_delete+0x1f/0x60
> [] __xfrm_state_delete+0x196/0x200
> [] xfrm_state_delete+0x21/0x40
> [] xfrm_del_sa+0x69/0x110
> [] xfrm_user_rcv_msg+0x11d/0x300
> [] ? release_pages+0xca/0x140
> [] ? copy_to_user_tmpl.part.0+0x110/0x110
> [] netlink_rcv_skb+0x54/0x100
> [] xfrm_netlink_rcv+0x31/0x40
> [] netlink_unicast+0x1fc/0x2d0
> [] netlink_sendmsg+0x1e4/0x410
> [] __sock_sendmsg+0x38/0x60
> [] sock_write_iter+0x94/0xf0
> [] vfs_write+0x338/0x3f0
> [] ksys_write+0xba/0xd0
> [] do_syscall_64+0x4c/0x100
> [] entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
> There's also another theoretical bug:
> Calling bond_ipsec_del_sa_all() multiple times can result in corruption
> in the driver implementation if the double-free isn't tolerated. This
> isn't nice.
>
> Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm
> state was unoffloaded from a device, but a race with netdevsim's
> .xdo_dev_offload_ok() accessing real_dev was considered a sufficient
> reason to not set real_dev to NULL anymore. This unfortunately
> introduced the new bugs.
>
> Since .xdo_dev_offload_ok() was significantly refactored by [1] and
> there are no more users in the stack of xso.real_dev, that
> race is now gone and xs->xso.real_dev can now once again be used to
> represent which device (if any) currently holds the offloaded rule.
>
> Go one step further and set real_dev after add/before delete calls, to
> catch any future driver misuses of real_dev.
>
> [1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/
> Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/net/bonding/bond_main.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 4ba525a564c5..14f7c9712ad4 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -496,9 +496,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev,
> goto out;
> }
>
> - xs->xso.real_dev = real_dev;
> err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
> if (!err) {
> + xs->xso.real_dev = real_dev;
> ipsec->xs = xs;
> INIT_LIST_HEAD(&ipsec->list);
> mutex_lock(&bond->ipsec_lock);
> @@ -540,12 +540,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
> if (ipsec->xs->xso.real_dev == real_dev)
> continue;
>
> - ipsec->xs->xso.real_dev = real_dev;
> if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
> ipsec->xs, NULL)) {
> slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
> - ipsec->xs->xso.real_dev = NULL;
> + continue;
> }
> + ipsec->xs->xso.real_dev = real_dev;
> }
> out:
> mutex_unlock(&bond->ipsec_lock);
> @@ -629,6 +629,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> __func__);
> continue;
> }
> + ipsec->xs->xso.real_dev = NULL;
> real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
> ipsec->xs);
> if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> @@ -664,6 +665,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
>
> WARN_ON(xs->xso.real_dev != real_dev);
>
> + xs->xso.real_dev = NULL;
> if (real_dev && real_dev->xfrmdev_ops &&
> real_dev->xfrmdev_ops->xdo_dev_state_free)
> real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
> --
> 2.45.0
>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
next prev parent reply other threads:[~2025-04-11 9:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-11 7:49 [PATCH net-next v3 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
2025-04-11 7:49 ` [PATCH net-next v3 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily Cosmin Ratiu
2025-04-11 7:49 ` [PATCH net-next v3 2/6] xfrm: Use xdo.dev instead of xdo.real_dev Cosmin Ratiu
2025-04-11 7:49 ` [PATCH net-next v3 3/6] xfrm: Remove unneeded device check from validate_xmit_xfrm Cosmin Ratiu
2025-04-11 7:49 ` [PATCH net-next v3 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Cosmin Ratiu
2025-04-11 7:49 ` [PATCH net-next v3 5/6] bonding: Mark active offloaded xfrm_states Cosmin Ratiu
2025-04-11 9:08 ` Hangbin Liu [this message]
2025-04-11 7:49 ` [PATCH net-next v3 6/6] bonding: Fix multiple long standing offload races Cosmin Ratiu
2025-04-11 9:09 ` Hangbin Liu
2025-04-11 8:21 ` [PATCH net-next v3 0/6] xfrm & bonding: Correct use of xso.real_dev Nikolay Aleksandrov
2025-04-14 10:11 ` Steffen Klassert
2025-04-14 16:51 ` Jakub Kicinski
2025-04-16 9:05 ` Steffen Klassert
2025-04-17 8:57 ` Steffen Klassert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_jcKjrrX1rsq-tA@fedora \
--to=liuhangbin@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=anthony.l.nguyen@intel.com \
--cc=ayush.sawal@chelsio.com \
--cc=bbhushan2@marvell.com \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gakula@marvell.com \
--cc=herbert@gondor.apana.org.au \
--cc=hkelam@marvell.com \
--cc=horms@kernel.org \
--cc=jianbol@nvidia.com \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kselftest@vger.kernel.org \
--cc=louis.peens@corigine.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=razor@blackwall.org \
--cc=saeedm@nvidia.com \
--cc=sbhatta@marvell.com \
--cc=sgoutham@marvell.com \
--cc=steffen.klassert@secunet.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).