netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev
@ 2025-04-07 13:35 Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily Cosmin Ratiu
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-07 13:35 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

This patch series was motivated by fixing a few bugs in the bonding
driver related to xfrm state migration on device failover.

struct xfrm_dev_offload has two net_device pointers: dev and real_dev.
The first one is the device the xfrm_state is offloaded on and the
second one is used by the bonding driver to manage the underlying device
xfrm_states are actually offloaded on. When bonding isn't used, the two
pointers are the same.

This causes confusion in drivers: Which device pointer should they use?
If they want to support bonding, they need to only use real_dev and
never look at dev.

Furthermore, real_dev is used without proper locking from multiple code
paths and changing it is dangerous. See commit [1] for example.

This patch series clears things out by removing all uses of real_dev
from outside the bonding driver.
Then, the bonding driver is refactored to fix a couple of long standing
races and the original bug which motivated this patch series.

[1] commit f8cde9805981 ("bonding: fix xfrm real_dev null pointer
dereference")

Cosmin Ratiu (6):
Cleaning up unnecessary uses of xso.real_dev:
  net/mlx5: Avoid using xso.real_dev unnecessarily
  xfrm: Use xdo.dev instead of xdo.real_dev
  xfrm: Remove unneeded device check from validate_xmit_xfrm
Refactoring device operations to get an explicit device pointer:
  xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
Fixing a bonding xfrm state migration bug:
  bonding: Mark active offloaded xfrm_states
Fixing long standing races in bonding:
  bonding: Fix multiple long standing offload races

 Documentation/networking/xfrm_device.rst      | 10 +-
 drivers/net/bonding/bond_main.c               | 93 +++++++++++--------
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 20 ++--
 .../inline_crypto/ch_ipsec/chcr_ipsec.c       | 18 ++--
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    | 40 ++++----
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    | 20 ++--
 .../marvell/octeontx2/nic/cn10k_ipsec.c       | 18 ++--
 .../mellanox/mlx5/core/en_accel/ipsec.c       | 28 +++---
 .../mellanox/mlx5/core/en_accel/ipsec.h       |  1 +
 .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +--
 drivers/net/netdevsim/ipsec.c                 | 15 ++-
 include/linux/netdevice.h                     | 10 +-
 include/net/xfrm.h                            |  8 ++
 net/xfrm/xfrm_device.c                        | 13 +--
 net/xfrm/xfrm_state.c                         | 16 ++--
 15 files changed, 175 insertions(+), 146 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily
  2025-04-07 13:35 [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
@ 2025-04-07 13:35 ` Cosmin Ratiu
  2025-04-08 17:06   ` Tariq Toukan
  2025-04-07 13:35 ` [PATCH net-next 2/6] xfrm: Use xdo.dev instead of xdo.real_dev Cosmin Ratiu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-07 13:35 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

xso.real_dev is the active device of an offloaded xfrm state and is
managed by bonding. As such, it's subject to change when states are
migrated to a new device. Using it in places other than
offloading/unoffloading the states is risky.

This commit saves the device into the driver-specific struct
mlx5e_ipsec_sa_entry and switches mlx5e_ipsec_init_macs() and
mlx5e_ipsec_netevent_event() to make use of it.

Additionally, mlx5e_xfrm_update_stats() used xso.real_dev to validate
that correct net locks are held. But in a bonding config, the net of the
master device is the same as the underlying devices, and the net is
already a local var, so use that instead.

The only remaining references to xso.real_dev are now in the
.xdo_dev_state_add() / .xdo_dev_state_delete() path.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 16 +++++-----------
 .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h |  1 +
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 2dd842aac6fc..626e525c0f0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -259,8 +259,7 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry,
 				  struct mlx5_accel_esp_xfrm_attrs *attrs)
 {
 	struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry);
-	struct xfrm_state *x = sa_entry->x;
-	struct net_device *netdev;
+	struct net_device *netdev = sa_entry->dev;
 	struct neighbour *n;
 	u8 addr[ETH_ALEN];
 	const void *pkey;
@@ -270,8 +269,6 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry,
 	    attrs->type != XFRM_DEV_OFFLOAD_PACKET)
 		return;
 
-	netdev = x->xso.real_dev;
-
 	mlx5_query_mac_address(mdev, addr);
 	switch (attrs->dir) {
 	case XFRM_DEV_OFFLOAD_IN:
@@ -713,6 +710,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
 		return -ENOMEM;
 
 	sa_entry->x = x;
+	sa_entry->dev = netdev;
 	sa_entry->ipsec = ipsec;
 	/* Check if this SA is originated from acquire flow temporary SA */
 	if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
@@ -855,8 +853,6 @@ static int mlx5e_ipsec_netevent_event(struct notifier_block *nb,
 	struct mlx5e_ipsec_sa_entry *sa_entry;
 	struct mlx5e_ipsec *ipsec;
 	struct neighbour *n = ptr;
-	struct net_device *netdev;
-	struct xfrm_state *x;
 	unsigned long idx;
 
 	if (event != NETEVENT_NEIGH_UPDATE || !(n->nud_state & NUD_VALID))
@@ -876,11 +872,9 @@ static int mlx5e_ipsec_netevent_event(struct notifier_block *nb,
 				continue;
 		}
 
-		x = sa_entry->x;
-		netdev = x->xso.real_dev;
 		data = sa_entry->work->data;
 
-		neigh_ha_snapshot(data->addr, n, netdev);
+		neigh_ha_snapshot(data->addr, n, sa_entry->dev);
 		queue_work(ipsec->wq, &sa_entry->work->work);
 	}
 
@@ -996,8 +990,8 @@ static void mlx5e_xfrm_update_stats(struct xfrm_state *x)
 	size_t headers;
 
 	lockdep_assert(lockdep_is_held(&x->lock) ||
-		       lockdep_is_held(&dev_net(x->xso.real_dev)->xfrm.xfrm_cfg_mutex) ||
-		       lockdep_is_held(&dev_net(x->xso.real_dev)->xfrm.xfrm_state_lock));
+		       lockdep_is_held(&net->xfrm.xfrm_cfg_mutex) ||
+		       lockdep_is_held(&net->xfrm.xfrm_state_lock));
 
 	if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
 		return;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index a63c2289f8af..ffcd0cdeb775 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -274,6 +274,7 @@ struct mlx5e_ipsec_limits {
 struct mlx5e_ipsec_sa_entry {
 	struct mlx5e_ipsec_esn_state esn_state;
 	struct xfrm_state *x;
+	struct net_device *dev;
 	struct mlx5e_ipsec *ipsec;
 	struct mlx5_accel_esp_xfrm_attrs attrs;
 	void (*set_iv_op)(struct sk_buff *skb, struct xfrm_state *x,
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/6] xfrm: Use xdo.dev instead of xdo.real_dev
  2025-04-07 13:35 [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily Cosmin Ratiu
@ 2025-04-07 13:35 ` Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 3/6] xfrm: Remove unneeded device check from validate_xmit_xfrm Cosmin Ratiu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-07 13:35 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

The policy offload struct was reused from the state offload and
real_dev was copied from dev, but it was never set to anything else.
Simplify the code by always using xdo.dev for policies.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 2 +-
 net/xfrm/xfrm_device.c                                   | 2 --
 net/xfrm/xfrm_state.c                                    | 2 --
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 626e525c0f0d..0dfbbe21936f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -1164,7 +1164,7 @@ mlx5e_ipsec_build_accel_pol_attrs(struct mlx5e_ipsec_pol_entry *pol_entry,
 static int mlx5e_xfrm_add_policy(struct xfrm_policy *x,
 				 struct netlink_ext_ack *extack)
 {
-	struct net_device *netdev = x->xdo.real_dev;
+	struct net_device *netdev = x->xdo.dev;
 	struct mlx5e_ipsec_pol_entry *pol_entry;
 	struct mlx5e_priv *priv;
 	int err;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index d62f76161d83..4f4165ff738d 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -378,7 +378,6 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
 
 	xdo->dev = dev;
 	netdev_tracker_alloc(dev, &xdo->dev_tracker, GFP_ATOMIC);
-	xdo->real_dev = dev;
 	xdo->type = XFRM_DEV_OFFLOAD_PACKET;
 	switch (dir) {
 	case XFRM_POLICY_IN:
@@ -400,7 +399,6 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
 	err = dev->xfrmdev_ops->xdo_dev_policy_add(xp, extack);
 	if (err) {
 		xdo->dev = NULL;
-		xdo->real_dev = NULL;
 		xdo->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 		xdo->dir = 0;
 		netdev_put(dev, &xdo->dev_tracker);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index d896c3fefb07..33d1f5679e8b 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1555,7 +1555,6 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 			xso->type = XFRM_DEV_OFFLOAD_PACKET;
 			xso->dir = xdo->dir;
 			xso->dev = xdo->dev;
-			xso->real_dev = xdo->real_dev;
 			xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ;
 			netdev_hold(xso->dev, &xso->dev_tracker, GFP_ATOMIC);
 			error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL);
@@ -1563,7 +1562,6 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 				xso->dir = 0;
 				netdev_put(xso->dev, &xso->dev_tracker);
 				xso->dev = NULL;
-				xso->real_dev = NULL;
 				xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 				x->km.state = XFRM_STATE_DEAD;
 				to_put = x;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/6] xfrm: Remove unneeded device check from validate_xmit_xfrm
  2025-04-07 13:35 [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 2/6] xfrm: Use xdo.dev instead of xdo.real_dev Cosmin Ratiu
@ 2025-04-07 13:35 ` Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Cosmin Ratiu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-07 13:35 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

validate_xmit_xfrm checks whether a packet already passed through it on
the master device (xso.dev) and skips processing the skb again on the
slave device (xso.real_dev).

This check was added in commit [1] to avoid tx packets on a bond device
pass through xfrm twice and get two sets of headers, but the check was
soon obsoleted by commit [2], which was added around the same time to
fix a similar but unrelated problem. Commit [3] set XFRM_XMIT only when
packets are hw offloaded.

xso.dev is usually equal to xso.real_dev, unless bonding is used, in
which case the bonding driver uses xso.real_dev to manage offloaded xfrm
states.

Since commit [3], the check added in commit [1] is unused on all cases,
since packets going through validate_xmit_xfrm twice bail out on the
check added in commit [2]. Here's a breakdown of relevant scenarios:

1. ESP offload off: validate_xmit_xfrm returns early on !xo.
2. ESP offload on, no bond: skb->dev == xso.real_dev == xso.dev.
3. ESP offload on, bond, xs on bond dev: 1st pass adds XFRM_XMIT, 2nd
   pass returns early on XFRM_XMIT.
3. ESP offload on, bond, xs on slave dev: 1st pass returns early on
   !xo, 2nd pass adds XFRM_XMIT.
4. ESP offload on, bond, xs on both bond AND slave dev: only 1 offload
   possible in secpath. Either 1st pass adds XFRM_XMIT and 2nd pass returns
   early on XFRM_XMIT, or 1st pass is sw and returns early on !xo.
6. ESP offload on, crypto fallback triggered in esp_xmit/esp6_xmit: 1st
   pass does sw crypto & secpath_reset, 2nd pass returns on !xo.

This commit removes the unnecessary check, so xso.real_dev becomes what
it is in practice: a private field managed by bonding driver.
The check immediately below that can be simplified as well.

[1] commit 272c2330adc9 ("xfrm: bail early on slave pass over skb")
[2] commit 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in
IPsec crypto offload.")
[3] commit c7dbf4c08868 ("xfrm: Provide private skb extensions for
segmented and hw offloaded ESP packets")

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/xfrm/xfrm_device.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 4f4165ff738d..0be5f7ffd019 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -145,10 +145,6 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
 		return NULL;
 	}
 
-	/* This skb was already validated on the upper/virtual dev */
-	if ((x->xso.dev != dev) && (x->xso.real_dev == dev))
-		return skb;
-
 	local_irq_save(flags);
 	sd = this_cpu_ptr(&softnet_data);
 	err = !skb_queue_empty(&sd->xfrm_backlog);
@@ -159,8 +155,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
 		return skb;
 	}
 
-	if (skb_is_gso(skb) && (unlikely(x->xso.dev != dev) ||
-				unlikely(xmit_xfrm_check_overflow(skb)))) {
+	if (skb_is_gso(skb) && unlikely(xmit_xfrm_check_overflow(skb))) {
 		struct sk_buff *segs;
 
 		/* Packet got rerouted, fixup features and segment it. */
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  2025-04-07 13:35 [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
                   ` (2 preceding siblings ...)
  2025-04-07 13:35 ` [PATCH net-next 3/6] xfrm: Remove unneeded device check from validate_xmit_xfrm Cosmin Ratiu
@ 2025-04-07 13:35 ` Cosmin Ratiu
  2025-04-08 23:20   ` Jakub Kicinski
  2025-04-09 12:30   ` kernel test robot
  2025-04-07 13:35 ` [PATCH net-next 5/6] bonding: Mark active offloaded xfrm_states Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 6/6] bonding: Fix multiple long standing offload races Cosmin Ratiu
  5 siblings, 2 replies; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-07 13:35 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

Previously, device driver IPSec offload implementations would fall into
two categories:
1. Those that used xso.dev to determine the offload device.
2. Those that used xso.real_dev to determine the offload device.

The first category didn't work with bonding while the second did.
In a non-bonding setup the two pointers are the same.

This commit adds explicit pointers for the offload netdevice to
.xdo_dev_state_add() / .xdo_dev_state_delete() / .xdo_dev_state_free()
which eliminates the confusion and allows drivers from the first
category to work with bonding.

xso.real_dev now becomes a private pointer managed by the bonding
driver.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/networking/xfrm_device.rst      | 10 +++--
 drivers/net/bonding/bond_main.c               | 31 +++++++-------
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 20 +++++-----
 .../inline_crypto/ch_ipsec/chcr_ipsec.c       | 18 ++++++---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    | 40 ++++++++++---------
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    | 20 ++++++----
 .../marvell/octeontx2/nic/cn10k_ipsec.c       | 18 ++++-----
 .../mellanox/mlx5/core/en_accel/ipsec.c       | 12 +++---
 .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +++--
 drivers/net/netdevsim/ipsec.c                 | 15 ++++---
 include/linux/netdevice.h                     | 10 +++--
 include/net/xfrm.h                            |  8 ++++
 net/xfrm/xfrm_device.c                        |  4 +-
 net/xfrm/xfrm_state.c                         | 14 ++++---
 14 files changed, 132 insertions(+), 99 deletions(-)

diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst
index 7f24c09f2694..122204da0fff 100644
--- a/Documentation/networking/xfrm_device.rst
+++ b/Documentation/networking/xfrm_device.rst
@@ -65,9 +65,13 @@ Callbacks to implement
   /* from include/linux/netdevice.h */
   struct xfrmdev_ops {
         /* Crypto and Packet offload callbacks */
-	int	(*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack);
-	void	(*xdo_dev_state_delete) (struct xfrm_state *x);
-	void	(*xdo_dev_state_free) (struct xfrm_state *x);
+	int	(*xdo_dev_state_add)(struct net_device *dev,
+                                     struct xfrm_state *x,
+                                     struct netlink_ext_ack *extack);
+	void	(*xdo_dev_state_delete)(struct net_device *dev,
+                                        struct xfrm_state *x);
+	void	(*xdo_dev_state_free)(struct net_device *dev,
+                                      struct xfrm_state *x);
 	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
 				       struct xfrm_state *x);
 	void    (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 950d8e4d86f8..2ffde8f672c2 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -456,10 +456,10 @@ static struct net_device *bond_ipsec_dev(struct xfrm_state *xs)
  * @xs: pointer to transformer state struct
  * @extack: extack point to fill failure reason
  **/
-static int bond_ipsec_add_sa(struct xfrm_state *xs,
+static int bond_ipsec_add_sa(struct net_device *bond_dev,
+			     struct xfrm_state *xs,
 			     struct netlink_ext_ack *extack)
 {
-	struct net_device *bond_dev = xs->xso.dev;
 	struct net_device *real_dev;
 	netdevice_tracker tracker;
 	struct bond_ipsec *ipsec;
@@ -496,7 +496,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
 	}
 
 	xs->xso.real_dev = real_dev;
-	err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
+	err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
 	if (!err) {
 		ipsec->xs = xs;
 		INIT_LIST_HEAD(&ipsec->list);
@@ -540,7 +540,8 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 			continue;
 
 		ipsec->xs->xso.real_dev = real_dev;
-		if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
+		if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
+							     ipsec->xs, NULL)) {
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			ipsec->xs->xso.real_dev = NULL;
 		}
@@ -553,9 +554,9 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
  * bond_ipsec_del_sa - clear out this specific SA
  * @xs: pointer to transformer state struct
  **/
-static void bond_ipsec_del_sa(struct xfrm_state *xs)
+static void bond_ipsec_del_sa(struct net_device *bond_dev,
+			      struct xfrm_state *xs)
 {
-	struct net_device *bond_dev = xs->xso.dev;
 	struct net_device *real_dev;
 	netdevice_tracker tracker;
 	struct bond_ipsec *ipsec;
@@ -587,7 +588,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 		goto out;
 	}
 
-	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+	real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);
 out:
 	netdev_put(real_dev, &tracker);
 	mutex_lock(&bond->ipsec_lock);
@@ -624,18 +625,20 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 			slave_warn(bond_dev, real_dev,
 				   "%s: no slave xdo_dev_state_delete\n",
 				   __func__);
-		} else {
-			real_dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
-			if (real_dev->xfrmdev_ops->xdo_dev_state_free)
-				real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
+			continue;
 		}
+		real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
+							    ipsec->xs);
+		if (real_dev->xfrmdev_ops->xdo_dev_state_free)
+			real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev,
+								  ipsec->xs);
 	}
 	mutex_unlock(&bond->ipsec_lock);
 }
 
-static void bond_ipsec_free_sa(struct xfrm_state *xs)
+static void bond_ipsec_free_sa(struct net_device *bond_dev,
+			       struct xfrm_state *xs)
 {
-	struct net_device *bond_dev = xs->xso.dev;
 	struct net_device *real_dev;
 	netdevice_tracker tracker;
 	struct bonding *bond;
@@ -661,7 +664,7 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
 
 	if (real_dev && real_dev->xfrmdev_ops &&
 	    real_dev->xfrmdev_ops->xdo_dev_state_free)
-		real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
+		real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
 out:
 	netdev_put(real_dev, &tracker);
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 551c279dc14b..51395c96b2e9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -6480,10 +6480,11 @@ static const struct tlsdev_ops cxgb4_ktls_ops = {
 
 #if IS_ENABLED(CONFIG_CHELSIO_IPSEC_INLINE)
 
-static int cxgb4_xfrm_add_state(struct xfrm_state *x,
+static int cxgb4_xfrm_add_state(struct net_device *dev,
+				struct xfrm_state *x,
 				struct netlink_ext_ack *extack)
 {
-	struct adapter *adap = netdev2adap(x->xso.dev);
+	struct adapter *adap = netdev2adap(dev);
 	int ret;
 
 	if (!mutex_trylock(&uld_mutex)) {
@@ -6494,7 +6495,8 @@ static int cxgb4_xfrm_add_state(struct xfrm_state *x,
 	if (ret)
 		goto out_unlock;
 
-	ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(x, extack);
+	ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(dev, x,
+									extack);
 
 out_unlock:
 	mutex_unlock(&uld_mutex);
@@ -6502,9 +6504,9 @@ static int cxgb4_xfrm_add_state(struct xfrm_state *x,
 	return ret;
 }
 
-static void cxgb4_xfrm_del_state(struct xfrm_state *x)
+static void cxgb4_xfrm_del_state(struct net_device *dev, struct xfrm_state *x)
 {
-	struct adapter *adap = netdev2adap(x->xso.dev);
+	struct adapter *adap = netdev2adap(dev);
 
 	if (!mutex_trylock(&uld_mutex)) {
 		dev_dbg(adap->pdev_dev,
@@ -6514,15 +6516,15 @@ static void cxgb4_xfrm_del_state(struct xfrm_state *x)
 	if (chcr_offload_state(adap, CXGB4_XFRMDEV_OPS))
 		goto out_unlock;
 
-	adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_delete(x);
+	adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_delete(dev, x);
 
 out_unlock:
 	mutex_unlock(&uld_mutex);
 }
 
-static void cxgb4_xfrm_free_state(struct xfrm_state *x)
+static void cxgb4_xfrm_free_state(struct net_device *dev, struct xfrm_state *x)
 {
-	struct adapter *adap = netdev2adap(x->xso.dev);
+	struct adapter *adap = netdev2adap(dev);
 
 	if (!mutex_trylock(&uld_mutex)) {
 		dev_dbg(adap->pdev_dev,
@@ -6532,7 +6534,7 @@ static void cxgb4_xfrm_free_state(struct xfrm_state *x)
 	if (chcr_offload_state(adap, CXGB4_XFRMDEV_OPS))
 		goto out_unlock;
 
-	adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_free(x);
+	adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_free(dev, x);
 
 out_unlock:
 	mutex_unlock(&uld_mutex);
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
index baba96883f48..ecd9a0bd5e18 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
@@ -75,9 +75,12 @@ static int ch_ipsec_uld_state_change(void *handle, enum cxgb4_state new_state);
 static int ch_ipsec_xmit(struct sk_buff *skb, struct net_device *dev);
 static void *ch_ipsec_uld_add(const struct cxgb4_lld_info *infop);
 static void ch_ipsec_advance_esn_state(struct xfrm_state *x);
-static void ch_ipsec_xfrm_free_state(struct xfrm_state *x);
-static void ch_ipsec_xfrm_del_state(struct xfrm_state *x);
-static int ch_ipsec_xfrm_add_state(struct xfrm_state *x,
+static void ch_ipsec_xfrm_free_state(struct net_device *dev,
+				     struct xfrm_state *x);
+static void ch_ipsec_xfrm_del_state(struct net_device *dev,
+				    struct xfrm_state *x);
+static int ch_ipsec_xfrm_add_state(struct net_device *dev,
+				   struct xfrm_state *x,
 				   struct netlink_ext_ack *extack);
 
 static const struct xfrmdev_ops ch_ipsec_xfrmdev_ops = {
@@ -223,7 +226,8 @@ static int ch_ipsec_setkey(struct xfrm_state *x,
  * returns 0 on success, negative error if failed to send message to FPGA
  * positive error if FPGA returned a bad response
  */
-static int ch_ipsec_xfrm_add_state(struct xfrm_state *x,
+static int ch_ipsec_xfrm_add_state(struct net_device *dev,
+				   struct xfrm_state *x,
 				   struct netlink_ext_ack *extack)
 {
 	struct ipsec_sa_entry *sa_entry;
@@ -302,14 +306,16 @@ static int ch_ipsec_xfrm_add_state(struct xfrm_state *x,
 	return res;
 }
 
-static void ch_ipsec_xfrm_del_state(struct xfrm_state *x)
+static void ch_ipsec_xfrm_del_state(struct net_device *dev,
+				    struct xfrm_state *x)
 {
 	/* do nothing */
 	if (!x->xso.offload_handle)
 		return;
 }
 
-static void ch_ipsec_xfrm_free_state(struct xfrm_state *x)
+static void ch_ipsec_xfrm_free_state(struct net_device *dev,
+				     struct xfrm_state *x)
 {
 	struct ipsec_sa_entry *sa_entry;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 07ea1954a276..bfb26a78da85 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -9,7 +9,7 @@
 #define IXGBE_IPSEC_KEY_BITS  160
 static const char aes_gcm_name[] = "rfc4106(gcm(aes))";
 
-static void ixgbe_ipsec_del_sa(struct xfrm_state *xs);
+static void ixgbe_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs);
 
 /**
  * ixgbe_ipsec_set_tx_sa - set the Tx SA registers
@@ -321,7 +321,7 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
 
 		if (r->used) {
 			if (r->mode & IXGBE_RXTXMOD_VF)
-				ixgbe_ipsec_del_sa(r->xs);
+				ixgbe_ipsec_del_sa(adapter->netdev, r->xs);
 			else
 				ixgbe_ipsec_set_rx_sa(hw, i, r->xs->id.spi,
 						      r->key, r->salt,
@@ -330,7 +330,7 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
 
 		if (t->used) {
 			if (t->mode & IXGBE_RXTXMOD_VF)
-				ixgbe_ipsec_del_sa(t->xs);
+				ixgbe_ipsec_del_sa(adapter->netdev, t->xs);
 			else
 				ixgbe_ipsec_set_tx_sa(hw, i, t->key, t->salt);
 		}
@@ -424,10 +424,10 @@ static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec,
  * This copies the protocol keys and salt to our own data tables.  The
  * 82599 family only supports the one algorithm.
  **/
-static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
+static int ixgbe_ipsec_parse_proto_keys(struct net_device *dev,
+					struct xfrm_state *xs,
 					u32 *mykey, u32 *mysalt)
 {
-	struct net_device *dev = xs->xso.real_dev;
 	unsigned char *key_data;
 	char *alg_name = NULL;
 	int key_len;
@@ -473,11 +473,12 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
 
 /**
  * ixgbe_ipsec_check_mgmt_ip - make sure there is no clash with mgmt IP filters
+ * @dev: pointer to net device
  * @xs: pointer to transformer state struct
  **/
-static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
+static int ixgbe_ipsec_check_mgmt_ip(struct net_device *dev,
+				     struct xfrm_state *xs)
 {
-	struct net_device *dev = xs->xso.real_dev;
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 mfval, manc, reg;
@@ -556,13 +557,14 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 
 /**
  * ixgbe_ipsec_add_sa - program device with a security association
+ * @dev: pointer to device to program
  * @xs: pointer to transformer state struct
  * @extack: extack point to fill failure reason
  **/
-static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
+static int ixgbe_ipsec_add_sa(struct net_device *dev,
+			      struct xfrm_state *xs,
 			      struct netlink_ext_ack *extack)
 {
-	struct net_device *dev = xs->xso.real_dev;
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_ipsec *ipsec = adapter->ipsec;
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -581,7 +583,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
 		return -EINVAL;
 	}
 
-	if (ixgbe_ipsec_check_mgmt_ip(xs)) {
+	if (ixgbe_ipsec_check_mgmt_ip(dev, xs)) {
 		NL_SET_ERR_MSG_MOD(extack, "IPsec IP addr clash with mgmt filters");
 		return -EINVAL;
 	}
@@ -615,7 +617,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
 			rsa.decrypt = xs->ealg || xs->aead;
 
 		/* get the key and salt */
-		ret = ixgbe_ipsec_parse_proto_keys(xs, rsa.key, &rsa.salt);
+		ret = ixgbe_ipsec_parse_proto_keys(dev, xs, rsa.key, &rsa.salt);
 		if (ret) {
 			NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Rx SA table");
 			return ret;
@@ -724,7 +726,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
 		if (xs->id.proto & IPPROTO_ESP)
 			tsa.encrypt = xs->ealg || xs->aead;
 
-		ret = ixgbe_ipsec_parse_proto_keys(xs, tsa.key, &tsa.salt);
+		ret = ixgbe_ipsec_parse_proto_keys(dev, xs, tsa.key, &tsa.salt);
 		if (ret) {
 			NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Tx SA table");
 			memset(&tsa, 0, sizeof(tsa));
@@ -752,11 +754,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
 
 /**
  * ixgbe_ipsec_del_sa - clear out this specific SA
+ * @dev: pointer to device to program
  * @xs: pointer to transformer state struct
  **/
-static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
+static void ixgbe_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs)
 {
-	struct net_device *dev = xs->xso.real_dev;
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_ipsec *ipsec = adapter->ipsec;
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -841,7 +843,8 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf)
 			continue;
 		if (ipsec->rx_tbl[i].mode & IXGBE_RXTXMOD_VF &&
 		    ipsec->rx_tbl[i].vf == vf)
-			ixgbe_ipsec_del_sa(ipsec->rx_tbl[i].xs);
+			ixgbe_ipsec_del_sa(adapter->netdev,
+					   ipsec->rx_tbl[i].xs);
 	}
 
 	/* search tx sa table */
@@ -850,7 +853,8 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf)
 			continue;
 		if (ipsec->tx_tbl[i].mode & IXGBE_RXTXMOD_VF &&
 		    ipsec->tx_tbl[i].vf == vf)
-			ixgbe_ipsec_del_sa(ipsec->tx_tbl[i].xs);
+			ixgbe_ipsec_del_sa(adapter->netdev,
+					   ipsec->tx_tbl[i].xs);
 	}
 }
 
@@ -930,7 +934,7 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 	memcpy(xs->aead->alg_name, aes_gcm_name, sizeof(aes_gcm_name));
 
 	/* set up the HW offload */
-	err = ixgbe_ipsec_add_sa(xs, NULL);
+	err = ixgbe_ipsec_add_sa(adapter->netdev, xs, NULL);
 	if (err)
 		goto err_aead;
 
@@ -1034,7 +1038,7 @@ int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 		xs = ipsec->tx_tbl[sa_idx].xs;
 	}
 
-	ixgbe_ipsec_del_sa(xs);
+	ixgbe_ipsec_del_sa(adapter->netdev, xs);
 
 	/* remove the xs that was made-up in the add request */
 	kfree_sensitive(xs);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 8ba037e3d9c2..70002c89c884 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -208,10 +208,10 @@ struct xfrm_state *ixgbevf_ipsec_find_rx_state(struct ixgbevf_ipsec *ipsec,
  * This copies the protocol keys and salt to our own data tables.  The
  * 82599 family only supports the one algorithm.
  **/
-static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs,
+static int ixgbevf_ipsec_parse_proto_keys(struct net_device *dev,
+					  struct xfrm_state *xs,
 					  u32 *mykey, u32 *mysalt)
 {
-	struct net_device *dev = xs->xso.real_dev;
 	unsigned char *key_data;
 	char *alg_name = NULL;
 	int key_len;
@@ -256,13 +256,14 @@ static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs,
 
 /**
  * ixgbevf_ipsec_add_sa - program device with a security association
+ * @dev: pointer to net device to program
  * @xs: pointer to transformer state struct
  * @extack: extack point to fill failure reason
  **/
-static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
+static int ixgbevf_ipsec_add_sa(struct net_device *dev,
+				struct xfrm_state *xs,
 				struct netlink_ext_ack *extack)
 {
-	struct net_device *dev = xs->xso.real_dev;
 	struct ixgbevf_adapter *adapter;
 	struct ixgbevf_ipsec *ipsec;
 	u16 sa_idx;
@@ -310,7 +311,8 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
 			rsa.decrypt = xs->ealg || xs->aead;
 
 		/* get the key and salt */
-		ret = ixgbevf_ipsec_parse_proto_keys(xs, rsa.key, &rsa.salt);
+		ret = ixgbevf_ipsec_parse_proto_keys(dev, xs, rsa.key,
+						     &rsa.salt);
 		if (ret) {
 			NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Rx SA table");
 			return ret;
@@ -363,7 +365,8 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
 		if (xs->id.proto & IPPROTO_ESP)
 			tsa.encrypt = xs->ealg || xs->aead;
 
-		ret = ixgbevf_ipsec_parse_proto_keys(xs, tsa.key, &tsa.salt);
+		ret = ixgbevf_ipsec_parse_proto_keys(dev, xs, tsa.key,
+						     &tsa.salt);
 		if (ret) {
 			NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Tx SA table");
 			memset(&tsa, 0, sizeof(tsa));
@@ -389,10 +392,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
 /**
  * ixgbevf_ipsec_del_sa - clear out this specific SA
  * @xs: pointer to transformer state struct
+ * @dev: pointer to net device to program
  **/
-static void ixgbevf_ipsec_del_sa(struct xfrm_state *xs)
+static void ixgbevf_ipsec_del_sa(struct net_device *dev,
+				 struct xfrm_state *xs)
 {
-	struct net_device *dev = xs->xso.real_dev;
 	struct ixgbevf_adapter *adapter;
 	struct ixgbevf_ipsec *ipsec;
 	u16 sa_idx;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
index fc59e50bafce..a6500e3673f2 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
@@ -663,10 +663,10 @@ static int cn10k_ipsec_inb_add_state(struct xfrm_state *x,
 	return -EOPNOTSUPP;
 }
 
-static int cn10k_ipsec_outb_add_state(struct xfrm_state *x,
+static int cn10k_ipsec_outb_add_state(struct net_device *dev,
+				      struct xfrm_state *x,
 				      struct netlink_ext_ack *extack)
 {
-	struct net_device *netdev = x->xso.dev;
 	struct cn10k_tx_sa_s *sa_entry;
 	struct qmem *sa_info;
 	struct otx2_nic *pf;
@@ -676,7 +676,7 @@ static int cn10k_ipsec_outb_add_state(struct xfrm_state *x,
 	if (err)
 		return err;
 
-	pf = netdev_priv(netdev);
+	pf = netdev_priv(dev);
 
 	err = qmem_alloc(pf->dev, &sa_info, pf->ipsec.sa_size, OTX2_ALIGN);
 	if (err)
@@ -700,18 +700,18 @@ static int cn10k_ipsec_outb_add_state(struct xfrm_state *x,
 	return 0;
 }
 
-static int cn10k_ipsec_add_state(struct xfrm_state *x,
+static int cn10k_ipsec_add_state(struct net_device *dev,
+				 struct xfrm_state *x,
 				 struct netlink_ext_ack *extack)
 {
 	if (x->xso.dir == XFRM_DEV_OFFLOAD_IN)
 		return cn10k_ipsec_inb_add_state(x, extack);
 	else
-		return cn10k_ipsec_outb_add_state(x, extack);
+		return cn10k_ipsec_outb_add_state(dev, x, extack);
 }
 
-static void cn10k_ipsec_del_state(struct xfrm_state *x)
+static void cn10k_ipsec_del_state(struct net_device *dev, struct xfrm_state *x)
 {
-	struct net_device *netdev = x->xso.dev;
 	struct cn10k_tx_sa_s *sa_entry;
 	struct qmem *sa_info;
 	struct otx2_nic *pf;
@@ -720,7 +720,7 @@ static void cn10k_ipsec_del_state(struct xfrm_state *x)
 	if (x->xso.dir == XFRM_DEV_OFFLOAD_IN)
 		return;
 
-	pf = netdev_priv(netdev);
+	pf = netdev_priv(dev);
 
 	sa_info = (struct qmem *)x->xso.offload_handle;
 	sa_entry = (struct cn10k_tx_sa_s *)sa_info->base;
@@ -732,7 +732,7 @@ static void cn10k_ipsec_del_state(struct xfrm_state *x)
 
 	err = cn10k_outb_write_sa(pf, sa_info);
 	if (err)
-		netdev_err(netdev, "Error (%d) deleting SA\n", err);
+		netdev_err(dev, "Error (%d) deleting SA\n", err);
 
 	x->xso.offload_handle = 0;
 	qmem_free(pf->dev, sa_info);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 0dfbbe21936f..77f61cd28a79 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -689,17 +689,17 @@ static int mlx5e_ipsec_create_dwork(struct mlx5e_ipsec_sa_entry *sa_entry)
 	return 0;
 }
 
-static int mlx5e_xfrm_add_state(struct xfrm_state *x,
+static int mlx5e_xfrm_add_state(struct net_device *dev,
+				struct xfrm_state *x,
 				struct netlink_ext_ack *extack)
 {
 	struct mlx5e_ipsec_sa_entry *sa_entry = NULL;
-	struct net_device *netdev = x->xso.real_dev;
 	struct mlx5e_ipsec *ipsec;
 	struct mlx5e_priv *priv;
 	gfp_t gfp;
 	int err;
 
-	priv = netdev_priv(netdev);
+	priv = netdev_priv(dev);
 	if (!priv->ipsec)
 		return -EOPNOTSUPP;
 
@@ -710,7 +710,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
 		return -ENOMEM;
 
 	sa_entry->x = x;
-	sa_entry->dev = netdev;
+	sa_entry->dev = dev;
 	sa_entry->ipsec = ipsec;
 	/* Check if this SA is originated from acquire flow temporary SA */
 	if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
@@ -807,7 +807,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
 	return err;
 }
 
-static void mlx5e_xfrm_del_state(struct xfrm_state *x)
+static void mlx5e_xfrm_del_state(struct net_device *dev, struct xfrm_state *x)
 {
 	struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
 	struct mlx5e_ipsec *ipsec = sa_entry->ipsec;
@@ -820,7 +820,7 @@ static void mlx5e_xfrm_del_state(struct xfrm_state *x)
 	WARN_ON(old != sa_entry);
 }
 
-static void mlx5e_xfrm_free_state(struct xfrm_state *x)
+static void mlx5e_xfrm_free_state(struct net_device *dev, struct xfrm_state *x)
 {
 	struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
 	struct mlx5e_ipsec *ipsec = sa_entry->ipsec;
diff --git a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
index 671af5d4c5d2..9e7c285eaa6b 100644
--- a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
+++ b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
@@ -266,17 +266,17 @@ static void set_sha2_512hmac(struct nfp_ipsec_cfg_add_sa *cfg, int *trunc_len)
 	}
 }
 
-static int nfp_net_xfrm_add_state(struct xfrm_state *x,
+static int nfp_net_xfrm_add_state(struct net_device *dev,
+				  struct xfrm_state *x,
 				  struct netlink_ext_ack *extack)
 {
-	struct net_device *netdev = x->xso.real_dev;
 	struct nfp_ipsec_cfg_mssg msg = {};
 	int i, key_len, trunc_len, err = 0;
 	struct nfp_ipsec_cfg_add_sa *cfg;
 	struct nfp_net *nn;
 	unsigned int saidx;
 
-	nn = netdev_priv(netdev);
+	nn = netdev_priv(dev);
 	cfg = &msg.cfg_add_sa;
 
 	/* General */
@@ -546,17 +546,16 @@ static int nfp_net_xfrm_add_state(struct xfrm_state *x,
 	return 0;
 }
 
-static void nfp_net_xfrm_del_state(struct xfrm_state *x)
+static void nfp_net_xfrm_del_state(struct net_device *dev, struct xfrm_state *x)
 {
 	struct nfp_ipsec_cfg_mssg msg = {
 		.cmd = NFP_IPSEC_CFG_MSSG_INV_SA,
 		.sa_idx = x->xso.offload_handle - 1,
 	};
-	struct net_device *netdev = x->xso.real_dev;
 	struct nfp_net *nn;
 	int err;
 
-	nn = netdev_priv(netdev);
+	nn = netdev_priv(dev);
 	err = nfp_net_sched_mbox_amsg_work(nn, NFP_NET_CFG_MBOX_CMD_IPSEC, &msg,
 					   sizeof(msg), nfp_net_ipsec_cfg);
 	if (err)
diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c
index d88bdb9a1717..47cdee5577d4 100644
--- a/drivers/net/netdevsim/ipsec.c
+++ b/drivers/net/netdevsim/ipsec.c
@@ -85,11 +85,11 @@ static int nsim_ipsec_find_empty_idx(struct nsim_ipsec *ipsec)
 	return -ENOSPC;
 }
 
-static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs,
+static int nsim_ipsec_parse_proto_keys(struct net_device *dev,
+				       struct xfrm_state *xs,
 				       u32 *mykey, u32 *mysalt)
 {
 	const char aes_gcm_name[] = "rfc4106(gcm(aes))";
-	struct net_device *dev = xs->xso.real_dev;
 	unsigned char *key_data;
 	char *alg_name = NULL;
 	int key_len;
@@ -129,17 +129,16 @@ static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs,
 	return 0;
 }
 
-static int nsim_ipsec_add_sa(struct xfrm_state *xs,
+static int nsim_ipsec_add_sa(struct net_device *dev,
+			     struct xfrm_state *xs,
 			     struct netlink_ext_ack *extack)
 {
 	struct nsim_ipsec *ipsec;
-	struct net_device *dev;
 	struct netdevsim *ns;
 	struct nsim_sa sa;
 	u16 sa_idx;
 	int ret;
 
-	dev = xs->xso.real_dev;
 	ns = netdev_priv(dev);
 	ipsec = &ns->ipsec;
 
@@ -174,7 +173,7 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs,
 		sa.crypt = xs->ealg || xs->aead;
 
 	/* get the key and salt */
-	ret = nsim_ipsec_parse_proto_keys(xs, sa.key, &sa.salt);
+	ret = nsim_ipsec_parse_proto_keys(dev, xs, sa.key, &sa.salt);
 	if (ret) {
 		NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for SA table");
 		return ret;
@@ -200,9 +199,9 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs,
 	return 0;
 }
 
-static void nsim_ipsec_del_sa(struct xfrm_state *xs)
+static void nsim_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs)
 {
-	struct netdevsim *ns = netdev_priv(xs->xso.real_dev);
+	struct netdevsim *ns = netdev_priv(dev);
 	struct nsim_ipsec *ipsec = &ns->ipsec;
 	u16 sa_idx;
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa79145518d1..4f8163783956 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1012,9 +1012,13 @@ struct netdev_bpf {
 
 #ifdef CONFIG_XFRM_OFFLOAD
 struct xfrmdev_ops {
-	int	(*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack);
-	void	(*xdo_dev_state_delete) (struct xfrm_state *x);
-	void	(*xdo_dev_state_free) (struct xfrm_state *x);
+	int	(*xdo_dev_state_add)(struct net_device *dev,
+				     struct xfrm_state *x,
+				     struct netlink_ext_ack *extack);
+	void	(*xdo_dev_state_delete)(struct net_device *dev,
+					struct xfrm_state *x);
+	void	(*xdo_dev_state_free)(struct net_device *dev,
+				      struct xfrm_state *x);
 	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
 				       struct xfrm_state *x);
 	void	(*xdo_dev_state_advance_esn) (struct xfrm_state *x);
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 39365fd2ea17..3d2f6c879311 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -147,8 +147,16 @@ enum {
 };
 
 struct xfrm_dev_offload {
+	/* The device for this offload.
+	 * Device drivers should not use this directly, as that will prevent
+	 * them from working with bonding device. Instead, the device passed
+	 * to the add/delete callbacks should be used.
+	 */
 	struct net_device	*dev;
 	netdevice_tracker	dev_tracker;
+	/* This is a private pointer used by the bonding driver.
+	 * Device drivers should not use it.
+	 */
 	struct net_device	*real_dev;
 	unsigned long		offload_handle;
 	u8			dir : 2;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 0be5f7ffd019..3be0139373f7 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -309,7 +309,6 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 
 	xso->dev = dev;
 	netdev_tracker_alloc(dev, &xso->dev_tracker, GFP_ATOMIC);
-	xso->real_dev = dev;
 
 	if (xuo->flags & XFRM_OFFLOAD_INBOUND)
 		xso->dir = XFRM_DEV_OFFLOAD_IN;
@@ -321,11 +320,10 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	else
 		xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
 
-	err = dev->xfrmdev_ops->xdo_dev_state_add(x, extack);
+	err = dev->xfrmdev_ops->xdo_dev_state_add(dev, x, extack);
 	if (err) {
 		xso->dev = NULL;
 		xso->dir = 0;
-		xso->real_dev = NULL;
 		netdev_put(dev, &xso->dev_tracker);
 		xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 33d1f5679e8b..e9b6e25508f6 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -767,7 +767,7 @@ void xfrm_dev_state_delete(struct xfrm_state *x)
 	struct net_device *dev = READ_ONCE(xso->dev);
 
 	if (dev) {
-		dev->xfrmdev_ops->xdo_dev_state_delete(x);
+		dev->xfrmdev_ops->xdo_dev_state_delete(dev, x);
 		spin_lock_bh(&xfrm_state_dev_gc_lock);
 		hlist_add_head(&x->dev_gclist, &xfrm_state_dev_gc_list);
 		spin_unlock_bh(&xfrm_state_dev_gc_lock);
@@ -789,7 +789,7 @@ void xfrm_dev_state_free(struct xfrm_state *x)
 		spin_unlock_bh(&xfrm_state_dev_gc_lock);
 
 		if (dev->xfrmdev_ops->xdo_dev_state_free)
-			dev->xfrmdev_ops->xdo_dev_state_free(x);
+			dev->xfrmdev_ops->xdo_dev_state_free(dev, x);
 		WRITE_ONCE(xso->dev, NULL);
 		xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 		netdev_put(dev, &xso->dev_tracker);
@@ -1551,16 +1551,18 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) {
 			struct xfrm_dev_offload *xdo = &pol->xdo;
 			struct xfrm_dev_offload *xso = &x->xso;
+			struct net_device *dev = xdo->dev;
 
 			xso->type = XFRM_DEV_OFFLOAD_PACKET;
 			xso->dir = xdo->dir;
-			xso->dev = xdo->dev;
+			xso->dev = dev;
 			xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ;
-			netdev_hold(xso->dev, &xso->dev_tracker, GFP_ATOMIC);
-			error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL);
+			netdev_hold(dev, &xso->dev_tracker, GFP_ATOMIC);
+			error = dev->xfrmdev_ops->xdo_dev_state_add(dev, x,
+								    NULL);
 			if (error) {
 				xso->dir = 0;
-				netdev_put(xso->dev, &xso->dev_tracker);
+				netdev_put(dev, &xso->dev_tracker);
 				xso->dev = NULL;
 				xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 				x->km.state = XFRM_STATE_DEAD;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 5/6] bonding: Mark active offloaded xfrm_states
  2025-04-07 13:35 [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
                   ` (3 preceding siblings ...)
  2025-04-07 13:35 ` [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Cosmin Ratiu
@ 2025-04-07 13:35 ` Cosmin Ratiu
  2025-04-07 13:35 ` [PATCH net-next 6/6] bonding: Fix multiple long standing offload races Cosmin Ratiu
  5 siblings, 0 replies; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-07 13:35 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

When the active link is changed for a bond device, the existing xfrm
states need to be migrated over to the new link. This is done with:
- bond_ipsec_del_sa_all() goes through the offloaded states list and
  removes all of them from hw.
- bond_ipsec_add_sa_all() re-offloads all states to the new device.

But because the offload status of xfrm states isn't marked in any way,
there can be bugs.

When all bond links are down, bond_ipsec_del_sa_all() unoffloads
everything from the previous active link. If the same link then comes
back up, nothing gets reoffloaded by bond_ipsec_add_sa_all().
This results in a stack trace like this a bit later when user space
removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to
remove a rule that's no longer offloaded:

 [] Call Trace:
 []  <TASK>
 []  ? __warn+0x7d/0x110
 []  ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
 []  ? report_bug+0x16d/0x180
 []  ? handle_bug+0x4f/0x90
 []  ? exc_invalid_op+0x14/0x70
 []  ? asm_exc_invalid_op+0x16/0x20
 []  ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core]
 []  ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
 []  bond_ipsec_del_sa+0x1ab/0x200 [bonding]
 []  xfrm_dev_state_delete+0x1f/0x60
 []  __xfrm_state_delete+0x196/0x200
 []  xfrm_state_delete+0x21/0x40
 []  xfrm_del_sa+0x69/0x110
 []  xfrm_user_rcv_msg+0x11d/0x300
 []  ? release_pages+0xca/0x140
 []  ? copy_to_user_tmpl.part.0+0x110/0x110
 []  netlink_rcv_skb+0x54/0x100
 []  xfrm_netlink_rcv+0x31/0x40
 []  netlink_unicast+0x1fc/0x2d0
 []  netlink_sendmsg+0x1e4/0x410
 []  __sock_sendmsg+0x38/0x60
 []  sock_write_iter+0x94/0xf0
 []  vfs_write+0x338/0x3f0
 []  ksys_write+0xba/0xd0
 []  do_syscall_64+0x4c/0x100
 []  entry_SYSCALL_64_after_hwframe+0x4b/0x53

There's also another theoretical bug:
Calling bond_ipsec_del_sa_all() multiple times can result in corruption
in the driver implementation if the double-free isn't tolerated. This
isn't nice.

Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm
state was unoffloaded from a device, but a race with netdevsim's
.xdo_dev_offload_ok() accessing real_dev was considered a sufficient
reason to not set real_dev to NULL anymore. This unfortunately
introduced the new bugs.

Since .xdo_dev_offload_ok() was significantly refactored by [1] and
there are no more users in the stack of xso.real_dev, that
race is now gone and xs->xso.real_dev can now once again be used to
represent which device (if any) currently holds the offloaded rule.

Go one step further and set real_dev after add/before delete calls, to
catch any future driver misuses of real_dev.

[1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/
Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/net/bonding/bond_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2ffde8f672c2..443624504767 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -495,9 +495,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev,
 		goto out;
 	}
 
-	xs->xso.real_dev = real_dev;
 	err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
 	if (!err) {
+		xs->xso.real_dev = real_dev;
 		ipsec->xs = xs;
 		INIT_LIST_HEAD(&ipsec->list);
 		mutex_lock(&bond->ipsec_lock);
@@ -539,12 +539,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 		if (ipsec->xs->xso.real_dev == real_dev)
 			continue;
 
-		ipsec->xs->xso.real_dev = real_dev;
 		if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
 							     ipsec->xs, NULL)) {
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
-			ipsec->xs->xso.real_dev = NULL;
+			continue;
 		}
+		ipsec->xs->xso.real_dev = real_dev;
 	}
 out:
 	mutex_unlock(&bond->ipsec_lock);
@@ -627,6 +627,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 				   __func__);
 			continue;
 		}
+		ipsec->xs->xso.real_dev = NULL;
 		real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
 							    ipsec->xs);
 		if (real_dev->xfrmdev_ops->xdo_dev_state_free)
@@ -662,6 +663,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
 
 	WARN_ON(xs->xso.real_dev != real_dev);
 
+	xs->xso.real_dev = NULL;
 	if (real_dev && real_dev->xfrmdev_ops &&
 	    real_dev->xfrmdev_ops->xdo_dev_state_free)
 		real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 6/6] bonding: Fix multiple long standing offload races
  2025-04-07 13:35 [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
                   ` (4 preceding siblings ...)
  2025-04-07 13:35 ` [PATCH net-next 5/6] bonding: Mark active offloaded xfrm_states Cosmin Ratiu
@ 2025-04-07 13:35 ` Cosmin Ratiu
  2025-04-08  8:08   ` Hangbin Liu
  5 siblings, 1 reply; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-07 13:35 UTC (permalink / raw)
  To: netdev, cratiu
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

Refactor the bonding ipsec offload operations to fix a number of
long-standing control plane races between state migration and user
deletion and a few other issues.

xfrm state deletion can happen concurrently with
bond_change_active_slave() operation. This manifests itself as a
bond_ipsec_del_sa() call with x->lock held, followed by a
bond_ipsec_free_sa() a bit later from a wq. The alternate path of
these calls coming from xfrm_dev_state_flush() can't happen, as that
needs the RTNL lock and bond_change_active_slave() already holds it.

1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second
   time on an xfrm state that was concurrently killed. This is bad.
2. bond_ipsec_add_sa_all() can add a state on the new device, but
   pending bond_ipsec_free_sa() calls from the old device will then hit
   the WARN_ON() and then, worse, call xdo_dev_state_free() on the new
   device without a corresponding xdo_dev_state_delete().
3. Resolve a sleeping in atomic context introduced by the mentioned
   "Fixes" commit.

bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x->lock
and check for x->km.state to help with problems 1 and 2. And since
xso.real_dev is now a private pointer managed by the bonding driver in
xfrm state, make better use of it to fully fix problems 1 and 2. In
bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the
mutex and x->lock, which makes sure that neither bond_ipsec_del_sa() nor
bond_ipsec_free_sa() could run concurrently.

Fix problem 3 by moving the list cleanup (which requires the mutex) from
bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa()

Finally, simplify bond_ipsec_free_sa() by not using current_active_slave
at all, because now that xso.real_dev is protected by locks it can be
trusted to always reflect the offload device.

Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 443624504767..ede3287318f8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -544,7 +544,20 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			continue;
 		}
+
+		spin_lock_bh(&ipsec->xs->lock);
+		/* xs might have been killed by the user during the migration
+		 * to the new dev, but bond_ipsec_del_sa() should have done
+		 * nothing, as xso.real_dev is NULL.
+		 * Delete it from the device we just added it to. The pending
+		 * bond_ipsec_free_sa() call will do the rest of the cleanup.
+		 */
+		if (ipsec->xs->km.state == XFRM_STATE_DEAD &&
+		    real_dev->xfrmdev_ops->xdo_dev_state_delete)
+			real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
+								    ipsec->xs);
 		ipsec->xs->xso.real_dev = real_dev;
+		spin_unlock_bh(&ipsec->xs->lock);
 	}
 out:
 	mutex_unlock(&bond->ipsec_lock);
@@ -559,7 +572,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev,
 {
 	struct net_device *real_dev;
 	netdevice_tracker tracker;
-	struct bond_ipsec *ipsec;
 	struct bonding *bond;
 	struct slave *slave;
 
@@ -591,15 +603,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev,
 	real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);
 out:
 	netdev_put(real_dev, &tracker);
-	mutex_lock(&bond->ipsec_lock);
-	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
-		if (ipsec->xs == xs) {
-			list_del(&ipsec->list);
-			kfree(ipsec);
-			break;
-		}
-	}
-	mutex_unlock(&bond->ipsec_lock);
 }
 
 static void bond_ipsec_del_sa_all(struct bonding *bond)
@@ -627,9 +630,15 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 				   __func__);
 			continue;
 		}
+
+		spin_lock_bh(&ipsec->xs->lock);
 		ipsec->xs->xso.real_dev = NULL;
-		real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
-							    ipsec->xs);
+		/* Don't double delete states killed by the user. */
+		if (ipsec->xs->km.state != XFRM_STATE_DEAD)
+			real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
+								    ipsec->xs);
+		spin_unlock_bh(&ipsec->xs->lock);
+
 		if (real_dev->xfrmdev_ops->xdo_dev_state_free)
 			real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev,
 								  ipsec->xs);
@@ -641,34 +650,33 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
 			       struct xfrm_state *xs)
 {
 	struct net_device *real_dev;
-	netdevice_tracker tracker;
+	struct bond_ipsec *ipsec;
 	struct bonding *bond;
-	struct slave *slave;
 
 	if (!bond_dev)
 		return;
 
-	rcu_read_lock();
 	bond = netdev_priv(bond_dev);
-	slave = rcu_dereference(bond->curr_active_slave);
-	real_dev = slave ? slave->dev : NULL;
-	netdev_hold(real_dev, &tracker, GFP_ATOMIC);
-	rcu_read_unlock();
-
-	if (!slave)
-		goto out;
 
+	mutex_lock(&bond->ipsec_lock);
 	if (!xs->xso.real_dev)
 		goto out;
 
-	WARN_ON(xs->xso.real_dev != real_dev);
+	real_dev = xs->xso.real_dev;
 
 	xs->xso.real_dev = NULL;
-	if (real_dev && real_dev->xfrmdev_ops &&
+	if (real_dev->xfrmdev_ops &&
 	    real_dev->xfrmdev_ops->xdo_dev_state_free)
 		real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
 out:
-	netdev_put(real_dev, &tracker);
+	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+		if (ipsec->xs == xs) {
+			list_del(&ipsec->list);
+			kfree(ipsec);
+			break;
+		}
+	}
+	mutex_unlock(&bond->ipsec_lock);
 }
 
 /**
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 6/6] bonding: Fix multiple long standing offload races
  2025-04-07 13:35 ` [PATCH net-next 6/6] bonding: Fix multiple long standing offload races Cosmin Ratiu
@ 2025-04-08  8:08   ` Hangbin Liu
  2025-04-09 14:33     ` Cosmin Ratiu
  0 siblings, 1 reply; 13+ messages in thread
From: Hangbin Liu @ 2025-04-08  8:08 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Jay Vosburgh, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
	Saeed Mahameed, Tariq Toukan, Jianbo Liu, Steffen Klassert,
	Herbert Xu, Ayush Sawal, Tony Nguyen, Przemek Kitszel,
	Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep, hariprasad,
	Bharat Bhushan, Louis Peens, Leon Romanovsky, linux-kselftest

Hi Cosmin,
On Mon, Apr 07, 2025 at 04:35:42PM +0300, Cosmin Ratiu wrote:
> Refactor the bonding ipsec offload operations to fix a number of
> long-standing control plane races between state migration and user
> deletion and a few other issues.
> 
> xfrm state deletion can happen concurrently with
> bond_change_active_slave() operation. This manifests itself as a
> bond_ipsec_del_sa() call with x->lock held, followed by a
> bond_ipsec_free_sa() a bit later from a wq. The alternate path of
> these calls coming from xfrm_dev_state_flush() can't happen, as that
> needs the RTNL lock and bond_change_active_slave() already holds it.
> 
> 1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second
>    time on an xfrm state that was concurrently killed. This is bad.
> 2. bond_ipsec_add_sa_all() can add a state on the new device, but
>    pending bond_ipsec_free_sa() calls from the old device will then hit
>    the WARN_ON() and then, worse, call xdo_dev_state_free() on the new
>    device without a corresponding xdo_dev_state_delete().
> 3. Resolve a sleeping in atomic context introduced by the mentioned
>    "Fixes" commit.
> 
> bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x->lock
> and check for x->km.state to help with problems 1 and 2. And since
> xso.real_dev is now a private pointer managed by the bonding driver in
> xfrm state, make better use of it to fully fix problems 1 and 2. In
> bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the
> mutex and x->lock, which makes sure that neither bond_ipsec_del_sa() nor
> bond_ipsec_free_sa() could run concurrently.
> 
> Fix problem 3 by moving the list cleanup (which requires the mutex) from
> bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa()
> 
> Finally, simplify bond_ipsec_free_sa() by not using current_active_slave
> at all, because now that xso.real_dev is protected by locks it can be
> trusted to always reflect the offload device.
> 
> Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex")
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++--------------
>  1 file changed, 33 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 443624504767..ede3287318f8 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -544,7 +544,20 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
>  			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
>  			continue;
>  		}
> +
> +		spin_lock_bh(&ipsec->xs->lock);
> +		/* xs might have been killed by the user during the migration
> +		 * to the new dev, but bond_ipsec_del_sa() should have done
> +		 * nothing, as xso.real_dev is NULL.
> +		 * Delete it from the device we just added it to. The pending
> +		 * bond_ipsec_free_sa() call will do the rest of the cleanup.
> +		 */
> +		if (ipsec->xs->km.state == XFRM_STATE_DEAD &&
> +		    real_dev->xfrmdev_ops->xdo_dev_state_delete)
> +			real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
> +								    ipsec->xs);
>  		ipsec->xs->xso.real_dev = real_dev;
> +		spin_unlock_bh(&ipsec->xs->lock);
>  	}
>  out:
>  	mutex_unlock(&bond->ipsec_lock);
> @@ -559,7 +572,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev,
>  {
>  	struct net_device *real_dev;
>  	netdevice_tracker tracker;
> -	struct bond_ipsec *ipsec;
>  	struct bonding *bond;
>  	struct slave *slave;
>  
> @@ -591,15 +603,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev,
>  	real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);

Thanks a lot for the fixes. With your patch applied. I see the
bond_ipsec_del_sa() still has WARN_ON(xs->xso.real_dev != real_dev);

Do you think if we still has this possibility? If yes, should we do
xdo_dev_state_delete() on xs->xso.real_dev or real_dev?

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily
  2025-04-07 13:35 ` [PATCH net-next 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily Cosmin Ratiu
@ 2025-04-08 17:06   ` Tariq Toukan
  0 siblings, 0 replies; 13+ messages in thread
From: Tariq Toukan @ 2025-04-08 17:06 UTC (permalink / raw)
  To: Cosmin Ratiu, netdev
  Cc: Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov,
	Simon Horman, Saeed Mahameed, Tariq Toukan, Jianbo Liu,
	Steffen Klassert, Herbert Xu, Ayush Sawal, Tony Nguyen,
	Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest



On 07/04/2025 16:35, Cosmin Ratiu wrote:
> xso.real_dev is the active device of an offloaded xfrm state and is
> managed by bonding. As such, it's subject to change when states are
> migrated to a new device. Using it in places other than
> offloading/unoffloading the states is risky.
> 
> This commit saves the device into the driver-specific struct
> mlx5e_ipsec_sa_entry and switches mlx5e_ipsec_init_macs() and
> mlx5e_ipsec_netevent_event() to make use of it.
> 
> Additionally, mlx5e_xfrm_update_stats() used xso.real_dev to validate
> that correct net locks are held. But in a bonding config, the net of the
> master device is the same as the underlying devices, and the net is
> already a local var, so use that instead.
> 
> The only remaining references to xso.real_dev are now in the
> .xdo_dev_state_add() / .xdo_dev_state_delete() path.
> 
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>

Acked-by: Tariq Toukan <tariqt@nvidia.com>

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  2025-04-07 13:35 ` [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Cosmin Ratiu
@ 2025-04-08 23:20   ` Jakub Kicinski
  2025-04-09 14:08     ` Cosmin Ratiu
  2025-04-09 12:30   ` kernel test robot
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-04-08 23:20 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Hangbin Liu, Jay Vosburgh, Andrew Lunn, David S . Miller,
	Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
	Saeed Mahameed, Tariq Toukan, Jianbo Liu, Steffen Klassert,
	Herbert Xu, Ayush Sawal, Tony Nguyen, Przemek Kitszel,
	Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep, hariprasad,
	Bharat Bhushan, Louis Peens, Leon Romanovsky, linux-kselftest

On Mon, 7 Apr 2025 16:35:40 +0300 Cosmin Ratiu wrote:
> @@ -424,10 +424,10 @@ static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec,
>   * This copies the protocol keys and salt to our own data tables.  The
>   * 82599 family only supports the one algorithm.
>   **/
> -static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
> +static int ixgbe_ipsec_parse_proto_keys(struct net_device *dev,
> +					struct xfrm_state *xs,
>  					u32 *mykey, u32 *mysalt)

nit: you missed a few kdoc changes in this patch
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  2025-04-07 13:35 ` [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Cosmin Ratiu
  2025-04-08 23:20   ` Jakub Kicinski
@ 2025-04-09 12:30   ` kernel test robot
  1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2025-04-09 12:30 UTC (permalink / raw)
  To: Cosmin Ratiu, netdev
  Cc: llvm, oe-kbuild-all, Hangbin Liu, Jay Vosburgh, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Nikolay Aleksandrov, Simon Horman, Saeed Mahameed, Tariq Toukan,
	Jianbo Liu, Steffen Klassert, Herbert Xu, Ayush Sawal,
	Tony Nguyen, Przemek Kitszel, Sunil Goutham, Geetha sowjanya,
	Subbaraya Sundeep, hariprasad, Bharat Bhushan, Louis Peens,
	Leon Romanovsky, linux-kselftest

Hi Cosmin,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Cosmin-Ratiu/net-mlx5-Avoid-using-xso-real_dev-unnecessarily/20250407-214239
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250407133542.2668491-5-cratiu%40nvidia.com
patch subject: [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
config: hexagon-randconfig-002-20250408 (https://download.01.org/0day-ci/archive/20250409/202504091346.cvaZAxVI-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project 92c93f5286b9ff33f27ff694d2dc33da1c07afdd)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250409/202504091346.cvaZAxVI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504091346.cvaZAxVI-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/bonding/bond_main.c:462: warning: Function parameter or struct member 'bond_dev' not described in 'bond_ipsec_add_sa'
>> drivers/net/bonding/bond_main.c:559: warning: Function parameter or struct member 'bond_dev' not described in 'bond_ipsec_del_sa'


vim +462 drivers/net/bonding/bond_main.c

1ddec5d0eec4b7 Hangbin Liu     2024-09-04  453  
18cb261afd7bf5 Jarod Wilson    2020-06-19  454  /**
18cb261afd7bf5 Jarod Wilson    2020-06-19  455   * bond_ipsec_add_sa - program device with a security association
18cb261afd7bf5 Jarod Wilson    2020-06-19  456   * @xs: pointer to transformer state struct
7681a4f58fb9c3 Leon Romanovsky 2023-01-24  457   * @extack: extack point to fill failure reason
18cb261afd7bf5 Jarod Wilson    2020-06-19  458   **/
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  459  static int bond_ipsec_add_sa(struct net_device *bond_dev,
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  460  			     struct xfrm_state *xs,
7681a4f58fb9c3 Leon Romanovsky 2023-01-24  461  			     struct netlink_ext_ack *extack)
18cb261afd7bf5 Jarod Wilson    2020-06-19 @462  {
907ed83a7583e8 Jianbo Liu      2024-08-23  463  	struct net_device *real_dev;
2aeeef906d5a52 Jianbo Liu      2024-08-23  464  	netdevice_tracker tracker;
9a5605505d9c7d Taehee Yoo      2021-07-05  465  	struct bond_ipsec *ipsec;
5cd24cbe7dca62 Jarod Wilson    2020-07-08  466  	struct bonding *bond;
5cd24cbe7dca62 Jarod Wilson    2020-07-08  467  	struct slave *slave;
b648eba4c69e58 Taehee Yoo      2021-07-05  468  	int err;
5cd24cbe7dca62 Jarod Wilson    2020-07-08  469  
5cd24cbe7dca62 Jarod Wilson    2020-07-08  470  	if (!bond_dev)
5cd24cbe7dca62 Jarod Wilson    2020-07-08  471  		return -EINVAL;
18cb261afd7bf5 Jarod Wilson    2020-06-19  472  
b648eba4c69e58 Taehee Yoo      2021-07-05  473  	rcu_read_lock();
5cd24cbe7dca62 Jarod Wilson    2020-07-08  474  	bond = netdev_priv(bond_dev);
f548a476268d62 Jarod Wilson    2020-07-08  475  	slave = rcu_dereference(bond->curr_active_slave);
2aeeef906d5a52 Jianbo Liu      2024-08-23  476  	real_dev = slave ? slave->dev : NULL;
2aeeef906d5a52 Jianbo Liu      2024-08-23  477  	netdev_hold(real_dev, &tracker, GFP_ATOMIC);
105cd17a866017 Taehee Yoo      2021-07-05  478  	rcu_read_unlock();
2aeeef906d5a52 Jianbo Liu      2024-08-23  479  	if (!real_dev) {
2aeeef906d5a52 Jianbo Liu      2024-08-23  480  		err = -ENODEV;
2aeeef906d5a52 Jianbo Liu      2024-08-23  481  		goto out;
105cd17a866017 Taehee Yoo      2021-07-05  482  	}
105cd17a866017 Taehee Yoo      2021-07-05  483  
907ed83a7583e8 Jianbo Liu      2024-08-23  484  	if (!real_dev->xfrmdev_ops ||
907ed83a7583e8 Jianbo Liu      2024-08-23  485  	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
907ed83a7583e8 Jianbo Liu      2024-08-23  486  	    netif_is_bond_master(real_dev)) {
3fe57986271aee Leon Romanovsky 2023-01-24  487  		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
2aeeef906d5a52 Jianbo Liu      2024-08-23  488  		err = -EINVAL;
2aeeef906d5a52 Jianbo Liu      2024-08-23  489  		goto out;
18cb261afd7bf5 Jarod Wilson    2020-06-19  490  	}
18cb261afd7bf5 Jarod Wilson    2020-06-19  491  
2aeeef906d5a52 Jianbo Liu      2024-08-23  492  	ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
9a5605505d9c7d Taehee Yoo      2021-07-05  493  	if (!ipsec) {
2aeeef906d5a52 Jianbo Liu      2024-08-23  494  		err = -ENOMEM;
2aeeef906d5a52 Jianbo Liu      2024-08-23  495  		goto out;
9a5605505d9c7d Taehee Yoo      2021-07-05  496  	}
9a5605505d9c7d Taehee Yoo      2021-07-05  497  
907ed83a7583e8 Jianbo Liu      2024-08-23  498  	xs->xso.real_dev = real_dev;
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  499  	err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
9a5605505d9c7d Taehee Yoo      2021-07-05  500  	if (!err) {
9a5605505d9c7d Taehee Yoo      2021-07-05  501  		ipsec->xs = xs;
9a5605505d9c7d Taehee Yoo      2021-07-05  502  		INIT_LIST_HEAD(&ipsec->list);
2aeeef906d5a52 Jianbo Liu      2024-08-23  503  		mutex_lock(&bond->ipsec_lock);
9a5605505d9c7d Taehee Yoo      2021-07-05  504  		list_add(&ipsec->list, &bond->ipsec_list);
2aeeef906d5a52 Jianbo Liu      2024-08-23  505  		mutex_unlock(&bond->ipsec_lock);
9a5605505d9c7d Taehee Yoo      2021-07-05  506  	} else {
9a5605505d9c7d Taehee Yoo      2021-07-05  507  		kfree(ipsec);
9a5605505d9c7d Taehee Yoo      2021-07-05  508  	}
2aeeef906d5a52 Jianbo Liu      2024-08-23  509  out:
2aeeef906d5a52 Jianbo Liu      2024-08-23  510  	netdev_put(real_dev, &tracker);
b648eba4c69e58 Taehee Yoo      2021-07-05  511  	return err;
18cb261afd7bf5 Jarod Wilson    2020-06-19  512  }
18cb261afd7bf5 Jarod Wilson    2020-06-19  513  
9a5605505d9c7d Taehee Yoo      2021-07-05  514  static void bond_ipsec_add_sa_all(struct bonding *bond)
9a5605505d9c7d Taehee Yoo      2021-07-05  515  {
9a5605505d9c7d Taehee Yoo      2021-07-05  516  	struct net_device *bond_dev = bond->dev;
907ed83a7583e8 Jianbo Liu      2024-08-23  517  	struct net_device *real_dev;
9a5605505d9c7d Taehee Yoo      2021-07-05  518  	struct bond_ipsec *ipsec;
9a5605505d9c7d Taehee Yoo      2021-07-05  519  	struct slave *slave;
9a5605505d9c7d Taehee Yoo      2021-07-05  520  
2aeeef906d5a52 Jianbo Liu      2024-08-23  521  	slave = rtnl_dereference(bond->curr_active_slave);
2aeeef906d5a52 Jianbo Liu      2024-08-23  522  	real_dev = slave ? slave->dev : NULL;
2aeeef906d5a52 Jianbo Liu      2024-08-23  523  	if (!real_dev)
2aeeef906d5a52 Jianbo Liu      2024-08-23  524  		return;
9a5605505d9c7d Taehee Yoo      2021-07-05  525  
2aeeef906d5a52 Jianbo Liu      2024-08-23  526  	mutex_lock(&bond->ipsec_lock);
907ed83a7583e8 Jianbo Liu      2024-08-23  527  	if (!real_dev->xfrmdev_ops ||
907ed83a7583e8 Jianbo Liu      2024-08-23  528  	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
907ed83a7583e8 Jianbo Liu      2024-08-23  529  	    netif_is_bond_master(real_dev)) {
9a5605505d9c7d Taehee Yoo      2021-07-05  530  		if (!list_empty(&bond->ipsec_list))
907ed83a7583e8 Jianbo Liu      2024-08-23  531  			slave_warn(bond_dev, real_dev,
9a5605505d9c7d Taehee Yoo      2021-07-05  532  				   "%s: no slave xdo_dev_state_add\n",
9a5605505d9c7d Taehee Yoo      2021-07-05  533  				   __func__);
9a5605505d9c7d Taehee Yoo      2021-07-05  534  		goto out;
9a5605505d9c7d Taehee Yoo      2021-07-05  535  	}
9a5605505d9c7d Taehee Yoo      2021-07-05  536  
9a5605505d9c7d Taehee Yoo      2021-07-05  537  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
2aeeef906d5a52 Jianbo Liu      2024-08-23  538  		/* If new state is added before ipsec_lock acquired */
2aeeef906d5a52 Jianbo Liu      2024-08-23  539  		if (ipsec->xs->xso.real_dev == real_dev)
2aeeef906d5a52 Jianbo Liu      2024-08-23  540  			continue;
2aeeef906d5a52 Jianbo Liu      2024-08-23  541  
907ed83a7583e8 Jianbo Liu      2024-08-23  542  		ipsec->xs->xso.real_dev = real_dev;
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  543  		if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  544  							     ipsec->xs, NULL)) {
907ed83a7583e8 Jianbo Liu      2024-08-23  545  			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
9a5605505d9c7d Taehee Yoo      2021-07-05  546  			ipsec->xs->xso.real_dev = NULL;
9a5605505d9c7d Taehee Yoo      2021-07-05  547  		}
9a5605505d9c7d Taehee Yoo      2021-07-05  548  	}
9a5605505d9c7d Taehee Yoo      2021-07-05  549  out:
2aeeef906d5a52 Jianbo Liu      2024-08-23  550  	mutex_unlock(&bond->ipsec_lock);
9a5605505d9c7d Taehee Yoo      2021-07-05  551  }
9a5605505d9c7d Taehee Yoo      2021-07-05  552  
18cb261afd7bf5 Jarod Wilson    2020-06-19  553  /**
18cb261afd7bf5 Jarod Wilson    2020-06-19  554   * bond_ipsec_del_sa - clear out this specific SA
18cb261afd7bf5 Jarod Wilson    2020-06-19  555   * @xs: pointer to transformer state struct
18cb261afd7bf5 Jarod Wilson    2020-06-19  556   **/
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  557  static void bond_ipsec_del_sa(struct net_device *bond_dev,
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  558  			      struct xfrm_state *xs)
18cb261afd7bf5 Jarod Wilson    2020-06-19 @559  {
907ed83a7583e8 Jianbo Liu      2024-08-23  560  	struct net_device *real_dev;
2aeeef906d5a52 Jianbo Liu      2024-08-23  561  	netdevice_tracker tracker;
9a5605505d9c7d Taehee Yoo      2021-07-05  562  	struct bond_ipsec *ipsec;
5cd24cbe7dca62 Jarod Wilson    2020-07-08  563  	struct bonding *bond;
5cd24cbe7dca62 Jarod Wilson    2020-07-08  564  	struct slave *slave;
5cd24cbe7dca62 Jarod Wilson    2020-07-08  565  
5cd24cbe7dca62 Jarod Wilson    2020-07-08  566  	if (!bond_dev)
5cd24cbe7dca62 Jarod Wilson    2020-07-08  567  		return;
5cd24cbe7dca62 Jarod Wilson    2020-07-08  568  
a22c39b831a081 Taehee Yoo      2021-07-05  569  	rcu_read_lock();
5cd24cbe7dca62 Jarod Wilson    2020-07-08  570  	bond = netdev_priv(bond_dev);
f548a476268d62 Jarod Wilson    2020-07-08  571  	slave = rcu_dereference(bond->curr_active_slave);
2aeeef906d5a52 Jianbo Liu      2024-08-23  572  	real_dev = slave ? slave->dev : NULL;
2aeeef906d5a52 Jianbo Liu      2024-08-23  573  	netdev_hold(real_dev, &tracker, GFP_ATOMIC);
2aeeef906d5a52 Jianbo Liu      2024-08-23  574  	rcu_read_unlock();
18cb261afd7bf5 Jarod Wilson    2020-06-19  575  
18cb261afd7bf5 Jarod Wilson    2020-06-19  576  	if (!slave)
a22c39b831a081 Taehee Yoo      2021-07-05  577  		goto out;
18cb261afd7bf5 Jarod Wilson    2020-06-19  578  
9a5605505d9c7d Taehee Yoo      2021-07-05  579  	if (!xs->xso.real_dev)
9a5605505d9c7d Taehee Yoo      2021-07-05  580  		goto out;
9a5605505d9c7d Taehee Yoo      2021-07-05  581  
907ed83a7583e8 Jianbo Liu      2024-08-23  582  	WARN_ON(xs->xso.real_dev != real_dev);
18cb261afd7bf5 Jarod Wilson    2020-06-19  583  
907ed83a7583e8 Jianbo Liu      2024-08-23  584  	if (!real_dev->xfrmdev_ops ||
907ed83a7583e8 Jianbo Liu      2024-08-23  585  	    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
907ed83a7583e8 Jianbo Liu      2024-08-23  586  	    netif_is_bond_master(real_dev)) {
907ed83a7583e8 Jianbo Liu      2024-08-23  587  		slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__);
a22c39b831a081 Taehee Yoo      2021-07-05  588  		goto out;
18cb261afd7bf5 Jarod Wilson    2020-06-19  589  	}
18cb261afd7bf5 Jarod Wilson    2020-06-19  590  
1e4d9370eb7223 Cosmin Ratiu    2025-04-07  591  	real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);
a22c39b831a081 Taehee Yoo      2021-07-05  592  out:
2aeeef906d5a52 Jianbo Liu      2024-08-23  593  	netdev_put(real_dev, &tracker);
2aeeef906d5a52 Jianbo Liu      2024-08-23  594  	mutex_lock(&bond->ipsec_lock);
9a5605505d9c7d Taehee Yoo      2021-07-05  595  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
9a5605505d9c7d Taehee Yoo      2021-07-05  596  		if (ipsec->xs == xs) {
9a5605505d9c7d Taehee Yoo      2021-07-05  597  			list_del(&ipsec->list);
9a5605505d9c7d Taehee Yoo      2021-07-05  598  			kfree(ipsec);
9a5605505d9c7d Taehee Yoo      2021-07-05  599  			break;
9a5605505d9c7d Taehee Yoo      2021-07-05  600  		}
9a5605505d9c7d Taehee Yoo      2021-07-05  601  	}
2aeeef906d5a52 Jianbo Liu      2024-08-23  602  	mutex_unlock(&bond->ipsec_lock);
9a5605505d9c7d Taehee Yoo      2021-07-05  603  }
9a5605505d9c7d Taehee Yoo      2021-07-05  604  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  2025-04-08 23:20   ` Jakub Kicinski
@ 2025-04-09 14:08     ` Cosmin Ratiu
  0 siblings, 0 replies; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-09 14:08 UTC (permalink / raw)
  To: kuba@kernel.org
  Cc: anthony.l.nguyen@intel.com, bbhushan2@marvell.com,
	andrew+netdev@lunn.ch, hkelam@marvell.com, davem@davemloft.net,
	jv@jvosburgh.net, liuhangbin@gmail.com, razor@blackwall.org,
	pabeni@redhat.com, edumazet@google.com, Leon Romanovsky,
	herbert@gondor.apana.org.au, ayush.sawal@chelsio.com, Jianbo Liu,
	sbhatta@marvell.com, horms@kernel.org, Tariq Toukan,
	Saeed Mahameed, netdev@vger.kernel.org,
	steffen.klassert@secunet.com, gakula@marvell.com,
	przemyslaw.kitszel@intel.com, louis.peens@corigine.com,
	linux-kselftest@vger.kernel.org, sgoutham@marvell.com

On Tue, 2025-04-08 at 16:20 -0700, Jakub Kicinski wrote:
> nit: you missed a few kdoc changes in this patch

Got it, will address in V2.

Cosmin.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 6/6] bonding: Fix multiple long standing offload races
  2025-04-08  8:08   ` Hangbin Liu
@ 2025-04-09 14:33     ` Cosmin Ratiu
  0 siblings, 0 replies; 13+ messages in thread
From: Cosmin Ratiu @ 2025-04-09 14:33 UTC (permalink / raw)
  To: liuhangbin@gmail.com
  Cc: anthony.l.nguyen@intel.com, bbhushan2@marvell.com,
	andrew+netdev@lunn.ch, hkelam@marvell.com, davem@davemloft.net,
	jv@jvosburgh.net, razor@blackwall.org,
	herbert@gondor.apana.org.au, edumazet@google.com,
	pabeni@redhat.com, Leon Romanovsky, ayush.sawal@chelsio.com,
	Jianbo Liu, sbhatta@marvell.com, horms@kernel.org,
	kuba@kernel.org, Tariq Toukan, Saeed Mahameed,
	netdev@vger.kernel.org, steffen.klassert@secunet.com,
	gakula@marvell.com, przemyslaw.kitszel@intel.com,
	louis.peens@corigine.com, linux-kselftest@vger.kernel.org,
	sgoutham@marvell.com

On Tue, 2025-04-08 at 08:08 +0000, Hangbin Liu wrote:
> Hi Cosmin,
> 
> Thanks a lot for the fixes. With your patch applied. I see the
> bond_ipsec_del_sa() still has WARN_ON(xs->xso.real_dev != real_dev);
> 
> Do you think if we still has this possibility? If yes, should we do
> xdo_dev_state_delete() on xs->xso.real_dev or real_dev?

You are right, the WARN_ON doesn't make sense any more.
bond_ipsec_del_sa() can get the same treatment as bond_ipsec_free_sa(),
I'll do so in the next version.

Cosmin.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-04-09 14:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 13:35 [PATCH net-next 0/6] xfrm & bonding: Correct use of xso.real_dev Cosmin Ratiu
2025-04-07 13:35 ` [PATCH net-next 1/6] net/mlx5: Avoid using xso.real_dev unnecessarily Cosmin Ratiu
2025-04-08 17:06   ` Tariq Toukan
2025-04-07 13:35 ` [PATCH net-next 2/6] xfrm: Use xdo.dev instead of xdo.real_dev Cosmin Ratiu
2025-04-07 13:35 ` [PATCH net-next 3/6] xfrm: Remove unneeded device check from validate_xmit_xfrm Cosmin Ratiu
2025-04-07 13:35 ` [PATCH net-next 4/6] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Cosmin Ratiu
2025-04-08 23:20   ` Jakub Kicinski
2025-04-09 14:08     ` Cosmin Ratiu
2025-04-09 12:30   ` kernel test robot
2025-04-07 13:35 ` [PATCH net-next 5/6] bonding: Mark active offloaded xfrm_states Cosmin Ratiu
2025-04-07 13:35 ` [PATCH net-next 6/6] bonding: Fix multiple long standing offload races Cosmin Ratiu
2025-04-08  8:08   ` Hangbin Liu
2025-04-09 14:33     ` Cosmin Ratiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).