* [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23
@ 2025-05-23 7:55 Steffen Klassert
2025-05-23 7:56 ` [PATCH 01/12] xfrm: Remove unnecessary strscpy_pad() size arguments Steffen Klassert
` (12 more replies)
0 siblings, 13 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:55 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
1) Remove some unnecessary strscpy_pad() size arguments.
From Thorsten Blum.
2) Correct use of xso.real_dev on bonding offloads.
Patchset from Cosmin Ratiu.
3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
From Chiachang Wang.
4) Refactor migration setup during cloning. This was
done after the clone was created. Now it is done
in the cloning function itself.
From Chiachang Wang.
5) Validate assignment of maximal possible SEQ number.
Prevent from setting to the maximum sequrnce number
as this would cause for traffic drop.
From Leon Romanovsky.
6) Prevent configuration of interface index when offload
is used. Hardware can't handle this case.i
From Leon Romanovsky.
7) Always use kfree_sensitive() for SA secret zeroization.
From Zilin Guan.
Please pull or let me know if there are problems.
Thanks!
The following changes since commit 0c49baf099ba2147a6ff3bbdc3197c6ddbee5469:
r8169: add helper rtl8125_phy_param (2025-04-10 20:18:11 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git tags/ipsec-next-2025-05-23
for you to fetch changes up to e7a37c9e428a2912a4eec160e633503cd72e1ee6:
xfrm: use kfree_sensitive() for SA secret zeroization (2025-05-20 07:55:00 +0200)
----------------------------------------------------------------
ipsec-next-2025-05-23
----------------------------------------------------------------
Chiachang Wang (2):
xfrm: Migrate offload configuration
xfrm: Refactor migration setup during the cloning process
Cosmin Ratiu (6):
net/mlx5: Avoid using xso.real_dev unnecessarily
xfrm: Use xdo.dev instead of xdo.real_dev
xfrm: Remove unneeded device check from validate_xmit_xfrm
xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
bonding: Mark active offloaded xfrm_states
bonding: Fix multiple long standing offload races
Leon Romanovsky (2):
xfrm: validate assignment of maximal possible SEQ number
xfrm: prevent configuration of interface index when offload is used
Steffen Klassert (2):
Merge branch 'xfrm & bonding: Correct use of xso.real_dev'
Merge branch 'Update offload configuration with SA'
Thorsten Blum (1):
xfrm: Remove unnecessary strscpy_pad() size arguments
Zilin Guan (1):
xfrm: use kfree_sensitive() for SA secret zeroization
Documentation/networking/xfrm_device.rst | 10 +-
drivers/net/bonding/bond_main.c | 119 ++++++++++-----------
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 20 ++--
.../chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c | 18 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 41 +++----
drivers/net/ethernet/intel/ixgbevf/ipsec.c | 21 ++--
.../ethernet/marvell/octeontx2/nic/cn10k_ipsec.c | 18 ++--
.../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 28 ++---
.../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 1 +
drivers/net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +-
drivers/net/netdevsim/ipsec.c | 15 ++-
include/linux/netdevice.h | 10 +-
include/net/xfrm.h | 19 +++-
net/key/af_key.c | 2 +-
net/xfrm/xfrm_device.c | 18 ++--
net/xfrm/xfrm_policy.c | 4 +-
net/xfrm/xfrm_state.c | 46 ++++----
net/xfrm/xfrm_user.c | 77 +++++++++----
18 files changed, 277 insertions(+), 201 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 01/12] xfrm: Remove unnecessary strscpy_pad() size arguments
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-26 16:50 ` patchwork-bot+netdevbpf
2025-05-23 7:56 ` [PATCH 02/12] net/mlx5: Avoid using xso.real_dev unnecessarily Steffen Klassert
` (11 subsequent siblings)
12 siblings, 1 reply; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Thorsten Blum <thorsten.blum@linux.dev>
If the destination buffer has a fixed length, strscpy_pad()
automatically determines its size using sizeof() when the argument is
omitted. This makes the explicit sizeof() calls unnecessary - remove
them.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_user.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 784a2d124749..0a3d3f3ae5a3 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1173,7 +1173,7 @@ static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
if (!nla)
return -EMSGSIZE;
algo = nla_data(nla);
- strscpy_pad(algo->alg_name, auth->alg_name, sizeof(algo->alg_name));
+ strscpy_pad(algo->alg_name, auth->alg_name);
if (redact_secret && auth->alg_key_len)
memset(algo->alg_key, 0, (auth->alg_key_len + 7) / 8);
@@ -1186,7 +1186,7 @@ static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
if (!nla)
return -EMSGSIZE;
ap = nla_data(nla);
- strscpy_pad(ap->alg_name, auth->alg_name, sizeof(ap->alg_name));
+ strscpy_pad(ap->alg_name, auth->alg_name);
ap->alg_key_len = auth->alg_key_len;
ap->alg_trunc_len = auth->alg_trunc_len;
if (redact_secret && auth->alg_key_len)
@@ -1207,7 +1207,7 @@ static int copy_to_user_aead(struct xfrm_algo_aead *aead, struct sk_buff *skb)
return -EMSGSIZE;
ap = nla_data(nla);
- strscpy_pad(ap->alg_name, aead->alg_name, sizeof(ap->alg_name));
+ strscpy_pad(ap->alg_name, aead->alg_name);
ap->alg_key_len = aead->alg_key_len;
ap->alg_icv_len = aead->alg_icv_len;
@@ -1229,7 +1229,7 @@ static int copy_to_user_ealg(struct xfrm_algo *ealg, struct sk_buff *skb)
return -EMSGSIZE;
ap = nla_data(nla);
- strscpy_pad(ap->alg_name, ealg->alg_name, sizeof(ap->alg_name));
+ strscpy_pad(ap->alg_name, ealg->alg_name);
ap->alg_key_len = ealg->alg_key_len;
if (redact_secret && ealg->alg_key_len)
@@ -1250,7 +1250,7 @@ static int copy_to_user_calg(struct xfrm_algo *calg, struct sk_buff *skb)
return -EMSGSIZE;
ap = nla_data(nla);
- strscpy_pad(ap->alg_name, calg->alg_name, sizeof(ap->alg_name));
+ strscpy_pad(ap->alg_name, calg->alg_name);
ap->alg_key_len = 0;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 02/12] net/mlx5: Avoid using xso.real_dev unnecessarily
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
2025-05-23 7:56 ` [PATCH 01/12] xfrm: Remove unnecessary strscpy_pad() size arguments Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 03/12] xfrm: Use xdo.dev instead of xdo.real_dev Steffen Klassert
` (10 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Cosmin Ratiu <cratiu@nvidia.com>
xso.real_dev is the active device of an offloaded xfrm state and is
managed by bonding. As such, it's subject to change when states are
migrated to a new device. Using it in places other than
offloading/unoffloading the states is risky.
This commit saves the device into the driver-specific struct
mlx5e_ipsec_sa_entry and switches mlx5e_ipsec_init_macs() and
mlx5e_ipsec_netevent_event() to make use of it.
Additionally, mlx5e_xfrm_update_stats() used xso.real_dev to validate
that correct net locks are held. But in a bonding config, the net of the
master device is the same as the underlying devices, and the net is
already a local var, so use that instead.
The only remaining references to xso.real_dev are now in the
.xdo_dev_state_add() / .xdo_dev_state_delete() path.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
.../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 16 +++++-----------
.../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 1 +
2 files changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 2dd842aac6fc..626e525c0f0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -259,8 +259,7 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry,
struct mlx5_accel_esp_xfrm_attrs *attrs)
{
struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry);
- struct xfrm_state *x = sa_entry->x;
- struct net_device *netdev;
+ struct net_device *netdev = sa_entry->dev;
struct neighbour *n;
u8 addr[ETH_ALEN];
const void *pkey;
@@ -270,8 +269,6 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry,
attrs->type != XFRM_DEV_OFFLOAD_PACKET)
return;
- netdev = x->xso.real_dev;
-
mlx5_query_mac_address(mdev, addr);
switch (attrs->dir) {
case XFRM_DEV_OFFLOAD_IN:
@@ -713,6 +710,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
return -ENOMEM;
sa_entry->x = x;
+ sa_entry->dev = netdev;
sa_entry->ipsec = ipsec;
/* Check if this SA is originated from acquire flow temporary SA */
if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
@@ -855,8 +853,6 @@ static int mlx5e_ipsec_netevent_event(struct notifier_block *nb,
struct mlx5e_ipsec_sa_entry *sa_entry;
struct mlx5e_ipsec *ipsec;
struct neighbour *n = ptr;
- struct net_device *netdev;
- struct xfrm_state *x;
unsigned long idx;
if (event != NETEVENT_NEIGH_UPDATE || !(n->nud_state & NUD_VALID))
@@ -876,11 +872,9 @@ static int mlx5e_ipsec_netevent_event(struct notifier_block *nb,
continue;
}
- x = sa_entry->x;
- netdev = x->xso.real_dev;
data = sa_entry->work->data;
- neigh_ha_snapshot(data->addr, n, netdev);
+ neigh_ha_snapshot(data->addr, n, sa_entry->dev);
queue_work(ipsec->wq, &sa_entry->work->work);
}
@@ -996,8 +990,8 @@ static void mlx5e_xfrm_update_stats(struct xfrm_state *x)
size_t headers;
lockdep_assert(lockdep_is_held(&x->lock) ||
- lockdep_is_held(&dev_net(x->xso.real_dev)->xfrm.xfrm_cfg_mutex) ||
- lockdep_is_held(&dev_net(x->xso.real_dev)->xfrm.xfrm_state_lock));
+ lockdep_is_held(&net->xfrm.xfrm_cfg_mutex) ||
+ lockdep_is_held(&net->xfrm.xfrm_state_lock));
if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
return;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index a63c2289f8af..ffcd0cdeb775 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -274,6 +274,7 @@ struct mlx5e_ipsec_limits {
struct mlx5e_ipsec_sa_entry {
struct mlx5e_ipsec_esn_state esn_state;
struct xfrm_state *x;
+ struct net_device *dev;
struct mlx5e_ipsec *ipsec;
struct mlx5_accel_esp_xfrm_attrs attrs;
void (*set_iv_op)(struct sk_buff *skb, struct xfrm_state *x,
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 03/12] xfrm: Use xdo.dev instead of xdo.real_dev
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
2025-05-23 7:56 ` [PATCH 01/12] xfrm: Remove unnecessary strscpy_pad() size arguments Steffen Klassert
2025-05-23 7:56 ` [PATCH 02/12] net/mlx5: Avoid using xso.real_dev unnecessarily Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 04/12] xfrm: Remove unneeded device check from validate_xmit_xfrm Steffen Klassert
` (9 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Cosmin Ratiu <cratiu@nvidia.com>
The policy offload struct was reused from the state offload and
real_dev was copied from dev, but it was never set to anything else.
Simplify the code by always using xdo.dev for policies.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 2 +-
net/xfrm/xfrm_device.c | 2 --
net/xfrm/xfrm_state.c | 2 --
3 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 626e525c0f0d..0dfbbe21936f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -1164,7 +1164,7 @@ mlx5e_ipsec_build_accel_pol_attrs(struct mlx5e_ipsec_pol_entry *pol_entry,
static int mlx5e_xfrm_add_policy(struct xfrm_policy *x,
struct netlink_ext_ack *extack)
{
- struct net_device *netdev = x->xdo.real_dev;
+ struct net_device *netdev = x->xdo.dev;
struct mlx5e_ipsec_pol_entry *pol_entry;
struct mlx5e_priv *priv;
int err;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index d62f76161d83..4f4165ff738d 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -378,7 +378,6 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
xdo->dev = dev;
netdev_tracker_alloc(dev, &xdo->dev_tracker, GFP_ATOMIC);
- xdo->real_dev = dev;
xdo->type = XFRM_DEV_OFFLOAD_PACKET;
switch (dir) {
case XFRM_POLICY_IN:
@@ -400,7 +399,6 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
err = dev->xfrmdev_ops->xdo_dev_policy_add(xp, extack);
if (err) {
xdo->dev = NULL;
- xdo->real_dev = NULL;
xdo->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
xdo->dir = 0;
netdev_put(dev, &xdo->dev_tracker);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 341d79ecb5c2..d67420e76389 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1555,7 +1555,6 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
xso->type = XFRM_DEV_OFFLOAD_PACKET;
xso->dir = xdo->dir;
xso->dev = xdo->dev;
- xso->real_dev = xdo->real_dev;
xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ;
netdev_hold(xso->dev, &xso->dev_tracker, GFP_ATOMIC);
error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL);
@@ -1563,7 +1562,6 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
xso->dir = 0;
netdev_put(xso->dev, &xso->dev_tracker);
xso->dev = NULL;
- xso->real_dev = NULL;
xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
x->km.state = XFRM_STATE_DEAD;
to_put = x;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 04/12] xfrm: Remove unneeded device check from validate_xmit_xfrm
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (2 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 03/12] xfrm: Use xdo.dev instead of xdo.real_dev Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 05/12] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Steffen Klassert
` (8 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Cosmin Ratiu <cratiu@nvidia.com>
validate_xmit_xfrm checks whether a packet already passed through it on
the master device (xso.dev) and skips processing the skb again on the
slave device (xso.real_dev).
This check was added in commit [1] to avoid tx packets on a bond device
pass through xfrm twice and get two sets of headers, but the check was
soon obsoleted by commit [2], which was added around the same time to
fix a similar but unrelated problem. Commit [3] set XFRM_XMIT only when
packets are hw offloaded.
xso.dev is usually equal to xso.real_dev, unless bonding is used, in
which case the bonding driver uses xso.real_dev to manage offloaded xfrm
states.
Since commit [3], the check added in commit [1] is unused on all cases,
since packets going through validate_xmit_xfrm twice bail out on the
check added in commit [2]. Here's a breakdown of relevant scenarios:
1. ESP offload off: validate_xmit_xfrm returns early on !xo.
2. ESP offload on, no bond: skb->dev == xso.real_dev == xso.dev.
3. ESP offload on, bond, xs on bond dev: 1st pass adds XFRM_XMIT, 2nd
pass returns early on XFRM_XMIT.
3. ESP offload on, bond, xs on slave dev: 1st pass returns early on
!xo, 2nd pass adds XFRM_XMIT.
4. ESP offload on, bond, xs on both bond AND slave dev: only 1 offload
possible in secpath. Either 1st pass adds XFRM_XMIT and 2nd pass returns
early on XFRM_XMIT, or 1st pass is sw and returns early on !xo.
6. ESP offload on, crypto fallback triggered in esp_xmit/esp6_xmit: 1st
pass does sw crypto & secpath_reset, 2nd pass returns on !xo.
This commit removes the unnecessary check, so xso.real_dev becomes what
it is in practice: a private field managed by bonding driver.
The check immediately below that can be simplified as well.
[1] commit 272c2330adc9 ("xfrm: bail early on slave pass over skb")
[2] commit 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in
IPsec crypto offload.")
[3] commit c7dbf4c08868 ("xfrm: Provide private skb extensions for
segmented and hw offloaded ESP packets")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_device.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 4f4165ff738d..0be5f7ffd019 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -145,10 +145,6 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
return NULL;
}
- /* This skb was already validated on the upper/virtual dev */
- if ((x->xso.dev != dev) && (x->xso.real_dev == dev))
- return skb;
-
local_irq_save(flags);
sd = this_cpu_ptr(&softnet_data);
err = !skb_queue_empty(&sd->xfrm_backlog);
@@ -159,8 +155,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
return skb;
}
- if (skb_is_gso(skb) && (unlikely(x->xso.dev != dev) ||
- unlikely(xmit_xfrm_check_overflow(skb)))) {
+ if (skb_is_gso(skb) && unlikely(xmit_xfrm_check_overflow(skb))) {
struct sk_buff *segs;
/* Packet got rerouted, fixup features and segment it. */
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 05/12] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (3 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 04/12] xfrm: Remove unneeded device check from validate_xmit_xfrm Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 06/12] bonding: Mark active offloaded xfrm_states Steffen Klassert
` (7 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Cosmin Ratiu <cratiu@nvidia.com>
Previously, device driver IPSec offload implementations would fall into
two categories:
1. Those that used xso.dev to determine the offload device.
2. Those that used xso.real_dev to determine the offload device.
The first category didn't work with bonding while the second did.
In a non-bonding setup the two pointers are the same.
This commit adds explicit pointers for the offload netdevice to
.xdo_dev_state_add() / .xdo_dev_state_delete() / .xdo_dev_state_free()
which eliminates the confusion and allows drivers from the first
category to work with bonding.
xso.real_dev now becomes a private pointer managed by the bonding
driver.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
Documentation/networking/xfrm_device.rst | 10 +++--
drivers/net/bonding/bond_main.c | 33 ++++++++-------
.../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 20 +++++----
.../inline_crypto/ch_ipsec/chcr_ipsec.c | 18 +++++---
.../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 41 +++++++++++--------
drivers/net/ethernet/intel/ixgbevf/ipsec.c | 21 ++++++----
.../marvell/octeontx2/nic/cn10k_ipsec.c | 18 ++++----
.../mellanox/mlx5/core/en_accel/ipsec.c | 12 +++---
.../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +++--
drivers/net/netdevsim/ipsec.c | 15 ++++---
include/linux/netdevice.h | 10 +++--
include/net/xfrm.h | 8 ++++
net/xfrm/xfrm_device.c | 4 +-
net/xfrm/xfrm_state.c | 14 ++++---
14 files changed, 136 insertions(+), 99 deletions(-)
diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst
index 7f24c09f2694..122204da0fff 100644
--- a/Documentation/networking/xfrm_device.rst
+++ b/Documentation/networking/xfrm_device.rst
@@ -65,9 +65,13 @@ Callbacks to implement
/* from include/linux/netdevice.h */
struct xfrmdev_ops {
/* Crypto and Packet offload callbacks */
- int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack);
- void (*xdo_dev_state_delete) (struct xfrm_state *x);
- void (*xdo_dev_state_free) (struct xfrm_state *x);
+ int (*xdo_dev_state_add)(struct net_device *dev,
+ struct xfrm_state *x,
+ struct netlink_ext_ack *extack);
+ void (*xdo_dev_state_delete)(struct net_device *dev,
+ struct xfrm_state *x);
+ void (*xdo_dev_state_free)(struct net_device *dev,
+ struct xfrm_state *x);
bool (*xdo_dev_offload_ok) (struct sk_buff *skb,
struct xfrm_state *x);
void (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 950d8e4d86f8..4ba525a564c5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -453,13 +453,14 @@ static struct net_device *bond_ipsec_dev(struct xfrm_state *xs)
/**
* bond_ipsec_add_sa - program device with a security association
+ * @bond_dev: pointer to the bond net device
* @xs: pointer to transformer state struct
* @extack: extack point to fill failure reason
**/
-static int bond_ipsec_add_sa(struct xfrm_state *xs,
+static int bond_ipsec_add_sa(struct net_device *bond_dev,
+ struct xfrm_state *xs,
struct netlink_ext_ack *extack)
{
- struct net_device *bond_dev = xs->xso.dev;
struct net_device *real_dev;
netdevice_tracker tracker;
struct bond_ipsec *ipsec;
@@ -496,7 +497,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs,
}
xs->xso.real_dev = real_dev;
- err = real_dev->xfrmdev_ops->xdo_dev_state_add(xs, extack);
+ err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
if (!err) {
ipsec->xs = xs;
INIT_LIST_HEAD(&ipsec->list);
@@ -540,7 +541,8 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
continue;
ipsec->xs->xso.real_dev = real_dev;
- if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) {
+ if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
+ ipsec->xs, NULL)) {
slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
ipsec->xs->xso.real_dev = NULL;
}
@@ -551,11 +553,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
/**
* bond_ipsec_del_sa - clear out this specific SA
+ * @bond_dev: pointer to the bond net device
* @xs: pointer to transformer state struct
**/
-static void bond_ipsec_del_sa(struct xfrm_state *xs)
+static void bond_ipsec_del_sa(struct net_device *bond_dev,
+ struct xfrm_state *xs)
{
- struct net_device *bond_dev = xs->xso.dev;
struct net_device *real_dev;
netdevice_tracker tracker;
struct bond_ipsec *ipsec;
@@ -587,7 +590,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
goto out;
}
- real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+ real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);
out:
netdev_put(real_dev, &tracker);
mutex_lock(&bond->ipsec_lock);
@@ -624,18 +627,20 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
slave_warn(bond_dev, real_dev,
"%s: no slave xdo_dev_state_delete\n",
__func__);
- } else {
- real_dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
- if (real_dev->xfrmdev_ops->xdo_dev_state_free)
- real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
+ continue;
}
+ real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
+ ipsec->xs);
+ if (real_dev->xfrmdev_ops->xdo_dev_state_free)
+ real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev,
+ ipsec->xs);
}
mutex_unlock(&bond->ipsec_lock);
}
-static void bond_ipsec_free_sa(struct xfrm_state *xs)
+static void bond_ipsec_free_sa(struct net_device *bond_dev,
+ struct xfrm_state *xs)
{
- struct net_device *bond_dev = xs->xso.dev;
struct net_device *real_dev;
netdevice_tracker tracker;
struct bonding *bond;
@@ -661,7 +666,7 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
if (real_dev && real_dev->xfrmdev_ops &&
real_dev->xfrmdev_ops->xdo_dev_state_free)
- real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
+ real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
out:
netdev_put(real_dev, &tracker);
}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 551c279dc14b..51395c96b2e9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -6480,10 +6480,11 @@ static const struct tlsdev_ops cxgb4_ktls_ops = {
#if IS_ENABLED(CONFIG_CHELSIO_IPSEC_INLINE)
-static int cxgb4_xfrm_add_state(struct xfrm_state *x,
+static int cxgb4_xfrm_add_state(struct net_device *dev,
+ struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
- struct adapter *adap = netdev2adap(x->xso.dev);
+ struct adapter *adap = netdev2adap(dev);
int ret;
if (!mutex_trylock(&uld_mutex)) {
@@ -6494,7 +6495,8 @@ static int cxgb4_xfrm_add_state(struct xfrm_state *x,
if (ret)
goto out_unlock;
- ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(x, extack);
+ ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(dev, x,
+ extack);
out_unlock:
mutex_unlock(&uld_mutex);
@@ -6502,9 +6504,9 @@ static int cxgb4_xfrm_add_state(struct xfrm_state *x,
return ret;
}
-static void cxgb4_xfrm_del_state(struct xfrm_state *x)
+static void cxgb4_xfrm_del_state(struct net_device *dev, struct xfrm_state *x)
{
- struct adapter *adap = netdev2adap(x->xso.dev);
+ struct adapter *adap = netdev2adap(dev);
if (!mutex_trylock(&uld_mutex)) {
dev_dbg(adap->pdev_dev,
@@ -6514,15 +6516,15 @@ static void cxgb4_xfrm_del_state(struct xfrm_state *x)
if (chcr_offload_state(adap, CXGB4_XFRMDEV_OPS))
goto out_unlock;
- adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_delete(x);
+ adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_delete(dev, x);
out_unlock:
mutex_unlock(&uld_mutex);
}
-static void cxgb4_xfrm_free_state(struct xfrm_state *x)
+static void cxgb4_xfrm_free_state(struct net_device *dev, struct xfrm_state *x)
{
- struct adapter *adap = netdev2adap(x->xso.dev);
+ struct adapter *adap = netdev2adap(dev);
if (!mutex_trylock(&uld_mutex)) {
dev_dbg(adap->pdev_dev,
@@ -6532,7 +6534,7 @@ static void cxgb4_xfrm_free_state(struct xfrm_state *x)
if (chcr_offload_state(adap, CXGB4_XFRMDEV_OPS))
goto out_unlock;
- adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_free(x);
+ adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_free(dev, x);
out_unlock:
mutex_unlock(&uld_mutex);
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
index baba96883f48..ecd9a0bd5e18 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
@@ -75,9 +75,12 @@ static int ch_ipsec_uld_state_change(void *handle, enum cxgb4_state new_state);
static int ch_ipsec_xmit(struct sk_buff *skb, struct net_device *dev);
static void *ch_ipsec_uld_add(const struct cxgb4_lld_info *infop);
static void ch_ipsec_advance_esn_state(struct xfrm_state *x);
-static void ch_ipsec_xfrm_free_state(struct xfrm_state *x);
-static void ch_ipsec_xfrm_del_state(struct xfrm_state *x);
-static int ch_ipsec_xfrm_add_state(struct xfrm_state *x,
+static void ch_ipsec_xfrm_free_state(struct net_device *dev,
+ struct xfrm_state *x);
+static void ch_ipsec_xfrm_del_state(struct net_device *dev,
+ struct xfrm_state *x);
+static int ch_ipsec_xfrm_add_state(struct net_device *dev,
+ struct xfrm_state *x,
struct netlink_ext_ack *extack);
static const struct xfrmdev_ops ch_ipsec_xfrmdev_ops = {
@@ -223,7 +226,8 @@ static int ch_ipsec_setkey(struct xfrm_state *x,
* returns 0 on success, negative error if failed to send message to FPGA
* positive error if FPGA returned a bad response
*/
-static int ch_ipsec_xfrm_add_state(struct xfrm_state *x,
+static int ch_ipsec_xfrm_add_state(struct net_device *dev,
+ struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
struct ipsec_sa_entry *sa_entry;
@@ -302,14 +306,16 @@ static int ch_ipsec_xfrm_add_state(struct xfrm_state *x,
return res;
}
-static void ch_ipsec_xfrm_del_state(struct xfrm_state *x)
+static void ch_ipsec_xfrm_del_state(struct net_device *dev,
+ struct xfrm_state *x)
{
/* do nothing */
if (!x->xso.offload_handle)
return;
}
-static void ch_ipsec_xfrm_free_state(struct xfrm_state *x)
+static void ch_ipsec_xfrm_free_state(struct net_device *dev,
+ struct xfrm_state *x)
{
struct ipsec_sa_entry *sa_entry;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 07ea1954a276..796e90d741f0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -9,7 +9,7 @@
#define IXGBE_IPSEC_KEY_BITS 160
static const char aes_gcm_name[] = "rfc4106(gcm(aes))";
-static void ixgbe_ipsec_del_sa(struct xfrm_state *xs);
+static void ixgbe_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs);
/**
* ixgbe_ipsec_set_tx_sa - set the Tx SA registers
@@ -321,7 +321,7 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
if (r->used) {
if (r->mode & IXGBE_RXTXMOD_VF)
- ixgbe_ipsec_del_sa(r->xs);
+ ixgbe_ipsec_del_sa(adapter->netdev, r->xs);
else
ixgbe_ipsec_set_rx_sa(hw, i, r->xs->id.spi,
r->key, r->salt,
@@ -330,7 +330,7 @@ void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter)
if (t->used) {
if (t->mode & IXGBE_RXTXMOD_VF)
- ixgbe_ipsec_del_sa(t->xs);
+ ixgbe_ipsec_del_sa(adapter->netdev, t->xs);
else
ixgbe_ipsec_set_tx_sa(hw, i, t->key, t->salt);
}
@@ -417,6 +417,7 @@ static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec,
/**
* ixgbe_ipsec_parse_proto_keys - find the key and salt based on the protocol
+ * @dev: pointer to net device
* @xs: pointer to xfrm_state struct
* @mykey: pointer to key array to populate
* @mysalt: pointer to salt value to populate
@@ -424,10 +425,10 @@ static struct xfrm_state *ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec,
* This copies the protocol keys and salt to our own data tables. The
* 82599 family only supports the one algorithm.
**/
-static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
+static int ixgbe_ipsec_parse_proto_keys(struct net_device *dev,
+ struct xfrm_state *xs,
u32 *mykey, u32 *mysalt)
{
- struct net_device *dev = xs->xso.real_dev;
unsigned char *key_data;
char *alg_name = NULL;
int key_len;
@@ -473,11 +474,12 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
/**
* ixgbe_ipsec_check_mgmt_ip - make sure there is no clash with mgmt IP filters
+ * @dev: pointer to net device
* @xs: pointer to transformer state struct
**/
-static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
+static int ixgbe_ipsec_check_mgmt_ip(struct net_device *dev,
+ struct xfrm_state *xs)
{
- struct net_device *dev = xs->xso.real_dev;
struct ixgbe_adapter *adapter = netdev_priv(dev);
struct ixgbe_hw *hw = &adapter->hw;
u32 mfval, manc, reg;
@@ -556,13 +558,14 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
/**
* ixgbe_ipsec_add_sa - program device with a security association
+ * @dev: pointer to device to program
* @xs: pointer to transformer state struct
* @extack: extack point to fill failure reason
**/
-static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
+static int ixgbe_ipsec_add_sa(struct net_device *dev,
+ struct xfrm_state *xs,
struct netlink_ext_ack *extack)
{
- struct net_device *dev = xs->xso.real_dev;
struct ixgbe_adapter *adapter = netdev_priv(dev);
struct ixgbe_ipsec *ipsec = adapter->ipsec;
struct ixgbe_hw *hw = &adapter->hw;
@@ -581,7 +584,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
return -EINVAL;
}
- if (ixgbe_ipsec_check_mgmt_ip(xs)) {
+ if (ixgbe_ipsec_check_mgmt_ip(dev, xs)) {
NL_SET_ERR_MSG_MOD(extack, "IPsec IP addr clash with mgmt filters");
return -EINVAL;
}
@@ -615,7 +618,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
rsa.decrypt = xs->ealg || xs->aead;
/* get the key and salt */
- ret = ixgbe_ipsec_parse_proto_keys(xs, rsa.key, &rsa.salt);
+ ret = ixgbe_ipsec_parse_proto_keys(dev, xs, rsa.key, &rsa.salt);
if (ret) {
NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Rx SA table");
return ret;
@@ -724,7 +727,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
if (xs->id.proto & IPPROTO_ESP)
tsa.encrypt = xs->ealg || xs->aead;
- ret = ixgbe_ipsec_parse_proto_keys(xs, tsa.key, &tsa.salt);
+ ret = ixgbe_ipsec_parse_proto_keys(dev, xs, tsa.key, &tsa.salt);
if (ret) {
NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Tx SA table");
memset(&tsa, 0, sizeof(tsa));
@@ -752,11 +755,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs,
/**
* ixgbe_ipsec_del_sa - clear out this specific SA
+ * @dev: pointer to device to program
* @xs: pointer to transformer state struct
**/
-static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
+static void ixgbe_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs)
{
- struct net_device *dev = xs->xso.real_dev;
struct ixgbe_adapter *adapter = netdev_priv(dev);
struct ixgbe_ipsec *ipsec = adapter->ipsec;
struct ixgbe_hw *hw = &adapter->hw;
@@ -841,7 +844,8 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf)
continue;
if (ipsec->rx_tbl[i].mode & IXGBE_RXTXMOD_VF &&
ipsec->rx_tbl[i].vf == vf)
- ixgbe_ipsec_del_sa(ipsec->rx_tbl[i].xs);
+ ixgbe_ipsec_del_sa(adapter->netdev,
+ ipsec->rx_tbl[i].xs);
}
/* search tx sa table */
@@ -850,7 +854,8 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf)
continue;
if (ipsec->tx_tbl[i].mode & IXGBE_RXTXMOD_VF &&
ipsec->tx_tbl[i].vf == vf)
- ixgbe_ipsec_del_sa(ipsec->tx_tbl[i].xs);
+ ixgbe_ipsec_del_sa(adapter->netdev,
+ ipsec->tx_tbl[i].xs);
}
}
@@ -930,7 +935,7 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
memcpy(xs->aead->alg_name, aes_gcm_name, sizeof(aes_gcm_name));
/* set up the HW offload */
- err = ixgbe_ipsec_add_sa(xs, NULL);
+ err = ixgbe_ipsec_add_sa(adapter->netdev, xs, NULL);
if (err)
goto err_aead;
@@ -1034,7 +1039,7 @@ int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
xs = ipsec->tx_tbl[sa_idx].xs;
}
- ixgbe_ipsec_del_sa(xs);
+ ixgbe_ipsec_del_sa(adapter->netdev, xs);
/* remove the xs that was made-up in the add request */
kfree_sensitive(xs);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 8ba037e3d9c2..65580b9cb06f 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -201,6 +201,7 @@ struct xfrm_state *ixgbevf_ipsec_find_rx_state(struct ixgbevf_ipsec *ipsec,
/**
* ixgbevf_ipsec_parse_proto_keys - find the key and salt based on the protocol
+ * @dev: pointer to net device to program
* @xs: pointer to xfrm_state struct
* @mykey: pointer to key array to populate
* @mysalt: pointer to salt value to populate
@@ -208,10 +209,10 @@ struct xfrm_state *ixgbevf_ipsec_find_rx_state(struct ixgbevf_ipsec *ipsec,
* This copies the protocol keys and salt to our own data tables. The
* 82599 family only supports the one algorithm.
**/
-static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs,
+static int ixgbevf_ipsec_parse_proto_keys(struct net_device *dev,
+ struct xfrm_state *xs,
u32 *mykey, u32 *mysalt)
{
- struct net_device *dev = xs->xso.real_dev;
unsigned char *key_data;
char *alg_name = NULL;
int key_len;
@@ -256,13 +257,14 @@ static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs,
/**
* ixgbevf_ipsec_add_sa - program device with a security association
+ * @dev: pointer to net device to program
* @xs: pointer to transformer state struct
* @extack: extack point to fill failure reason
**/
-static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
+static int ixgbevf_ipsec_add_sa(struct net_device *dev,
+ struct xfrm_state *xs,
struct netlink_ext_ack *extack)
{
- struct net_device *dev = xs->xso.real_dev;
struct ixgbevf_adapter *adapter;
struct ixgbevf_ipsec *ipsec;
u16 sa_idx;
@@ -310,7 +312,8 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
rsa.decrypt = xs->ealg || xs->aead;
/* get the key and salt */
- ret = ixgbevf_ipsec_parse_proto_keys(xs, rsa.key, &rsa.salt);
+ ret = ixgbevf_ipsec_parse_proto_keys(dev, xs, rsa.key,
+ &rsa.salt);
if (ret) {
NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Rx SA table");
return ret;
@@ -363,7 +366,8 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
if (xs->id.proto & IPPROTO_ESP)
tsa.encrypt = xs->ealg || xs->aead;
- ret = ixgbevf_ipsec_parse_proto_keys(xs, tsa.key, &tsa.salt);
+ ret = ixgbevf_ipsec_parse_proto_keys(dev, xs, tsa.key,
+ &tsa.salt);
if (ret) {
NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for Tx SA table");
memset(&tsa, 0, sizeof(tsa));
@@ -388,11 +392,12 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs,
/**
* ixgbevf_ipsec_del_sa - clear out this specific SA
+ * @dev: pointer to net device to program
* @xs: pointer to transformer state struct
**/
-static void ixgbevf_ipsec_del_sa(struct xfrm_state *xs)
+static void ixgbevf_ipsec_del_sa(struct net_device *dev,
+ struct xfrm_state *xs)
{
- struct net_device *dev = xs->xso.real_dev;
struct ixgbevf_adapter *adapter;
struct ixgbevf_ipsec *ipsec;
u16 sa_idx;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
index fc59e50bafce..a6500e3673f2 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
@@ -663,10 +663,10 @@ static int cn10k_ipsec_inb_add_state(struct xfrm_state *x,
return -EOPNOTSUPP;
}
-static int cn10k_ipsec_outb_add_state(struct xfrm_state *x,
+static int cn10k_ipsec_outb_add_state(struct net_device *dev,
+ struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
- struct net_device *netdev = x->xso.dev;
struct cn10k_tx_sa_s *sa_entry;
struct qmem *sa_info;
struct otx2_nic *pf;
@@ -676,7 +676,7 @@ static int cn10k_ipsec_outb_add_state(struct xfrm_state *x,
if (err)
return err;
- pf = netdev_priv(netdev);
+ pf = netdev_priv(dev);
err = qmem_alloc(pf->dev, &sa_info, pf->ipsec.sa_size, OTX2_ALIGN);
if (err)
@@ -700,18 +700,18 @@ static int cn10k_ipsec_outb_add_state(struct xfrm_state *x,
return 0;
}
-static int cn10k_ipsec_add_state(struct xfrm_state *x,
+static int cn10k_ipsec_add_state(struct net_device *dev,
+ struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
if (x->xso.dir == XFRM_DEV_OFFLOAD_IN)
return cn10k_ipsec_inb_add_state(x, extack);
else
- return cn10k_ipsec_outb_add_state(x, extack);
+ return cn10k_ipsec_outb_add_state(dev, x, extack);
}
-static void cn10k_ipsec_del_state(struct xfrm_state *x)
+static void cn10k_ipsec_del_state(struct net_device *dev, struct xfrm_state *x)
{
- struct net_device *netdev = x->xso.dev;
struct cn10k_tx_sa_s *sa_entry;
struct qmem *sa_info;
struct otx2_nic *pf;
@@ -720,7 +720,7 @@ static void cn10k_ipsec_del_state(struct xfrm_state *x)
if (x->xso.dir == XFRM_DEV_OFFLOAD_IN)
return;
- pf = netdev_priv(netdev);
+ pf = netdev_priv(dev);
sa_info = (struct qmem *)x->xso.offload_handle;
sa_entry = (struct cn10k_tx_sa_s *)sa_info->base;
@@ -732,7 +732,7 @@ static void cn10k_ipsec_del_state(struct xfrm_state *x)
err = cn10k_outb_write_sa(pf, sa_info);
if (err)
- netdev_err(netdev, "Error (%d) deleting SA\n", err);
+ netdev_err(dev, "Error (%d) deleting SA\n", err);
x->xso.offload_handle = 0;
qmem_free(pf->dev, sa_info);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 0dfbbe21936f..77f61cd28a79 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -689,17 +689,17 @@ static int mlx5e_ipsec_create_dwork(struct mlx5e_ipsec_sa_entry *sa_entry)
return 0;
}
-static int mlx5e_xfrm_add_state(struct xfrm_state *x,
+static int mlx5e_xfrm_add_state(struct net_device *dev,
+ struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
struct mlx5e_ipsec_sa_entry *sa_entry = NULL;
- struct net_device *netdev = x->xso.real_dev;
struct mlx5e_ipsec *ipsec;
struct mlx5e_priv *priv;
gfp_t gfp;
int err;
- priv = netdev_priv(netdev);
+ priv = netdev_priv(dev);
if (!priv->ipsec)
return -EOPNOTSUPP;
@@ -710,7 +710,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
return -ENOMEM;
sa_entry->x = x;
- sa_entry->dev = netdev;
+ sa_entry->dev = dev;
sa_entry->ipsec = ipsec;
/* Check if this SA is originated from acquire flow temporary SA */
if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
@@ -807,7 +807,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x,
return err;
}
-static void mlx5e_xfrm_del_state(struct xfrm_state *x)
+static void mlx5e_xfrm_del_state(struct net_device *dev, struct xfrm_state *x)
{
struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
struct mlx5e_ipsec *ipsec = sa_entry->ipsec;
@@ -820,7 +820,7 @@ static void mlx5e_xfrm_del_state(struct xfrm_state *x)
WARN_ON(old != sa_entry);
}
-static void mlx5e_xfrm_free_state(struct xfrm_state *x)
+static void mlx5e_xfrm_free_state(struct net_device *dev, struct xfrm_state *x)
{
struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
struct mlx5e_ipsec *ipsec = sa_entry->ipsec;
diff --git a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
index 671af5d4c5d2..9e7c285eaa6b 100644
--- a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
+++ b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
@@ -266,17 +266,17 @@ static void set_sha2_512hmac(struct nfp_ipsec_cfg_add_sa *cfg, int *trunc_len)
}
}
-static int nfp_net_xfrm_add_state(struct xfrm_state *x,
+static int nfp_net_xfrm_add_state(struct net_device *dev,
+ struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
- struct net_device *netdev = x->xso.real_dev;
struct nfp_ipsec_cfg_mssg msg = {};
int i, key_len, trunc_len, err = 0;
struct nfp_ipsec_cfg_add_sa *cfg;
struct nfp_net *nn;
unsigned int saidx;
- nn = netdev_priv(netdev);
+ nn = netdev_priv(dev);
cfg = &msg.cfg_add_sa;
/* General */
@@ -546,17 +546,16 @@ static int nfp_net_xfrm_add_state(struct xfrm_state *x,
return 0;
}
-static void nfp_net_xfrm_del_state(struct xfrm_state *x)
+static void nfp_net_xfrm_del_state(struct net_device *dev, struct xfrm_state *x)
{
struct nfp_ipsec_cfg_mssg msg = {
.cmd = NFP_IPSEC_CFG_MSSG_INV_SA,
.sa_idx = x->xso.offload_handle - 1,
};
- struct net_device *netdev = x->xso.real_dev;
struct nfp_net *nn;
int err;
- nn = netdev_priv(netdev);
+ nn = netdev_priv(dev);
err = nfp_net_sched_mbox_amsg_work(nn, NFP_NET_CFG_MBOX_CMD_IPSEC, &msg,
sizeof(msg), nfp_net_ipsec_cfg);
if (err)
diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c
index d88bdb9a1717..47cdee5577d4 100644
--- a/drivers/net/netdevsim/ipsec.c
+++ b/drivers/net/netdevsim/ipsec.c
@@ -85,11 +85,11 @@ static int nsim_ipsec_find_empty_idx(struct nsim_ipsec *ipsec)
return -ENOSPC;
}
-static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs,
+static int nsim_ipsec_parse_proto_keys(struct net_device *dev,
+ struct xfrm_state *xs,
u32 *mykey, u32 *mysalt)
{
const char aes_gcm_name[] = "rfc4106(gcm(aes))";
- struct net_device *dev = xs->xso.real_dev;
unsigned char *key_data;
char *alg_name = NULL;
int key_len;
@@ -129,17 +129,16 @@ static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs,
return 0;
}
-static int nsim_ipsec_add_sa(struct xfrm_state *xs,
+static int nsim_ipsec_add_sa(struct net_device *dev,
+ struct xfrm_state *xs,
struct netlink_ext_ack *extack)
{
struct nsim_ipsec *ipsec;
- struct net_device *dev;
struct netdevsim *ns;
struct nsim_sa sa;
u16 sa_idx;
int ret;
- dev = xs->xso.real_dev;
ns = netdev_priv(dev);
ipsec = &ns->ipsec;
@@ -174,7 +173,7 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs,
sa.crypt = xs->ealg || xs->aead;
/* get the key and salt */
- ret = nsim_ipsec_parse_proto_keys(xs, sa.key, &sa.salt);
+ ret = nsim_ipsec_parse_proto_keys(dev, xs, sa.key, &sa.salt);
if (ret) {
NL_SET_ERR_MSG_MOD(extack, "Failed to get key data for SA table");
return ret;
@@ -200,9 +199,9 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs,
return 0;
}
-static void nsim_ipsec_del_sa(struct xfrm_state *xs)
+static void nsim_ipsec_del_sa(struct net_device *dev, struct xfrm_state *xs)
{
- struct netdevsim *ns = netdev_priv(xs->xso.real_dev);
+ struct netdevsim *ns = netdev_priv(dev);
struct nsim_ipsec *ipsec = &ns->ipsec;
u16 sa_idx;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d8544f6a680c..88dfb8aeed3c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1013,9 +1013,13 @@ struct netdev_bpf {
#ifdef CONFIG_XFRM_OFFLOAD
struct xfrmdev_ops {
- int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack);
- void (*xdo_dev_state_delete) (struct xfrm_state *x);
- void (*xdo_dev_state_free) (struct xfrm_state *x);
+ int (*xdo_dev_state_add)(struct net_device *dev,
+ struct xfrm_state *x,
+ struct netlink_ext_ack *extack);
+ void (*xdo_dev_state_delete)(struct net_device *dev,
+ struct xfrm_state *x);
+ void (*xdo_dev_state_free)(struct net_device *dev,
+ struct xfrm_state *x);
bool (*xdo_dev_offload_ok) (struct sk_buff *skb,
struct xfrm_state *x);
void (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 39365fd2ea17..3d2f6c879311 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -147,8 +147,16 @@ enum {
};
struct xfrm_dev_offload {
+ /* The device for this offload.
+ * Device drivers should not use this directly, as that will prevent
+ * them from working with bonding device. Instead, the device passed
+ * to the add/delete callbacks should be used.
+ */
struct net_device *dev;
netdevice_tracker dev_tracker;
+ /* This is a private pointer used by the bonding driver.
+ * Device drivers should not use it.
+ */
struct net_device *real_dev;
unsigned long offload_handle;
u8 dir : 2;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 0be5f7ffd019..3be0139373f7 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -309,7 +309,6 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
xso->dev = dev;
netdev_tracker_alloc(dev, &xso->dev_tracker, GFP_ATOMIC);
- xso->real_dev = dev;
if (xuo->flags & XFRM_OFFLOAD_INBOUND)
xso->dir = XFRM_DEV_OFFLOAD_IN;
@@ -321,11 +320,10 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
else
xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
- err = dev->xfrmdev_ops->xdo_dev_state_add(x, extack);
+ err = dev->xfrmdev_ops->xdo_dev_state_add(dev, x, extack);
if (err) {
xso->dev = NULL;
xso->dir = 0;
- xso->real_dev = NULL;
netdev_put(dev, &xso->dev_tracker);
xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index d67420e76389..3c2e27e5a1e3 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -767,7 +767,7 @@ void xfrm_dev_state_delete(struct xfrm_state *x)
struct net_device *dev = READ_ONCE(xso->dev);
if (dev) {
- dev->xfrmdev_ops->xdo_dev_state_delete(x);
+ dev->xfrmdev_ops->xdo_dev_state_delete(dev, x);
spin_lock_bh(&xfrm_state_dev_gc_lock);
hlist_add_head(&x->dev_gclist, &xfrm_state_dev_gc_list);
spin_unlock_bh(&xfrm_state_dev_gc_lock);
@@ -789,7 +789,7 @@ void xfrm_dev_state_free(struct xfrm_state *x)
spin_unlock_bh(&xfrm_state_dev_gc_lock);
if (dev->xfrmdev_ops->xdo_dev_state_free)
- dev->xfrmdev_ops->xdo_dev_state_free(x);
+ dev->xfrmdev_ops->xdo_dev_state_free(dev, x);
WRITE_ONCE(xso->dev, NULL);
xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
netdev_put(dev, &xso->dev_tracker);
@@ -1551,16 +1551,18 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) {
struct xfrm_dev_offload *xdo = &pol->xdo;
struct xfrm_dev_offload *xso = &x->xso;
+ struct net_device *dev = xdo->dev;
xso->type = XFRM_DEV_OFFLOAD_PACKET;
xso->dir = xdo->dir;
- xso->dev = xdo->dev;
+ xso->dev = dev;
xso->flags = XFRM_DEV_OFFLOAD_FLAG_ACQ;
- netdev_hold(xso->dev, &xso->dev_tracker, GFP_ATOMIC);
- error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL);
+ netdev_hold(dev, &xso->dev_tracker, GFP_ATOMIC);
+ error = dev->xfrmdev_ops->xdo_dev_state_add(dev, x,
+ NULL);
if (error) {
xso->dir = 0;
- netdev_put(xso->dev, &xso->dev_tracker);
+ netdev_put(dev, &xso->dev_tracker);
xso->dev = NULL;
xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
x->km.state = XFRM_STATE_DEAD;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 06/12] bonding: Mark active offloaded xfrm_states
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (4 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 05/12] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 07/12] bonding: Fix multiple long standing offload races Steffen Klassert
` (6 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Cosmin Ratiu <cratiu@nvidia.com>
When the active link is changed for a bond device, the existing xfrm
states need to be migrated over to the new link. This is done with:
- bond_ipsec_del_sa_all() goes through the offloaded states list and
removes all of them from hw.
- bond_ipsec_add_sa_all() re-offloads all states to the new device.
But because the offload status of xfrm states isn't marked in any way,
there can be bugs.
When all bond links are down, bond_ipsec_del_sa_all() unoffloads
everything from the previous active link. If the same link then comes
back up, nothing gets reoffloaded by bond_ipsec_add_sa_all().
This results in a stack trace like this a bit later when user space
removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to
remove a rule that's no longer offloaded:
[] Call Trace:
[] <TASK>
[] ? __warn+0x7d/0x110
[] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
[] ? report_bug+0x16d/0x180
[] ? handle_bug+0x4f/0x90
[] ? exc_invalid_op+0x14/0x70
[] ? asm_exc_invalid_op+0x16/0x20
[] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core]
[] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
[] bond_ipsec_del_sa+0x1ab/0x200 [bonding]
[] xfrm_dev_state_delete+0x1f/0x60
[] __xfrm_state_delete+0x196/0x200
[] xfrm_state_delete+0x21/0x40
[] xfrm_del_sa+0x69/0x110
[] xfrm_user_rcv_msg+0x11d/0x300
[] ? release_pages+0xca/0x140
[] ? copy_to_user_tmpl.part.0+0x110/0x110
[] netlink_rcv_skb+0x54/0x100
[] xfrm_netlink_rcv+0x31/0x40
[] netlink_unicast+0x1fc/0x2d0
[] netlink_sendmsg+0x1e4/0x410
[] __sock_sendmsg+0x38/0x60
[] sock_write_iter+0x94/0xf0
[] vfs_write+0x338/0x3f0
[] ksys_write+0xba/0xd0
[] do_syscall_64+0x4c/0x100
[] entry_SYSCALL_64_after_hwframe+0x4b/0x53
There's also another theoretical bug:
Calling bond_ipsec_del_sa_all() multiple times can result in corruption
in the driver implementation if the double-free isn't tolerated. This
isn't nice.
Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm
state was unoffloaded from a device, but a race with netdevsim's
.xdo_dev_offload_ok() accessing real_dev was considered a sufficient
reason to not set real_dev to NULL anymore. This unfortunately
introduced the new bugs.
Since .xdo_dev_offload_ok() was significantly refactored by [1] and
there are no more users in the stack of xso.real_dev, that
race is now gone and xs->xso.real_dev can now once again be used to
represent which device (if any) currently holds the offloaded rule.
Go one step further and set real_dev after add/before delete calls, to
catch any future driver misuses of real_dev.
[1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/
Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
drivers/net/bonding/bond_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4ba525a564c5..14f7c9712ad4 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -496,9 +496,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev,
goto out;
}
- xs->xso.real_dev = real_dev;
err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
if (!err) {
+ xs->xso.real_dev = real_dev;
ipsec->xs = xs;
INIT_LIST_HEAD(&ipsec->list);
mutex_lock(&bond->ipsec_lock);
@@ -540,12 +540,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
if (ipsec->xs->xso.real_dev == real_dev)
continue;
- ipsec->xs->xso.real_dev = real_dev;
if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
ipsec->xs, NULL)) {
slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
- ipsec->xs->xso.real_dev = NULL;
+ continue;
}
+ ipsec->xs->xso.real_dev = real_dev;
}
out:
mutex_unlock(&bond->ipsec_lock);
@@ -629,6 +629,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
__func__);
continue;
}
+ ipsec->xs->xso.real_dev = NULL;
real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
ipsec->xs);
if (real_dev->xfrmdev_ops->xdo_dev_state_free)
@@ -664,6 +665,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
WARN_ON(xs->xso.real_dev != real_dev);
+ xs->xso.real_dev = NULL;
if (real_dev && real_dev->xfrmdev_ops &&
real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 07/12] bonding: Fix multiple long standing offload races
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (5 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 06/12] bonding: Mark active offloaded xfrm_states Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 08/12] xfrm: Migrate offload configuration Steffen Klassert
` (5 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Cosmin Ratiu <cratiu@nvidia.com>
Refactor the bonding ipsec offload operations to fix a number of
long-standing control plane races between state migration and user
deletion and a few other issues.
xfrm state deletion can happen concurrently with
bond_change_active_slave() operation. This manifests itself as a
bond_ipsec_del_sa() call with x->lock held, followed by a
bond_ipsec_free_sa() a bit later from a wq. The alternate path of
these calls coming from xfrm_dev_state_flush() can't happen, as that
needs the RTNL lock and bond_change_active_slave() already holds it.
1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second
time on an xfrm state that was concurrently killed. This is bad.
2. bond_ipsec_add_sa_all() can add a state on the new device, but
pending bond_ipsec_free_sa() calls from the old device will then hit
the WARN_ON() and then, worse, call xdo_dev_state_free() on the new
device without a corresponding xdo_dev_state_delete().
3. Resolve a sleeping in atomic context introduced by the mentioned
"Fixes" commit.
bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x->lock
and check for x->km.state to help with problems 1 and 2. And since
xso.real_dev is now a private pointer managed by the bonding driver in
xfrm state, make better use of it to fully fix problems 1 and 2. In
bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the
mutex and x->lock, which makes sure that neither bond_ipsec_del_sa() nor
bond_ipsec_free_sa() could run concurrently.
Fix problem 3 by moving the list cleanup (which requires the mutex) from
bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa()
Finally, simplify bond_ipsec_del_sa() and bond_ipsec_free_sa() by using
xso->real_dev directly, since it's now protected by locks and can be
trusted to always reflect the offload device.
Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
drivers/net/bonding/bond_main.c | 82 +++++++++++++++------------------
include/net/xfrm.h | 7 ++-
2 files changed, 41 insertions(+), 48 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 14f7c9712ad4..8ed8c29659a0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -545,7 +545,20 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
continue;
}
+
+ spin_lock_bh(&ipsec->xs->lock);
+ /* xs might have been killed by the user during the migration
+ * to the new dev, but bond_ipsec_del_sa() should have done
+ * nothing, as xso.real_dev is NULL.
+ * Delete it from the device we just added it to. The pending
+ * bond_ipsec_free_sa() call will do the rest of the cleanup.
+ */
+ if (ipsec->xs->km.state == XFRM_STATE_DEAD &&
+ real_dev->xfrmdev_ops->xdo_dev_state_delete)
+ real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
+ ipsec->xs);
ipsec->xs->xso.real_dev = real_dev;
+ spin_unlock_bh(&ipsec->xs->lock);
}
out:
mutex_unlock(&bond->ipsec_lock);
@@ -560,48 +573,20 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev,
struct xfrm_state *xs)
{
struct net_device *real_dev;
- netdevice_tracker tracker;
- struct bond_ipsec *ipsec;
- struct bonding *bond;
- struct slave *slave;
- if (!bond_dev)
+ if (!bond_dev || !xs->xso.real_dev)
return;
- rcu_read_lock();
- bond = netdev_priv(bond_dev);
- slave = rcu_dereference(bond->curr_active_slave);
- real_dev = slave ? slave->dev : NULL;
- netdev_hold(real_dev, &tracker, GFP_ATOMIC);
- rcu_read_unlock();
-
- if (!slave)
- goto out;
-
- if (!xs->xso.real_dev)
- goto out;
-
- WARN_ON(xs->xso.real_dev != real_dev);
+ real_dev = xs->xso.real_dev;
if (!real_dev->xfrmdev_ops ||
!real_dev->xfrmdev_ops->xdo_dev_state_delete ||
netif_is_bond_master(real_dev)) {
slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__);
- goto out;
+ return;
}
real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);
-out:
- netdev_put(real_dev, &tracker);
- mutex_lock(&bond->ipsec_lock);
- list_for_each_entry(ipsec, &bond->ipsec_list, list) {
- if (ipsec->xs == xs) {
- list_del(&ipsec->list);
- kfree(ipsec);
- break;
- }
- }
- mutex_unlock(&bond->ipsec_lock);
}
static void bond_ipsec_del_sa_all(struct bonding *bond)
@@ -629,9 +614,15 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
__func__);
continue;
}
+
+ spin_lock_bh(&ipsec->xs->lock);
ipsec->xs->xso.real_dev = NULL;
- real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
- ipsec->xs);
+ /* Don't double delete states killed by the user. */
+ if (ipsec->xs->km.state != XFRM_STATE_DEAD)
+ real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
+ ipsec->xs);
+ spin_unlock_bh(&ipsec->xs->lock);
+
if (real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev,
ipsec->xs);
@@ -643,34 +634,33 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
struct xfrm_state *xs)
{
struct net_device *real_dev;
- netdevice_tracker tracker;
+ struct bond_ipsec *ipsec;
struct bonding *bond;
- struct slave *slave;
if (!bond_dev)
return;
- rcu_read_lock();
bond = netdev_priv(bond_dev);
- slave = rcu_dereference(bond->curr_active_slave);
- real_dev = slave ? slave->dev : NULL;
- netdev_hold(real_dev, &tracker, GFP_ATOMIC);
- rcu_read_unlock();
-
- if (!slave)
- goto out;
+ mutex_lock(&bond->ipsec_lock);
if (!xs->xso.real_dev)
goto out;
- WARN_ON(xs->xso.real_dev != real_dev);
+ real_dev = xs->xso.real_dev;
xs->xso.real_dev = NULL;
- if (real_dev && real_dev->xfrmdev_ops &&
+ if (real_dev->xfrmdev_ops &&
real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
out:
- netdev_put(real_dev, &tracker);
+ list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+ if (ipsec->xs == xs) {
+ list_del(&ipsec->list);
+ kfree(ipsec);
+ break;
+ }
+ }
+ mutex_unlock(&bond->ipsec_lock);
}
/**
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 3d2f6c879311..b7e8f3f49627 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -154,8 +154,11 @@ struct xfrm_dev_offload {
*/
struct net_device *dev;
netdevice_tracker dev_tracker;
- /* This is a private pointer used by the bonding driver.
- * Device drivers should not use it.
+ /* This is a private pointer used by the bonding driver (and eventually
+ * should be moved there). Device drivers should not use it.
+ * Protected by xfrm_state.lock AND bond.ipsec_lock in most cases,
+ * except in the .xdo_dev_state_del() flow, where only xfrm_state.lock
+ * is held.
*/
struct net_device *real_dev;
unsigned long offload_handle;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 08/12] xfrm: Migrate offload configuration
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (6 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 07/12] bonding: Fix multiple long standing offload races Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 09/12] xfrm: Refactor migration setup during the cloning process Steffen Klassert
` (4 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Chiachang Wang <chiachangwang@google.com>
Add hardware offload configuration to XFRM_MSG_MIGRATE
using an option netlink attribute XFRMA_OFFLOAD_DEV.
In the existing xfrm_state_migrate(), the xfrm_init_state()
is called assuming no hardware offload by default. Even the
original xfrm_state is configured with offload, the setting will
be reset. If the device is configured with hardware offload,
it's reasonable to allow the device to maintain its hardware
offload mode. But the device will end up with offload disabled
after receiving a migration event when the device migrates the
connection from one netdev to another one.
The devices that support migration may work with different
underlying networks, such as mobile devices. The hardware setting
should be forwarded to the different netdev based on the
migration configuration. This change provides the capability
for user space to migrate from one netdev to another.
Test: Tested with kernel test in the Android tree located
in https://android.googlesource.com/kernel/tests/
The xfrm_tunnel_test.py under the tests folder in
particular.
Signed-off-by: Chiachang Wang <chiachangwang@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/xfrm.h | 8 ++++++--
net/key/af_key.c | 2 +-
net/xfrm/xfrm_policy.c | 4 ++--
net/xfrm/xfrm_state.c | 9 ++++++++-
net/xfrm/xfrm_user.c | 15 ++++++++++++---
5 files changed, 29 insertions(+), 9 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b7e8f3f49627..466423a1a70a 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1904,12 +1904,16 @@ struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *n
u32 if_id);
struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
struct xfrm_migrate *m,
- struct xfrm_encap_tmpl *encap);
+ struct xfrm_encap_tmpl *encap,
+ struct net *net,
+ struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack);
int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct xfrm_migrate *m, int num_bundles,
struct xfrm_kmaddress *k, struct net *net,
struct xfrm_encap_tmpl *encap, u32 if_id,
- struct netlink_ext_ack *extack);
+ struct netlink_ext_ack *extack,
+ struct xfrm_user_offload *xuo);
#endif
int km_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr, __be16 sport);
diff --git a/net/key/af_key.c b/net/key/af_key.c
index c56bb4f451e6..efc2a91f4c48 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -2630,7 +2630,7 @@ static int pfkey_migrate(struct sock *sk, struct sk_buff *skb,
}
return xfrm_migrate(&sel, dir, XFRM_POLICY_TYPE_MAIN, m, i,
- kma ? &k : NULL, net, NULL, 0, NULL);
+ kma ? &k : NULL, net, NULL, 0, NULL, NULL);
out:
return err;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 143ac3aa7537..7200ba8de936 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -4630,7 +4630,7 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct xfrm_migrate *m, int num_migrate,
struct xfrm_kmaddress *k, struct net *net,
struct xfrm_encap_tmpl *encap, u32 if_id,
- struct netlink_ext_ack *extack)
+ struct netlink_ext_ack *extack, struct xfrm_user_offload *xuo)
{
int i, err, nx_cur = 0, nx_new = 0;
struct xfrm_policy *pol = NULL;
@@ -4663,7 +4663,7 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
if ((x = xfrm_migrate_state_find(mp, net, if_id))) {
x_cur[nx_cur] = x;
nx_cur++;
- xc = xfrm_state_migrate(x, mp, encap);
+ xc = xfrm_state_migrate(x, mp, encap, net, xuo, extack);
if (xc) {
x_new[nx_new] = xc;
nx_new++;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 3c2e27e5a1e3..1c5fe1b0b6d6 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2120,7 +2120,10 @@ EXPORT_SYMBOL(xfrm_migrate_state_find);
struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
struct xfrm_migrate *m,
- struct xfrm_encap_tmpl *encap)
+ struct xfrm_encap_tmpl *encap,
+ struct net *net,
+ struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack)
{
struct xfrm_state *xc;
@@ -2136,6 +2139,10 @@ struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
memcpy(&xc->id.daddr, &m->new_daddr, sizeof(xc->id.daddr));
memcpy(&xc->props.saddr, &m->new_saddr, sizeof(xc->props.saddr));
+ /* configure the hardware if offload is requested */
+ if (xuo && xfrm_dev_state_add(net, xc, xuo, extack))
+ goto error;
+
/* add state */
if (xfrm_addr_equal(&x->id.daddr, &m->new_daddr, m->new_family)) {
/* a care is needed when the destination address of the
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 0a3d3f3ae5a3..ae8e06573639 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3069,6 +3069,7 @@ static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
int n = 0;
struct net *net = sock_net(skb->sk);
struct xfrm_encap_tmpl *encap = NULL;
+ struct xfrm_user_offload *xuo = NULL;
u32 if_id = 0;
if (!attrs[XFRMA_MIGRATE]) {
@@ -3099,11 +3100,19 @@ static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
if (attrs[XFRMA_IF_ID])
if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+ if (attrs[XFRMA_OFFLOAD_DEV]) {
+ xuo = kmemdup(nla_data(attrs[XFRMA_OFFLOAD_DEV]),
+ sizeof(*xuo), GFP_KERNEL);
+ if (!xuo) {
+ err = -ENOMEM;
+ goto error;
+ }
+ }
err = xfrm_migrate(&pi->sel, pi->dir, type, m, n, kmp, net, encap,
- if_id, extack);
-
+ if_id, extack, xuo);
+error:
kfree(encap);
-
+ kfree(xuo);
return err;
}
#else
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 09/12] xfrm: Refactor migration setup during the cloning process
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (7 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 08/12] xfrm: Migrate offload configuration Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 10/12] xfrm: validate assignment of maximal possible SEQ number Steffen Klassert
` (3 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Chiachang Wang <chiachangwang@google.com>
Previously, migration related setup, such as updating family,
destination address, and source address, was performed after
the clone was created in `xfrm_state_migrate`. This change
moves this setup into the cloning function itself, improving
code locality and reducing redundancy.
The `xfrm_state_clone_and_setup` function now conditionally
applies the migration parameters from struct xfrm_migrate
if it is provided. This allows the function to be used both
for simple cloning and for cloning with migration setup.
Test: Tested with kernel test in the Android tree located
in https://android.googlesource.com/kernel/tests/
The xfrm_tunnel_test.py under the tests folder in
particular.
Signed-off-by: Chiachang Wang <chiachangwang@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_state.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1c5fe1b0b6d6..4bf7a4a8f9d4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1958,8 +1958,9 @@ static inline int clone_security(struct xfrm_state *x, struct xfrm_sec_ctx *secu
return 0;
}
-static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
- struct xfrm_encap_tmpl *encap)
+static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
+ struct xfrm_encap_tmpl *encap,
+ struct xfrm_migrate *m)
{
struct net *net = xs_net(orig);
struct xfrm_state *x = xfrm_state_alloc(net);
@@ -2058,6 +2059,11 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
goto error;
}
+
+ x->props.family = m->new_family;
+ memcpy(&x->id.daddr, &m->new_daddr, sizeof(x->id.daddr));
+ memcpy(&x->props.saddr, &m->new_saddr, sizeof(x->props.saddr));
+
return x;
error:
@@ -2127,18 +2133,13 @@ struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
{
struct xfrm_state *xc;
- xc = xfrm_state_clone(x, encap);
+ xc = xfrm_state_clone_and_setup(x, encap, m);
if (!xc)
return NULL;
- xc->props.family = m->new_family;
-
if (xfrm_init_state(xc) < 0)
goto error;
- memcpy(&xc->id.daddr, &m->new_daddr, sizeof(xc->id.daddr));
- memcpy(&xc->props.saddr, &m->new_saddr, sizeof(xc->props.saddr));
-
/* configure the hardware if offload is requested */
if (xuo && xfrm_dev_state_add(net, xc, xuo, extack))
goto error;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 10/12] xfrm: validate assignment of maximal possible SEQ number
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (8 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 09/12] xfrm: Refactor migration setup during the cloning process Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 11/12] xfrm: prevent configuration of interface index when offload is used Steffen Klassert
` (2 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Leon Romanovsky <leonro@nvidia.com>
Users can set any seq/seq_hi/oseq/oseq_hi values. The XFRM core code
doesn't prevent from them to set even 0xFFFFFFFF, however this value
will cause for traffic drop.
Is is happening because SEQ numbers here mean that packet with such
number was processed and next number should be sent on the wire. In this
case, the next number will be 0, and it means overflow which causes to
(expected) packet drops.
While it can be considered as misconfiguration and handled by XFRM
datapath in the same manner as any other SEQ number, let's add
validation to easy for packet offloads implementations which need to
configure HW with next SEQ to send and not with current SEQ like it is
done in core code.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_user.c | 52 +++++++++++++++++++++++++++++++++++---------
1 file changed, 42 insertions(+), 10 deletions(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index ae8e06573639..59f258daf830 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -178,11 +178,27 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
"Replay seq and seq_hi should be 0 for output SA");
return -EINVAL;
}
- if (rs->oseq_hi && !(p->flags & XFRM_STATE_ESN)) {
- NL_SET_ERR_MSG(
- extack,
- "Replay oseq_hi should be 0 in non-ESN mode for output SA");
- return -EINVAL;
+
+ if (!(p->flags & XFRM_STATE_ESN)) {
+ if (rs->oseq_hi) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay oseq_hi should be 0 in non-ESN mode for output SA");
+ return -EINVAL;
+ }
+ if (rs->oseq == U32_MAX) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay oseq should be less than 0xFFFFFFFF in non-ESN mode for output SA");
+ return -EINVAL;
+ }
+ } else {
+ if (rs->oseq == U32_MAX && rs->oseq_hi == U32_MAX) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay oseq and oseq_hi should be less than 0xFFFFFFFF for output SA");
+ return -EINVAL;
+ }
}
if (rs->bmp_len) {
NL_SET_ERR_MSG(extack, "Replay bmp_len should 0 for output SA");
@@ -196,11 +212,27 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
"Replay oseq and oseq_hi should be 0 for input SA");
return -EINVAL;
}
- if (rs->seq_hi && !(p->flags & XFRM_STATE_ESN)) {
- NL_SET_ERR_MSG(
- extack,
- "Replay seq_hi should be 0 in non-ESN mode for input SA");
- return -EINVAL;
+ if (!(p->flags & XFRM_STATE_ESN)) {
+ if (rs->seq_hi) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay seq_hi should be 0 in non-ESN mode for input SA");
+ return -EINVAL;
+ }
+
+ if (rs->seq == U32_MAX) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay seq should be less than 0xFFFFFFFF in non-ESN mode for input SA");
+ return -EINVAL;
+ }
+ } else {
+ if (rs->seq == U32_MAX && rs->seq_hi == U32_MAX) {
+ NL_SET_ERR_MSG(
+ extack,
+ "Replay seq and seq_hi should be less than 0xFFFFFFFF for input SA");
+ return -EINVAL;
+ }
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 11/12] xfrm: prevent configuration of interface index when offload is used
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (9 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 10/12] xfrm: validate assignment of maximal possible SEQ number Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 7:56 ` [PATCH 12/12] xfrm: use kfree_sensitive() for SA secret zeroization Steffen Klassert
2025-05-23 10:36 ` [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Leon Romanovsky <leonro@nvidia.com>
Both packet and crypto offloads perform decryption while packet is
arriving to the HW from the wire. It means that there is no possible
way to perform lookup on XFRM if_id as it can't be set to be "before' HW.
So instead of silently ignore this configuration, let's warn users about
misconfiguration.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_device.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 3be0139373f7..81fd486b5e56 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -251,6 +251,11 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
return -EINVAL;
}
+ if (xuo->flags & XFRM_OFFLOAD_INBOUND && x->if_id) {
+ NL_SET_ERR_MSG(extack, "XFRM if_id is not supported in RX path");
+ return -EINVAL;
+ }
+
is_packet_offload = xuo->flags & XFRM_OFFLOAD_PACKET;
/* We don't yet support TFC padding. */
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 12/12] xfrm: use kfree_sensitive() for SA secret zeroization
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (10 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 11/12] xfrm: prevent configuration of interface index when offload is used Steffen Klassert
@ 2025-05-23 7:56 ` Steffen Klassert
2025-05-23 10:36 ` [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 7:56 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, Steffen Klassert, netdev
From: Zilin Guan <zilin@seu.edu.cn>
High-level copy_to_user_* APIs already redact SA secret fields when
redaction is enabled, but the state teardown path still freed aead,
aalg and ealg structs with plain kfree(), which does not clear memory
before deallocation. This can leave SA keys and other confidential
data in memory, risking exposure via post-free vulnerabilities.
Since this path is outside the packet fast path, the cost of zeroization
is acceptable and prevents any residual key material. This patch
replaces those kfree() calls unconditionally with kfree_sensitive(),
which zeroizes the entire buffer before freeing.
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_state.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 4bf7a4a8f9d4..5e1c736ea708 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -599,9 +599,9 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)
x->mode_cbs->destroy_state(x);
hrtimer_cancel(&x->mtimer);
timer_delete_sync(&x->rtimer);
- kfree(x->aead);
- kfree(x->aalg);
- kfree(x->ealg);
+ kfree_sensitive(x->aead);
+ kfree_sensitive(x->aalg);
+ kfree_sensitive(x->ealg);
kfree(x->calg);
kfree(x->encap);
kfree(x->coaddr);
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
` (11 preceding siblings ...)
2025-05-23 7:56 ` [PATCH 12/12] xfrm: use kfree_sensitive() for SA secret zeroization Steffen Klassert
@ 2025-05-23 10:36 ` Steffen Klassert
12 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2025-05-23 10:36 UTC (permalink / raw)
To: David Miller, Jakub Kicinski; +Cc: Herbert Xu, netdev
On Fri, May 23, 2025 at 09:55:59AM +0200, Steffen Klassert wrote:
> 1) Remove some unnecessary strscpy_pad() size arguments.
> From Thorsten Blum.
>
> 2) Correct use of xso.real_dev on bonding offloads.
> Patchset from Cosmin Ratiu.
>
> 3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
> From Chiachang Wang.
>
> 4) Refactor migration setup during cloning. This was
> done after the clone was created. Now it is done
> in the cloning function itself.
> From Chiachang Wang.
>
> 5) Validate assignment of maximal possible SEQ number.
> Prevent from setting to the maximum sequrnce number
> as this would cause for traffic drop.
> From Leon Romanovsky.
>
> 6) Prevent configuration of interface index when offload
> is used. Hardware can't handle this case.i
> From Leon Romanovsky.
>
> 7) Always use kfree_sensitive() for SA secret zeroization.
> From Zilin Guan.
>
> Please pull or let me know if there are problems.
>
> Thanks!
I forgot to mention a merge conflict between
fd5ef5203ce6 ("ixgbe: wrap netdev_priv() usage")
from the net-next tree and commit:
43eca05b6a3b ("xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}")
from the ipsec-next tree.
It can be solved as done in linux-next.
Thanks!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 01/12] xfrm: Remove unnecessary strscpy_pad() size arguments
2025-05-23 7:56 ` [PATCH 01/12] xfrm: Remove unnecessary strscpy_pad() size arguments Steffen Klassert
@ 2025-05-26 16:50 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-05-26 16:50 UTC (permalink / raw)
To: Steffen Klassert; +Cc: davem, kuba, herbert, netdev
Hello:
This series was applied to netdev/net-next.git (main)
by Steffen Klassert <steffen.klassert@secunet.com>:
On Fri, 23 May 2025 09:56:00 +0200 you wrote:
> From: Thorsten Blum <thorsten.blum@linux.dev>
>
> If the destination buffer has a fixed length, strscpy_pad()
> automatically determines its size using sizeof() when the argument is
> omitted. This makes the explicit sizeof() calls unnecessary - remove
> them.
>
> [...]
Here is the summary with links:
- [01/12] xfrm: Remove unnecessary strscpy_pad() size arguments
https://git.kernel.org/netdev/net-next/c/20eb35da409f
- [02/12] net/mlx5: Avoid using xso.real_dev unnecessarily
https://git.kernel.org/netdev/net-next/c/d79444e8c3d4
- [03/12] xfrm: Use xdo.dev instead of xdo.real_dev
https://git.kernel.org/netdev/net-next/c/25ac138f58e7
- [04/12] xfrm: Remove unneeded device check from validate_xmit_xfrm
https://git.kernel.org/netdev/net-next/c/d53dda291bbd
- [05/12] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
https://git.kernel.org/netdev/net-next/c/43eca05b6a3b
- [06/12] bonding: Mark active offloaded xfrm_states
https://git.kernel.org/netdev/net-next/c/fd4e41ebf66c
- [07/12] bonding: Fix multiple long standing offload races
https://git.kernel.org/netdev/net-next/c/d2fddbd34799
- [08/12] xfrm: Migrate offload configuration
https://git.kernel.org/netdev/net-next/c/ab244a394c7f
- [09/12] xfrm: Refactor migration setup during the cloning process
https://git.kernel.org/netdev/net-next/c/e8961c50ee9c
- [10/12] xfrm: validate assignment of maximal possible SEQ number
https://git.kernel.org/netdev/net-next/c/e86212b6b13a
- [11/12] xfrm: prevent configuration of interface index when offload is used
https://git.kernel.org/netdev/net-next/c/c82b48b63a93
- [12/12] xfrm: use kfree_sensitive() for SA secret zeroization
https://git.kernel.org/netdev/net-next/c/e7a37c9e428a
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-05-26 16:50 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 7:55 [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
2025-05-23 7:56 ` [PATCH 01/12] xfrm: Remove unnecessary strscpy_pad() size arguments Steffen Klassert
2025-05-26 16:50 ` patchwork-bot+netdevbpf
2025-05-23 7:56 ` [PATCH 02/12] net/mlx5: Avoid using xso.real_dev unnecessarily Steffen Klassert
2025-05-23 7:56 ` [PATCH 03/12] xfrm: Use xdo.dev instead of xdo.real_dev Steffen Klassert
2025-05-23 7:56 ` [PATCH 04/12] xfrm: Remove unneeded device check from validate_xmit_xfrm Steffen Klassert
2025-05-23 7:56 ` [PATCH 05/12] xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} Steffen Klassert
2025-05-23 7:56 ` [PATCH 06/12] bonding: Mark active offloaded xfrm_states Steffen Klassert
2025-05-23 7:56 ` [PATCH 07/12] bonding: Fix multiple long standing offload races Steffen Klassert
2025-05-23 7:56 ` [PATCH 08/12] xfrm: Migrate offload configuration Steffen Klassert
2025-05-23 7:56 ` [PATCH 09/12] xfrm: Refactor migration setup during the cloning process Steffen Klassert
2025-05-23 7:56 ` [PATCH 10/12] xfrm: validate assignment of maximal possible SEQ number Steffen Klassert
2025-05-23 7:56 ` [PATCH 11/12] xfrm: prevent configuration of interface index when offload is used Steffen Klassert
2025-05-23 7:56 ` [PATCH 12/12] xfrm: use kfree_sensitive() for SA secret zeroization Steffen Klassert
2025-05-23 10:36 ` [PATCH 0/12] pull request (net-next): ipsec-next 2025-05-23 Steffen Klassert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).