netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload
@ 2025-02-19 13:50 Leon Romanovsky
  2025-02-19 13:50 ` [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested Leon Romanovsky
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Leon Romanovsky @ 2025-02-19 13:50 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Andrew Lunn, Ayush Sawal, Bharat Bhushan, Eric Dumazet,
	Geetha sowjanya, hariprasad, Herbert Xu, intel-wired-lan,
	Jakub Kicinski, Jay Vosburgh, Jonathan Corbet, linux-doc,
	linux-rdma, Louis Peens, netdev, oss-drivers, Paolo Abeni,
	Potnuri Bharat Teja, Przemek Kitszel, Saeed Mahameed,
	Subbaraya Sundeep, Sunil Goutham, Tariq Toukan, Tony Nguyen,
	Zhu Yanjun, Bharat Bhushan

Changelog:
v1:
 * Changed signature and names of functions which set and clear type_offload
 * Fixed typos
 * Add Zhu's ROB tag
v0: https://lore.kernel.org/all/cover.1738778580.git.leon@kernel.org

Hi,

This series refactors the xdo_dev_offload_ok() to be global place for
drivers to check if their offload can perform encryption for xmit
packets.

Such common place gives us an option to check MTU and PMTU at one place.

Thanks

Leon Romanovsky (5):
  xfrm: delay initialization of offload path till its actually requested
  xfrm: simplify SA initialization routine
  xfrm: rely on XFRM offload
  xfrm: provide common xdo_dev_offload_ok callback implementation
  xfrm: check for PMTU in tunnel mode for packet offload

 Documentation/networking/xfrm_device.rst      |  3 +-
 drivers/net/bonding/bond_main.c               | 16 ++----
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 21 --------
 .../inline_crypto/ch_ipsec/chcr_ipsec.c       | 16 ------
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    | 21 --------
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    | 21 --------
 .../marvell/octeontx2/nic/cn10k_ipsec.c       | 15 ------
 .../mellanox/mlx5/core/en_accel/ipsec.c       | 16 ------
 .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 ----
 drivers/net/netdevsim/ipsec.c                 | 11 ----
 drivers/net/netdevsim/netdevsim.h             |  1 -
 include/net/xfrm.h                            | 21 +++++++-
 net/xfrm/xfrm_device.c                        | 46 ++++++++++++-----
 net/xfrm/xfrm_output.c                        |  6 ++-
 net/xfrm/xfrm_state.c                         | 50 ++++++++-----------
 net/xfrm/xfrm_user.c                          |  2 +-
 16 files changed, 87 insertions(+), 190 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested
  2025-02-19 13:50 [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload Leon Romanovsky
@ 2025-02-19 13:50 ` Leon Romanovsky
  2025-06-05 13:09   ` Sabrina Dubroca
  2025-02-19 13:50 ` [PATCH ipsec-next v1 2/5] xfrm: simplify SA initialization routine Leon Romanovsky
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Leon Romanovsky @ 2025-02-19 13:50 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni

From: Leon Romanovsky <leonro@nvidia.com>

XFRM offload path is probed even if offload isn't needed at all. Let's
make sure that x->type_offload pointer stays NULL for such path to
reduce ambiguity.

Fixes: 9d389d7f84bb ("xfrm: Add a xfrm type offload.")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/net/xfrm.h     | 11 ++++++++++-
 net/xfrm/xfrm_device.c | 13 ++++++++-----
 net/xfrm/xfrm_state.c  | 32 ++++++++++++++------------------
 net/xfrm/xfrm_user.c   |  2 +-
 4 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index ed4b83696c77..e1eed5d47d07 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -464,6 +464,15 @@ struct xfrm_type_offload {
 
 int xfrm_register_type_offload(const struct xfrm_type_offload *type, unsigned short family);
 void xfrm_unregister_type_offload(const struct xfrm_type_offload *type, unsigned short family);
+void xfrm_set_type_offload(struct xfrm_state *x);
+static inline void xfrm_unset_type_offload(struct xfrm_state *x)
+{
+	if (!x->type_offload)
+		return;
+
+	module_put(x->type_offload->owner);
+	x->type_offload = NULL;
+}
 
 /**
  * struct xfrm_mode_cbs - XFRM mode callbacks
@@ -1760,7 +1769,7 @@ void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si);
 u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
 int xfrm_init_replay(struct xfrm_state *x, struct netlink_ext_ack *extack);
 u32 xfrm_state_mtu(struct xfrm_state *x, int mtu);
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload,
+int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
 		      struct netlink_ext_ack *extack);
 int xfrm_init_state(struct xfrm_state *x);
 int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index d1fa94e52cea..97c8030cc417 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -244,11 +244,6 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	xfrm_address_t *daddr;
 	bool is_packet_offload;
 
-	if (!x->type_offload) {
-		NL_SET_ERR_MSG(extack, "Type doesn't support offload");
-		return -EINVAL;
-	}
-
 	if (xuo->flags &
 	    ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND | XFRM_OFFLOAD_PACKET)) {
 		NL_SET_ERR_MSG(extack, "Unrecognized flags in offload request");
@@ -310,6 +305,13 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 		return -EINVAL;
 	}
 
+	xfrm_set_type_offload(x);
+	if (!x->type_offload) {
+		NL_SET_ERR_MSG(extack, "Type doesn't support offload");
+		dev_put(dev);
+		return -EINVAL;
+	}
+
 	xso->dev = dev;
 	netdev_tracker_alloc(dev, &xso->dev_tracker, GFP_ATOMIC);
 	xso->real_dev = dev;
@@ -332,6 +334,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 		netdev_put(dev, &xso->dev_tracker);
 		xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 
+		xfrm_unset_type_offload(x);
 		/* User explicitly requested packet offload mode and configured
 		 * policy in addition to the XFRM state. So be civil to users,
 		 * and return an error instead of taking fallback path.
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index ad2202fa82f3..69af5964c886 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -424,18 +424,18 @@ void xfrm_unregister_type_offload(const struct xfrm_type_offload *type,
 }
 EXPORT_SYMBOL(xfrm_unregister_type_offload);
 
-static const struct xfrm_type_offload *
-xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
+void xfrm_set_type_offload(struct xfrm_state *x)
 {
 	const struct xfrm_type_offload *type = NULL;
 	struct xfrm_state_afinfo *afinfo;
+	bool try_load = true;
 
 retry:
-	afinfo = xfrm_state_get_afinfo(family);
+	afinfo = xfrm_state_get_afinfo(x->props.family);
 	if (unlikely(afinfo == NULL))
-		return NULL;
+		goto out;
 
-	switch (proto) {
+	switch (x->id.proto) {
 	case IPPROTO_ESP:
 		type = afinfo->type_offload_esp;
 		break;
@@ -449,18 +449,16 @@ xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
 	rcu_read_unlock();
 
 	if (!type && try_load) {
-		request_module("xfrm-offload-%d-%d", family, proto);
+		request_module("xfrm-offload-%d-%d", x->props.family,
+			       x->id.proto);
 		try_load = false;
 		goto retry;
 	}
 
-	return type;
-}
-
-static void xfrm_put_type_offload(const struct xfrm_type_offload *type)
-{
-	module_put(type->owner);
+out:
+	x->type_offload = type;
 }
+EXPORT_SYMBOL(xfrm_set_type_offload);
 
 static const struct xfrm_mode xfrm4_mode_map[XFRM_MODE_MAX] = {
 	[XFRM_MODE_BEET] = {
@@ -609,8 +607,6 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)
 	kfree(x->coaddr);
 	kfree(x->replay_esn);
 	kfree(x->preplay_esn);
-	if (x->type_offload)
-		xfrm_put_type_offload(x->type_offload);
 	if (x->type) {
 		x->type->destructor(x);
 		xfrm_put_type(x->type);
@@ -784,6 +780,8 @@ void xfrm_dev_state_free(struct xfrm_state *x)
 	struct xfrm_dev_offload *xso = &x->xso;
 	struct net_device *dev = READ_ONCE(xso->dev);
 
+	xfrm_unset_type_offload(x);
+
 	if (dev && dev->xfrmdev_ops) {
 		spin_lock_bh(&xfrm_state_dev_gc_lock);
 		if (!hlist_unhashed(&x->dev_gclist))
@@ -3122,7 +3120,7 @@ u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
 }
 EXPORT_SYMBOL_GPL(xfrm_state_mtu);
 
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload,
+int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
 		      struct netlink_ext_ack *extack)
 {
 	const struct xfrm_mode *inner_mode;
@@ -3178,8 +3176,6 @@ int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload,
 		goto error;
 	}
 
-	x->type_offload = xfrm_get_type_offload(x->id.proto, family, offload);
-
 	err = x->type->init_state(x, extack);
 	if (err)
 		goto error;
@@ -3229,7 +3225,7 @@ int xfrm_init_state(struct xfrm_state *x)
 {
 	int err;
 
-	err = __xfrm_init_state(x, true, false, NULL);
+	err = __xfrm_init_state(x, true, NULL);
 	if (!err)
 		x->km.state = XFRM_STATE_VALID;
 
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 08c6d6f0179f..82a768500999 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -907,7 +907,7 @@ static struct xfrm_state *xfrm_state_construct(struct net *net,
 			goto error;
 	}
 
-	err = __xfrm_init_state(x, false, attrs[XFRMA_OFFLOAD_DEV], extack);
+	err = __xfrm_init_state(x, false, extack);
 	if (err)
 		goto error;
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH ipsec-next v1 2/5] xfrm: simplify SA initialization routine
  2025-02-19 13:50 [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload Leon Romanovsky
  2025-02-19 13:50 ` [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested Leon Romanovsky
@ 2025-02-19 13:50 ` Leon Romanovsky
  2025-02-19 13:50 ` [PATCH ipsec-next v1 3/5] xfrm: rely on XFRM offload Leon Romanovsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2025-02-19 13:50 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni

From: Leon Romanovsky <leonro@nvidia.com>

SA replay mode is initialized differently for user-space and
kernel-space users, but the call to xfrm_init_replay() existed in
common path with boolean protection. That caused to situation where
we have two different function orders.

So let's rewrite the SA initialization flow to have same order for
both in-kernel and user-space callers.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/net/xfrm.h    |  3 +--
 net/xfrm/xfrm_state.c | 22 ++++++++++------------
 net/xfrm/xfrm_user.c  |  2 +-
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e1eed5d47d07..15997374a594 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1769,8 +1769,7 @@ void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si);
 u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
 int xfrm_init_replay(struct xfrm_state *x, struct netlink_ext_ack *extack);
 u32 xfrm_state_mtu(struct xfrm_state *x, int mtu);
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
-		      struct netlink_ext_ack *extack);
+int __xfrm_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack);
 int xfrm_init_state(struct xfrm_state *x);
 int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
 int xfrm_input_resume(struct sk_buff *skb, int nexthdr);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 69af5964c886..7b1028671144 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -3120,8 +3120,7 @@ u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
 }
 EXPORT_SYMBOL_GPL(xfrm_state_mtu);
 
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
-		      struct netlink_ext_ack *extack)
+int __xfrm_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack)
 {
 	const struct xfrm_mode *inner_mode;
 	const struct xfrm_mode *outer_mode;
@@ -3188,12 +3187,6 @@ int __xfrm_init_state(struct xfrm_state *x, bool init_replay,
 	}
 
 	x->outer_mode = *outer_mode;
-	if (init_replay) {
-		err = xfrm_init_replay(x, extack);
-		if (err)
-			goto error;
-	}
-
 	if (x->nat_keepalive_interval) {
 		if (x->dir != XFRM_SA_DIR_OUT) {
 			NL_SET_ERR_MSG(extack, "NAT keepalive is only supported for outbound SAs");
@@ -3225,11 +3218,16 @@ int xfrm_init_state(struct xfrm_state *x)
 {
 	int err;
 
-	err = __xfrm_init_state(x, true, NULL);
-	if (!err)
-		x->km.state = XFRM_STATE_VALID;
+	err = __xfrm_init_state(x, NULL);
+	if (err)
+		return err;
 
-	return err;
+	err = xfrm_init_replay(x, NULL);
+	if (err)
+		return err;
+
+	x->km.state = XFRM_STATE_VALID;
+	return 0;
 }
 
 EXPORT_SYMBOL(xfrm_init_state);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 82a768500999..d1d422f68978 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -907,7 +907,7 @@ static struct xfrm_state *xfrm_state_construct(struct net *net,
 			goto error;
 	}
 
-	err = __xfrm_init_state(x, false, extack);
+	err = __xfrm_init_state(x, extack);
 	if (err)
 		goto error;
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH ipsec-next v1 3/5] xfrm: rely on XFRM offload
  2025-02-19 13:50 [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload Leon Romanovsky
  2025-02-19 13:50 ` [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested Leon Romanovsky
  2025-02-19 13:50 ` [PATCH ipsec-next v1 2/5] xfrm: simplify SA initialization routine Leon Romanovsky
@ 2025-02-19 13:50 ` Leon Romanovsky
  2025-02-19 13:51 ` [PATCH ipsec-next v1 4/5] xfrm: provide common xdo_dev_offload_ok callback implementation Leon Romanovsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2025-02-19 13:50 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

After change of initialization of x->type_offload pointer to be valid
only for offloaded SAs. There is no need to rely on both x->type_offload
and x->xso.type to determine if SA is offloaded or not.

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/xfrm/xfrm_device.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 97c8030cc417..8d24f4743107 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -419,13 +419,11 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct net_device *dev = x->xso.dev;
 
-	if (!x->type_offload ||
-	    (x->xso.type == XFRM_DEV_OFFLOAD_UNSPECIFIED && x->encap))
+	if (x->xso.type == XFRM_DEV_OFFLOAD_UNSPECIFIED)
 		return false;
 
 	if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET ||
-	    ((!dev || (dev == xfrm_dst_path(dst)->dev)) &&
-	     !xdst->child->xfrm)) {
+	    ((dev == xfrm_dst_path(dst)->dev) && !xdst->child->xfrm)) {
 		mtu = xfrm_state_mtu(x, xdst->child_mtu_cached);
 		if (skb->len <= mtu)
 			goto ok;
@@ -437,8 +435,8 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 	return false;
 
 ok:
-	if (dev && dev->xfrmdev_ops && dev->xfrmdev_ops->xdo_dev_offload_ok)
-		return x->xso.dev->xfrmdev_ops->xdo_dev_offload_ok(skb, x);
+	if (dev->xfrmdev_ops->xdo_dev_offload_ok)
+		return dev->xfrmdev_ops->xdo_dev_offload_ok(skb, x);
 
 	return true;
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH ipsec-next v1 4/5] xfrm: provide common xdo_dev_offload_ok callback implementation
  2025-02-19 13:50 [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload Leon Romanovsky
                   ` (2 preceding siblings ...)
  2025-02-19 13:50 ` [PATCH ipsec-next v1 3/5] xfrm: rely on XFRM offload Leon Romanovsky
@ 2025-02-19 13:51 ` Leon Romanovsky
  2025-02-19 13:51 ` [PATCH ipsec-next v1 5/5] xfrm: check for PMTU in tunnel mode for packet offload Leon Romanovsky
  2025-02-25  8:57 ` [PATCH ipsec-next v1 0/5] Support " Steffen Klassert
  5 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2025-02-19 13:51 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, Andrew Lunn, Ayush Sawal, Bharat Bhushan,
	Eric Dumazet, Geetha sowjanya, hariprasad, Herbert Xu,
	intel-wired-lan, Jakub Kicinski, Jay Vosburgh, Jonathan Corbet,
	linux-doc, linux-rdma, Louis Peens, netdev, oss-drivers,
	Paolo Abeni, Potnuri Bharat Teja, Przemek Kitszel, Saeed Mahameed,
	Subbaraya Sundeep, Sunil Goutham, Tariq Toukan, Tony Nguyen,
	Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

Almost all drivers except bond and nsim had same check if device
can perform XFRM offload on that specific packet. The check was that
packet doesn't have IPv4 options and IPv6 extensions.

In NIC drivers, the IPv4 HELEN comparison was slightly different, but
the intent was to check for the same conditions. So let's chose more
strict variant as a common base.

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/networking/xfrm_device.rst      |  3 ++-
 drivers/net/bonding/bond_main.c               | 16 +++++---------
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 21 -------------------
 .../inline_crypto/ch_ipsec/chcr_ipsec.c       | 16 --------------
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    | 21 -------------------
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    | 21 -------------------
 .../marvell/octeontx2/nic/cn10k_ipsec.c       | 15 -------------
 .../mellanox/mlx5/core/en_accel/ipsec.c       | 16 --------------
 .../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 ----------
 drivers/net/netdevsim/ipsec.c                 | 11 ----------
 drivers/net/netdevsim/netdevsim.h             |  1 -
 net/xfrm/xfrm_device.c                        | 15 +++++++++++++
 12 files changed, 22 insertions(+), 145 deletions(-)

diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst
index 66f6e9a9b59a..7f24c09f2694 100644
--- a/Documentation/networking/xfrm_device.rst
+++ b/Documentation/networking/xfrm_device.rst
@@ -126,7 +126,8 @@ been setup for offload, it first calls into xdo_dev_offload_ok() with
 the skb and the intended offload state to ask the driver if the offload
 will serviceable.  This can check the packet information to be sure the
 offload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
-return true of false to signify its support.
+return true or false to signify its support. In case driver doesn't implement
+this callback, the stack provides reasonable defaults.
 
 Crypto offload mode:
 When ready to send, the driver needs to inspect the Tx packet for the
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f6d0628a36d9..154e670d8075 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -673,22 +673,16 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs)
 static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
 {
 	struct net_device *real_dev;
-	bool ok = false;
 
 	rcu_read_lock();
 	real_dev = bond_ipsec_dev(xs);
-	if (!real_dev)
-		goto out;
-
-	if (!real_dev->xfrmdev_ops ||
-	    !real_dev->xfrmdev_ops->xdo_dev_offload_ok ||
-	    netif_is_bond_master(real_dev))
-		goto out;
+	if (!real_dev || netif_is_bond_master(real_dev)) {
+		rcu_read_unlock();
+		return false;
+	}
 
-	ok = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
-out:
 	rcu_read_unlock();
-	return ok;
+	return true;
 }
 
 /**
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 2f0b3e389e62..551c279dc14b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -6538,26 +6538,6 @@ static void cxgb4_xfrm_free_state(struct xfrm_state *x)
 	mutex_unlock(&uld_mutex);
 }
 
-static bool cxgb4_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
-{
-	struct adapter *adap = netdev2adap(x->xso.dev);
-	bool ret = false;
-
-	if (!mutex_trylock(&uld_mutex)) {
-		dev_dbg(adap->pdev_dev,
-			"crypto uld critical resource is under use\n");
-		return ret;
-	}
-	if (chcr_offload_state(adap, CXGB4_XFRMDEV_OPS))
-		goto out_unlock;
-
-	ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_offload_ok(skb, x);
-
-out_unlock:
-	mutex_unlock(&uld_mutex);
-	return ret;
-}
-
 static void cxgb4_advance_esn_state(struct xfrm_state *x)
 {
 	struct adapter *adap = netdev2adap(x->xso.dev);
@@ -6583,7 +6563,6 @@ static const struct xfrmdev_ops cxgb4_xfrmdev_ops = {
 	.xdo_dev_state_add      = cxgb4_xfrm_add_state,
 	.xdo_dev_state_delete   = cxgb4_xfrm_del_state,
 	.xdo_dev_state_free     = cxgb4_xfrm_free_state,
-	.xdo_dev_offload_ok     = cxgb4_ipsec_offload_ok,
 	.xdo_dev_state_advance_esn = cxgb4_advance_esn_state,
 };
 
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
index c7338ac6a5bb..baba96883f48 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
@@ -71,7 +71,6 @@
 static LIST_HEAD(uld_ctx_list);
 static DEFINE_MUTEX(dev_mutex);
 
-static bool ch_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
 static int ch_ipsec_uld_state_change(void *handle, enum cxgb4_state new_state);
 static int ch_ipsec_xmit(struct sk_buff *skb, struct net_device *dev);
 static void *ch_ipsec_uld_add(const struct cxgb4_lld_info *infop);
@@ -85,7 +84,6 @@ static const struct xfrmdev_ops ch_ipsec_xfrmdev_ops = {
 	.xdo_dev_state_add      = ch_ipsec_xfrm_add_state,
 	.xdo_dev_state_delete   = ch_ipsec_xfrm_del_state,
 	.xdo_dev_state_free     = ch_ipsec_xfrm_free_state,
-	.xdo_dev_offload_ok     = ch_ipsec_offload_ok,
 	.xdo_dev_state_advance_esn = ch_ipsec_advance_esn_state,
 };
 
@@ -323,20 +321,6 @@ static void ch_ipsec_xfrm_free_state(struct xfrm_state *x)
 	module_put(THIS_MODULE);
 }
 
-static bool ch_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
-{
-	if (x->props.family == AF_INET) {
-		/* Offload with IP options is not supported yet */
-		if (ip_hdr(skb)->ihl > 5)
-			return false;
-	} else {
-		/* Offload with IPv6 extension headers is not support yet */
-		if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
-			return false;
-	}
-	return true;
-}
-
 static void ch_ipsec_advance_esn_state(struct xfrm_state *x)
 {
 	/* do nothing */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 866024f2b9ee..07ea1954a276 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -817,30 +817,9 @@ static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
 	}
 }
 
-/**
- * ixgbe_ipsec_offload_ok - can this packet use the xfrm hw offload
- * @skb: current data packet
- * @xs: pointer to transformer state struct
- **/
-static bool ixgbe_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
-{
-	if (xs->props.family == AF_INET) {
-		/* Offload with IPv4 options is not supported yet */
-		if (ip_hdr(skb)->ihl != 5)
-			return false;
-	} else {
-		/* Offload with IPv6 extension headers is not support yet */
-		if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
-			return false;
-	}
-
-	return true;
-}
-
 static const struct xfrmdev_ops ixgbe_xfrmdev_ops = {
 	.xdo_dev_state_add = ixgbe_ipsec_add_sa,
 	.xdo_dev_state_delete = ixgbe_ipsec_del_sa,
-	.xdo_dev_offload_ok = ixgbe_ipsec_offload_ok,
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index f804b35d79c7..8ba037e3d9c2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -428,30 +428,9 @@ static void ixgbevf_ipsec_del_sa(struct xfrm_state *xs)
 	}
 }
 
-/**
- * ixgbevf_ipsec_offload_ok - can this packet use the xfrm hw offload
- * @skb: current data packet
- * @xs: pointer to transformer state struct
- **/
-static bool ixgbevf_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
-{
-	if (xs->props.family == AF_INET) {
-		/* Offload with IPv4 options is not supported yet */
-		if (ip_hdr(skb)->ihl != 5)
-			return false;
-	} else {
-		/* Offload with IPv6 extension headers is not support yet */
-		if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
-			return false;
-	}
-
-	return true;
-}
-
 static const struct xfrmdev_ops ixgbevf_xfrmdev_ops = {
 	.xdo_dev_state_add = ixgbevf_ipsec_add_sa,
 	.xdo_dev_state_delete = ixgbevf_ipsec_del_sa,
-	.xdo_dev_offload_ok = ixgbevf_ipsec_offload_ok,
 };
 
 /**
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
index 09a5b5268205..fc59e50bafce 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_ipsec.c
@@ -744,24 +744,9 @@ static void cn10k_ipsec_del_state(struct xfrm_state *x)
 		queue_work(pf->ipsec.sa_workq, &pf->ipsec.sa_work);
 }
 
-static bool cn10k_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
-{
-	if (x->props.family == AF_INET) {
-		/* Offload with IPv4 options is not supported yet */
-		if (ip_hdr(skb)->ihl > 5)
-			return false;
-	} else {
-		/* Offload with IPv6 extension headers is not support yet */
-		if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
-			return false;
-	}
-	return true;
-}
-
 static const struct xfrmdev_ops cn10k_ipsec_xfrmdev_ops = {
 	.xdo_dev_state_add	= cn10k_ipsec_add_state,
 	.xdo_dev_state_delete	= cn10k_ipsec_del_state,
-	.xdo_dev_offload_ok	= cn10k_ipsec_offload_ok,
 };
 
 static void cn10k_ipsec_sa_wq_handler(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 501709ac310f..3b81e7b8ce23 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -953,21 +953,6 @@ void mlx5e_ipsec_cleanup(struct mlx5e_priv *priv)
 	priv->ipsec = NULL;
 }
 
-static bool mlx5e_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
-{
-	if (x->props.family == AF_INET) {
-		/* Offload with IPv4 options is not supported yet */
-		if (ip_hdr(skb)->ihl > 5)
-			return false;
-	} else {
-		/* Offload with IPv6 extension headers is not support yet */
-		if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
-			return false;
-	}
-
-	return true;
-}
-
 static void mlx5e_xfrm_advance_esn_state(struct xfrm_state *x)
 {
 	struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x);
@@ -1196,7 +1181,6 @@ static const struct xfrmdev_ops mlx5e_ipsec_xfrmdev_ops = {
 	.xdo_dev_state_add	= mlx5e_xfrm_add_state,
 	.xdo_dev_state_delete	= mlx5e_xfrm_del_state,
 	.xdo_dev_state_free	= mlx5e_xfrm_free_state,
-	.xdo_dev_offload_ok	= mlx5e_ipsec_offload_ok,
 	.xdo_dev_state_advance_esn = mlx5e_xfrm_advance_esn_state,
 
 	.xdo_dev_state_update_stats = mlx5e_xfrm_update_stats,
diff --git a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
index 515069d5637b..671af5d4c5d2 100644
--- a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
+++ b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
@@ -565,20 +565,9 @@ static void nfp_net_xfrm_del_state(struct xfrm_state *x)
 	xa_erase(&nn->xa_ipsec, x->xso.offload_handle - 1);
 }
 
-static bool nfp_net_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
-{
-	if (x->props.family == AF_INET)
-		/* Offload with IPv4 options is not supported yet */
-		return ip_hdr(skb)->ihl == 5;
-
-	/* Offload with IPv6 extension headers is not support yet */
-	return !(ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr));
-}
-
 static const struct xfrmdev_ops nfp_net_ipsec_xfrmdev_ops = {
 	.xdo_dev_state_add = nfp_net_xfrm_add_state,
 	.xdo_dev_state_delete = nfp_net_xfrm_del_state,
-	.xdo_dev_offload_ok = nfp_net_ipsec_offload_ok,
 };
 
 void nfp_net_ipsec_init(struct nfp_net *nn)
diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c
index 88187dd4eb2d..d88bdb9a1717 100644
--- a/drivers/net/netdevsim/ipsec.c
+++ b/drivers/net/netdevsim/ipsec.c
@@ -217,20 +217,9 @@ static void nsim_ipsec_del_sa(struct xfrm_state *xs)
 	ipsec->count--;
 }
 
-static bool nsim_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
-{
-	struct netdevsim *ns = netdev_priv(xs->xso.real_dev);
-	struct nsim_ipsec *ipsec = &ns->ipsec;
-
-	ipsec->ok++;
-
-	return true;
-}
-
 static const struct xfrmdev_ops nsim_xfrmdev_ops = {
 	.xdo_dev_state_add	= nsim_ipsec_add_sa,
 	.xdo_dev_state_delete	= nsim_ipsec_del_sa,
-	.xdo_dev_offload_ok	= nsim_ipsec_offload_ok,
 };
 
 bool nsim_ipsec_tx(struct netdevsim *ns, struct sk_buff *skb)
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 96d54c08043d..ca8f1a620044 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -54,7 +54,6 @@ struct nsim_ipsec {
 	struct dentry *pfile;
 	u32 count;
 	u32 tx;
-	u32 ok;
 };
 
 #define NSIM_MACSEC_MAX_SECY_COUNT 3
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 8d24f4743107..f9d985ef30f2 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -435,6 +435,21 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 	return false;
 
 ok:
+	switch (x->props.family) {
+	case AF_INET:
+		/* Check for IPv4 options */
+		if (ip_hdr(skb)->ihl != 5)
+			return false;
+		break;
+	case AF_INET6:
+		/* Check for IPv6 extensions */
+		if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
+			return false;
+		break;
+	default:
+		break;
+	}
+
 	if (dev->xfrmdev_ops->xdo_dev_offload_ok)
 		return dev->xfrmdev_ops->xdo_dev_offload_ok(skb, x);
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH ipsec-next v1 5/5] xfrm: check for PMTU in tunnel mode for packet offload
  2025-02-19 13:50 [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload Leon Romanovsky
                   ` (3 preceding siblings ...)
  2025-02-19 13:51 ` [PATCH ipsec-next v1 4/5] xfrm: provide common xdo_dev_offload_ok callback implementation Leon Romanovsky
@ 2025-02-19 13:51 ` Leon Romanovsky
  2025-02-25  8:57 ` [PATCH ipsec-next v1 0/5] Support " Steffen Klassert
  5 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2025-02-19 13:51 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni

From: Leon Romanovsky <leonro@nvidia.com>

In tunnel mode, for the packet offload, there were no PMTU signaling
to the upper level about need to fragment the packet. As a solution,
call to already existing xfrm[4|6]_tunnel_check_size() to perform that.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/net/xfrm.h     |  9 +++++++++
 net/xfrm/xfrm_device.c | 10 ++++++++--
 net/xfrm/xfrm_output.c |  6 ++++--
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 15997374a594..39365fd2ea17 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1781,6 +1781,15 @@ int xfrm_trans_queue(struct sk_buff *skb,
 				   struct sk_buff *));
 int xfrm_output_resume(struct sock *sk, struct sk_buff *skb, int err);
 int xfrm_output(struct sock *sk, struct sk_buff *skb);
+int xfrm4_tunnel_check_size(struct sk_buff *skb);
+#if IS_ENABLED(CONFIG_IPV6)
+int xfrm6_tunnel_check_size(struct sk_buff *skb);
+#else
+static inline int xfrm6_tunnel_check_size(struct sk_buff *skb)
+{
+	return -EMSGSIZE;
+}
+#endif
 
 #if IS_ENABLED(CONFIG_NET_PKTGEN)
 int pktgen_xfrm_outer_mode_output(struct xfrm_state *x, struct sk_buff *skb);
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index f9d985ef30f2..d62f76161d83 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -418,12 +418,12 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 	struct dst_entry *dst = skb_dst(skb);
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct net_device *dev = x->xso.dev;
+	bool check_tunnel_size;
 
 	if (x->xso.type == XFRM_DEV_OFFLOAD_UNSPECIFIED)
 		return false;
 
-	if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET ||
-	    ((dev == xfrm_dst_path(dst)->dev) && !xdst->child->xfrm)) {
+	if ((dev == xfrm_dst_path(dst)->dev) && !xdst->child->xfrm) {
 		mtu = xfrm_state_mtu(x, xdst->child_mtu_cached);
 		if (skb->len <= mtu)
 			goto ok;
@@ -435,16 +435,22 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 	return false;
 
 ok:
+	check_tunnel_size = x->xso.type == XFRM_DEV_OFFLOAD_PACKET &&
+			    x->props.mode == XFRM_MODE_TUNNEL;
 	switch (x->props.family) {
 	case AF_INET:
 		/* Check for IPv4 options */
 		if (ip_hdr(skb)->ihl != 5)
 			return false;
+		if (check_tunnel_size && xfrm4_tunnel_check_size(skb))
+			return false;
 		break;
 	case AF_INET6:
 		/* Check for IPv6 extensions */
 		if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
 			return false;
+		if (check_tunnel_size && xfrm6_tunnel_check_size(skb))
+			return false;
 		break;
 	default:
 		break;
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index f7abd42c077d..34c8e266641c 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -786,7 +786,7 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(xfrm_output);
 
-static int xfrm4_tunnel_check_size(struct sk_buff *skb)
+int xfrm4_tunnel_check_size(struct sk_buff *skb)
 {
 	int mtu, ret = 0;
 
@@ -812,6 +812,7 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb)
 out:
 	return ret;
 }
+EXPORT_SYMBOL_GPL(xfrm4_tunnel_check_size);
 
 static int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb)
 {
@@ -834,7 +835,7 @@ static int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb)
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static int xfrm6_tunnel_check_size(struct sk_buff *skb)
+int xfrm6_tunnel_check_size(struct sk_buff *skb)
 {
 	int mtu, ret = 0;
 	struct dst_entry *dst = skb_dst(skb);
@@ -864,6 +865,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb)
 out:
 	return ret;
 }
+EXPORT_SYMBOL_GPL(xfrm6_tunnel_check_size);
 #endif
 
 static int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload
  2025-02-19 13:50 [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload Leon Romanovsky
                   ` (4 preceding siblings ...)
  2025-02-19 13:51 ` [PATCH ipsec-next v1 5/5] xfrm: check for PMTU in tunnel mode for packet offload Leon Romanovsky
@ 2025-02-25  8:57 ` Steffen Klassert
  5 siblings, 0 replies; 11+ messages in thread
From: Steffen Klassert @ 2025-02-25  8:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Andrew Lunn, Ayush Sawal, Bharat Bhushan, Eric Dumazet,
	Geetha sowjanya, hariprasad, Herbert Xu, intel-wired-lan,
	Jakub Kicinski, Jay Vosburgh, Jonathan Corbet, linux-doc,
	linux-rdma, Louis Peens, netdev, oss-drivers, Paolo Abeni,
	Potnuri Bharat Teja, Przemek Kitszel, Saeed Mahameed,
	Subbaraya Sundeep, Sunil Goutham, Tariq Toukan, Tony Nguyen,
	Zhu Yanjun, Bharat Bhushan

On Wed, Feb 19, 2025 at 03:50:56PM +0200, Leon Romanovsky wrote:
> Changelog:
> v1:
>  * Changed signature and names of functions which set and clear type_offload
>  * Fixed typos
>  * Add Zhu's ROB tag
> v0: https://lore.kernel.org/all/cover.1738778580.git.leon@kernel.org
> 
> Hi,
> 
> This series refactors the xdo_dev_offload_ok() to be global place for
> drivers to check if their offload can perform encryption for xmit
> packets.
> 
> Such common place gives us an option to check MTU and PMTU at one place.
> 
> Thanks
> 
> Leon Romanovsky (5):
>   xfrm: delay initialization of offload path till its actually requested
>   xfrm: simplify SA initialization routine
>   xfrm: rely on XFRM offload
>   xfrm: provide common xdo_dev_offload_ok callback implementation
>   xfrm: check for PMTU in tunnel mode for packet offload

Series applied, thanks a lot Leon!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested
  2025-02-19 13:50 ` [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested Leon Romanovsky
@ 2025-06-05 13:09   ` Sabrina Dubroca
  2025-06-05 14:16     ` Leon Romanovsky
  0 siblings, 1 reply; 11+ messages in thread
From: Sabrina Dubroca @ 2025-06-05 13:09 UTC (permalink / raw)
  To: Steffen Klassert, Leon Romanovsky
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni

Hello,

I think we need to revert this patch. It causes a severe performance
regression for SW IPsec (around 40-50%).

2025-02-19, 15:50:57 +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> XFRM offload path is probed even if offload isn't needed at all. Let's
> make sure that x->type_offload pointer stays NULL for such path to
> reduce ambiguity.

x->type_offload is used for GRO with SW IPsec, not just for HW offload.

-- 
Sabrina

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested
  2025-06-05 13:09   ` Sabrina Dubroca
@ 2025-06-05 14:16     ` Leon Romanovsky
  2025-06-06 15:12       ` Sabrina Dubroca
  0 siblings, 1 reply; 11+ messages in thread
From: Leon Romanovsky @ 2025-06-05 14:16 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Steffen Klassert, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni

On Thu, Jun 05, 2025 at 03:09:19PM +0200, Sabrina Dubroca wrote:
> Hello,
> 
> I think we need to revert this patch. It causes a severe performance
> regression for SW IPsec (around 40-50%).
> 
> 2025-02-19, 15:50:57 +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > XFRM offload path is probed even if offload isn't needed at all. Let's
> > make sure that x->type_offload pointer stays NULL for such path to
> > reduce ambiguity.
> 
> x->type_offload is used for GRO with SW IPsec, not just for HW offload.

Thanks for the report, can you please try the following fix?

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index a21e276dbe447..e45a275fca26d 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -474,7 +474,7 @@ struct xfrm_type_offload {

 int xfrm_register_type_offload(const struct xfrm_type_offload *type, unsigned short family);
 void xfrm_unregister_type_offload(const struct xfrm_type_offload *type, unsigned short family);
-void xfrm_set_type_offload(struct xfrm_state *x);
+void xfrm_set_type_offload(struct xfrm_state *x, bool try_load);
 static inline void xfrm_unset_type_offload(struct xfrm_state *x)
 {
        if (!x->type_offload)
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 81fd486b5e566..d2819baea414f 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -305,7 +305,6 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
                return -EINVAL;
        }

-       xfrm_set_type_offload(x);
        if (!x->type_offload) {
                NL_SET_ERR_MSG(extack, "Type doesn't support offload");
                dev_put(dev);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 203b585c2ae28..2da7281f8344b 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -424,11 +424,10 @@ void xfrm_unregister_type_offload(const struct xfrm_type_offload *type,
 }
 EXPORT_SYMBOL(xfrm_unregister_type_offload);

-void xfrm_set_type_offload(struct xfrm_state *x)
+void xfrm_set_type_offload(struct xfrm_state *x, bool try_load)
 {
        const struct xfrm_type_offload *type = NULL;
        struct xfrm_state_afinfo *afinfo;
-       bool try_load = true;
 
 retry:
        afinfo = xfrm_state_get_afinfo(x->props.family);
@@ -607,6 +606,7 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)
        kfree(x->coaddr);
        kfree(x->replay_esn);
        kfree(x->preplay_esn);
+       xfrm_unset_type_offload(x);
        if (x->type) {
                x->type->destructor(x);
                xfrm_put_type(x->type);
@@ -780,8 +780,6 @@ void xfrm_dev_state_free(struct xfrm_state *x)
        struct xfrm_dev_offload *xso = &x->xso;
        struct net_device *dev = READ_ONCE(xso->dev);
 
-       xfrm_unset_type_offload(x);
-
        if (dev && dev->xfrmdev_ops) {
                spin_lock_bh(&xfrm_state_dev_gc_lock);
                if (!hlist_unhashed(&x->dev_gclist))
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 59f258daf8305..1db18f470f429 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -977,6 +977,7 @@ static struct xfrm_state *xfrm_state_construct(struct net *net,
        /* override default values from above */
        xfrm_update_ae_params(x, attrs, 0);
 
+       xfrm_set_type_offload(x, attrs[XFRMA_OFFLOAD_DEV]);
        /* configure the hardware if offload is requested */
        if (attrs[XFRMA_OFFLOAD_DEV]) {
                err = xfrm_dev_state_add(net, x,


> 
> -- 
> Sabrina

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested
  2025-06-05 14:16     ` Leon Romanovsky
@ 2025-06-06 15:12       ` Sabrina Dubroca
  2025-06-06 17:05         ` Leon Romanovsky
  0 siblings, 1 reply; 11+ messages in thread
From: Sabrina Dubroca @ 2025-06-06 15:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Steffen Klassert, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni

2025-06-05, 17:16:24 +0300, Leon Romanovsky wrote:
> On Thu, Jun 05, 2025 at 03:09:19PM +0200, Sabrina Dubroca wrote:
> > Hello,
> > 
> > I think we need to revert this patch. It causes a severe performance
> > regression for SW IPsec (around 40-50%).
> > 
> > 2025-02-19, 15:50:57 +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > XFRM offload path is probed even if offload isn't needed at all. Let's
> > > make sure that x->type_offload pointer stays NULL for such path to
> > > reduce ambiguity.
> > 
> > x->type_offload is used for GRO with SW IPsec, not just for HW offload.
> 
> Thanks for the report, can you please try the following fix?

Seems to work in my setup. That's basically a revert of every
functional change in 585b64f5a620 ("xfrm: delay initialization of
offload path till its actually requested"), except that now we set
->type_offload during xfrm_state_construct instead of
__xfrm_init_state, so other callers of __xfrm_init_state
(xfrm_state_migrate and pfkey - we can ignore ipcomp since it doesn't
have ->type_offload) won't get ->type_offload set correctly. I'm not
sure we want that.

Do you need to also revert 49431af6c4ef ("xfrm: rely on XFRM offload")
from this series? The assumption that x->type_offload is set only for
HW offload isn't correct.

-- 
Sabrina

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested
  2025-06-06 15:12       ` Sabrina Dubroca
@ 2025-06-06 17:05         ` Leon Romanovsky
  0 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2025-06-06 17:05 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Steffen Klassert, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni

On Fri, Jun 06, 2025 at 05:12:52PM +0200, Sabrina Dubroca wrote:
> 2025-06-05, 17:16:24 +0300, Leon Romanovsky wrote:
> > On Thu, Jun 05, 2025 at 03:09:19PM +0200, Sabrina Dubroca wrote:
> > > Hello,
> > > 
> > > I think we need to revert this patch. It causes a severe performance
> > > regression for SW IPsec (around 40-50%).
> > > 
> > > 2025-02-19, 15:50:57 +0200, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > XFRM offload path is probed even if offload isn't needed at all. Let's
> > > > make sure that x->type_offload pointer stays NULL for such path to
> > > > reduce ambiguity.
> > > 
> > > x->type_offload is used for GRO with SW IPsec, not just for HW offload.
> > 
> > Thanks for the report, can you please try the following fix?
> 
> Seems to work in my setup. That's basically a revert of every
> functional change in 585b64f5a620 ("xfrm: delay initialization of
> offload path till its actually requested"), except that now we set
> ->type_offload during xfrm_state_construct instead of
> __xfrm_init_state, so other callers of __xfrm_init_state
> (xfrm_state_migrate and pfkey - we can ignore ipcomp since it doesn't
> have ->type_offload) won't get ->type_offload set correctly. I'm not
> sure we want that.
> 
> Do you need to also revert 49431af6c4ef ("xfrm: rely on XFRM offload")
> from this series? The assumption that x->type_offload is set only for
> HW offload isn't correct.

I don't think so, we are not setting x->type_offload in crypto and packet
offload modes and it is enough for us to rely on offload type.

  230 int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
  231                        struct xfrm_user_offload *xuo,
  232                        struct netlink_ext_ack *extack)
  233 {
  ...
  308         if (!x->type_offload) {
  309                 NL_SET_ERR_MSG(extack, "Type doesn't support offload");
  310                 dev_put(dev);
  311                 return -EINVAL;
  312         }

Thanks

> 
> -- 
> Sabrina

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-06-06 17:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 13:50 [PATCH ipsec-next v1 0/5] Support PMTU in tunnel mode for packet offload Leon Romanovsky
2025-02-19 13:50 ` [PATCH ipsec-next v1 1/5] xfrm: delay initialization of offload path till its actually requested Leon Romanovsky
2025-06-05 13:09   ` Sabrina Dubroca
2025-06-05 14:16     ` Leon Romanovsky
2025-06-06 15:12       ` Sabrina Dubroca
2025-06-06 17:05         ` Leon Romanovsky
2025-02-19 13:50 ` [PATCH ipsec-next v1 2/5] xfrm: simplify SA initialization routine Leon Romanovsky
2025-02-19 13:50 ` [PATCH ipsec-next v1 3/5] xfrm: rely on XFRM offload Leon Romanovsky
2025-02-19 13:51 ` [PATCH ipsec-next v1 4/5] xfrm: provide common xdo_dev_offload_ok callback implementation Leon Romanovsky
2025-02-19 13:51 ` [PATCH ipsec-next v1 5/5] xfrm: check for PMTU in tunnel mode for packet offload Leon Romanovsky
2025-02-25  8:57 ` [PATCH ipsec-next v1 0/5] Support " Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).