netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states
@ 2024-10-07  6:44 Steffen Klassert
  2024-10-07  6:44 ` [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Steffen Klassert
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-10-07  6:44 UTC (permalink / raw)
  To: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev
  Cc: Steffen Klassert, devel

This patchset implements the xfrm part of per cpu SAs as specified in
RFC 9611.

Patch 1 adds the cpu as a lookup key and config option to to generate
acquire messages for each cpu.

Patch 2 caches outbound states at the policy.

Patch 3 caches inbound states on a new percpu state cache.

Patch 4 restricts percpu SA attributes to specific netlink message types.

Please review and test.

---

Changes from v1:

- Add compat layer attributes

- Fix a 'use always slowpath' condition

- Document get_cpu() usage

- Fix forgotten update of xfrm_expire_msgsize()

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
  2024-10-07  6:44 [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
@ 2024-10-07  6:44 ` Steffen Klassert
  2024-10-08 16:47   ` Simon Horman
  2024-10-10 18:22   ` kernel test robot
  2024-10-07  6:44 ` [PATCH 2/4] xfrm: Cache used outbound xfrm states at the policy Steffen Klassert
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-10-07  6:44 UTC (permalink / raw)
  To: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev
  Cc: Steffen Klassert, devel

Currently all flows for a certain SA must be processed by the same
cpu to avoid packet reordering and lock contention of the xfrm
state lock.

To get rid of this limitation, the IETF is about to standardize
per cpu SAs. This patch implements the xfrm part of it:

https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/

This adds the cpu as a lookup key for xfrm states and a config option
to generate acquire messages for each cpu.

With that, we can have on each cpu a SA with identical traffic selector
so that flows can be processed in parallel on all cpu.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h        |  5 ++--
 include/uapi/linux/xfrm.h |  2 ++
 net/key/af_key.c          |  7 +++--
 net/xfrm/xfrm_compat.c    |  6 ++--
 net/xfrm/xfrm_state.c     | 58 +++++++++++++++++++++++++++++++--------
 net/xfrm/xfrm_user.c      | 54 ++++++++++++++++++++++++++++++++++--
 6 files changed, 111 insertions(+), 21 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b6bfdc6416c7..e23ad52824e2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -188,6 +188,7 @@ struct xfrm_state {
 	refcount_t		refcnt;
 	spinlock_t		lock;
 
+	u32			pcpu_num;
 	struct xfrm_id		id;
 	struct xfrm_selector	sel;
 	struct xfrm_mark	mark;
@@ -1679,7 +1680,7 @@ struct xfrmk_spdinfo {
 	u32 spdhmcnt;
 };
 
-struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq);
+struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num);
 int xfrm_state_delete(struct xfrm_state *x);
 int xfrm_state_flush(struct net *net, u8 proto, bool task_valid, bool sync);
 int xfrm_dev_state_flush(struct net *net, struct net_device *dev, bool task_valid);
@@ -1794,7 +1795,7 @@ int verify_spi_info(u8 proto, u32 min, u32 max, struct netlink_ext_ack *extack);
 int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi,
 		   struct netlink_ext_ack *extack);
 struct xfrm_state *xfrm_find_acq(struct net *net, const struct xfrm_mark *mark,
-				 u8 mode, u32 reqid, u32 if_id, u8 proto,
+				 u8 mode, u32 reqid, u32 if_id, u32 pcpu_num, u8 proto,
 				 const xfrm_address_t *daddr,
 				 const xfrm_address_t *saddr, int create,
 				 unsigned short family);
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index f28701500714..d73a97e3030a 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -322,6 +322,7 @@ enum xfrm_attr_type_t {
 	XFRMA_MTIMER_THRESH,	/* __u32 in seconds for input SA */
 	XFRMA_SA_DIR,		/* __u8 */
 	XFRMA_NAT_KEEPALIVE_INTERVAL,	/* __u32 in seconds for NAT keepalive */
+	XFRMA_SA_PCPU,		/* __u32 */
 	__XFRMA_MAX
 
 #define XFRMA_OUTPUT_MARK XFRMA_SET_MARK	/* Compatibility */
@@ -437,6 +438,7 @@ struct xfrm_userpolicy_info {
 #define XFRM_POLICY_LOCALOK	1	/* Allow user to override global policy */
 	/* Automatically expand selector to include matching ICMP payloads. */
 #define XFRM_POLICY_ICMP	2
+#define XFRM_POLICY_CPU_ACQUIRE	4
 	__u8				share;
 };
 
diff --git a/net/key/af_key.c b/net/key/af_key.c
index f79fb99271ed..c56bb4f451e6 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1354,7 +1354,7 @@ static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, const struct sadb_
 	}
 
 	if (hdr->sadb_msg_seq) {
-		x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq);
+		x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq, UINT_MAX);
 		if (x && !xfrm_addr_equal(&x->id.daddr, xdaddr, family)) {
 			xfrm_state_put(x);
 			x = NULL;
@@ -1362,7 +1362,8 @@ static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, const struct sadb_
 	}
 
 	if (!x)
-		x = xfrm_find_acq(net, &dummy_mark, mode, reqid, 0, proto, xdaddr, xsaddr, 1, family);
+		x = xfrm_find_acq(net, &dummy_mark, mode, reqid, 0, UINT_MAX,
+				  proto, xdaddr, xsaddr, 1, family);
 
 	if (x == NULL)
 		return -ENOENT;
@@ -1417,7 +1418,7 @@ static int pfkey_acquire(struct sock *sk, struct sk_buff *skb, const struct sadb
 	if (hdr->sadb_msg_seq == 0 || hdr->sadb_msg_errno == 0)
 		return 0;
 
-	x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq);
+	x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq, UINT_MAX);
 	if (x == NULL)
 		return 0;
 
diff --git a/net/xfrm/xfrm_compat.c b/net/xfrm/xfrm_compat.c
index 91357ccaf4af..5b9ee63e30b6 100644
--- a/net/xfrm/xfrm_compat.c
+++ b/net/xfrm/xfrm_compat.c
@@ -132,6 +132,7 @@ static const struct nla_policy compat_policy[XFRMA_MAX+1] = {
 	[XFRMA_MTIMER_THRESH]	= { .type = NLA_U32 },
 	[XFRMA_SA_DIR]          = NLA_POLICY_RANGE(NLA_U8, XFRM_SA_DIR_IN, XFRM_SA_DIR_OUT),
 	[XFRMA_NAT_KEEPALIVE_INTERVAL]	= { .type = NLA_U32 },
+	[XFRMA_SA_PCPU]		= { .type = NLA_U32 },
 };
 
 static struct nlmsghdr *xfrm_nlmsg_put_compat(struct sk_buff *skb,
@@ -282,9 +283,10 @@ static int xfrm_xlate64_attr(struct sk_buff *dst, const struct nlattr *src)
 	case XFRMA_MTIMER_THRESH:
 	case XFRMA_SA_DIR:
 	case XFRMA_NAT_KEEPALIVE_INTERVAL:
+	case XFRMA_SA_PCPU:
 		return xfrm_nla_cpy(dst, src, nla_len(src));
 	default:
-		BUILD_BUG_ON(XFRMA_MAX != XFRMA_NAT_KEEPALIVE_INTERVAL);
+		BUILD_BUG_ON(XFRMA_MAX != XFRMA_SA_PCPU);
 		pr_warn_once("unsupported nla_type %d\n", src->nla_type);
 		return -EOPNOTSUPP;
 	}
@@ -439,7 +441,7 @@ static int xfrm_xlate32_attr(void *dst, const struct nlattr *nla,
 	int err;
 
 	if (type > XFRMA_MAX) {
-		BUILD_BUG_ON(XFRMA_MAX != XFRMA_NAT_KEEPALIVE_INTERVAL);
+		BUILD_BUG_ON(XFRMA_MAX != XFRMA_SA_PCPU);
 		NL_SET_ERR_MSG(extack, "Bad attribute");
 		return -EOPNOTSUPP;
 	}
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 37478d36a8df..ebef07b80afa 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -679,6 +679,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net)
 		x->lft.hard_packet_limit = XFRM_INF;
 		x->replay_maxage = 0;
 		x->replay_maxdiff = 0;
+		x->pcpu_num = UINT_MAX;
 		spin_lock_init(&x->lock);
 	}
 	return x;
@@ -1155,6 +1156,12 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x,
 			       struct xfrm_state **best, int *acq_in_progress,
 			       int *error)
 {
+	/* We need the cpu id just as a lookup key,
+	 * we don't require it to be stable.
+	 */
+	unsigned int pcpu_id = get_cpu();
+	put_cpu();
+
 	/* Resolution logic:
 	 * 1. There is a valid state with matching selector. Done.
 	 * 2. Valid state with inappropriate selector. Skip.
@@ -1174,13 +1181,18 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x,
 							&fl->u.__fl_common))
 			return;
 
+		if (x->pcpu_num != UINT_MAX && x->pcpu_num != pcpu_id)
+			return;
+
 		if (!*best ||
+		    ((*best)->pcpu_num == UINT_MAX && x->pcpu_num == pcpu_id) ||
 		    (*best)->km.dying > x->km.dying ||
 		    ((*best)->km.dying == x->km.dying &&
 		     (*best)->curlft.add_time < x->curlft.add_time))
 			*best = x;
 	} else if (x->km.state == XFRM_STATE_ACQ) {
-		*acq_in_progress = 1;
+		if (!*best || x->pcpu_num == pcpu_id)
+			*acq_in_progress = 1;
 	} else if (x->km.state == XFRM_STATE_ERROR ||
 		   x->km.state == XFRM_STATE_EXPIRED) {
 		if ((!x->sel.family ||
@@ -1209,6 +1221,13 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	unsigned short encap_family = tmpl->encap_family;
 	unsigned int sequence;
 	struct km_event c;
+	unsigned int pcpu_id;
+
+	/* We need the cpu id just as a lookup key,
+	 * we don't require it to be stable.
+	 */
+	pcpu_id = get_cpu();
+	put_cpu();
 
 	to_put = NULL;
 
@@ -1282,7 +1301,10 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	}
 
 found:
-	x = best;
+	if (!(pol->flags & XFRM_POLICY_CPU_ACQUIRE) ||
+	    (best && (best->pcpu_num == pcpu_id)))
+		x = best;
+
 	if (!x && !error && !acquire_in_progress) {
 		if (tmpl->id.spi &&
 		    (x0 = __xfrm_state_lookup_all(net, mark, daddr,
@@ -1314,6 +1336,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		xfrm_init_tempstate(x, fl, tmpl, daddr, saddr, family);
 		memcpy(&x->mark, &pol->mark, sizeof(x->mark));
 		x->if_id = if_id;
+		if ((pol->flags & XFRM_POLICY_CPU_ACQUIRE) && best)
+			x->pcpu_num = pcpu_id;
 
 		error = security_xfrm_state_alloc_acquire(x, pol->security, fl->flowi_secid);
 		if (error) {
@@ -1392,6 +1416,11 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 			x = NULL;
 			error = -ESRCH;
 		}
+
+		/* Use the already installed 'fallback' while the CPU-specific
+		 * SA acquire is handled*/
+		if (best)
+			x = best;
 	}
 out:
 	if (x) {
@@ -1524,12 +1553,14 @@ static void __xfrm_state_bump_genids(struct xfrm_state *xnew)
 	unsigned int h;
 	u32 mark = xnew->mark.v & xnew->mark.m;
 	u32 if_id = xnew->if_id;
+	u32 cpu_id = xnew->pcpu_num;
 
 	h = xfrm_dst_hash(net, &xnew->id.daddr, &xnew->props.saddr, reqid, family);
 	hlist_for_each_entry(x, net->xfrm.state_bydst+h, bydst) {
 		if (x->props.family	== family &&
 		    x->props.reqid	== reqid &&
 		    x->if_id		== if_id &&
+		    x->pcpu_num		== cpu_id &&
 		    (mark & x->mark.m) == x->mark.v &&
 		    xfrm_addr_equal(&x->id.daddr, &xnew->id.daddr, family) &&
 		    xfrm_addr_equal(&x->props.saddr, &xnew->props.saddr, family))
@@ -1552,7 +1583,7 @@ EXPORT_SYMBOL(xfrm_state_insert);
 static struct xfrm_state *__find_acq_core(struct net *net,
 					  const struct xfrm_mark *m,
 					  unsigned short family, u8 mode,
-					  u32 reqid, u32 if_id, u8 proto,
+					  u32 reqid, u32 if_id, u32 pcpu_num, u8 proto,
 					  const xfrm_address_t *daddr,
 					  const xfrm_address_t *saddr,
 					  int create)
@@ -1569,6 +1600,7 @@ static struct xfrm_state *__find_acq_core(struct net *net,
 		    x->id.spi       != 0 ||
 		    x->id.proto	    != proto ||
 		    (mark & x->mark.m) != x->mark.v ||
+		    x->pcpu_num != pcpu_num ||
 		    !xfrm_addr_equal(&x->id.daddr, daddr, family) ||
 		    !xfrm_addr_equal(&x->props.saddr, saddr, family))
 			continue;
@@ -1602,6 +1634,7 @@ static struct xfrm_state *__find_acq_core(struct net *net,
 			break;
 		}
 
+		x->pcpu_num = pcpu_num;
 		x->km.state = XFRM_STATE_ACQ;
 		x->id.proto = proto;
 		x->props.family = family;
@@ -1630,7 +1663,7 @@ static struct xfrm_state *__find_acq_core(struct net *net,
 	return x;
 }
 
-static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq);
+static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num);
 
 int xfrm_state_add(struct xfrm_state *x)
 {
@@ -1656,7 +1689,7 @@ int xfrm_state_add(struct xfrm_state *x)
 	}
 
 	if (use_spi && x->km.seq) {
-		x1 = __xfrm_find_acq_byseq(net, mark, x->km.seq);
+		x1 = __xfrm_find_acq_byseq(net, mark, x->km.seq, x->pcpu_num);
 		if (x1 && ((x1->id.proto != x->id.proto) ||
 		    !xfrm_addr_equal(&x1->id.daddr, &x->id.daddr, family))) {
 			to_put = x1;
@@ -1666,7 +1699,7 @@ int xfrm_state_add(struct xfrm_state *x)
 
 	if (use_spi && !x1)
 		x1 = __find_acq_core(net, &x->mark, family, x->props.mode,
-				     x->props.reqid, x->if_id, x->id.proto,
+				     x->props.reqid, x->if_id, x->pcpu_num, x->id.proto,
 				     &x->id.daddr, &x->props.saddr, 0);
 
 	__xfrm_state_bump_genids(x);
@@ -1791,6 +1824,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
 	x->props.flags = orig->props.flags;
 	x->props.extra_flags = orig->props.extra_flags;
 
+	x->pcpu_num = orig->pcpu_num;
 	x->if_id = orig->if_id;
 	x->tfcpad = orig->tfcpad;
 	x->replay_maxdiff = orig->replay_maxdiff;
@@ -2066,13 +2100,14 @@ EXPORT_SYMBOL(xfrm_state_lookup_byaddr);
 
 struct xfrm_state *
 xfrm_find_acq(struct net *net, const struct xfrm_mark *mark, u8 mode, u32 reqid,
-	      u32 if_id, u8 proto, const xfrm_address_t *daddr,
+	      u32 if_id, u32 pcpu_num, u8 proto, const xfrm_address_t *daddr,
 	      const xfrm_address_t *saddr, int create, unsigned short family)
 {
 	struct xfrm_state *x;
 
 	spin_lock_bh(&net->xfrm.xfrm_state_lock);
-	x = __find_acq_core(net, mark, family, mode, reqid, if_id, proto, daddr, saddr, create);
+	x = __find_acq_core(net, mark, family, mode, reqid, if_id, pcpu_num,
+			    proto, daddr, saddr, create);
 	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
 
 	return x;
@@ -2207,7 +2242,7 @@ xfrm_state_sort(struct xfrm_state **dst, struct xfrm_state **src, int n,
 
 /* Silly enough, but I'm lazy to build resolution list */
 
-static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq)
+static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num)
 {
 	unsigned int h = xfrm_seq_hash(net, seq);
 	struct xfrm_state *x;
@@ -2215,6 +2250,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 s
 	hlist_for_each_entry_rcu(x, net->xfrm.state_byseq + h, byseq) {
 		if (x->km.seq == seq &&
 		    (mark & x->mark.m) == x->mark.v &&
+		    x->pcpu_num == pcpu_num &&
 		    x->km.state == XFRM_STATE_ACQ) {
 			xfrm_state_hold(x);
 			return x;
@@ -2224,12 +2260,12 @@ static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 s
 	return NULL;
 }
 
-struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq)
+struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num)
 {
 	struct xfrm_state *x;
 
 	spin_lock_bh(&net->xfrm.xfrm_state_lock);
-	x = __xfrm_find_acq_byseq(net, mark, seq);
+	x = __xfrm_find_acq_byseq(net, mark, seq, pcpu_num);
 	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
 	return x;
 }
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2b10a45ff124..6bf53e17d382 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -456,6 +456,12 @@ static int verify_newsa_info(struct xfrm_usersa_info *p,
 		}
 	}
 
+	if (!sa_dir && attrs[XFRMA_SA_PCPU]) {
+		NL_SET_ERR_MSG(extack, "SA_PCPU only supported with SA_DIR");
+		err = -EINVAL;
+		goto out;
+	}
+
 out:
 	return err;
 }
@@ -837,6 +843,12 @@ static struct xfrm_state *xfrm_state_construct(struct net *net,
 		x->nat_keepalive_interval =
 			nla_get_u32(attrs[XFRMA_NAT_KEEPALIVE_INTERVAL]);
 
+	if (attrs[XFRMA_SA_PCPU]) {
+		x->pcpu_num = nla_get_u32(attrs[XFRMA_SA_PCPU]);
+		if (x->pcpu_num >= num_possible_cpus())
+			goto error;
+	}
+
 	err = __xfrm_init_state(x, false, attrs[XFRMA_OFFLOAD_DEV], extack);
 	if (err)
 		goto error;
@@ -1290,6 +1302,11 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 		if (ret)
 			goto out;
 	}
+	if (x->pcpu_num != UINT_MAX) {
+		ret = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num);
+		if (ret)
+			goto out;
+	}
 	if (x->dir)
 		ret = nla_put_u8(skb, XFRMA_SA_DIR, x->dir);
 
@@ -1694,6 +1711,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 	u32 mark;
 	struct xfrm_mark m;
 	u32 if_id = 0;
+	u32 pcpu_num = UINT_MAX;
 
 	p = nlmsg_data(nlh);
 	err = verify_spi_info(p->info.id.proto, p->min, p->max, extack);
@@ -1710,8 +1728,16 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (attrs[XFRMA_IF_ID])
 		if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
 
+	if (attrs[XFRMA_SA_PCPU]) {
+		pcpu_num = nla_get_u32(attrs[XFRMA_SA_PCPU]);
+		if (pcpu_num >= num_possible_cpus()) {
+			err = -EINVAL;
+			goto out_noput;
+		}
+	}
+
 	if (p->info.seq) {
-		x = xfrm_find_acq_byseq(net, mark, p->info.seq);
+		x = xfrm_find_acq_byseq(net, mark, p->info.seq, pcpu_num);
 		if (x && !xfrm_addr_equal(&x->id.daddr, daddr, family)) {
 			xfrm_state_put(x);
 			x = NULL;
@@ -1720,7 +1746,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	if (!x)
 		x = xfrm_find_acq(net, &m, p->info.mode, p->info.reqid,
-				  if_id, p->info.id.proto, daddr,
+				  if_id, pcpu_num, p->info.id.proto, daddr,
 				  &p->info.saddr, 1,
 				  family);
 	err = -ENOENT;
@@ -2521,6 +2547,7 @@ static inline unsigned int xfrm_aevent_msgsize(struct xfrm_state *x)
 	       + nla_total_size(4) /* XFRM_AE_RTHR */
 	       + nla_total_size(4) /* XFRM_AE_ETHR */
 	       + nla_total_size(sizeof(x->dir)); /* XFRMA_SA_DIR */
+	       + nla_total_size(4); /* XFRMA_SA_PCPU */
 }
 
 static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c)
@@ -2576,6 +2603,8 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
 	err = xfrm_if_id_put(skb, x->if_id);
 	if (err)
 		goto out_cancel;
+	if (x->pcpu_num != UINT_MAX)
+		err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num);
 
 	if (x->dir) {
 		err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir);
@@ -2846,6 +2875,13 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	xfrm_mark_get(attrs, &mark);
 
+	if (attrs[XFRMA_SA_PCPU]) {
+		x->pcpu_num = nla_get_u32(attrs[XFRMA_SA_PCPU]);
+		err = -EINVAL;
+		if (x->pcpu_num >= num_possible_cpus())
+			goto free_state;
+	}
+
 	err = verify_newpolicy_info(&ua->policy, extack);
 	if (err)
 		goto free_state;
@@ -3176,6 +3212,7 @@ const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
 	[XFRMA_MTIMER_THRESH]   = { .type = NLA_U32 },
 	[XFRMA_SA_DIR]          = NLA_POLICY_RANGE(NLA_U8, XFRM_SA_DIR_IN, XFRM_SA_DIR_OUT),
 	[XFRMA_NAT_KEEPALIVE_INTERVAL] = { .type = NLA_U32 },
+	[XFRMA_SA_PCPU]		= { .type = NLA_U32 },
 };
 EXPORT_SYMBOL_GPL(xfrma_policy);
 
@@ -3342,7 +3379,8 @@ static inline unsigned int xfrm_expire_msgsize(void)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_user_expire)) +
 	       nla_total_size(sizeof(struct xfrm_mark)) +
-	       nla_total_size(sizeof_field(struct xfrm_state, dir));
+	       nla_total_size(sizeof_field(struct xfrm_state, dir)) +
+	       nla_total_size(4); /* XFRMA_SA_PCPU */
 }
 
 static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c)
@@ -3368,6 +3406,11 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
 	err = xfrm_if_id_put(skb, x->if_id);
 	if (err)
 		return err;
+	if (x->pcpu_num != UINT_MAX) {
+		err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num);
+		if (err)
+			return err;
+	}
 
 	if (x->dir) {
 		err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir);
@@ -3475,6 +3518,8 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 	}
 	if (x->if_id)
 		l += nla_total_size(sizeof(x->if_id));
+	if (x->pcpu_num)
+		l += nla_total_size(sizeof(x->pcpu_num));
 
 	/* Must count x->lastused as it may become non-zero behind our back. */
 	l += nla_total_size_64bit(sizeof(u64));
@@ -3581,6 +3626,7 @@ static inline unsigned int xfrm_acquire_msgsize(struct xfrm_state *x,
 	       + nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr)
 	       + nla_total_size(sizeof(struct xfrm_mark))
 	       + nla_total_size(xfrm_user_sec_ctx_size(x->security))
+	       + nla_total_size(4) /* XFRMA_SA_PCPU */
 	       + userpolicy_type_attrsize();
 }
 
@@ -3617,6 +3663,8 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
 		err = xfrm_if_id_put(skb, xp->if_id);
 	if (!err && xp->xdo.dev)
 		err = copy_user_offload(&xp->xdo, skb);
+	if (!err && x->pcpu_num != UINT_MAX)
+		err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] xfrm: Cache used outbound xfrm states at the policy.
  2024-10-07  6:44 [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
  2024-10-07  6:44 ` [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Steffen Klassert
@ 2024-10-07  6:44 ` Steffen Klassert
  2024-10-07 14:26   ` Jakub Kicinski
  2024-10-07  6:44 ` [PATCH 3/4] xfrm: Add an inbound percpu state cache Steffen Klassert
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Steffen Klassert @ 2024-10-07  6:44 UTC (permalink / raw)
  To: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev
  Cc: Steffen Klassert, devel

Now that we can have percpu xfrm states, the number of active
states might increase. To get a better lookup performance,
we cache the used xfrm states at the policy for outbound
IPsec traffic.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h     |  3 +++
 net/xfrm/xfrm_policy.c | 12 +++++++++
 net/xfrm/xfrm_state.c  | 55 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e23ad52824e2..17e5edc58b89 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -184,6 +184,7 @@ struct xfrm_state {
 	};
 	struct hlist_node	byspi;
 	struct hlist_node	byseq;
+	struct hlist_node	state_cache;
 
 	refcount_t		refcnt;
 	spinlock_t		lock;
@@ -562,6 +563,8 @@ struct xfrm_policy {
 	struct hlist_node	bydst;
 	struct hlist_node	byidx;
 
+	struct hlist_head	state_cache_list;
+
 	/* This lock only affects elements except for entry. */
 	rwlock_t		lock;
 	refcount_t		refcnt;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 914bac03b52a..82d1e0b9be70 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -413,6 +413,7 @@ struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp)
 	if (policy) {
 		write_pnet(&policy->xp_net, net);
 		INIT_LIST_HEAD(&policy->walk.all);
+		INIT_HLIST_HEAD(&policy->state_cache_list);
 		INIT_HLIST_NODE(&policy->bydst);
 		INIT_HLIST_NODE(&policy->byidx);
 		rwlock_init(&policy->lock);
@@ -454,6 +455,9 @@ EXPORT_SYMBOL(xfrm_policy_destroy);
 
 static void xfrm_policy_kill(struct xfrm_policy *policy)
 {
+	struct net *net = xp_net(policy);
+	struct xfrm_state *x;
+
 	xfrm_dev_policy_delete(policy);
 
 	write_lock_bh(&policy->lock);
@@ -469,6 +473,13 @@ static void xfrm_policy_kill(struct xfrm_policy *policy)
 	if (del_timer(&policy->timer))
 		xfrm_pol_put(policy);
 
+	/* XXX: Flush state cache */
+	spin_lock_bh(&net->xfrm.xfrm_state_lock);
+	hlist_for_each_entry_rcu(x, &policy->state_cache_list, state_cache) {
+		hlist_del_init_rcu(&x->state_cache);
+	}
+	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+
 	xfrm_pol_put(policy);
 }
 
@@ -3249,6 +3260,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
 		dst_release(dst);
 		dst = dst_orig;
 	}
+
 ok:
 	xfrm_pols_put(pols, drop_pols);
 	if (dst && dst->xfrm &&
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index ebef07b80afa..a2047825f6c8 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -665,6 +665,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net)
 		refcount_set(&x->refcnt, 1);
 		atomic_set(&x->tunnel_users, 0);
 		INIT_LIST_HEAD(&x->km.all);
+		INIT_HLIST_NODE(&x->state_cache);
 		INIT_HLIST_NODE(&x->bydst);
 		INIT_HLIST_NODE(&x->bysrc);
 		INIT_HLIST_NODE(&x->byspi);
@@ -744,12 +745,15 @@ int __xfrm_state_delete(struct xfrm_state *x)
 
 	if (x->km.state != XFRM_STATE_DEAD) {
 		x->km.state = XFRM_STATE_DEAD;
+
 		spin_lock(&net->xfrm.xfrm_state_lock);
 		list_del(&x->km.all);
 		hlist_del_rcu(&x->bydst);
 		hlist_del_rcu(&x->bysrc);
 		if (x->km.seq)
 			hlist_del_rcu(&x->byseq);
+		if (!hlist_unhashed(&x->state_cache))
+			hlist_del_rcu(&x->state_cache);
 		if (x->id.spi)
 			hlist_del_rcu(&x->byspi);
 		net->xfrm.state_num--;
@@ -1222,6 +1226,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	unsigned int sequence;
 	struct km_event c;
 	unsigned int pcpu_id;
+	bool cached = false;
 
 	/* We need the cpu id just as a lookup key,
 	 * we don't require it to be stable.
@@ -1234,6 +1239,46 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	sequence = read_seqcount_begin(&net->xfrm.xfrm_state_hash_generation);
 
 	rcu_read_lock();
+	hlist_for_each_entry_rcu(x, &pol->state_cache_list, state_cache) {
+		if (x->props.family == encap_family &&
+		    x->props.reqid == tmpl->reqid &&
+		    (mark & x->mark.m) == x->mark.v &&
+		    x->if_id == if_id &&
+		    !(x->props.flags & XFRM_STATE_WILDRECV) &&
+		    xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
+		    tmpl->mode == x->props.mode &&
+		    tmpl->id.proto == x->id.proto &&
+		    (tmpl->id.spi == x->id.spi || !tmpl->id.spi))
+			xfrm_state_look_at(pol, x, fl, encap_family,
+					   &best, &acquire_in_progress, &error);
+	}
+
+	if (best)
+		goto cached;
+
+	hlist_for_each_entry_rcu(x, &pol->state_cache_list, state_cache) {
+		if (x->props.family == encap_family &&
+		    x->props.reqid == tmpl->reqid &&
+		    (mark & x->mark.m) == x->mark.v &&
+		    x->if_id == if_id &&
+		    !(x->props.flags & XFRM_STATE_WILDRECV) &&
+		    xfrm_addr_equal(&x->id.daddr, daddr, encap_family) &&
+		    tmpl->mode == x->props.mode &&
+		    tmpl->id.proto == x->id.proto &&
+		    (tmpl->id.spi == x->id.spi || !tmpl->id.spi))
+			xfrm_state_look_at(pol, x, fl, family,
+					   &best, &acquire_in_progress, &error);
+	}
+
+cached:
+	cached = true;
+	if (best)
+		goto found;
+	else if (error)
+		best = NULL;
+	else if (acquire_in_progress) /* XXX: acquire_in_progress should not happen */
+		WARN_ON(1);
+
 	h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family);
 	hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) {
 #ifdef CONFIG_XFRM_OFFLOAD
@@ -1383,6 +1428,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 			XFRM_STATE_INSERT(bysrc, &x->bysrc,
 					  net->xfrm.state_bysrc + h,
 					  x->xso.type);
+			INIT_HLIST_NODE(&x->state_cache);
 			if (x->id.spi) {
 				h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family);
 				XFRM_STATE_INSERT(byspi, &x->byspi,
@@ -1431,6 +1477,15 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	} else {
 		*err = acquire_in_progress ? -EAGAIN : error;
 	}
+
+	if (x && x->km.state == XFRM_STATE_VALID && !cached &&
+	    (!(pol->flags & XFRM_POLICY_CPU_ACQUIRE) || x->pcpu_num == pcpu_id)) {
+		spin_lock_bh(&net->xfrm.xfrm_state_lock);
+		if (hlist_unhashed(&x->state_cache))
+			hlist_add_head_rcu(&x->state_cache, &pol->state_cache_list);
+		spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+	}
+
 	rcu_read_unlock();
 	if (to_put)
 		xfrm_state_put(to_put);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] xfrm: Add an inbound percpu state cache.
  2024-10-07  6:44 [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
  2024-10-07  6:44 ` [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Steffen Klassert
  2024-10-07  6:44 ` [PATCH 2/4] xfrm: Cache used outbound xfrm states at the policy Steffen Klassert
@ 2024-10-07  6:44 ` Steffen Klassert
  2024-10-07  6:44 ` [PATCH 4/4] xfrm: Restrict percpu SA attribute to specific netlink message types Steffen Klassert
  2024-10-07  6:48 ` [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
  4 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-10-07  6:44 UTC (permalink / raw)
  To: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev
  Cc: Steffen Klassert, devel

Now that we can have percpu xfrm states, the number of active
states might increase. To get a better lookup performance,
we add a percpu cache to cache the used inbound xfrm states.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/netns/xfrm.h |  1 +
 include/net/xfrm.h       |  5 ++++
 net/ipv4/esp4_offload.c  |  6 ++---
 net/ipv6/esp6_offload.c  |  6 ++---
 net/xfrm/xfrm_input.c    |  2 +-
 net/xfrm/xfrm_state.c    | 57 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index d489d9250bff..4e0702598d52 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -43,6 +43,7 @@ struct netns_xfrm {
 	struct hlist_head	__rcu *state_bysrc;
 	struct hlist_head	__rcu *state_byspi;
 	struct hlist_head	__rcu *state_byseq;
+	struct hlist_head	 __percpu *state_cache_input;
 	unsigned int		state_hmask;
 	unsigned int		state_num;
 	struct work_struct	state_hash_work;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 17e5edc58b89..ae25d18e3236 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -185,6 +185,7 @@ struct xfrm_state {
 	struct hlist_node	byspi;
 	struct hlist_node	byseq;
 	struct hlist_node	state_cache;
+	struct hlist_node	state_cache_input;
 
 	refcount_t		refcnt;
 	spinlock_t		lock;
@@ -1644,6 +1645,10 @@ int xfrm_state_update(struct xfrm_state *x);
 struct xfrm_state *xfrm_state_lookup(struct net *net, u32 mark,
 				     const xfrm_address_t *daddr, __be32 spi,
 				     u8 proto, unsigned short family);
+struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family);
 struct xfrm_state *xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 					    const xfrm_address_t *daddr,
 					    const xfrm_address_t *saddr,
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index 80c4ea0e12f4..e0d94270da28 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -53,9 +53,9 @@ static struct sk_buff *esp4_gro_receive(struct list_head *head,
 		if (sp->len == XFRM_MAX_DEPTH)
 			goto out_reset;
 
-		x = xfrm_state_lookup(dev_net(skb->dev), skb->mark,
-				      (xfrm_address_t *)&ip_hdr(skb)->daddr,
-				      spi, IPPROTO_ESP, AF_INET);
+		x = xfrm_input_state_lookup(dev_net(skb->dev), skb->mark,
+					    (xfrm_address_t *)&ip_hdr(skb)->daddr,
+					    spi, IPPROTO_ESP, AF_INET);
 
 		if (unlikely(x && x->dir && x->dir != XFRM_SA_DIR_IN)) {
 			/* non-offload path will record the error and audit log */
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index 919ebfabbe4e..7b41fb4f00b5 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -80,9 +80,9 @@ static struct sk_buff *esp6_gro_receive(struct list_head *head,
 		if (sp->len == XFRM_MAX_DEPTH)
 			goto out_reset;
 
-		x = xfrm_state_lookup(dev_net(skb->dev), skb->mark,
-				      (xfrm_address_t *)&ipv6_hdr(skb)->daddr,
-				      spi, IPPROTO_ESP, AF_INET6);
+		x = xfrm_input_state_lookup(dev_net(skb->dev), skb->mark,
+					    (xfrm_address_t *)&ipv6_hdr(skb)->daddr,
+					    spi, IPPROTO_ESP, AF_INET6);
 
 		if (unlikely(x && x->dir && x->dir != XFRM_SA_DIR_IN)) {
 			/* non-offload path will record the error and audit log */
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 749e7eea99e4..841a60a6fbfe 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -572,7 +572,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			goto drop;
 		}
 
-		x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
+		x = xfrm_input_state_lookup(net, mark, daddr, spi, nexthdr, family);
 		if (x == NULL) {
 			secpath_reset(skb);
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index a2047825f6c8..e3266a5d4f90 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -754,6 +754,9 @@ int __xfrm_state_delete(struct xfrm_state *x)
 			hlist_del_rcu(&x->byseq);
 		if (!hlist_unhashed(&x->state_cache))
 			hlist_del_rcu(&x->state_cache);
+		if (!hlist_unhashed(&x->state_cache_input))
+			hlist_del_rcu(&x->state_cache_input);
+
 		if (x->id.spi)
 			hlist_del_rcu(&x->byspi);
 		net->xfrm.state_num--;
@@ -1106,6 +1109,52 @@ static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark,
 	return NULL;
 }
 
+struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family)
+{
+	struct hlist_head *state_cache_input;
+	struct xfrm_state *x = NULL;
+	int cpu = get_cpu();
+
+	state_cache_input =  per_cpu_ptr(net->xfrm.state_cache_input, cpu);
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
+		if (x->props.family != family ||
+		    x->id.spi       != spi ||
+		    x->id.proto     != proto ||
+		    !xfrm_addr_equal(&x->id.daddr, daddr, family))
+			continue;
+
+		if ((mark & x->mark.m) != x->mark.v)
+			continue;
+		if (!xfrm_state_hold_rcu(x))
+			continue;
+		goto out;
+	}
+
+	x = __xfrm_state_lookup(net, mark, daddr, spi, proto, family);
+
+	if (x && x->km.state == XFRM_STATE_VALID) {
+		spin_lock_bh(&net->xfrm.xfrm_state_lock);
+		if (hlist_unhashed(&x->state_cache_input)) {
+			hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
+		} else {
+			hlist_del_rcu(&x->state_cache_input);
+			hlist_add_head_rcu(&x->state_cache_input, state_cache_input);
+		}
+		spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+	}
+
+out:
+	rcu_read_unlock();
+	put_cpu();
+	return x;
+}
+EXPORT_SYMBOL(xfrm_input_state_lookup);
+
 static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 						     const xfrm_address_t *daddr,
 						     const xfrm_address_t *saddr,
@@ -3079,6 +3128,11 @@ int __net_init xfrm_state_init(struct net *net)
 	net->xfrm.state_byseq = xfrm_hash_alloc(sz);
 	if (!net->xfrm.state_byseq)
 		goto out_byseq;
+
+	net->xfrm.state_cache_input = alloc_percpu(struct hlist_head);
+	if (!net->xfrm.state_cache_input)
+		goto out_state_cache_input;
+
 	net->xfrm.state_hmask = ((sz / sizeof(struct hlist_head)) - 1);
 
 	net->xfrm.state_num = 0;
@@ -3088,6 +3142,8 @@ int __net_init xfrm_state_init(struct net *net)
 			       &net->xfrm.xfrm_state_lock);
 	return 0;
 
+out_state_cache_input:
+	xfrm_hash_free(net->xfrm.state_byseq, sz);
 out_byseq:
 	xfrm_hash_free(net->xfrm.state_byspi, sz);
 out_byspi:
@@ -3117,6 +3173,7 @@ void xfrm_state_fini(struct net *net)
 	xfrm_hash_free(net->xfrm.state_bysrc, sz);
 	WARN_ON(!hlist_empty(net->xfrm.state_bydst));
 	xfrm_hash_free(net->xfrm.state_bydst, sz);
+	free_percpu(net->xfrm.state_cache_input);
 }
 
 #ifdef CONFIG_AUDITSYSCALL
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] xfrm: Restrict percpu SA attribute to specific netlink message types
  2024-10-07  6:44 [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
                   ` (2 preceding siblings ...)
  2024-10-07  6:44 ` [PATCH 3/4] xfrm: Add an inbound percpu state cache Steffen Klassert
@ 2024-10-07  6:44 ` Steffen Klassert
  2024-10-07  6:48 ` [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
  4 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-10-07  6:44 UTC (permalink / raw)
  To: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev
  Cc: Steffen Klassert, devel

Reject the usage of XFRMA_SA_PCPU in xfrm netlink messages when
it's not applicable.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_user.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 6bf53e17d382..291bc320c072 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3276,6 +3276,20 @@ static int xfrm_reject_unused_attr(int type, struct nlattr **attrs,
 		}
 	}
 
+	if (attrs[XFRMA_SA_PCPU]) {
+		switch (type) {
+		case XFRM_MSG_NEWSA:
+		case XFRM_MSG_UPDSA:
+		case XFRM_MSG_ALLOCSPI:
+		case XFRM_MSG_ACQUIRE:
+
+			break;
+		default:
+			NL_SET_ERR_MSG(extack, "Invalid attribute SA_PCPU");
+			return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states
  2024-10-07  6:44 [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
                   ` (3 preceding siblings ...)
  2024-10-07  6:44 ` [PATCH 4/4] xfrm: Restrict percpu SA attribute to specific netlink message types Steffen Klassert
@ 2024-10-07  6:48 ` Steffen Klassert
  4 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-10-07  6:48 UTC (permalink / raw)
  To: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev
  Cc: devel

I forgot to mention that this is the v2 patchset and based on
the ipsec-next tree...

On Mon, Oct 07, 2024 at 08:44:49AM +0200, Steffen Klassert wrote:
> This patchset implements the xfrm part of per cpu SAs as specified in
> RFC 9611.
> 
> Patch 1 adds the cpu as a lookup key and config option to to generate
> acquire messages for each cpu.
> 
> Patch 2 caches outbound states at the policy.
> 
> Patch 3 caches inbound states on a new percpu state cache.
> 
> Patch 4 restricts percpu SA attributes to specific netlink message types.
> 
> Please review and test.
> 
> ---
> 
> Changes from v1:
> 
> - Add compat layer attributes
> 
> - Fix a 'use always slowpath' condition
> 
> - Document get_cpu() usage
> 
> - Fix forgotten update of xfrm_expire_msgsize()

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] xfrm: Cache used outbound xfrm states at the policy.
  2024-10-07  6:44 ` [PATCH 2/4] xfrm: Cache used outbound xfrm states at the policy Steffen Klassert
@ 2024-10-07 14:26   ` Jakub Kicinski
  2024-10-11  8:21     ` Steffen Klassert
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-10-07 14:26 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev, devel

On Mon, 7 Oct 2024 08:44:51 +0200 Steffen Klassert wrote:
> Now that we can have percpu xfrm states, the number of active
> states might increase. To get a better lookup performance,
> we cache the used xfrm states at the policy for outbound
> IPsec traffic.

missing kdoc here, FWIW:

include/net/xfrm.h:595: warning: Function parameter or struct member 'state_cache_list' not described in 'xfrm_policy'

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
  2024-10-07  6:44 ` [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Steffen Klassert
@ 2024-10-08 16:47   ` Simon Horman
  2024-10-11  8:22     ` Steffen Klassert
  2024-10-10 18:22   ` kernel test robot
  1 sibling, 1 reply; 15+ messages in thread
From: Simon Horman @ 2024-10-08 16:47 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Sabrina Dubroca, netdev, devel

On Mon, Oct 07, 2024 at 08:44:50AM +0200, Steffen Klassert wrote:
> Currently all flows for a certain SA must be processed by the same
> cpu to avoid packet reordering and lock contention of the xfrm
> state lock.
> 
> To get rid of this limitation, the IETF is about to standardize
> per cpu SAs. This patch implements the xfrm part of it:
> 
> https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/
> 
> This adds the cpu as a lookup key for xfrm states and a config option
> to generate acquire messages for each cpu.
> 
> With that, we can have on each cpu a SA with identical traffic selector
> so that flows can be processed in parallel on all cpu.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

...

> @@ -2521,6 +2547,7 @@ static inline unsigned int xfrm_aevent_msgsize(struct xfrm_state *x)
>  	       + nla_total_size(4) /* XFRM_AE_RTHR */
>  	       + nla_total_size(4) /* XFRM_AE_ETHR */
>  	       + nla_total_size(sizeof(x->dir)); /* XFRMA_SA_DIR */
> +	       + nla_total_size(4); /* XFRMA_SA_PCPU */

Hi Steffen,

It looks like the ';' needs to be dropped from the x->dir line.
(Completely untested!)

	       + nla_total_size(sizeof(x->dir)) /* XFRMA_SA_DIR */
	       + nla_total_size(4); /* XFRMA_SA_PCPU */

Flagged by Smatch.

>  }
>  
>  static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c)

...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
  2024-10-07  6:44 ` [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Steffen Klassert
  2024-10-08 16:47   ` Simon Horman
@ 2024-10-10 18:22   ` kernel test robot
  1 sibling, 0 replies; 15+ messages in thread
From: kernel test robot @ 2024-10-10 18:22 UTC (permalink / raw)
  To: Steffen Klassert, Tobias Brunner, Antony Antony, Daniel Xu,
	Paul Wouters, Simon Horman, Sabrina Dubroca, netdev
  Cc: oe-kbuild-all, Steffen Klassert, devel

Hi Steffen,

kernel test robot noticed the following build warnings:

[auto build test WARNING on klassert-ipsec-next/master]
[also build test WARNING on klassert-ipsec/master net/main net-next/main linus/master v6.12-rc2 next-20241010]
[cannot apply to horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steffen-Klassert/xfrm-Add-support-for-per-cpu-xfrm-state-handling/20241007-145514
base:   https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master
patch link:    https://lore.kernel.org/r/20241007064453.2171933-2-steffen.klassert%40secunet.com
patch subject: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
config: x86_64-randconfig-161-20241010 (https://download.01.org/0day-ci/archive/20241011/202410110224.x8I8OGK4-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410110224.x8I8OGK4-lkp@intel.com/

New smatch warnings:
net/xfrm/xfrm_user.c:2550 xfrm_aevent_msgsize() warn: inconsistent indenting

Old smatch warnings:
net/xfrm/xfrm_user.c:912 xfrm_add_sa() warn: missing error code? 'err'
net/xfrm/xfrm_user.c:2128 xfrm_add_policy() warn: missing error code? 'err'
net/xfrm/xfrm_user.c:2895 xfrm_add_acquire() warn: missing error code 'err'

vim +2550 net/xfrm/xfrm_user.c

  2536	
  2537	static inline unsigned int xfrm_aevent_msgsize(struct xfrm_state *x)
  2538	{
  2539		unsigned int replay_size = x->replay_esn ?
  2540				      xfrm_replay_state_esn_len(x->replay_esn) :
  2541				      sizeof(struct xfrm_replay_state);
  2542	
  2543		return NLMSG_ALIGN(sizeof(struct xfrm_aevent_id))
  2544		       + nla_total_size(replay_size)
  2545		       + nla_total_size_64bit(sizeof(struct xfrm_lifetime_cur))
  2546		       + nla_total_size(sizeof(struct xfrm_mark))
  2547		       + nla_total_size(4) /* XFRM_AE_RTHR */
  2548		       + nla_total_size(4) /* XFRM_AE_ETHR */
  2549		       + nla_total_size(sizeof(x->dir)); /* XFRMA_SA_DIR */
> 2550		       + nla_total_size(4); /* XFRMA_SA_PCPU */
  2551	}
  2552	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] xfrm: Cache used outbound xfrm states at the policy.
  2024-10-07 14:26   ` Jakub Kicinski
@ 2024-10-11  8:21     ` Steffen Klassert
  0 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-10-11  8:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Simon Horman, Sabrina Dubroca, netdev, devel

On Mon, Oct 07, 2024 at 07:26:23AM -0700, Jakub Kicinski wrote:
> On Mon, 7 Oct 2024 08:44:51 +0200 Steffen Klassert wrote:
> > Now that we can have percpu xfrm states, the number of active
> > states might increase. To get a better lookup performance,
> > we cache the used xfrm states at the policy for outbound
> > IPsec traffic.
> 
> missing kdoc here, FWIW:
> 
> include/net/xfrm.h:595: warning: Function parameter or struct member 'state_cache_list' not described in 'xfrm_policy'

Fixed, thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
  2024-10-08 16:47   ` Simon Horman
@ 2024-10-11  8:22     ` Steffen Klassert
  0 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-10-11  8:22 UTC (permalink / raw)
  To: Simon Horman
  Cc: Tobias Brunner, Antony Antony, Daniel Xu, Paul Wouters,
	Sabrina Dubroca, netdev, devel

On Tue, Oct 08, 2024 at 05:47:26PM +0100, Simon Horman wrote:
> On Mon, Oct 07, 2024 at 08:44:50AM +0200, Steffen Klassert wrote:
> > Currently all flows for a certain SA must be processed by the same
> > cpu to avoid packet reordering and lock contention of the xfrm
> > state lock.
> > 
> > To get rid of this limitation, the IETF is about to standardize
> > per cpu SAs. This patch implements the xfrm part of it:
> > 
> > https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/
> > 
> > This adds the cpu as a lookup key for xfrm states and a config option
> > to generate acquire messages for each cpu.
> > 
> > With that, we can have on each cpu a SA with identical traffic selector
> > so that flows can be processed in parallel on all cpu.
> > 
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> ...
> 
> > @@ -2521,6 +2547,7 @@ static inline unsigned int xfrm_aevent_msgsize(struct xfrm_state *x)
> >  	       + nla_total_size(4) /* XFRM_AE_RTHR */
> >  	       + nla_total_size(4) /* XFRM_AE_ETHR */
> >  	       + nla_total_size(sizeof(x->dir)); /* XFRMA_SA_DIR */
> > +	       + nla_total_size(4); /* XFRMA_SA_PCPU */
> 
> Hi Steffen,
> 
> It looks like the ';' needs to be dropped from the x->dir line.
> (Completely untested!)
> 
> 	       + nla_total_size(sizeof(x->dir)) /* XFRMA_SA_DIR */
> 	       + nla_total_size(4); /* XFRMA_SA_PCPU */

Uhm, yes apparently!

Fixed now, thanks!


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
@ 2024-11-11 20:42 Kees Bakker
  2024-11-12 11:03 ` Steffen Klassert
  0 siblings, 1 reply; 15+ messages in thread
From: Kees Bakker @ 2024-11-11 20:42 UTC (permalink / raw)
  To: Steffen Klassert, netdev

Hi Steffen,

Sorry for the direct email. Did you perhaps forgot a "goto out_cancel" here?

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
[...]
@@ -2576,6 +2603,8 @@ static int build_aevent(struct sk_buff *skb, 
struct xfrm_state *x, const struct
      err = xfrm_if_id_put(skb, x->if_id);
      if (err)
          goto out_cancel;
+    if (x->pcpu_num != UINT_MAX)
+        err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num);

      if (x->dir) {
          err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir);

-- 
Kees Bakker

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
  2024-11-11 20:42 [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Kees Bakker
@ 2024-11-12 11:03 ` Steffen Klassert
  2024-11-12 19:21   ` Kees Bakker
  0 siblings, 1 reply; 15+ messages in thread
From: Steffen Klassert @ 2024-11-12 11:03 UTC (permalink / raw)
  To: Kees Bakker; +Cc: netdev

On Mon, Nov 11, 2024 at 09:42:02PM +0100, Kees Bakker wrote:
> Hi Steffen,
> 
> Sorry for the direct email. Did you perhaps forgot a "goto out_cancel" here?

Yes, looks like that. Do you want to send a patch?

> 
> diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
> [...]
> @@ -2576,6 +2603,8 @@ static int build_aevent(struct sk_buff *skb, struct
> xfrm_state *x, const struct
>      err = xfrm_if_id_put(skb, x->if_id);
>      if (err)
>          goto out_cancel;
> +    if (x->pcpu_num != UINT_MAX)
> +        err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num);
> 
>      if (x->dir) {
>          err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir);
> 
> -- 
> Kees Bakker

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
  2024-11-12 11:03 ` Steffen Klassert
@ 2024-11-12 19:21   ` Kees Bakker
  2024-11-14 10:59     ` Steffen Klassert
  0 siblings, 1 reply; 15+ messages in thread
From: Kees Bakker @ 2024-11-12 19:21 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev

Op 12-11-2024 om 12:03 schreef Steffen Klassert:
> On Mon, Nov 11, 2024 at 09:42:02PM +0100, Kees Bakker wrote:
>> Hi Steffen,
>>
>> Sorry for the direct email. Did you perhaps forgot a "goto out_cancel" here?
> Yes, looks like that. Do you want to send a patch?
I prefer that you create the patch.
>
>> diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
>> [...]
>> @@ -2576,6 +2603,8 @@ static int build_aevent(struct sk_buff *skb, struct
>> xfrm_state *x, const struct
>>       err = xfrm_if_id_put(skb, x->if_id);
>>       if (err)
>>           goto out_cancel;
>> +    if (x->pcpu_num != UINT_MAX)
>> +        err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num);
>>
>>       if (x->dir) {
>>           err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir);
>>
>> -- 
>> Kees Bakker


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling.
  2024-11-12 19:21   ` Kees Bakker
@ 2024-11-14 10:59     ` Steffen Klassert
  0 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2024-11-14 10:59 UTC (permalink / raw)
  To: Kees Bakker; +Cc: netdev

On Tue, Nov 12, 2024 at 08:21:07PM +0100, Kees Bakker wrote:
> Op 12-11-2024 om 12:03 schreef Steffen Klassert:
> > On Mon, Nov 11, 2024 at 09:42:02PM +0100, Kees Bakker wrote:
> > > Hi Steffen,
> > > 
> > > Sorry for the direct email. Did you perhaps forgot a "goto out_cancel" here?
> > Yes, looks like that. Do you want to send a patch?
> I prefer that you create the patch.

Someone else sent a patch already. Thanks for the report!

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-11-14 11:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07  6:44 [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
2024-10-07  6:44 ` [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Steffen Klassert
2024-10-08 16:47   ` Simon Horman
2024-10-11  8:22     ` Steffen Klassert
2024-10-10 18:22   ` kernel test robot
2024-10-07  6:44 ` [PATCH 2/4] xfrm: Cache used outbound xfrm states at the policy Steffen Klassert
2024-10-07 14:26   ` Jakub Kicinski
2024-10-11  8:21     ` Steffen Klassert
2024-10-07  6:44 ` [PATCH 3/4] xfrm: Add an inbound percpu state cache Steffen Klassert
2024-10-07  6:44 ` [PATCH 4/4] xfrm: Restrict percpu SA attribute to specific netlink message types Steffen Klassert
2024-10-07  6:48 ` [PATCH 0/4] xfrm: Add support for RFC 9611 per cpu xfrm states Steffen Klassert
  -- strict thread matches above, loose matches on Subject: below --
2024-11-11 20:42 [PATCH 1/4] xfrm: Add support for per cpu xfrm state handling Kees Bakker
2024-11-12 11:03 ` Steffen Klassert
2024-11-12 19:21   ` Kees Bakker
2024-11-14 10:59     ` Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).