netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init
@ 2023-12-11 18:18 Pedro Tammela
  2023-12-11 18:18 ` [PATCH net-next v2 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc Pedro Tammela
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Pedro Tammela @ 2023-12-11 18:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, jhs, xiyou.wangcong, jiri,
	marcelo.leitner, vladbu, Pedro Tammela

Scaling optimizations for action binding in rtnl-less filters.
We saw a noticeable lock contention around idrinfo->lock when
testing in a 56 core system, which disappeared after the patches.

v1->v2:
- Address comments from Vlad

Pedro Tammela (2):
  net/sched: act_api: rely on rcu in tcf_idr_check_alloc
  net/sched: act_api: skip idr replace on bound actions

 include/net/act_api.h |  2 +-
 net/sched/act_api.c   | 76 ++++++++++++++++++++++++++++---------------
 net/sched/cls_api.c   |  2 +-
 3 files changed, 51 insertions(+), 29 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next v2 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc
  2023-12-11 18:18 [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Pedro Tammela
@ 2023-12-11 18:18 ` Pedro Tammela
  2023-12-11 18:18 ` [PATCH net-next v2 2/2] net/sched: act_api: skip idr replace on bound actions Pedro Tammela
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Pedro Tammela @ 2023-12-11 18:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, jhs, xiyou.wangcong, jiri,
	marcelo.leitner, vladbu, Pedro Tammela

Instead of relying only on the idrinfo->lock mutex for
bind/alloc logic, rely on a combination of rcu + mutex + atomics
to better scale the case where multiple rtnl-less filters are
binding to the same action object.

Action binding happens when an action index is specified explicitly and
an action exists which such index exists. Example:
  tc actions add action drop index 1
  tc filter add ... matchall action drop index 1
  tc filter add ... matchall action drop index 1
  tc filter add ... matchall action drop index 1
  tc filter ls ...
     filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1
     not_in_hw
           action order 1: gact action drop
            random type none pass val 0
            index 1 ref 4 bind 3

   filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1
     not_in_hw
           action order 1: gact action drop
            random type none pass val 0
            index 1 ref 4 bind 3

   filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1
     not_in_hw
           action order 1: gact action drop
            random type none pass val 0
            index 1 ref 4 bind 3

When no index is specified, as before, grab the mutex and allocate
in the idr the next available id. In this version, as opposed to before,
it's simplified to store the -EBUSY pointer instead of the previous
alloc + replace combination.

When an index is specified, rely on rcu to find if there's an object in
such index. If there's none, fallback to the above, serializing on the
mutex and reserving the specified id. If there's one, it can be an -EBUSY
pointer, in which case we just try again until it's an action, or an action.
Given the rcu guarantees, the action found could be dead and therefore
we need to bump the refcount if it's not 0, handling the case it's
in fact 0.

As bind and the action refcount are already atomics, these increments can
happen without the mutex protection while many tcf_idr_check_alloc race
to bind to the same action instance.

In case binding encounters a parallel delete or add, it will return
-EAGAIN in order to try again. Both filter and action apis already
have the retry machinery in-place. In case it's an unlocked filter it
retries under the rtnl lock.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
---
 net/sched/act_api.c | 65 ++++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index abec5c45b5a4..688227acac45 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -816,6 +816,9 @@ EXPORT_SYMBOL(tcf_idr_cleanup);
  * its reference and bind counters, and return 1. Otherwise insert temporary
  * error pointer (to prevent concurrent users from inserting actions with same
  * index) and return 0.
+ *
+ * May return -EAGAIN for binding actions in case of a parallel add/delete on
+ * the requested index.
  */
 
 int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
@@ -824,43 +827,61 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 	struct tc_action *p;
 	int ret;
+	u32 max;
 
-again:
-	mutex_lock(&idrinfo->lock);
 	if (*index) {
+again:
+		rcu_read_lock();
 		p = idr_find(&idrinfo->action_idr, *index);
+
 		if (IS_ERR(p)) {
 			/* This means that another process allocated
 			 * index but did not assign the pointer yet.
 			 */
-			mutex_unlock(&idrinfo->lock);
+			rcu_read_unlock();
 			goto again;
 		}
 
-		if (p) {
-			refcount_inc(&p->tcfa_refcnt);
-			if (bind)
-				atomic_inc(&p->tcfa_bindcnt);
-			*a = p;
-			ret = 1;
-		} else {
-			*a = NULL;
-			ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
-					    *index, GFP_KERNEL);
-			if (!ret)
-				idr_replace(&idrinfo->action_idr,
-					    ERR_PTR(-EBUSY), *index);
+		if (!p) {
+			/* Empty slot, try to allocate it */
+			max = *index;
+			rcu_read_unlock();
+			goto new;
+		}
+
+		if (!refcount_inc_not_zero(&p->tcfa_refcnt)) {
+			/* Action was deleted in parallel */
+			rcu_read_unlock();
+			return -EAGAIN;
 		}
+
+		if (bind)
+			atomic_inc(&p->tcfa_bindcnt);
+		*a = p;
+
+		rcu_read_unlock();
+
+		return 1;
 	} else {
+		/* Find a slot */
 		*index = 1;
-		*a = NULL;
-		ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
-				    UINT_MAX, GFP_KERNEL);
-		if (!ret)
-			idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY),
-				    *index);
+		max = UINT_MAX;
 	}
+
+new:
+	*a = NULL;
+
+	mutex_lock(&idrinfo->lock);
+	ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max,
+			    GFP_KERNEL);
 	mutex_unlock(&idrinfo->lock);
+
+	/* N binds raced for action allocation,
+	 * retry for all the ones that failed.
+	 */
+	if (ret == -ENOSPC && *index == max)
+		ret = -EAGAIN;
+
 	return ret;
 }
 EXPORT_SYMBOL(tcf_idr_check_alloc);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next v2 2/2] net/sched: act_api: skip idr replace on bound actions
  2023-12-11 18:18 [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Pedro Tammela
  2023-12-11 18:18 ` [PATCH net-next v2 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc Pedro Tammela
@ 2023-12-11 18:18 ` Pedro Tammela
  2023-12-12 15:59 ` [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Jamal Hadi Salim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Pedro Tammela @ 2023-12-11 18:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, jhs, xiyou.wangcong, jiri,
	marcelo.leitner, vladbu, Pedro Tammela

tcf_idr_insert_many will replace the allocated -EBUSY pointer in
tcf_idr_check_alloc with the real action pointer, exposing it
to all operations. This operation is only needed when the action pointer
is created (ACT_P_CREATED). For actions which are bound to (returned 0),
the pointer already resides in the idr making such operation a nop.

Even though it's a nop, it's still not a cheap operation as internally
the idr code walks the idr and then does a replace on the appropriate slot.
So if the action was bound, better skip the idr replace entirely.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
---
 include/net/act_api.h |  2 +-
 net/sched/act_api.c   | 11 ++++++-----
 net/sched/cls_api.c   |  2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 4ae0580b63ca..ea13e1e4a7c2 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -191,7 +191,7 @@ int tcf_idr_create_from_flags(struct tc_action_net *tn, u32 index,
 			      struct nlattr *est, struct tc_action **a,
 			      const struct tc_action_ops *ops, int bind,
 			      u32 flags);
-void tcf_idr_insert_many(struct tc_action *actions[]);
+void tcf_idr_insert_many(struct tc_action *actions[], int init_res[]);
 void tcf_idr_cleanup(struct tc_action_net *tn, u32 index);
 int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
 			struct tc_action **a, int bind);
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 688227acac45..1e3de528e005 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1304,7 +1304,7 @@ static const struct nla_policy tcf_action_policy[TCA_ACT_MAX + 1] = {
 	[TCA_ACT_HW_STATS]	= NLA_POLICY_BITFIELD32(TCA_ACT_HW_STATS_ANY),
 };
 
-void tcf_idr_insert_many(struct tc_action *actions[])
+void tcf_idr_insert_many(struct tc_action *actions[], int init_res[])
 {
 	struct tc_action *a;
 	int i;
@@ -1312,11 +1312,12 @@ void tcf_idr_insert_many(struct tc_action *actions[])
 	tcf_act_for_each_action(i, a, actions) {
 		struct tcf_idrinfo *idrinfo;
 
+		if (init_res[i] == 0) /* Bound */
+			continue;
+
 		idrinfo = a->idrinfo;
 		mutex_lock(&idrinfo->lock);
-		/* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc if
-		 * it is just created, otherwise this is just a nop.
-		 */
+		/* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */
 		idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
 		mutex_unlock(&idrinfo->lock);
 	}
@@ -1516,7 +1517,7 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 	/* We have to commit them all together, because if any error happened in
 	 * between, we could not handle the failure gracefully.
 	 */
-	tcf_idr_insert_many(actions);
+	tcf_idr_insert_many(actions, init_res);
 
 	*attr_size = tcf_action_full_attrs_size(sz);
 	err = i - 1;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1976bd163986..6391b341284e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3309,7 +3309,7 @@ int tcf_exts_validate_ex(struct net *net, struct tcf_proto *tp, struct nlattr **
 			act->type = exts->type = TCA_OLD_COMPAT;
 			exts->actions[0] = act;
 			exts->nr_actions = 1;
-			tcf_idr_insert_many(exts->actions);
+			tcf_idr_insert_many(exts->actions, init_res);
 		} else if (exts->action && tb[exts->action]) {
 			int err;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init
  2023-12-11 18:18 [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Pedro Tammela
  2023-12-11 18:18 ` [PATCH net-next v2 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc Pedro Tammela
  2023-12-11 18:18 ` [PATCH net-next v2 2/2] net/sched: act_api: skip idr replace on bound actions Pedro Tammela
@ 2023-12-12 15:59 ` Jamal Hadi Salim
  2023-12-12 17:34 ` Vlad Buslov
  2023-12-14  2:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: Jamal Hadi Salim @ 2023-12-12 15:59 UTC (permalink / raw)
  To: Pedro Tammela
  Cc: netdev, davem, edumazet, kuba, pabeni, xiyou.wangcong, jiri,
	marcelo.leitner, vladbu

On Mon, Dec 11, 2023 at 1:18 PM Pedro Tammela <pctammela@mojatatu.com> wrote:
>
> Scaling optimizations for action binding in rtnl-less filters.
> We saw a noticeable lock contention around idrinfo->lock when
> testing in a 56 core system, which disappeared after the patches.
>
> v1->v2:
> - Address comments from Vlad
>

For the series:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

> Pedro Tammela (2):
>   net/sched: act_api: rely on rcu in tcf_idr_check_alloc
>   net/sched: act_api: skip idr replace on bound actions
>
>  include/net/act_api.h |  2 +-
>  net/sched/act_api.c   | 76 ++++++++++++++++++++++++++++---------------
>  net/sched/cls_api.c   |  2 +-
>  3 files changed, 51 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init
  2023-12-11 18:18 [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Pedro Tammela
                   ` (2 preceding siblings ...)
  2023-12-12 15:59 ` [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Jamal Hadi Salim
@ 2023-12-12 17:34 ` Vlad Buslov
  2023-12-14  2:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: Vlad Buslov @ 2023-12-12 17:34 UTC (permalink / raw)
  To: Pedro Tammela
  Cc: netdev, davem, edumazet, kuba, pabeni, jhs, xiyou.wangcong, jiri,
	marcelo.leitner

On Mon 11 Dec 2023 at 15:18, Pedro Tammela <pctammela@mojatatu.com> wrote:
> Scaling optimizations for action binding in rtnl-less filters.
> We saw a noticeable lock contention around idrinfo->lock when
> testing in a 56 core system, which disappeared after the patches.
>
> v1->v2:
> - Address comments from Vlad
>
> Pedro Tammela (2):
>   net/sched: act_api: rely on rcu in tcf_idr_check_alloc
>   net/sched: act_api: skip idr replace on bound actions

Reviewed-by: Vlad Buslov <vladbu@nvidia.com>

[...]



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init
  2023-12-11 18:18 [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Pedro Tammela
                   ` (3 preceding siblings ...)
  2023-12-12 17:34 ` Vlad Buslov
@ 2023-12-14  2:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-12-14  2:00 UTC (permalink / raw)
  To: Pedro Tammela
  Cc: netdev, davem, edumazet, kuba, pabeni, jhs, xiyou.wangcong, jiri,
	marcelo.leitner, vladbu

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 11 Dec 2023 15:18:05 -0300 you wrote:
> Scaling optimizations for action binding in rtnl-less filters.
> We saw a noticeable lock contention around idrinfo->lock when
> testing in a 56 core system, which disappeared after the patches.
> 
> v1->v2:
> - Address comments from Vlad
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc
    https://git.kernel.org/netdev/net-next/c/4b55e86736d5
  - [net-next,v2,2/2] net/sched: act_api: skip idr replace on bound actions
    https://git.kernel.org/netdev/net-next/c/1dd7f18fc0ed

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-12-14  2:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-11 18:18 [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Pedro Tammela
2023-12-11 18:18 ` [PATCH net-next v2 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc Pedro Tammela
2023-12-11 18:18 ` [PATCH net-next v2 2/2] net/sched: act_api: skip idr replace on bound actions Pedro Tammela
2023-12-12 15:59 ` [PATCH net-next v2 0/2] net/sched: optimizations around action binding and init Jamal Hadi Salim
2023-12-12 17:34 ` Vlad Buslov
2023-12-14  2:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).