From: Vlad Buslov <vladbu@nvidia.com>
To: Pedro Tammela <pctammela@mojatatu.com>
Cc: <netdev@vger.kernel.org>, <davem@davemloft.net>,
<edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
<jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>,
<jiri@resnulli.us>, <marcelo.leitner@gmail.com>
Subject: Re: [PATCH net-next 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc
Date: Tue, 5 Dec 2023 20:34:03 +0200 [thread overview]
Message-ID: <87jzpso53a.fsf@nvidia.com> (raw)
In-Reply-To: <20231205153012.484687-2-pctammela@mojatatu.com>
On Tue 05 Dec 2023 at 12:30, Pedro Tammela <pctammela@mojatatu.com> wrote:
> Instead of relying only on the idrinfo->lock mutex for
> bind/alloc logic, rely on a combination of rcu + mutex + atomics
> to better scale the case where multiple rtnl-less filters are
> binding to the same action object.
>
> Action binding happens when an action index is specified explicitly and
> an action exists which such index exists. Example:
Nit: the first sentence looks mangled, extra 'exists' word and probably
'which' should be 'with'.
> tc actions add action drop index 1
> tc filter add ... matchall action drop index 1
> tc filter add ... matchall action drop index 1
> tc filter add ... matchall action drop index 1
> tc filter ls ...
> filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1
> not_in_hw
> action order 1: gact action drop
> random type none pass val 0
> index 1 ref 4 bind 3
>
> filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1
> not_in_hw
> action order 1: gact action drop
> random type none pass val 0
> index 1 ref 4 bind 3
>
> filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1
> not_in_hw
> action order 1: gact action drop
> random type none pass val 0
> index 1 ref 4 bind 3
>
> When no index is specified, as before, grab the mutex and allocate
> in the idr the next available id. In this version, as opposed to before,
> it's simplified to store the -EBUSY pointer instead of the previous
> alloc + replace combination.
>
> When an index is specified, rely on rcu to find if there's an object in
> such index. If there's none, fallback to the above, serializing on the
> mutex and reserving the specified id. If there's one, it can be an -EBUSY
> pointer, in which case we just try again until it's an action, or an action.
> Given the rcu guarantees, the action found could be dead and therefore
> we need to bump the refcount if it's not 0, handling the case it's
> in fact 0.
>
> As bind and the action refcount are already atomics, these increments can
> happen without the mutex protection while many tcf_idr_check_alloc race
> to bind to the same action instance.
>
> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
> ---
> net/sched/act_api.c | 56 +++++++++++++++++++++++++++------------------
> 1 file changed, 34 insertions(+), 22 deletions(-)
>
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index abec5c45b5a4..79a044d2ae02 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -824,43 +824,55 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
> struct tcf_idrinfo *idrinfo = tn->idrinfo;
> struct tc_action *p;
> int ret;
> + u32 max;
>
> -again:
> - mutex_lock(&idrinfo->lock);
> if (*index) {
> +again:
> + rcu_read_lock();
> p = idr_find(&idrinfo->action_idr, *index);
> +
> if (IS_ERR(p)) {
> /* This means that another process allocated
> * index but did not assign the pointer yet.
> */
> - mutex_unlock(&idrinfo->lock);
> + rcu_read_unlock();
> goto again;
> }
>
> - if (p) {
> - refcount_inc(&p->tcfa_refcnt);
> - if (bind)
> - atomic_inc(&p->tcfa_bindcnt);
> - *a = p;
> - ret = 1;
> - } else {
> - *a = NULL;
> - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
> - *index, GFP_KERNEL);
> - if (!ret)
> - idr_replace(&idrinfo->action_idr,
> - ERR_PTR(-EBUSY), *index);
> + if (!p) {
> + /* Empty slot, try to allocate it */
> + max = *index;
> + rcu_read_unlock();
> + goto new;
> + }
> +
> + if (!refcount_inc_not_zero(&p->tcfa_refcnt)) {
> + /* Action was deleted in parallel */
> + rcu_read_unlock();
> + return -ENOENT;
Current version doesn't return ENOENT since it is synchronous. You are
now introducing basically a change to UAPI since users of this function
(individual actions) are not prepared to retry on ENOENT and will
propagate the error up the call chain. I guess you need to try to create
a new action with specified index instead.
> }
> +
> + if (bind)
> + atomic_inc(&p->tcfa_bindcnt);
> + *a = p;
> +
> + rcu_read_unlock();
> +
> + return 1;
> } else {
> + /* Find a slot */
> *index = 1;
> - *a = NULL;
> - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
> - UINT_MAX, GFP_KERNEL);
> - if (!ret)
> - idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY),
> - *index);
> + max = UINT_MAX;
> }
> +
> +new:
> + *a = NULL;
> +
> + mutex_lock(&idrinfo->lock);
> + ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max,
> + GFP_KERNEL);
What if multiple concurrent tasks didn't find the action by index with
rcu and get here, synchronizing on the idrinfo->lock? It looks like
after the one who got the lock first successfully allocates the index
everyone else will fail (also propagating ENOSPACE to the user). I guess
you need some mechanism to account for such case and retry.
> mutex_unlock(&idrinfo->lock);
> +
> return ret;
> }
> EXPORT_SYMBOL(tcf_idr_check_alloc);
next prev parent reply other threads:[~2023-12-05 18:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-05 15:30 [PATCH net-next 0/2] net/sched: optimizations around action binding and init Pedro Tammela
2023-12-05 15:30 ` [PATCH net-next 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc Pedro Tammela
2023-12-05 18:34 ` Vlad Buslov [this message]
2023-12-05 20:19 ` Pedro Tammela
2023-12-06 9:52 ` Vlad Buslov
2023-12-08 21:07 ` Pedro Tammela
2023-12-11 16:10 ` Vlad Buslov
2023-12-05 15:30 ` [PATCH net-next 2/2] net/sched: act_api: skip idr replace on bound actions Pedro Tammela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jzpso53a.fsf@nvidia.com \
--to=vladbu@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=marcelo.leitner@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pctammela@mojatatu.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.