Re: [PATCH net-next 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Vlad Buslov <vladbu@nvidia.com>
To: Pedro Tammela <pctammela@mojatatu.com>
Cc: <netdev@vger.kernel.org>, <davem@davemloft.net>,
	<edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>,
	<jiri@resnulli.us>, <marcelo.leitner@gmail.com>
Subject: Re: [PATCH net-next 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc
Date: Wed, 6 Dec 2023 11:52:45 +0200	[thread overview]
Message-ID: <87fs0fodmu.fsf@nvidia.com> (raw)
In-Reply-To: <77b8d1d8-4ad9-49c7-9c42-612e9de29881@mojatatu.com>

On Tue 05 Dec 2023 at 17:19, Pedro Tammela <pctammela@mojatatu.com> wrote:
> On 05/12/2023 15:34, Vlad Buslov wrote:
>> On Tue 05 Dec 2023 at 12:30, Pedro Tammela <pctammela@mojatatu.com> wrote:
>>> Instead of relying only on the idrinfo->lock mutex for
>>> bind/alloc logic, rely on a combination of rcu + mutex + atomics
>>> to better scale the case where multiple rtnl-less filters are
>>> binding to the same action object.
>>>
>>> Action binding happens when an action index is specified explicitly and
>>> an action exists which such index exists. Example:
>> Nit: the first sentence looks mangled, extra 'exists' word and probably
>> 'which' should be 'with'.
>> 
>>>    tc actions add action drop index 1
>>>    tc filter add ... matchall action drop index 1
>>>    tc filter add ... matchall action drop index 1
>>>    tc filter add ... matchall action drop index 1
>>>    tc filter ls ...
>>>       filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1
>>>       not_in_hw
>>>             action order 1: gact action drop
>>>              random type none pass val 0
>>>              index 1 ref 4 bind 3
>>>
>>>     filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1
>>>       not_in_hw
>>>             action order 1: gact action drop
>>>              random type none pass val 0
>>>              index 1 ref 4 bind 3
>>>
>>>     filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1
>>>       not_in_hw
>>>             action order 1: gact action drop
>>>              random type none pass val 0
>>>              index 1 ref 4 bind 3
>>>
>>> When no index is specified, as before, grab the mutex and allocate
>>> in the idr the next available id. In this version, as opposed to before,
>>> it's simplified to store the -EBUSY pointer instead of the previous
>>> alloc + replace combination.
>>>
>>> When an index is specified, rely on rcu to find if there's an object in
>>> such index. If there's none, fallback to the above, serializing on the
>>> mutex and reserving the specified id. If there's one, it can be an -EBUSY
>>> pointer, in which case we just try again until it's an action, or an action.
>>> Given the rcu guarantees, the action found could be dead and therefore
>>> we need to bump the refcount if it's not 0, handling the case it's
>>> in fact 0.
>>>
>>> As bind and the action refcount are already atomics, these increments can
>>> happen without the mutex protection while many tcf_idr_check_alloc race
>>> to bind to the same action instance.
>>>
>>> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
>>> ---
>>>   net/sched/act_api.c | 56 +++++++++++++++++++++++++++------------------
>>>   1 file changed, 34 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>> index abec5c45b5a4..79a044d2ae02 100644
>>> --- a/net/sched/act_api.c
>>> +++ b/net/sched/act_api.c
>>> @@ -824,43 +824,55 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>>>   	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>>   	struct tc_action *p;
>>>   	int ret;
>>> +	u32 max;
>>>   -again:
>>> -	mutex_lock(&idrinfo->lock);
>>>   	if (*index) {
>>> +again:
>>> +		rcu_read_lock();
>>>   		p = idr_find(&idrinfo->action_idr, *index);
>>> +
>>>   		if (IS_ERR(p)) {
>>>   			/* This means that another process allocated
>>>   			 * index but did not assign the pointer yet.
>>>   			 */
>>> -			mutex_unlock(&idrinfo->lock);
>>> +			rcu_read_unlock();
>>>   			goto again;
>>>   		}
>>>   -		if (p) {
>>> -			refcount_inc(&p->tcfa_refcnt);
>>> -			if (bind)
>>> -				atomic_inc(&p->tcfa_bindcnt);
>>> -			*a = p;
>>> -			ret = 1;
>>> -		} else {
>>> -			*a = NULL;
>>> -			ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
>>> -					    *index, GFP_KERNEL);
>>> -			if (!ret)
>>> -				idr_replace(&idrinfo->action_idr,
>>> -					    ERR_PTR(-EBUSY), *index);
>>> +		if (!p) {
>>> +			/* Empty slot, try to allocate it */
>>> +			max = *index;
>>> +			rcu_read_unlock();
>>> +			goto new;
>>> +		}
>>> +
>>> +		if (!refcount_inc_not_zero(&p->tcfa_refcnt)) {
>>> +			/* Action was deleted in parallel */
>>> +			rcu_read_unlock();
>>> +			return -ENOENT;
>> Current version doesn't return ENOENT since it is synchronous. You are
>> now introducing basically a change to UAPI since users of this function
>> (individual actions) are not prepared to retry on ENOENT and will
>> propagate the error up the call chain. I guess you need to try to create
>> a new action with specified index instead.
>
> I see.
> So you are saying that in the case where action foo is deleted and a binding in
> parallel observes the deleted action, it should fallback into trying to allocate
> the index.

Correct.

>
> We could goto again and hope that idr_find will observe the idr index being
> freed, in which case it would fall back into action allocation if it does or
> simply go via the same path as before (jumping to 'again').
>
> I don't see much problems here, it seems to converge in this scenario
> as it eventually transforms into race for action allocation (more below) if you
> have an unfortunate delete with many bindings in flight.
>
>> 
>>>   		}
>>> +
>>> +		if (bind)
>>> +			atomic_inc(&p->tcfa_bindcnt);
>>> +		*a = p;
>>> +
>>> +		rcu_read_unlock();
>>> +
>>> +		return 1;
>>>   	} else {
>>> +		/* Find a slot */
>>>   		*index = 1;
>>> -		*a = NULL;
>>> -		ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
>>> -				    UINT_MAX, GFP_KERNEL);
>>> -		if (!ret)
>>> -			idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY),
>>> -				    *index);
>>> +		max = UINT_MAX;
>>>   	}
>>> +
>>> +new:
>>> +	*a = NULL;
>>> +
>>> +	mutex_lock(&idrinfo->lock);
>>> +	ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max,
>>> +			    GFP_KERNEL);
>> What if multiple concurrent tasks didn't find the action by index with
>> rcu and get here, synchronizing on the idrinfo->lock? It looks like
>> after the one who got the lock first successfully allocates the index
>> everyone else will fail (also propagating ENOSPACE to the user). 
>
> Correct
>
>> I guess you need some mechanism to account for such case and retry.
>
> Ok, so if I'm binding and it's observed a free index, which means "try to
> allocate" and I get a ENOSPC after jumping to new, try again but this time
> binding into the allocated action.
>
> In this scenario when we come back to 'again' we will wait until -EBUSY is
> replaced with the real pointer. Seems like a big enough window that any race for
> allocating from binding would most probably end up in this contention loop.
>
> However I think when we have these two retry mechanisms there's a extremely
> small window for an infinite loop if an action delete is timed just right, in
> between the action pointer is found and when we grab the tcfa_refcnt.
>
> 	idr_find (pointer)
> 	tcfa_refcnt (0)  <-------|
> 	again:                   |
> 	idr_find (free index!)   |
> 	new:                     |
> 	idr_alloc_u32 (ENOSPC)   |
> 	again:                   |
> 	idr_find (EBUSY)         |
> 	again:                   |
> 	idr_find (pointer)       |
> 	<evil delete happens>    |
> 	------->>>>--------------|

I'm not sure I'm following. Why would this sequence cause infinite loop?

>
> Another potential problem, is that this will race with non binding actions. So
> if the ENOSPC was actually from another unrelated action. A practical example
> would be a race between a binding to an 'action drop index 1' and an 'action ok'
> allocation. Actually it's a general problem and not particular to this case here
> but it seems like we could be amplifying it.
>
> I'm conflicted here. If I were to choose one of the two, I would pick the action
> respawing as to me it seems to converge much quicker and removes the uapi change
> (my bad! :).
>
> As for usability, if all of a sudden there's a huge influx of ENOSPC errors
> because users are abusing 'tc filter add ... action index 1 ...' in parallel
> _before_ actually creating the action object the fix is to just:
> tc actions add action index 1 ...
> tc filter add ...
>
> As tc has always supported

next prev parent reply	other threads:[~2023-12-06  9:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-05 15:30 [PATCH net-next 0/2] net/sched: optimizations around action binding and init Pedro Tammela
2023-12-05 15:30 ` [PATCH net-next 1/2] net/sched: act_api: rely on rcu in tcf_idr_check_alloc Pedro Tammela
2023-12-05 18:34   ` Vlad Buslov
2023-12-05 20:19     ` Pedro Tammela
2023-12-06  9:52       ` Vlad Buslov [this message]
2023-12-08 21:07         ` Pedro Tammela
2023-12-11 16:10           ` Vlad Buslov
2023-12-05 15:30 ` [PATCH net-next 2/2] net/sched: act_api: skip idr replace on bound actions Pedro Tammela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fs0fodmu.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).