netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oz Shlomo <ozsh@nvidia.com>
To: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>, <netdev@vger.kernel.org>,
	<netfilter-devel@vger.kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	"Paul Blakey" <paulb@nvidia.com>
Subject: Re: [PATCH nf-next] netfilter: flowtable: separate replace, destroy and stats to different workqueues
Date: Thu, 25 Mar 2021 10:46:12 +0200	[thread overview]
Message-ID: <b89d8340-ca1c-1424-bbaa-0e85d37a84bb@nvidia.com> (raw)
In-Reply-To: <YFutK3Mn+h5OWNXe@horizon.localdomain>

Hi Marcelo,

On 3/24/2021 11:20 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Mar 24, 2021 at 01:24:53PM +0200, Oz Shlomo wrote:
>> Hi,
> 
> Hi,
> 
>>
>> On 3/24/2021 3:38 AM, Pablo Neira Ayuso wrote:
>>> Hi Marcelo,
>>>
>>> On Mon, Mar 22, 2021 at 03:09:51PM -0300, Marcelo Ricardo Leitner wrote:
>>>> On Wed, Mar 03, 2021 at 05:11:47PM +0100, Pablo Neira Ayuso wrote:
>>> [...]
>>>>> Or probably make the cookie unique is sufficient? The cookie refers to
>>>>> the memory address but memory can be recycled very quickly. If the
>>>>> cookie helps to catch the reorder scenario, then the conntrack id
>>>>> could be used instead of the memory address as cookie.
>>>>
>>>> Something like this, if I got the idea right, would be even better. If
>>>> the entry actually expired before it had a chance of being offloaded,
>>>> there is no point in offloading it to then just remove it.
>>>
>>> It would be interesting to explore this idea you describe. Maybe a
>>> flag can be set on stale objects, or simply remove the stale object
>>> from the offload queue. So I guess it should be possible to recover
>>> control on the list of pending requests as a batch that is passed
>>> through one single queue_work call.
>>>
>>
>> Removing stale objects is a good optimization for cases when the rate of
>> established connections is greater than the hardware offload insertion rate.
>> However, with a single workqueue design, a burst of del commands may postpone connection offload tasks.
>> Postponed offloads may cause additional packets to go through software, thus
>> creating a chain effect which may diminish the system's connection rate.
> 
> Right. I didn't intend to object to multiqueues. I'm sorry if it
> sounded that way.
> 
>>
>> Marcelo, AFAIU add/del are synchronized by design since the del is triggered by the gc thread.
>> A del workqueue item will be instantiated only after a connection is in hardware.
> 
> They were synchronized, but after this patch, not anymore AFAICT:
> 
> tcf_ct_flow_table_add()
>    flow_offload_add()
>                if (nf_flowtable_hw_offload(flow_table)) {
>                    __set_bit(NF_FLOW_HW, &flow->flags);    [A]
>                    nf_flow_offload_add(flow_table, flow);
>                             ^--- schedules on _add workqueue
> 
> then the gc thread:
> nf_flow_offload_gc_step()
>            if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct))
>                    set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> 
>            if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
> 	                   ^-- can also set by tcf_ct_flow_table_lookup()
> 			       on fin's, by calling flow_offload_teardown()
>                    if (test_bit(NF_FLOW_HW, &flow->flags)) {
>                                      ^--- this is set in [A], even if the _add is still queued
>                            if (!test_bit(NF_FLOW_HW_DYING, &flow->flags))
>                                    nf_flow_offload_del(flow_table, flow);
> 
> nf_flow_offload_del()
>            offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_DESTROY);
>            if (!offload)
>                    return;
> 
>            set_bit(NF_FLOW_HW_DYING, &flow->flags);
>            flow_offload_queue_work(offload);
> 
> NF_FLOW_HW_DYING only avoids a double _del here.
> 
> Maybe I'm just missing it but I'm not seeing how removals would only
> happen after the entry is actually offloaded. As in, if the add queue
> is very long, and the datapath see a FIN, seems the next gc iteration
> could try to remove it before it's actually offloaded. I think this is
> what Pablo meant on his original reply here too, then his idea on
> having add/del to work with the same queue.
> 

The work item will not be allocated if the hw offload is pending.

nf_flow_offload_work_alloc()
	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
		return NULL;


  reply	other threads:[~2021-03-25  8:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-03 12:59 [PATCH nf-next] netfilter: flowtable: separate replace, destroy and stats to different workqueues Oz Shlomo
2021-03-03 16:11 ` Pablo Neira Ayuso
2021-03-22 18:09   ` Marcelo Ricardo Leitner
2021-03-24  1:38     ` Pablo Neira Ayuso
2021-03-24 11:24       ` Oz Shlomo
2021-03-24 21:20         ` Marcelo Ricardo Leitner
2021-03-25  8:46           ` Oz Shlomo [this message]
2021-03-26 13:51             ` Marcelo Ricardo Leitner
2021-03-17 23:36 ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b89d8340-ca1c-1424-bbaa-0e85d37a84bb@nvidia.com \
    --to=ozsh@nvidia.com \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=paulb@nvidia.com \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).