Re: [PATCH net-next 2/8] netfilter: introduce total count of hw offloaded flow table entries

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Vlad Buslov <vladbu@nvidia.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: <netfilter-devel@vger.kernel.org>, <kadlec@netfilter.org>,
	<fw@strlen.de>, <ozsh@nvidia.com>, <paulb@nvidia.com>
Subject: Re: [PATCH net-next 2/8] netfilter: introduce total count of hw offloaded flow table entries
Date: Sat, 12 Mar 2022 20:56:49 +0200	[thread overview]
Message-ID: <87o82bug3d.fsf@nvidia.com> (raw)
In-Reply-To: <YiZ9fQ8oMSOn5Su2@salvia>

On Mon 07 Mar 2022 at 22:47, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, Feb 22, 2022 at 05:09:57PM +0200, Vlad Buslov wrote:
>> To improve hardware offload debuggability and allow capping total amount of
>> offloaded entries in following patch extend struct netns_nftables with
>> 'count_hw' counter and expose it to userspace as 'nf_flowtable_count_hw'
>> sysctl entry. Increment the counter together with setting NF_FLOW_HW flag
>> when scheduling offload add task on workqueue and decrement it after
>> successfully scheduling offload del task.
>> 
>> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
>> Reviewed-by: Paul Blakey <paulb@nvidia.com>
>> ---
>>  include/net/netns/nftables.h            |  1 +
>>  net/netfilter/nf_conntrack_standalone.c | 12 ++++++++++++
>>  net/netfilter/nf_flow_table_core.c      | 12 ++++++++++--
>>  3 files changed, 23 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h
>> index 8c77832d0240..262b8b3213cb 100644
>> --- a/include/net/netns/nftables.h
>> +++ b/include/net/netns/nftables.h
>> @@ -6,6 +6,7 @@
>>  
>>  struct netns_nftables {
>>  	u8			gencursor;
>> +	atomic_t		count_hw;
>>  };
>>  
>>  #endif
>> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
>> index 3e1afd10a9b6..8cd55d502ffd 100644
>> --- a/net/netfilter/nf_conntrack_standalone.c
>> +++ b/net/netfilter/nf_conntrack_standalone.c
>> @@ -618,6 +618,9 @@ enum nf_ct_sysctl_index {
>>  #ifdef CONFIG_LWTUNNEL
>>  	NF_SYSCTL_CT_LWTUNNEL,
>>  #endif
>> +#if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>> +	NF_SYSCTL_CT_COUNT_HW,
>> +#endif
>>  
>>  	__NF_SYSCTL_CT_LAST_SYSCTL,
>>  };
>> @@ -973,6 +976,14 @@ static struct ctl_table nf_ct_sysctl_table[] = {
>>  		.mode		= 0644,
>>  		.proc_handler	= nf_hooks_lwtunnel_sysctl_handler,
>>  	},
>> +#endif
>> +#if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>> +	[NF_SYSCTL_CT_COUNT_HW] = {
>> +		.procname	= "nf_flowtable_count_hw",
>> +		.maxlen		= sizeof(int),
>> +		.mode		= 0444,
>> +		.proc_handler	= proc_dointvec,
>> +	},
>>  #endif
>>  	{}
>>  };
>> @@ -1111,6 +1122,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>  	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>  #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>  	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>> +	table[NF_SYSCTL_CT_COUNT_HW].data = &net->nft.count_hw;
>>  #endif
>>  
>>  	nf_conntrack_standalone_init_tcp_sysctl(net, table);
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index b90eca7a2f22..9f2b68bc6f80 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -315,6 +315,9 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
>>  	nf_ct_offload_timeout(flow->ct);
>>  
>>  	if (nf_flowtable_hw_offload(flow_table)) {
>> +		struct net *net = read_pnet(&flow_table->net);
>> +
>> +		atomic_inc(&net->nft.count_hw);
>>  		__set_bit(NF_FLOW_HW, &flow->flags);
>>  		nf_flow_offload_add(flow_table, flow);
>
> Better increment this new counter from flow_offload_work_add()? There
> is offload->flowtable->net, you could use it from there to bump the
> counter once this is placed in hardware (by when IPS_HW_OFFLOAD_BIT is
> set on).

Hi Pablo,

Thanks for reviewing my code and sorry for the late reply.

We explored the approach you propose and found several issues with it.
First, the nice benefit of implementation in this patch is that having
counter increment in flow_offload_add() (and test in following patch)
completely avoids spamming the workqueue when the limit is reached which
is an important concern for slower embedded DPU cores. Second, it is not
possible to change it when IPS_HW_OFFLOAD_BIT is set at the very end of
flow_offload_work_add() function because in following patch we need to
verify that counter is in user-specified limit before attempting
offload. Third, changing the counter in wq tasks makes it hard to
balance correctly. Consider following cases:

- flow_offload_work_add() can be called arbitrary amount of times per
  flow due to refresh logic. However, any such flow is still deleted
  only once.

- flow_offload_work_del() can be called for flows that were never
  actually offloaded (it is called for flows that have NF_FLOW_HW bit
  that is unconditionally set before attempting to schedule offload task
  on wq).

Counter balancing issues could maybe be solved by carefully
conditionally changing it based on current value IPS_HW_OFFLOAD_BIT, but
spamming the workqueue can't be prevented for such design.

>
> That also moves the atomic would be away from the packet path.

I understand your concern. However, note that this atomic is normally
changed once for adding offloaded flow and once for removing it. The
code path is only executed per-packet in error cases where flow has
failed to offload and refresh is called repeatedly for same flow.

>
>>  	}
>> @@ -462,10 +465,15 @@ static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data)
>>  
>>  	if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
>>  		if (test_bit(NF_FLOW_HW, &flow->flags)) {
>> -			if (!test_bit(NF_FLOW_HW_DYING, &flow->flags))
>> +			if (!test_bit(NF_FLOW_HW_DYING, &flow->flags)) {
>> +				struct net *net = read_pnet(&flow_table->net);
>> +
>>  				nf_flow_offload_del(flow_table, flow);
>> -			else if (test_bit(NF_FLOW_HW_DEAD, &flow->flags))
>> +				if (test_bit(NF_FLOW_HW_DYING, &flow->flags))
>> +					atomic_dec(&net->nft.count_hw);
>
> Same with this, I'd suggest to decrement it from flow_offload_work_del().
>
>> +			} else if (test_bit(NF_FLOW_HW_DEAD, &flow->flags)) {
>>  				flow_offload_del(flow_table, flow);
>> +			}
>>  		} else {
>>  			flow_offload_del(flow_table, flow);
>>  		}
>> -- 
>> 2.31.1
>>

next prev parent reply	other threads:[~2022-03-12 19:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22 15:09 [PATCH net-next 0/8] Conntrack offload debuggability improvements Vlad Buslov
2022-02-22 15:09 ` [PATCH net-next 1/8] net/sched: act_ct: set 'net' pointer when creating new nf_flow_table Vlad Buslov
2022-03-07 21:09   ` Pablo Neira Ayuso
2022-02-22 15:09 ` [PATCH net-next 2/8] netfilter: introduce total count of hw offloaded flow table entries Vlad Buslov
2022-03-07 21:47   ` Pablo Neira Ayuso
2022-03-12 18:56     ` Vlad Buslov [this message]
2022-03-15 10:23       ` Pablo Neira Ayuso
2022-03-15 16:18         ` Vlad Buslov
2022-03-07 21:56   ` Pablo Neira Ayuso
2022-03-12 19:51     ` Vlad Buslov
2022-03-15 10:41       ` Pablo Neira Ayuso
2022-03-15 16:34         ` Vlad Buslov
2022-02-22 15:09 ` [PATCH net-next 3/8] netfilter: introduce max " Vlad Buslov
2022-03-07 22:13   ` Pablo Neira Ayuso
2022-03-12 19:32     ` Vlad Buslov
2022-02-22 15:09 ` [PATCH net-next 4/8] netfilter: introduce total count of hw offload 'add' workqueue tasks Vlad Buslov
2022-03-07 22:46   ` Pablo Neira Ayuso
2022-02-22 15:10 ` [PATCH net-next 5/8] netfilter: introduce max " Vlad Buslov
2022-03-07 22:43   ` Pablo Neira Ayuso
2022-03-12 19:59     ` Vlad Buslov
2022-02-22 15:10 ` [PATCH net-next 6/8] netfilter: introduce total count of hw offload 'del' " Vlad Buslov
2022-02-22 15:10 ` [PATCH net-next 7/8] netfilter: introduce total count of hw offload 'stats' wq tasks Vlad Buslov
2022-02-22 15:10 ` [PATCH net-next 8/8] netfilter: flowtable: add hardware offload tracepoints Vlad Buslov
2022-03-07 22:49   ` Pablo Neira Ayuso
2022-03-12 20:05     ` Vlad Buslov
2022-03-15 10:29       ` Pablo Neira Ayuso
2022-03-15 16:36         ` Vlad Buslov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o82bug3d.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=fw@strlen.de \
    --cc=kadlec@netfilter.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=ozsh@nvidia.com \
    --cc=pablo@netfilter.org \
    --cc=paulb@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).