* [PATCH net-next v2 0/3] Conntrack offload debuggability improvements
@ 2022-05-16 19:10 Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 1/3] net/sched: act_ct: set 'net' pointer when creating new nf_flow_table Vlad Buslov
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Vlad Buslov @ 2022-05-16 19:10 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, kadlec, fw, ozsh, paulb, Vlad Buslov
Current conntrack offload implementation doesn't provide much visibility
and control over offload code. The code just tries to offload new flows,
even if current amount of flows is beyond what can be reasonably
processed by target hardware. On top of that there is no way to
determine current load on workqueues that process the offload tasks
which makes it hard to debug the cases where offload is significantly
delayed due to rate of new connections being higher than driver or
hardware offload rate.
Improve the debuggability situation by implementing following new
functionality:
- Sysctls for current total count of offloaded flow and
user-configurable maximum. Capping the amount of offloaded flows can
be useful for the allocations of hardware resources. Note that the
flow can still be offloaded afterwards via 'refresh' mechanism if
total hardware count.
- Procfs for current total of workqueue tasks for nf_ft_offload_add,
nf_ft_offload_del and nf_ft_offload_stats queues. This allows
visibility for flow offload delay due to system scheduling offload
tasks faster than driver/hardware can process them.
Vlad Buslov (3):
net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
netfilter: nf_flow_table: count and limit hw offloaded entries
netfilter: nf_flow_table: count pending offload workqueue tasks
.../networking/nf_flowtable-sysctl.rst | 17 ++
include/net/net_namespace.h | 6 +
include/net/netfilter/nf_flow_table.h | 35 ++++
include/net/netns/flow_table.h | 14 ++
net/netfilter/Kconfig | 8 +
net/netfilter/Makefile | 3 +-
net/netfilter/nf_flow_table_core.c | 89 ++++++++-
net/netfilter/nf_flow_table_offload.c | 53 +++++-
net/netfilter/nf_flow_table_sysctl.c | 171 ++++++++++++++++++
net/sched/act_ct.c | 5 +-
10 files changed, 387 insertions(+), 14 deletions(-)
create mode 100644 Documentation/networking/nf_flowtable-sysctl.rst
create mode 100644 include/net/netns/flow_table.h
create mode 100644 net/netfilter/nf_flow_table_sysctl.c
--
2.31.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net-next v2 1/3] net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
2022-05-16 19:10 [PATCH net-next v2 0/3] Conntrack offload debuggability improvements Vlad Buslov
@ 2022-05-16 19:10 ` Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 2/3] netfilter: nf_flow_table: count and limit hw offloaded entries Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks Vlad Buslov
2 siblings, 0 replies; 10+ messages in thread
From: Vlad Buslov @ 2022-05-16 19:10 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, kadlec, fw, ozsh, paulb, Vlad Buslov
Following patches in series use the pointer to access flow table offload
debug variables.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
---
net/sched/act_ct.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 8af9d6e5ba61..58e0dfed22c6 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -277,7 +277,7 @@ static struct nf_flowtable_type flowtable_ct = {
.owner = THIS_MODULE,
};
-static int tcf_ct_flow_table_get(struct tcf_ct_params *params)
+static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params)
{
struct tcf_ct_flow_table *ct_ft;
int err = -ENOMEM;
@@ -303,6 +303,7 @@ static int tcf_ct_flow_table_get(struct tcf_ct_params *params)
err = nf_flow_table_init(&ct_ft->nf_ft);
if (err)
goto err_init;
+ write_pnet(&ct_ft->nf_ft.net, net);
__module_get(THIS_MODULE);
out_unlock:
@@ -1391,7 +1392,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla,
if (err)
goto cleanup;
- err = tcf_ct_flow_table_get(params);
+ err = tcf_ct_flow_table_get(net, params);
if (err)
goto cleanup;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v2 2/3] netfilter: nf_flow_table: count and limit hw offloaded entries
2022-05-16 19:10 [PATCH net-next v2 0/3] Conntrack offload debuggability improvements Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 1/3] net/sched: act_ct: set 'net' pointer when creating new nf_flow_table Vlad Buslov
@ 2022-05-16 19:10 ` Vlad Buslov
2022-05-17 11:28 ` Pablo Neira Ayuso
2022-05-16 19:10 ` [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks Vlad Buslov
2 siblings, 1 reply; 10+ messages in thread
From: Vlad Buslov @ 2022-05-16 19:10 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, kadlec, fw, ozsh, paulb, Vlad Buslov
To improve hardware offload debuggability and scalability introduce
'nf_flowtable_count_hw' and 'nf_flowtable_max_hw' sysctl entries in new
dedicated 'net/netfilter/ft' namespace. Add new pernet struct nf_ft_net in
order to store the counter and sysctl header of new sysctl table.
Count the offloaded flows in workqueue add task handler. Verify that
offloaded flow total is lower than allowed maximum before calling the
driver callbacks. To prevent spamming the 'add' workqueue with tasks when
flows can't be offloaded anymore also check that count is below limit
before queuing offload work. This doesn't prevent all redundant workqueue
task since counter can be taken by concurrent work handler after the check
had been performed but before the offload job is executed but it still
greatly reduces such occurrences. Note that flows that were not offloaded
due to counter being larger than the cap can still be offloaded via refresh
function.
Ensure that flows are accounted correctly by verifying IPS_HW_OFFLOAD_BIT
value before counting them. This ensures that add/refresh code path
increments the counter exactly once per flow when setting the bit and
decrements it only for accounted flows when deleting the flow with the bit
set.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
---
Notes:
Changes V1 -> V2:
- Combine patches that expose offloaded flow count and limit total
offloaded flows.
- Rework the counting logic to count in workqueue context.
- Create dedicated namespace for flow table sysctls.
.../networking/nf_flowtable-sysctl.rst | 17 ++++
include/net/netfilter/nf_flow_table.h | 25 ++++++
net/netfilter/Makefile | 3 +-
net/netfilter/nf_flow_table_core.c | 55 +++++++++++-
net/netfilter/nf_flow_table_offload.c | 36 ++++++--
net/netfilter/nf_flow_table_sysctl.c | 83 +++++++++++++++++++
6 files changed, 210 insertions(+), 9 deletions(-)
create mode 100644 Documentation/networking/nf_flowtable-sysctl.rst
create mode 100644 net/netfilter/nf_flow_table_sysctl.c
diff --git a/Documentation/networking/nf_flowtable-sysctl.rst b/Documentation/networking/nf_flowtable-sysctl.rst
new file mode 100644
index 000000000000..932b4abeca6c
--- /dev/null
+++ b/Documentation/networking/nf_flowtable-sysctl.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Netfilter Flowtable Sysfs variables
+===================================
+
+/proc/sys/net/netfilter/ft/nf_flowtable_* Variables:
+=================================================
+
+nf_flowtable_count_hw - INTEGER (read-only)
+ Number of flowtable entries that are currently offloaded to hardware.
+
+nf_flowtable_max_hw - INTEGER
+ - 0 - disabled (default)
+ - not 0 - enabled
+
+ Cap on maximum total amount of flowtable entries offloaded to hardware.
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 64daafd1fc41..e09c29fa034e 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -335,4 +335,29 @@ static inline __be16 nf_flow_pppoe_proto(const struct sk_buff *skb)
return 0;
}
+struct nf_ft_net {
+ atomic_t count_hw;
+#ifdef CONFIG_SYSCTL
+ struct ctl_table_header *sysctl_header;
+#endif
+};
+
+extern unsigned int nf_ft_hw_max;
+extern unsigned int nf_ft_net_id;
+
+#include <net/netns/generic.h>
+
+static inline struct nf_ft_net *nf_ft_pernet(const struct net *net)
+{
+ return net_generic(net, nf_ft_net_id);
+}
+
+static inline struct nf_ft_net *nf_ft_pernet_get(struct nf_flowtable *flow_table)
+{
+ return nf_ft_pernet(read_pnet(&flow_table->net));
+}
+
+int nf_flow_table_init_sysctl(struct net *net);
+void nf_flow_table_fini_sysctl(struct net *net);
+
#endif /* _NF_FLOW_TABLE_H */
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 238b6a620e88..67e281842b61 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -127,7 +127,8 @@ obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o
# flow table infrastructure
obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
- nf_flow_table_offload.o
+ nf_flow_table_offload.o \
+ nf_flow_table_sysctl.o
obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 3db256da919b..e2598f98017c 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -277,6 +277,13 @@ static const struct rhashtable_params nf_flow_offload_rhash_params = {
.automatic_shrinking = true,
};
+static bool flow_max_hw_entries(struct nf_flowtable *flow_table)
+{
+ struct nf_ft_net *fnet = nf_ft_pernet_get(flow_table);
+
+ return nf_ft_hw_max && atomic_read(&fnet->count_hw) >= nf_ft_hw_max;
+}
+
unsigned long flow_offload_get_timeout(struct flow_offload *flow)
{
unsigned long timeout = NF_FLOW_TIMEOUT;
@@ -320,7 +327,8 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
nf_ct_offload_timeout(flow->ct);
- if (nf_flowtable_hw_offload(flow_table)) {
+ if (nf_flowtable_hw_offload(flow_table) &&
+ !flow_max_hw_entries(flow_table)) {
__set_bit(NF_FLOW_HW, &flow->flags);
nf_flow_offload_add(flow_table, flow);
}
@@ -338,9 +346,11 @@ void flow_offload_refresh(struct nf_flowtable *flow_table,
if (READ_ONCE(flow->timeout) != timeout)
WRITE_ONCE(flow->timeout, timeout);
- if (likely(!nf_flowtable_hw_offload(flow_table)))
+ if (likely(!nf_flowtable_hw_offload(flow_table) ||
+ flow_max_hw_entries(flow_table)))
return;
+ set_bit(NF_FLOW_HW, &flow->flags);
nf_flow_offload_add(flow_table, flow);
}
EXPORT_SYMBOL_GPL(flow_offload_refresh);
@@ -652,14 +662,53 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
}
EXPORT_SYMBOL_GPL(nf_flow_table_free);
+static int nf_flow_table_pernet_init(struct net *net)
+{
+ return nf_flow_table_init_sysctl(net);
+}
+
+static void nf_flow_table_pernet_exit(struct list_head *net_exit_list)
+{
+ struct net *net;
+
+ list_for_each_entry(net, net_exit_list, exit_list)
+ nf_flow_table_fini_sysctl(net);
+}
+
+unsigned int nf_ft_net_id __read_mostly;
+
+static struct pernet_operations nf_flow_table_net_ops = {
+ .init = nf_flow_table_pernet_init,
+ .exit_batch = nf_flow_table_pernet_exit,
+ .id = &nf_ft_net_id,
+ .size = sizeof(struct nf_ft_net),
+};
+
static int __init nf_flow_table_module_init(void)
{
- return nf_flow_table_offload_init();
+ int ret;
+
+ nf_ft_hw_max = 0;
+
+ ret = register_pernet_subsys(&nf_flow_table_net_ops);
+ if (ret < 0)
+ return ret;
+
+ ret = nf_flow_table_offload_init();
+ if (ret)
+ goto out_offload;
+
+ return 0;
+
+out_offload:
+ unregister_pernet_subsys(&nf_flow_table_net_ops);
+ return ret;
}
static void __exit nf_flow_table_module_exit(void)
{
nf_flow_table_offload_exit();
+ unregister_pernet_subsys(&nf_flow_table_net_ops);
}
module_init(nf_flow_table_module_init);
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 11b6e1942092..a6e763901eb9 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -903,30 +903,56 @@ static int flow_offload_rule_add(struct flow_offload_work *offload,
return 0;
}
+static bool flow_get_max_hw_entries(struct nf_flowtable *flow_table)
+{
+ struct nf_ft_net *fnet = nf_ft_pernet_get(flow_table);
+ int count_hw = atomic_inc_return(&fnet->count_hw);
+
+ if (nf_ft_hw_max && count_hw > nf_ft_hw_max) {
+ atomic_dec(&fnet->count_hw);
+ return false;
+ }
+ return true;
+}
+
static void flow_offload_work_add(struct flow_offload_work *offload)
{
+ struct nf_ft_net *fnet = nf_ft_pernet_get(offload->flowtable);
struct nf_flow_rule *flow_rule[FLOW_OFFLOAD_DIR_MAX];
int err;
+ if (!flow_get_max_hw_entries(offload->flowtable))
+ return;
+
err = nf_flow_offload_alloc(offload, flow_rule);
if (err < 0)
- return;
+ goto out_alloc;
err = flow_offload_rule_add(offload, flow_rule);
if (err < 0)
- goto out;
+ goto out_add;
- set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
+ if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status))
+ atomic_dec(&fnet->count_hw);
+ nf_flow_offload_destroy(flow_rule);
+ return;
-out:
+out_add:
nf_flow_offload_destroy(flow_rule);
+out_alloc:
+ atomic_dec(&fnet->count_hw);
}
static void flow_offload_work_del(struct flow_offload_work *offload)
{
- clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
+ bool offloaded = test_and_clear_bit(IPS_HW_OFFLOAD_BIT,
+ &offload->flow->ct->status);
+ struct nf_ft_net *fnet = nf_ft_pernet_get(offload->flowtable);
+
flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
+ if (offloaded)
+ atomic_dec(&fnet->count_hw);
set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
}
diff --git a/net/netfilter/nf_flow_table_sysctl.c b/net/netfilter/nf_flow_table_sysctl.c
new file mode 100644
index 000000000000..ce8c0a2c4bdb
--- /dev/null
+++ b/net/netfilter/nf_flow_table_sysctl.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kernel.h>
+#include <net/netfilter/nf_flow_table.h>
+
+unsigned int nf_ft_hw_max __read_mostly;
+
+#ifdef CONFIG_SYSCTL
+enum nf_ct_sysctl_index {
+ NF_SYSCTL_FLOW_TABLE_MAX_HW,
+ NF_SYSCTL_FLOW_TABLE_COUNT_HW,
+
+ __NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL,
+};
+
+#define NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL \
+ (__NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL + 1)
+
+static struct ctl_table nf_flow_table_sysctl_table[] = {
+ [NF_SYSCTL_FLOW_TABLE_MAX_HW] = {
+ .procname = "nf_flowtable_max_hw",
+ .data = &nf_ft_hw_max,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ [NF_SYSCTL_FLOW_TABLE_COUNT_HW] = {
+ .procname = "nf_flowtable_count_hw",
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = proc_dointvec,
+ },
+ {}
+};
+
+int nf_flow_table_init_sysctl(struct net *net)
+{
+ struct nf_ft_net *fnet = nf_ft_pernet(net);
+ struct ctl_table *table;
+
+ BUILD_BUG_ON(ARRAY_SIZE(nf_flow_table_sysctl_table) != NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL);
+
+ table = kmemdup(nf_flow_table_sysctl_table, sizeof(nf_flow_table_sysctl_table),
+ GFP_KERNEL);
+ if (!table)
+ return -ENOMEM;
+
+ table[NF_SYSCTL_FLOW_TABLE_COUNT_HW].data = &fnet->count_hw;
+
+ /* Don't allow non-init_net ns to alter global sysctls */
+ if (!net_eq(&init_net, net))
+ table[NF_SYSCTL_FLOW_TABLE_MAX_HW].mode = 0444;
+
+ fnet->sysctl_header = register_net_sysctl(net, "net/netfilter/ft", table);
+ if (!fnet->sysctl_header)
+ goto out_unregister_netfilter;
+
+ return 0;
+
+out_unregister_netfilter:
+ kfree(table);
+ return -ENOMEM;
+}
+
+void nf_flow_table_fini_sysctl(struct net *net)
+{
+ struct nf_ft_net *fnet = nf_ft_pernet(net);
+ struct ctl_table *table;
+
+ table = fnet->sysctl_header->ctl_table_arg;
+ unregister_net_sysctl_table(fnet->sysctl_header);
+ kfree(table);
+}
+
+#else
+int nf_flow_table_init_sysctl(struct net *net)
+{
+ return 0;
+}
+
+void nf_flow_table_fini_sysctl(struct net *net)
+{
+}
+#endif /* CONFIG_SYSCTL */
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks
2022-05-16 19:10 [PATCH net-next v2 0/3] Conntrack offload debuggability improvements Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 1/3] net/sched: act_ct: set 'net' pointer when creating new nf_flow_table Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 2/3] netfilter: nf_flow_table: count and limit hw offloaded entries Vlad Buslov
@ 2022-05-16 19:10 ` Vlad Buslov
2022-05-17 11:20 ` Pablo Neira Ayuso
2 siblings, 1 reply; 10+ messages in thread
From: Vlad Buslov @ 2022-05-16 19:10 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, kadlec, fw, ozsh, paulb, Vlad Buslov
To improve hardware offload debuggability count pending 'add', 'del' and
'stats' flow_table offload workqueue tasks. Counters are incremented before
scheduling new task and decremented when workqueue handler finishes
executing. These counters allow user to diagnose congestion on hardware
offload workqueues that can happen when either CPU is starved and workqueue
jobs are executed at lower rate than new ones are added or when
hardware/driver can't keep up with the rate.
Implement the described counters as percpu counters inside new struct
netns_ft which is stored inside struct net. Expose them via new procfs file
'/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack'
file.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
---
Notes:
Changes V1 -> V2:
- Combine patches that expose add, del, stats tasks.
- Use percpu stats to count pending workqueue tasks instead of atomics.
- Expose the stats via /proc/net/stats/nf_flowtable file instead of
sysctls.
include/net/net_namespace.h | 6 ++
include/net/netfilter/nf_flow_table.h | 10 +++
include/net/netns/flow_table.h | 14 +++++
net/netfilter/Kconfig | 8 +++
net/netfilter/nf_flow_table_core.c | 38 +++++++++++-
net/netfilter/nf_flow_table_offload.c | 17 +++++-
net/netfilter/nf_flow_table_sysctl.c | 88 +++++++++++++++++++++++++++
7 files changed, 176 insertions(+), 5 deletions(-)
create mode 100644 include/net/netns/flow_table.h
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index c4f5601f6e32..bf770c13e19b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -26,6 +26,9 @@
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
#include <net/netns/conntrack.h>
#endif
+#if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
+#include <net/netns/flow_table.h>
+#endif
#include <net/netns/nftables.h>
#include <net/netns/xfrm.h>
#include <net/netns/mpls.h>
@@ -140,6 +143,9 @@ struct net {
#if defined(CONFIG_NF_TABLES) || defined(CONFIG_NF_TABLES_MODULE)
struct netns_nftables nft;
#endif
+#if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
+ struct netns_ft ft;
+#endif
#endif
#ifdef CONFIG_WEXT_CORE
struct sk_buff_head wext_nlevents;
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index e09c29fa034e..94606c04b6fe 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -360,4 +360,14 @@ static inline struct nf_ft_net *nf_ft_pernet_get(struct nf_flowtable *flow_table
int nf_flow_table_init_sysctl(struct net *net);
void nf_flow_table_fini_sysctl(struct net *net);
+#define NF_FLOW_TABLE_STAT_INC(net, count) __this_cpu_inc((net)->ft.stat->count)
+#define NF_FLOW_TABLE_STAT_DEC(net, count) __this_cpu_dec((net)->ft.stat->count)
+#define NF_FLOW_TABLE_STAT_INC_ATOMIC(net, count) \
+ this_cpu_inc((net)->ft.stat->count)
+#define NF_FLOW_TABLE_STAT_DEC_ATOMIC(net, count) \
+ this_cpu_dec((net)->ft.stat->count)
+
+int nf_flow_table_init_proc(struct net *net);
+void nf_flow_table_fini_proc(struct net *net);
+
#endif /* _NF_FLOW_TABLE_H */
diff --git a/include/net/netns/flow_table.h b/include/net/netns/flow_table.h
new file mode 100644
index 000000000000..1c5fc657e267
--- /dev/null
+++ b/include/net/netns/flow_table.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NETNS_FLOW_TABLE_H
+#define __NETNS_FLOW_TABLE_H
+
+struct nf_flow_table_stat {
+ unsigned int count_wq_add;
+ unsigned int count_wq_del;
+ unsigned int count_wq_stats;
+};
+
+struct netns_ft {
+ struct nf_flow_table_stat __percpu *stat;
+};
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index ddc54b6d18ee..c8fc5c7ef04a 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -734,6 +734,14 @@ config NF_FLOW_TABLE
To compile it as a module, choose M here.
+config NF_FLOW_TABLE_PROCFS
+ bool "Supply flow table statistics in procfs"
+ default y
+ depends on PROC_FS
+ help
+ This option enables for the flow table offload statistics
+ to be shown in procfs under net/netfilter/nf_flowtable.
+
config NETFILTER_XTABLES
tristate "Netfilter Xtables support (required for ip_tables)"
default m if NETFILTER_ADVANCED=n
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index e2598f98017c..c86dd627ef42 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -662,17 +662,51 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
}
EXPORT_SYMBOL_GPL(nf_flow_table_free);
+static int nf_flow_table_init_net(struct net *net)
+{
+ net->ft.stat = alloc_percpu(struct nf_flow_table_stat);
+ return net->ft.stat ? 0 : -ENOMEM;
+}
+
+static void nf_flow_table_fini_net(struct net *net)
+{
+ free_percpu(net->ft.stat);
+}
+
static int nf_flow_table_pernet_init(struct net *net)
{
- return nf_flow_table_init_sysctl(net);
+ int ret;
+
+ ret = nf_flow_table_init_net(net);
+ if (ret < 0)
+ return ret;
+
+ ret = nf_flow_table_init_sysctl(net);
+ if (ret < 0)
+ goto out_sysctl;
+
+ ret = nf_flow_table_init_proc(net);
+ if (ret < 0)
+ goto out_proc;
+
+ return 0;
+
+out_proc:
+ nf_flow_table_fini_sysctl(net);
+out_sysctl:
+ nf_flow_table_fini_net(net);
+ return ret;
}
static void nf_flow_table_pernet_exit(struct list_head *net_exit_list)
{
struct net *net;
- list_for_each_entry(net, net_exit_list, exit_list)
+ list_for_each_entry(net, net_exit_list, exit_list) {
+ nf_flow_table_fini_proc(net);
nf_flow_table_fini_sysctl(net);
+ nf_flow_table_fini_net(net);
+ }
}
unsigned int nf_ft_net_id __read_mostly;
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index a6e763901eb9..381bd598a16f 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -993,17 +993,22 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
static void flow_offload_work_handler(struct work_struct *work)
{
struct flow_offload_work *offload;
+ struct net *net;
offload = container_of(work, struct flow_offload_work, work);
+ net = read_pnet(&offload->flowtable->net);
switch (offload->cmd) {
case FLOW_CLS_REPLACE:
flow_offload_work_add(offload);
+ NF_FLOW_TABLE_STAT_DEC_ATOMIC(net, count_wq_add);
break;
case FLOW_CLS_DESTROY:
flow_offload_work_del(offload);
+ NF_FLOW_TABLE_STAT_DEC_ATOMIC(net, count_wq_del);
break;
case FLOW_CLS_STATS:
flow_offload_work_stats(offload);
+ NF_FLOW_TABLE_STAT_DEC_ATOMIC(net, count_wq_stats);
break;
default:
WARN_ON_ONCE(1);
@@ -1015,12 +1020,18 @@ static void flow_offload_work_handler(struct work_struct *work)
static void flow_offload_queue_work(struct flow_offload_work *offload)
{
- if (offload->cmd == FLOW_CLS_REPLACE)
+ struct net *net = read_pnet(&offload->flowtable->net);
+
+ if (offload->cmd == FLOW_CLS_REPLACE) {
+ NF_FLOW_TABLE_STAT_INC(net, count_wq_add);
queue_work(nf_flow_offload_add_wq, &offload->work);
- else if (offload->cmd == FLOW_CLS_DESTROY)
+ } else if (offload->cmd == FLOW_CLS_DESTROY) {
+ NF_FLOW_TABLE_STAT_INC(net, count_wq_del);
queue_work(nf_flow_offload_del_wq, &offload->work);
- else
+ } else {
+ NF_FLOW_TABLE_STAT_INC(net, count_wq_stats);
queue_work(nf_flow_offload_stats_wq, &offload->work);
+ }
}
static struct flow_offload_work *
diff --git a/net/netfilter/nf_flow_table_sysctl.c b/net/netfilter/nf_flow_table_sysctl.c
index ce8c0a2c4bdb..2aefd7542527 100644
--- a/net/netfilter/nf_flow_table_sysctl.c
+++ b/net/netfilter/nf_flow_table_sysctl.c
@@ -1,7 +1,95 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/kernel.h>
+#include <linux/proc_fs.h>
#include <net/netfilter/nf_flow_table.h>
+#ifdef CONFIG_NF_FLOW_TABLE_PROCFS
+static void *nf_flow_table_cpu_seq_start(struct seq_file *seq, loff_t *pos)
+{
+ struct net *net = seq_file_net(seq);
+ int cpu;
+
+ if (*pos == 0)
+ return SEQ_START_TOKEN;
+
+ for (cpu = *pos - 1; cpu < nr_cpu_ids; ++cpu) {
+ if (!cpu_possible(cpu))
+ continue;
+ *pos = cpu + 1;
+ return per_cpu_ptr(net->ft.stat, cpu);
+ }
+
+ return NULL;
+}
+
+static void *nf_flow_table_cpu_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ struct net *net = seq_file_net(seq);
+ int cpu;
+
+ for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
+ if (!cpu_possible(cpu))
+ continue;
+ *pos = cpu + 1;
+ return per_cpu_ptr(net->ft.stat, cpu);
+ }
+ (*pos)++;
+ return NULL;
+}
+
+static void nf_flow_table_cpu_seq_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int nf_flow_table_cpu_seq_show(struct seq_file *seq, void *v)
+{
+ const struct nf_flow_table_stat *st = v;
+
+ if (v == SEQ_START_TOKEN) {
+ seq_puts(seq, "wq_add wq_del wq_stats\n");
+ return 0;
+ }
+
+ seq_printf(seq, "%8d %8d %8d\n",
+ st->count_wq_add,
+ st->count_wq_del,
+ st->count_wq_stats
+ );
+ return 0;
+}
+
+static const struct seq_operations nf_flow_table_cpu_seq_ops = {
+ .start = nf_flow_table_cpu_seq_start,
+ .next = nf_flow_table_cpu_seq_next,
+ .stop = nf_flow_table_cpu_seq_stop,
+ .show = nf_flow_table_cpu_seq_show,
+};
+
+int nf_flow_table_init_proc(struct net *net)
+{
+ struct proc_dir_entry *pde;
+
+ pde = proc_create_net("nf_flowtable", 0444, net->proc_net_stat,
+ &nf_flow_table_cpu_seq_ops,
+ sizeof(struct seq_net_private));
+ return pde ? 0 : -ENOMEM;
+}
+
+void nf_flow_table_fini_proc(struct net *net)
+{
+ remove_proc_entry("nf_flowtable", net->proc_net_stat);
+}
+#else
+int nf_flow_table_init_proc(struct net *net)
+{
+ return 0;
+}
+
+void nf_flow_table_fini_proc(struct net *net)
+{
+}
+#endif /* CONFIG_NF_FLOW_TABLE_PROCFS */
+
unsigned int nf_ft_hw_max __read_mostly;
#ifdef CONFIG_SYSCTL
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v2 2/3] netfilter: nf_flow_table: count and limit hw offloaded entries
2022-05-17 11:28 ` Pablo Neira Ayuso
@ 2022-05-17 11:10 ` Vlad Buslov
0 siblings, 0 replies; 10+ messages in thread
From: Vlad Buslov @ 2022-05-17 11:10 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, kadlec, fw, ozsh, paulb
On Tue 17 May 2022 at 13:28, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, May 16, 2022 at 10:10:31PM +0300, Vlad Buslov wrote:
>> To improve hardware offload debuggability and scalability introduce
>> 'nf_flowtable_count_hw' and 'nf_flowtable_max_hw' sysctl entries in new
>> dedicated 'net/netfilter/ft' namespace. Add new pernet struct nf_ft_net in
>> order to store the counter and sysctl header of new sysctl table.
>>
>> Count the offloaded flows in workqueue add task handler. Verify that
>> offloaded flow total is lower than allowed maximum before calling the
>> driver callbacks. To prevent spamming the 'add' workqueue with tasks when
>> flows can't be offloaded anymore also check that count is below limit
>> before queuing offload work. This doesn't prevent all redundant workqueue
>> task since counter can be taken by concurrent work handler after the check
>> had been performed but before the offload job is executed but it still
>> greatly reduces such occurrences. Note that flows that were not offloaded
>> due to counter being larger than the cap can still be offloaded via refresh
>> function.
>>
>> Ensure that flows are accounted correctly by verifying IPS_HW_OFFLOAD_BIT
>> value before counting them. This ensures that add/refresh code path
>> increments the counter exactly once per flow when setting the bit and
>> decrements it only for accounted flows when deleting the flow with the bit
>> set.
>>
>> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
>> ---
>>
>> Notes:
>> Changes V1 -> V2:
>>
>> - Combine patches that expose offloaded flow count and limit total
>> offloaded flows.
>>
>> - Rework the counting logic to count in workqueue context.
>>
>> - Create dedicated namespace for flow table sysctls.
>>
>> .../networking/nf_flowtable-sysctl.rst | 17 ++++
>> include/net/netfilter/nf_flow_table.h | 25 ++++++
>> net/netfilter/Makefile | 3 +-
>> net/netfilter/nf_flow_table_core.c | 55 +++++++++++-
>> net/netfilter/nf_flow_table_offload.c | 36 ++++++--
>> net/netfilter/nf_flow_table_sysctl.c | 83 +++++++++++++++++++
>> 6 files changed, 210 insertions(+), 9 deletions(-)
>> create mode 100644 Documentation/networking/nf_flowtable-sysctl.rst
>> create mode 100644 net/netfilter/nf_flow_table_sysctl.c
>>
>> diff --git a/Documentation/networking/nf_flowtable-sysctl.rst b/Documentation/networking/nf_flowtable-sysctl.rst
>> new file mode 100644
>> index 000000000000..932b4abeca6c
>> --- /dev/null
>> +++ b/Documentation/networking/nf_flowtable-sysctl.rst
>> @@ -0,0 +1,17 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +===================================
>> +Netfilter Flowtable Sysfs variables
>> +===================================
>
> Better document the sysctl entries in the existing nf_conntrack.txt
> file rather than this new file?
Sure, no problem.
>
>> +
>> +/proc/sys/net/netfilter/ft/nf_flowtable_* Variables:
>> +=================================================
>> +
>> +nf_flowtable_count_hw - INTEGER (read-only)
>> + Number of flowtable entries that are currently offloaded to hardware.
>> +
>> +nf_flowtable_max_hw - INTEGER
>> + - 0 - disabled (default)
>> + - not 0 - enabled
>> +
>> + Cap on maximum total amount of flowtable entries offloaded to hardware.
>> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
>> index 64daafd1fc41..e09c29fa034e 100644
>> --- a/include/net/netfilter/nf_flow_table.h
>> +++ b/include/net/netfilter/nf_flow_table.h
>> @@ -335,4 +335,29 @@ static inline __be16 nf_flow_pppoe_proto(const struct sk_buff *skb)
>> return 0;
>> }
>>
>> +struct nf_ft_net {
>> + atomic_t count_hw;
>> +#ifdef CONFIG_SYSCTL
>> + struct ctl_table_header *sysctl_header;
>> +#endif
>> +};
>> +
>> +extern unsigned int nf_ft_hw_max;
>> +extern unsigned int nf_ft_net_id;
>> +
>> +#include <net/netns/generic.h>
>> +
>> +static inline struct nf_ft_net *nf_ft_pernet(const struct net *net)
>> +{
>> + return net_generic(net, nf_ft_net_id);
>> +}
>> +
>> +static inline struct nf_ft_net *nf_ft_pernet_get(struct nf_flowtable *flow_table)
>> +{
>> + return nf_ft_pernet(read_pnet(&flow_table->net));
>> +}
>> +
>> +int nf_flow_table_init_sysctl(struct net *net);
>> +void nf_flow_table_fini_sysctl(struct net *net);
>> +
>> #endif /* _NF_FLOW_TABLE_H */
>> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
>> index 238b6a620e88..67e281842b61 100644
>> --- a/net/netfilter/Makefile
>> +++ b/net/netfilter/Makefile
>> @@ -127,7 +127,8 @@ obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o
>> # flow table infrastructure
>> obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
>> nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
>> - nf_flow_table_offload.o
>> + nf_flow_table_offload.o \
>> + nf_flow_table_sysctl.o
>>
>> obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
>>
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index 3db256da919b..e2598f98017c 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -277,6 +277,13 @@ static const struct rhashtable_params nf_flow_offload_rhash_params = {
>> .automatic_shrinking = true,
>> };
>>
>> +static bool flow_max_hw_entries(struct nf_flowtable *flow_table)
>> +{
>> + struct nf_ft_net *fnet = nf_ft_pernet_get(flow_table);
>> +
>> + return nf_ft_hw_max && atomic_read(&fnet->count_hw) >= nf_ft_hw_max;
>> +}
>> +
>> unsigned long flow_offload_get_timeout(struct flow_offload *flow)
>> {
>> unsigned long timeout = NF_FLOW_TIMEOUT;
>> @@ -320,7 +327,8 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
>>
>> nf_ct_offload_timeout(flow->ct);
>>
>> - if (nf_flowtable_hw_offload(flow_table)) {
>> + if (nf_flowtable_hw_offload(flow_table) &&
>> + !flow_max_hw_entries(flow_table)) {
>> __set_bit(NF_FLOW_HW, &flow->flags);
>> nf_flow_offload_add(flow_table, flow);
>> }
>> @@ -338,9 +346,11 @@ void flow_offload_refresh(struct nf_flowtable *flow_table,
>> if (READ_ONCE(flow->timeout) != timeout)
>> WRITE_ONCE(flow->timeout, timeout);
>>
>> - if (likely(!nf_flowtable_hw_offload(flow_table)))
>> + if (likely(!nf_flowtable_hw_offload(flow_table) ||
>> + flow_max_hw_entries(flow_table)))
>> return;
>>
>> + set_bit(NF_FLOW_HW, &flow->flags);
>> nf_flow_offload_add(flow_table, flow);
>> }
>> EXPORT_SYMBOL_GPL(flow_offload_refresh);
>> @@ -652,14 +662,53 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
>> }
>> EXPORT_SYMBOL_GPL(nf_flow_table_free);
>>
>> +static int nf_flow_table_pernet_init(struct net *net)
>> +{
>> + return nf_flow_table_init_sysctl(net);
>> +}
>> +
>> +static void nf_flow_table_pernet_exit(struct list_head *net_exit_list)
>> +{
>> + struct net *net;
>> +
>> + list_for_each_entry(net, net_exit_list, exit_list)
>> + nf_flow_table_fini_sysctl(net);
>> +}
>> +
>> +unsigned int nf_ft_net_id __read_mostly;
>> +
>> +static struct pernet_operations nf_flow_table_net_ops = {
>> + .init = nf_flow_table_pernet_init,
>> + .exit_batch = nf_flow_table_pernet_exit,
>> + .id = &nf_ft_net_id,
>> + .size = sizeof(struct nf_ft_net),
>> +};
>> +
>> static int __init nf_flow_table_module_init(void)
>> {
>> - return nf_flow_table_offload_init();
>> + int ret;
>> +
>> + nf_ft_hw_max = 0;
>> +
>> + ret = register_pernet_subsys(&nf_flow_table_net_ops);
>> + if (ret < 0)
>> + return ret;
>> +
>> + ret = nf_flow_table_offload_init();
>> + if (ret)
>> + goto out_offload;
>> +
>> + return 0;
>> +
>> +out_offload:
>> + unregister_pernet_subsys(&nf_flow_table_net_ops);
>> + return ret;
>> }
>>
>> static void __exit nf_flow_table_module_exit(void)
>> {
>> nf_flow_table_offload_exit();
>> + unregister_pernet_subsys(&nf_flow_table_net_ops);
>> }
>>
>> module_init(nf_flow_table_module_init);
>> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
>> index 11b6e1942092..a6e763901eb9 100644
>> --- a/net/netfilter/nf_flow_table_offload.c
>> +++ b/net/netfilter/nf_flow_table_offload.c
>> @@ -903,30 +903,56 @@ static int flow_offload_rule_add(struct flow_offload_work *offload,
>> return 0;
>> }
>>
>> +static bool flow_get_max_hw_entries(struct nf_flowtable *flow_table)
>> +{
>> + struct nf_ft_net *fnet = nf_ft_pernet_get(flow_table);
>> + int count_hw = atomic_inc_return(&fnet->count_hw);
>> +
>> + if (nf_ft_hw_max && count_hw > nf_ft_hw_max) {
>> + atomic_dec(&fnet->count_hw);
>> + return false;
>> + }
>> + return true;
>> +}
>> +
>> static void flow_offload_work_add(struct flow_offload_work *offload)
>> {
>> + struct nf_ft_net *fnet = nf_ft_pernet_get(offload->flowtable);
>> struct nf_flow_rule *flow_rule[FLOW_OFFLOAD_DIR_MAX];
>> int err;
>>
>> + if (!flow_get_max_hw_entries(offload->flowtable))
>> + return;
>> +
>> err = nf_flow_offload_alloc(offload, flow_rule);
>> if (err < 0)
>> - return;
>> + goto out_alloc;
>>
>> err = flow_offload_rule_add(offload, flow_rule);
>> if (err < 0)
>> - goto out;
>> + goto out_add;
>>
>> - set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
>> + if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status))
>> + atomic_dec(&fnet->count_hw);
>> + nf_flow_offload_destroy(flow_rule);
>> + return;
>>
>> -out:
>> +out_add:
>> nf_flow_offload_destroy(flow_rule);
>> +out_alloc:
>> + atomic_dec(&fnet->count_hw);
>> }
>>
>> static void flow_offload_work_del(struct flow_offload_work *offload)
>> {
>> - clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
>> + bool offloaded = test_and_clear_bit(IPS_HW_OFFLOAD_BIT,
>> + &offload->flow->ct->status);
>> + struct nf_ft_net *fnet = nf_ft_pernet_get(offload->flowtable);
>> +
>> flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
>> flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
>> + if (offloaded)
>> + atomic_dec(&fnet->count_hw);
>> set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
>> }
>>
>> diff --git a/net/netfilter/nf_flow_table_sysctl.c b/net/netfilter/nf_flow_table_sysctl.c
>> new file mode 100644
>> index 000000000000..ce8c0a2c4bdb
>> --- /dev/null
>> +++ b/net/netfilter/nf_flow_table_sysctl.c
>> @@ -0,0 +1,83 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +#include <linux/kernel.h>
>> +#include <net/netfilter/nf_flow_table.h>
>> +
>> +unsigned int nf_ft_hw_max __read_mostly;
>
> You can move this to nf_flow_table_core_offload.c for the Makefile
> idea.
>
>> +#ifdef CONFIG_SYSCTL
>
> If you follow the Kconfig+Makefile approach, then this #ifdef can
> likely go away as the entire file would be on/off thing.
In following patch I also implement procfs stats in this file, which
depends on another config variable (CONFIG_NF_FLOW_TABLE_PROCFS). Should
I put procfs code in another new file that will also be conditionally
compiled?
>
>> +enum nf_ct_sysctl_index {
>> + NF_SYSCTL_FLOW_TABLE_MAX_HW,
>> + NF_SYSCTL_FLOW_TABLE_COUNT_HW,
>> +
>> + __NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL,
>> +};
>> +
>> +#define NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL \
>> + (__NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL + 1)
>> +
>> +static struct ctl_table nf_flow_table_sysctl_table[] = {
>> + [NF_SYSCTL_FLOW_TABLE_MAX_HW] = {
>> + .procname = "nf_flowtable_max_hw",
>> + .data = &nf_ft_hw_max,
>> + .maxlen = sizeof(int),
>> + .mode = 0644,
>> + .proc_handler = proc_dointvec,
>> + },
>> + [NF_SYSCTL_FLOW_TABLE_COUNT_HW] = {
>> + .procname = "nf_flowtable_count_hw",
>> + .maxlen = sizeof(int),
>> + .mode = 0444,
>> + .proc_handler = proc_dointvec,
>> + },
>> + {}
>> +};
>> +
>> +int nf_flow_table_init_sysctl(struct net *net)
>> +{
>> + struct nf_ft_net *fnet = nf_ft_pernet(net);
>> + struct ctl_table *table;
>> +
>> + BUILD_BUG_ON(ARRAY_SIZE(nf_flow_table_sysctl_table) != NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL);
>> +
>> + table = kmemdup(nf_flow_table_sysctl_table, sizeof(nf_flow_table_sysctl_table),
>> + GFP_KERNEL);
>> + if (!table)
>> + return -ENOMEM;
>> +
>> + table[NF_SYSCTL_FLOW_TABLE_COUNT_HW].data = &fnet->count_hw;
>> +
>> + /* Don't allow non-init_net ns to alter global sysctls */
>> + if (!net_eq(&init_net, net))
>> + table[NF_SYSCTL_FLOW_TABLE_MAX_HW].mode = 0444;
>> +
>> + fnet->sysctl_header = register_net_sysctl(net, "net/netfilter/ft", table);
>> + if (!fnet->sysctl_header)
>> + goto out_unregister_netfilter;
>> +
>> + return 0;
>> +
>> +out_unregister_netfilter:
>> + kfree(table);
>> + return -ENOMEM;
>> +}
>> +
>> +void nf_flow_table_fini_sysctl(struct net *net)
>> +{
>> + struct nf_ft_net *fnet = nf_ft_pernet(net);
>> + struct ctl_table *table;
>> +
>> + table = fnet->sysctl_header->ctl_table_arg;
>> + unregister_net_sysctl_table(fnet->sysctl_header);
>> + kfree(table);
>> +}
>> +
>> +#else
>> +int nf_flow_table_init_sysctl(struct net *net)
>> +{
>> + return 0;
>> +}
>> +
>> +void nf_flow_table_fini_sysctl(struct net *net)
>> +{
>> +}
>> +#endif /* CONFIG_SYSCTL */
>> --
>> 2.31.1
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks
2022-05-17 11:20 ` Pablo Neira Ayuso
@ 2022-05-17 11:16 ` Vlad Buslov
2022-05-17 12:26 ` Pablo Neira Ayuso
0 siblings, 1 reply; 10+ messages in thread
From: Vlad Buslov @ 2022-05-17 11:16 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, kadlec, fw, ozsh, paulb
On Tue 17 May 2022 at 13:20, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, May 16, 2022 at 10:10:32PM +0300, Vlad Buslov wrote:
> [...]
>> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
>> index ddc54b6d18ee..c8fc5c7ef04a 100644
>> --- a/net/netfilter/Kconfig
>> +++ b/net/netfilter/Kconfig
>> @@ -734,6 +734,14 @@ config NF_FLOW_TABLE
>>
>> To compile it as a module, choose M here.
>>
>> +config NF_FLOW_TABLE_PROCFS
>> + bool "Supply flow table statistics in procfs"
>> + default y
>> + depends on PROC_FS
>> + help
>> + This option enables for the flow table offload statistics
>> + to be shown in procfs under net/netfilter/nf_flowtable.
>
> This belongs to patch 2/3.
>
> Then, use NF_FLOW_TABLE_PROCFS to conditionally add it to
> nf_flow_table if this is enabled in .config? To honor this new Kconfig
> toggle.
>
> I mean instead of:
>
> obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
> nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
> - nf_flow_table_offload.o
> + nf_flow_table_offload.o \
> + nf_flow_table_sysctl.o
>
> this?
>
> nf_flow_table-$(CONFIG_NF_FLOW_TABLE_SYSCTL) += nf_flow_table_sysctl.o
In V2 I have both sysctl and procfs implementations in single file.
As I replied for previous patch in series: Should I split those in two
separate files (nf_flow_table_sysctl.c and nf_flow_table_procfs.c) that
both could be conditionally compiled depending on their respective
configs?
>
>> config NETFILTER_XTABLES
>> tristate "Netfilter Xtables support (required for ip_tables)"
>> default m if NETFILTER_ADVANCED=n
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index e2598f98017c..c86dd627ef42 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -662,17 +662,51 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
>> }
>> EXPORT_SYMBOL_GPL(nf_flow_table_free);
>>
>> +static int nf_flow_table_init_net(struct net *net)
>> +{
>> + net->ft.stat = alloc_percpu(struct nf_flow_table_stat);
>
> Missing check for NULL in case alloc_percpu() fails?
>
I might be missing something, but why isn't NULL check in following line
sufficient?
>> + return net->ft.stat ? 0 : -ENOMEM;
>> +}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks
2022-05-16 19:10 ` [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks Vlad Buslov
@ 2022-05-17 11:20 ` Pablo Neira Ayuso
2022-05-17 11:16 ` Vlad Buslov
0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-17 11:20 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netfilter-devel, kadlec, fw, ozsh, paulb
On Mon, May 16, 2022 at 10:10:32PM +0300, Vlad Buslov wrote:
[...]
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index ddc54b6d18ee..c8fc5c7ef04a 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -734,6 +734,14 @@ config NF_FLOW_TABLE
>
> To compile it as a module, choose M here.
>
> +config NF_FLOW_TABLE_PROCFS
> + bool "Supply flow table statistics in procfs"
> + default y
> + depends on PROC_FS
> + help
> + This option enables for the flow table offload statistics
> + to be shown in procfs under net/netfilter/nf_flowtable.
This belongs to patch 2/3.
Then, use NF_FLOW_TABLE_PROCFS to conditionally add it to
nf_flow_table if this is enabled in .config? To honor this new Kconfig
toggle.
I mean instead of:
obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
- nf_flow_table_offload.o
+ nf_flow_table_offload.o \
+ nf_flow_table_sysctl.o
this?
nf_flow_table-$(CONFIG_NF_FLOW_TABLE_SYSCTL) += nf_flow_table_sysctl.o
> config NETFILTER_XTABLES
> tristate "Netfilter Xtables support (required for ip_tables)"
> default m if NETFILTER_ADVANCED=n
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index e2598f98017c..c86dd627ef42 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -662,17 +662,51 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
> }
> EXPORT_SYMBOL_GPL(nf_flow_table_free);
>
> +static int nf_flow_table_init_net(struct net *net)
> +{
> + net->ft.stat = alloc_percpu(struct nf_flow_table_stat);
Missing check for NULL in case alloc_percpu() fails?
> + return net->ft.stat ? 0 : -ENOMEM;
> +}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v2 2/3] netfilter: nf_flow_table: count and limit hw offloaded entries
2022-05-16 19:10 ` [PATCH net-next v2 2/3] netfilter: nf_flow_table: count and limit hw offloaded entries Vlad Buslov
@ 2022-05-17 11:28 ` Pablo Neira Ayuso
2022-05-17 11:10 ` Vlad Buslov
0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-17 11:28 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netfilter-devel, kadlec, fw, ozsh, paulb
On Mon, May 16, 2022 at 10:10:31PM +0300, Vlad Buslov wrote:
> To improve hardware offload debuggability and scalability introduce
> 'nf_flowtable_count_hw' and 'nf_flowtable_max_hw' sysctl entries in new
> dedicated 'net/netfilter/ft' namespace. Add new pernet struct nf_ft_net in
> order to store the counter and sysctl header of new sysctl table.
>
> Count the offloaded flows in workqueue add task handler. Verify that
> offloaded flow total is lower than allowed maximum before calling the
> driver callbacks. To prevent spamming the 'add' workqueue with tasks when
> flows can't be offloaded anymore also check that count is below limit
> before queuing offload work. This doesn't prevent all redundant workqueue
> task since counter can be taken by concurrent work handler after the check
> had been performed but before the offload job is executed but it still
> greatly reduces such occurrences. Note that flows that were not offloaded
> due to counter being larger than the cap can still be offloaded via refresh
> function.
>
> Ensure that flows are accounted correctly by verifying IPS_HW_OFFLOAD_BIT
> value before counting them. This ensures that add/refresh code path
> increments the counter exactly once per flow when setting the bit and
> decrements it only for accounted flows when deleting the flow with the bit
> set.
>
> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
> ---
>
> Notes:
> Changes V1 -> V2:
>
> - Combine patches that expose offloaded flow count and limit total
> offloaded flows.
>
> - Rework the counting logic to count in workqueue context.
>
> - Create dedicated namespace for flow table sysctls.
>
> .../networking/nf_flowtable-sysctl.rst | 17 ++++
> include/net/netfilter/nf_flow_table.h | 25 ++++++
> net/netfilter/Makefile | 3 +-
> net/netfilter/nf_flow_table_core.c | 55 +++++++++++-
> net/netfilter/nf_flow_table_offload.c | 36 ++++++--
> net/netfilter/nf_flow_table_sysctl.c | 83 +++++++++++++++++++
> 6 files changed, 210 insertions(+), 9 deletions(-)
> create mode 100644 Documentation/networking/nf_flowtable-sysctl.rst
> create mode 100644 net/netfilter/nf_flow_table_sysctl.c
>
> diff --git a/Documentation/networking/nf_flowtable-sysctl.rst b/Documentation/networking/nf_flowtable-sysctl.rst
> new file mode 100644
> index 000000000000..932b4abeca6c
> --- /dev/null
> +++ b/Documentation/networking/nf_flowtable-sysctl.rst
> @@ -0,0 +1,17 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===================================
> +Netfilter Flowtable Sysfs variables
> +===================================
Better document the sysctl entries in the existing nf_conntrack.txt
file rather than this new file?
> +
> +/proc/sys/net/netfilter/ft/nf_flowtable_* Variables:
> +=================================================
> +
> +nf_flowtable_count_hw - INTEGER (read-only)
> + Number of flowtable entries that are currently offloaded to hardware.
> +
> +nf_flowtable_max_hw - INTEGER
> + - 0 - disabled (default)
> + - not 0 - enabled
> +
> + Cap on maximum total amount of flowtable entries offloaded to hardware.
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 64daafd1fc41..e09c29fa034e 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -335,4 +335,29 @@ static inline __be16 nf_flow_pppoe_proto(const struct sk_buff *skb)
> return 0;
> }
>
> +struct nf_ft_net {
> + atomic_t count_hw;
> +#ifdef CONFIG_SYSCTL
> + struct ctl_table_header *sysctl_header;
> +#endif
> +};
> +
> +extern unsigned int nf_ft_hw_max;
> +extern unsigned int nf_ft_net_id;
> +
> +#include <net/netns/generic.h>
> +
> +static inline struct nf_ft_net *nf_ft_pernet(const struct net *net)
> +{
> + return net_generic(net, nf_ft_net_id);
> +}
> +
> +static inline struct nf_ft_net *nf_ft_pernet_get(struct nf_flowtable *flow_table)
> +{
> + return nf_ft_pernet(read_pnet(&flow_table->net));
> +}
> +
> +int nf_flow_table_init_sysctl(struct net *net);
> +void nf_flow_table_fini_sysctl(struct net *net);
> +
> #endif /* _NF_FLOW_TABLE_H */
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 238b6a620e88..67e281842b61 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -127,7 +127,8 @@ obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o
> # flow table infrastructure
> obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
> nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
> - nf_flow_table_offload.o
> + nf_flow_table_offload.o \
> + nf_flow_table_sysctl.o
>
> obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
>
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 3db256da919b..e2598f98017c 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -277,6 +277,13 @@ static const struct rhashtable_params nf_flow_offload_rhash_params = {
> .automatic_shrinking = true,
> };
>
> +static bool flow_max_hw_entries(struct nf_flowtable *flow_table)
> +{
> + struct nf_ft_net *fnet = nf_ft_pernet_get(flow_table);
> +
> + return nf_ft_hw_max && atomic_read(&fnet->count_hw) >= nf_ft_hw_max;
> +}
> +
> unsigned long flow_offload_get_timeout(struct flow_offload *flow)
> {
> unsigned long timeout = NF_FLOW_TIMEOUT;
> @@ -320,7 +327,8 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
>
> nf_ct_offload_timeout(flow->ct);
>
> - if (nf_flowtable_hw_offload(flow_table)) {
> + if (nf_flowtable_hw_offload(flow_table) &&
> + !flow_max_hw_entries(flow_table)) {
> __set_bit(NF_FLOW_HW, &flow->flags);
> nf_flow_offload_add(flow_table, flow);
> }
> @@ -338,9 +346,11 @@ void flow_offload_refresh(struct nf_flowtable *flow_table,
> if (READ_ONCE(flow->timeout) != timeout)
> WRITE_ONCE(flow->timeout, timeout);
>
> - if (likely(!nf_flowtable_hw_offload(flow_table)))
> + if (likely(!nf_flowtable_hw_offload(flow_table) ||
> + flow_max_hw_entries(flow_table)))
> return;
>
> + set_bit(NF_FLOW_HW, &flow->flags);
> nf_flow_offload_add(flow_table, flow);
> }
> EXPORT_SYMBOL_GPL(flow_offload_refresh);
> @@ -652,14 +662,53 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
> }
> EXPORT_SYMBOL_GPL(nf_flow_table_free);
>
> +static int nf_flow_table_pernet_init(struct net *net)
> +{
> + return nf_flow_table_init_sysctl(net);
> +}
> +
> +static void nf_flow_table_pernet_exit(struct list_head *net_exit_list)
> +{
> + struct net *net;
> +
> + list_for_each_entry(net, net_exit_list, exit_list)
> + nf_flow_table_fini_sysctl(net);
> +}
> +
> +unsigned int nf_ft_net_id __read_mostly;
> +
> +static struct pernet_operations nf_flow_table_net_ops = {
> + .init = nf_flow_table_pernet_init,
> + .exit_batch = nf_flow_table_pernet_exit,
> + .id = &nf_ft_net_id,
> + .size = sizeof(struct nf_ft_net),
> +};
> +
> static int __init nf_flow_table_module_init(void)
> {
> - return nf_flow_table_offload_init();
> + int ret;
> +
> + nf_ft_hw_max = 0;
> +
> + ret = register_pernet_subsys(&nf_flow_table_net_ops);
> + if (ret < 0)
> + return ret;
> +
> + ret = nf_flow_table_offload_init();
> + if (ret)
> + goto out_offload;
> +
> + return 0;
> +
> +out_offload:
> + unregister_pernet_subsys(&nf_flow_table_net_ops);
> + return ret;
> }
>
> static void __exit nf_flow_table_module_exit(void)
> {
> nf_flow_table_offload_exit();
> + unregister_pernet_subsys(&nf_flow_table_net_ops);
> }
>
> module_init(nf_flow_table_module_init);
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index 11b6e1942092..a6e763901eb9 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -903,30 +903,56 @@ static int flow_offload_rule_add(struct flow_offload_work *offload,
> return 0;
> }
>
> +static bool flow_get_max_hw_entries(struct nf_flowtable *flow_table)
> +{
> + struct nf_ft_net *fnet = nf_ft_pernet_get(flow_table);
> + int count_hw = atomic_inc_return(&fnet->count_hw);
> +
> + if (nf_ft_hw_max && count_hw > nf_ft_hw_max) {
> + atomic_dec(&fnet->count_hw);
> + return false;
> + }
> + return true;
> +}
> +
> static void flow_offload_work_add(struct flow_offload_work *offload)
> {
> + struct nf_ft_net *fnet = nf_ft_pernet_get(offload->flowtable);
> struct nf_flow_rule *flow_rule[FLOW_OFFLOAD_DIR_MAX];
> int err;
>
> + if (!flow_get_max_hw_entries(offload->flowtable))
> + return;
> +
> err = nf_flow_offload_alloc(offload, flow_rule);
> if (err < 0)
> - return;
> + goto out_alloc;
>
> err = flow_offload_rule_add(offload, flow_rule);
> if (err < 0)
> - goto out;
> + goto out_add;
>
> - set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
> + if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status))
> + atomic_dec(&fnet->count_hw);
> + nf_flow_offload_destroy(flow_rule);
> + return;
>
> -out:
> +out_add:
> nf_flow_offload_destroy(flow_rule);
> +out_alloc:
> + atomic_dec(&fnet->count_hw);
> }
>
> static void flow_offload_work_del(struct flow_offload_work *offload)
> {
> - clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
> + bool offloaded = test_and_clear_bit(IPS_HW_OFFLOAD_BIT,
> + &offload->flow->ct->status);
> + struct nf_ft_net *fnet = nf_ft_pernet_get(offload->flowtable);
> +
> flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL);
> flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY);
> + if (offloaded)
> + atomic_dec(&fnet->count_hw);
> set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags);
> }
>
> diff --git a/net/netfilter/nf_flow_table_sysctl.c b/net/netfilter/nf_flow_table_sysctl.c
> new file mode 100644
> index 000000000000..ce8c0a2c4bdb
> --- /dev/null
> +++ b/net/netfilter/nf_flow_table_sysctl.c
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/kernel.h>
> +#include <net/netfilter/nf_flow_table.h>
> +
> +unsigned int nf_ft_hw_max __read_mostly;
You can move this to nf_flow_table_core_offload.c for the Makefile
idea.
> +#ifdef CONFIG_SYSCTL
If you follow the Kconfig+Makefile approach, then this #ifdef can
likely go away as the entire file would be on/off thing.
> +enum nf_ct_sysctl_index {
> + NF_SYSCTL_FLOW_TABLE_MAX_HW,
> + NF_SYSCTL_FLOW_TABLE_COUNT_HW,
> +
> + __NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL,
> +};
> +
> +#define NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL \
> + (__NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL + 1)
> +
> +static struct ctl_table nf_flow_table_sysctl_table[] = {
> + [NF_SYSCTL_FLOW_TABLE_MAX_HW] = {
> + .procname = "nf_flowtable_max_hw",
> + .data = &nf_ft_hw_max,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },
> + [NF_SYSCTL_FLOW_TABLE_COUNT_HW] = {
> + .procname = "nf_flowtable_count_hw",
> + .maxlen = sizeof(int),
> + .mode = 0444,
> + .proc_handler = proc_dointvec,
> + },
> + {}
> +};
> +
> +int nf_flow_table_init_sysctl(struct net *net)
> +{
> + struct nf_ft_net *fnet = nf_ft_pernet(net);
> + struct ctl_table *table;
> +
> + BUILD_BUG_ON(ARRAY_SIZE(nf_flow_table_sysctl_table) != NF_SYSCTL_FLOW_TABLE_LAST_SYSCTL);
> +
> + table = kmemdup(nf_flow_table_sysctl_table, sizeof(nf_flow_table_sysctl_table),
> + GFP_KERNEL);
> + if (!table)
> + return -ENOMEM;
> +
> + table[NF_SYSCTL_FLOW_TABLE_COUNT_HW].data = &fnet->count_hw;
> +
> + /* Don't allow non-init_net ns to alter global sysctls */
> + if (!net_eq(&init_net, net))
> + table[NF_SYSCTL_FLOW_TABLE_MAX_HW].mode = 0444;
> +
> + fnet->sysctl_header = register_net_sysctl(net, "net/netfilter/ft", table);
> + if (!fnet->sysctl_header)
> + goto out_unregister_netfilter;
> +
> + return 0;
> +
> +out_unregister_netfilter:
> + kfree(table);
> + return -ENOMEM;
> +}
> +
> +void nf_flow_table_fini_sysctl(struct net *net)
> +{
> + struct nf_ft_net *fnet = nf_ft_pernet(net);
> + struct ctl_table *table;
> +
> + table = fnet->sysctl_header->ctl_table_arg;
> + unregister_net_sysctl_table(fnet->sysctl_header);
> + kfree(table);
> +}
> +
> +#else
> +int nf_flow_table_init_sysctl(struct net *net)
> +{
> + return 0;
> +}
> +
> +void nf_flow_table_fini_sysctl(struct net *net)
> +{
> +}
> +#endif /* CONFIG_SYSCTL */
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks
2022-05-17 11:16 ` Vlad Buslov
@ 2022-05-17 12:26 ` Pablo Neira Ayuso
2022-05-17 15:18 ` Vlad Buslov
0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-17 12:26 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netfilter-devel, kadlec, fw, ozsh, paulb
On Tue, May 17, 2022 at 02:16:04PM +0300, Vlad Buslov wrote:
>
> On Tue 17 May 2022 at 13:20, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Mon, May 16, 2022 at 10:10:32PM +0300, Vlad Buslov wrote:
> > [...]
> >> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> >> index ddc54b6d18ee..c8fc5c7ef04a 100644
> >> --- a/net/netfilter/Kconfig
> >> +++ b/net/netfilter/Kconfig
> >> @@ -734,6 +734,14 @@ config NF_FLOW_TABLE
> >>
> >> To compile it as a module, choose M here.
> >>
> >> +config NF_FLOW_TABLE_PROCFS
> >> + bool "Supply flow table statistics in procfs"
> >> + default y
> >> + depends on PROC_FS
> >> + help
> >> + This option enables for the flow table offload statistics
> >> + to be shown in procfs under net/netfilter/nf_flowtable.
> >
> > This belongs to patch 2/3.
> >
> > Then, use NF_FLOW_TABLE_PROCFS to conditionally add it to
> > nf_flow_table if this is enabled in .config? To honor this new Kconfig
> > toggle.
> >
> > I mean instead of:
> >
> > obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
> > nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
> > - nf_flow_table_offload.o
> > + nf_flow_table_offload.o \
> > + nf_flow_table_sysctl.o
> >
> > this?
> >
> > nf_flow_table-$(CONFIG_NF_FLOW_TABLE_SYSCTL) += nf_flow_table_sysctl.o
>
> In V2 I have both sysctl and procfs implementations in single file.
> As I replied for previous patch in series: Should I split those in two
> separate files (nf_flow_table_sysctl.c and nf_flow_table_procfs.c) that
> both could be conditionally compiled depending on their respective
> configs?
Same file is fine.
Probably instead ?
nf_flow_table-$(CONFIG_SYSCTL) += nf_flow_table_sysctl.o
so the #ifdef CONFIG_SYSCTL in nf_flow_table_sysctl.c can go away.
you would need to move:
unsigned int nf_ft_hw_max __read_mostly;
to nf_flow_table_offload.c
Make sense?
> >> config NETFILTER_XTABLES
> >> tristate "Netfilter Xtables support (required for ip_tables)"
> >> default m if NETFILTER_ADVANCED=n
> >> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> >> index e2598f98017c..c86dd627ef42 100644
> >> --- a/net/netfilter/nf_flow_table_core.c
> >> +++ b/net/netfilter/nf_flow_table_core.c
> >> @@ -662,17 +662,51 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
> >> }
> >> EXPORT_SYMBOL_GPL(nf_flow_table_free);
> >>
> >> +static int nf_flow_table_init_net(struct net *net)
> >> +{
> >> + net->ft.stat = alloc_percpu(struct nf_flow_table_stat);
> >
> > Missing check for NULL in case alloc_percpu() fails?
> >
>
> I might be missing something, but why isn't NULL check in following line
> sufficient?
I overlook the return is check for NULL, sorry for the noise.
> >> + return net->ft.stat ? 0 : -ENOMEM;
> >> +}
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks
2022-05-17 12:26 ` Pablo Neira Ayuso
@ 2022-05-17 15:18 ` Vlad Buslov
0 siblings, 0 replies; 10+ messages in thread
From: Vlad Buslov @ 2022-05-17 15:18 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, kadlec, fw, ozsh, paulb
On Tue 17 May 2022 at 14:26, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, May 17, 2022 at 02:16:04PM +0300, Vlad Buslov wrote:
>>
>> On Tue 17 May 2022 at 13:20, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > On Mon, May 16, 2022 at 10:10:32PM +0300, Vlad Buslov wrote:
>> > [...]
>> >> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
>> >> index ddc54b6d18ee..c8fc5c7ef04a 100644
>> >> --- a/net/netfilter/Kconfig
>> >> +++ b/net/netfilter/Kconfig
>> >> @@ -734,6 +734,14 @@ config NF_FLOW_TABLE
>> >>
>> >> To compile it as a module, choose M here.
>> >>
>> >> +config NF_FLOW_TABLE_PROCFS
>> >> + bool "Supply flow table statistics in procfs"
>> >> + default y
>> >> + depends on PROC_FS
>> >> + help
>> >> + This option enables for the flow table offload statistics
>> >> + to be shown in procfs under net/netfilter/nf_flowtable.
>> >
>> > This belongs to patch 2/3.
>> >
>> > Then, use NF_FLOW_TABLE_PROCFS to conditionally add it to
>> > nf_flow_table if this is enabled in .config? To honor this new Kconfig
>> > toggle.
>> >
>> > I mean instead of:
>> >
>> > obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
>> > nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
>> > - nf_flow_table_offload.o
>> > + nf_flow_table_offload.o \
>> > + nf_flow_table_sysctl.o
>> >
>> > this?
>> >
>> > nf_flow_table-$(CONFIG_NF_FLOW_TABLE_SYSCTL) += nf_flow_table_sysctl.o
>>
>> In V2 I have both sysctl and procfs implementations in single file.
>> As I replied for previous patch in series: Should I split those in two
>> separate files (nf_flow_table_sysctl.c and nf_flow_table_procfs.c) that
>> both could be conditionally compiled depending on their respective
>> configs?
>
> Same file is fine.
>
> Probably instead ?
>
> nf_flow_table-$(CONFIG_SYSCTL) += nf_flow_table_sysctl.o
>
> so the #ifdef CONFIG_SYSCTL in nf_flow_table_sysctl.c can go away.
>
> you would need to move:
>
> unsigned int nf_ft_hw_max __read_mostly;
>
> to nf_flow_table_offload.c
>
> Make sense?
Yep. Will send the V3 soon.
Thanks,
Vlad
[...]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-05-17 16:20 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-16 19:10 [PATCH net-next v2 0/3] Conntrack offload debuggability improvements Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 1/3] net/sched: act_ct: set 'net' pointer when creating new nf_flow_table Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 2/3] netfilter: nf_flow_table: count and limit hw offloaded entries Vlad Buslov
2022-05-17 11:28 ` Pablo Neira Ayuso
2022-05-17 11:10 ` Vlad Buslov
2022-05-16 19:10 ` [PATCH net-next v2 3/3] netfilter: nf_flow_table: count pending offload workqueue tasks Vlad Buslov
2022-05-17 11:20 ` Pablo Neira Ayuso
2022-05-17 11:16 ` Vlad Buslov
2022-05-17 12:26 ` Pablo Neira Ayuso
2022-05-17 15:18 ` Vlad Buslov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.