* [PATCH v2 1/7] netfilter: nf_tables: Store user-defined hook ifname
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
@ 2024-05-17 13:06 ` Phil Sutter
2024-05-17 13:06 ` [PATCH v2 2/7] netfilter: nf_tables: Relax hook interface binding Phil Sutter
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2024-05-17 13:06 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
In order to support dynamic interface binding, the name must be stored
separately. Also store the attribute length, it may serve to implement
simple wildcards (eth* for instance).
Also use the stored name when filling hook's NFTA_DEVICE_NAME attribute.
This avoids at least inadvertent changes in stored rulesets if an
interface is renamed at run-time.
Compare hooks by this stored interface name instead of the 'ops.dev'
pointer. Also prerequisite work for dynamic interface binding.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
include/net/netfilter/nf_tables.h | 2 ++
net/netfilter/nf_tables_api.c | 19 +++++++++++--------
2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 3f1ed467f951..3dec239bdb22 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1183,6 +1183,8 @@ struct nft_hook {
struct list_head list;
struct nf_hook_ops ops;
struct rcu_head rcu;
+ char ifname[IFNAMSIZ];
+ u8 ifnamelen;
};
/**
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 84fa25305b4f..4f64dbac5abc 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1799,15 +1799,16 @@ static int nft_dump_basechain_hook(struct sk_buff *skb, int family,
if (!first)
first = hook;
- if (nla_put_string(skb, NFTA_DEVICE_NAME,
- hook->ops.dev->name))
+ if (nla_put(skb, NFTA_DEVICE_NAME,
+ hook->ifnamelen, hook->ifname))
goto nla_put_failure;
n++;
}
nla_nest_end(skb, nest_devs);
if (n == 1 &&
- nla_put_string(skb, NFTA_HOOK_DEV, first->ops.dev->name))
+ nla_put(skb, NFTA_HOOK_DEV,
+ first->ifnamelen, first->ifname))
goto nla_put_failure;
}
nla_nest_end(skb, nest);
@@ -2118,7 +2119,6 @@ static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
const struct nlattr *attr)
{
struct net_device *dev;
- char ifname[IFNAMSIZ];
struct nft_hook *hook;
int err;
@@ -2128,12 +2128,13 @@ static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
goto err_hook_alloc;
}
- nla_strscpy(ifname, attr, IFNAMSIZ);
+ nla_strscpy(hook->ifname, attr, IFNAMSIZ);
+ hook->ifnamelen = nla_len(attr);
/* nf_tables_netdev_event() is called under rtnl_mutex, this is
* indirectly serializing all the other holders of the commit_mutex with
* the rtnl_mutex.
*/
- dev = __dev_get_by_name(net, ifname);
+ dev = __dev_get_by_name(net, hook->ifname);
if (!dev) {
err = -ENOENT;
goto err_hook_dev;
@@ -2154,7 +2155,8 @@ static struct nft_hook *nft_hook_list_find(struct list_head *hook_list,
struct nft_hook *hook;
list_for_each_entry(hook, hook_list, list) {
- if (this->ops.dev == hook->ops.dev)
+ if (hook->ifnamelen == this->ifnamelen &&
+ !strncmp(hook->ifname, this->ifname, hook->ifnamelen))
return hook;
}
@@ -8908,7 +8910,8 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net,
hook_list = &flowtable->hook_list;
list_for_each_entry_rcu(hook, hook_list, list) {
- if (nla_put_string(skb, NFTA_DEVICE_NAME, hook->ops.dev->name))
+ if (nla_put(skb, NFTA_DEVICE_NAME,
+ hook->ifnamelen, hook->ifname))
goto nla_put_failure;
}
nla_nest_end(skb, nest_devs);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 2/7] netfilter: nf_tables: Relax hook interface binding
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
2024-05-17 13:06 ` [PATCH v2 1/7] netfilter: nf_tables: Store user-defined hook ifname Phil Sutter
@ 2024-05-17 13:06 ` Phil Sutter
2024-05-17 13:06 ` [PATCH v2 3/7] netfilter: nf_tables: Report active interfaces to user space Phil Sutter
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2024-05-17 13:06 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
When creating a new flowtable or netdev-family chain, accept that the
devices to bind to may not exist and proceed to create a stub hook.
Such inactive hooks are identified by 'ops.dev' pointer being NULL,
ignore them for practical purposes.
When reacting upon a vanishing interface, merely deactivate the hook
instead of removing it from the list. Also leave netdev chains in place
even if no active hooks remain. In combination with externally stored
interface names, this stabilizes ruleset dumps with regard to
disappearing interfaces.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
include/net/netfilter/nf_tables.h | 2 -
net/netfilter/nf_tables_api.c | 63 +++++++++++++------------------
net/netfilter/nft_chain_filter.c | 29 +++-----------
3 files changed, 33 insertions(+), 61 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 3dec239bdb22..9cbef71f0462 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1220,8 +1220,6 @@ static inline bool nft_is_base_chain(const struct nft_chain *chain)
return chain->flags & NFT_CHAIN_BASE;
}
-int __nft_release_basechain(struct nft_ctx *ctx);
-
unsigned int nft_do_chain(struct nft_pktinfo *pkt, void *priv);
static inline bool nft_use_inc(u32 *use)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 4f64dbac5abc..35990fbed444 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -282,6 +282,9 @@ static int nft_netdev_register_hooks(struct net *net,
j = 0;
list_for_each_entry(hook, hook_list, list) {
+ if (!hook->ops.dev)
+ continue;
+
err = nf_register_net_hook(net, &hook->ops);
if (err < 0)
goto err_register;
@@ -292,6 +295,9 @@ static int nft_netdev_register_hooks(struct net *net,
err_register:
list_for_each_entry(hook, hook_list, list) {
+ if (!hook->ops.dev)
+ continue;
+
if (j-- <= 0)
break;
@@ -307,7 +313,10 @@ static void nft_netdev_unregister_hooks(struct net *net,
struct nft_hook *hook, *next;
list_for_each_entry_safe(hook, next, hook_list, list) {
- nf_unregister_net_hook(net, &hook->ops);
+ if (hook->ops.dev) {
+ nf_unregister_net_hook(net, &hook->ops);
+ hook->ops.dev = NULL;
+ }
if (release_netdev) {
list_del(&hook->list);
kfree_rcu(hook, rcu);
@@ -2118,7 +2127,6 @@ void nf_tables_chain_destroy(struct nft_ctx *ctx)
static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
const struct nlattr *attr)
{
- struct net_device *dev;
struct nft_hook *hook;
int err;
@@ -2134,17 +2142,10 @@ static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
* indirectly serializing all the other holders of the commit_mutex with
* the rtnl_mutex.
*/
- dev = __dev_get_by_name(net, hook->ifname);
- if (!dev) {
- err = -ENOENT;
- goto err_hook_dev;
- }
- hook->ops.dev = dev;
+ hook->ops.dev = __dev_get_by_name(net, hook->ifname);
return hook;
-err_hook_dev:
- kfree(hook);
err_hook_alloc:
return ERR_PTR(err);
}
@@ -8452,6 +8453,9 @@ static void nft_unregister_flowtable_hook(struct net *net,
struct nft_flowtable *flowtable,
struct nft_hook *hook)
{
+ if (!hook->ops.dev)
+ return;
+
nf_unregister_net_hook(net, &hook->ops);
flowtable->data.type->setup(&flowtable->data, hook->ops.dev,
FLOW_BLOCK_UNBIND);
@@ -8464,7 +8468,8 @@ static void __nft_unregister_flowtable_net_hooks(struct net *net,
struct nft_hook *hook, *next;
list_for_each_entry_safe(hook, next, hook_list, list) {
- nf_unregister_net_hook(net, &hook->ops);
+ if (hook->ops.dev)
+ nf_unregister_net_hook(net, &hook->ops);
if (release_netdev) {
list_del(&hook->list);
kfree_rcu(hook, rcu);
@@ -8488,6 +8493,9 @@ static int nft_register_flowtable_net_hooks(struct net *net,
int err, i = 0;
list_for_each_entry(hook, hook_list, list) {
+ if (!hook->ops.dev)
+ continue;
+
list_for_each_entry(ft, &table->flowtables, list) {
if (!nft_is_active_next(net, ft))
continue;
@@ -8522,6 +8530,9 @@ static int nft_register_flowtable_net_hooks(struct net *net,
err_unregister_net_hooks:
list_for_each_entry_safe(hook, next, hook_list, list) {
+ if (!hook->ops.dev)
+ continue;
+
if (i-- <= 0)
break;
@@ -9117,8 +9128,10 @@ static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable)
flowtable->data.type->free(&flowtable->data);
list_for_each_entry_safe(hook, next, &flowtable->hook_list, list) {
- flowtable->data.type->setup(&flowtable->data, hook->ops.dev,
- FLOW_BLOCK_UNBIND);
+ if (hook->ops.dev)
+ flowtable->data.type->setup(&flowtable->data,
+ hook->ops.dev,
+ FLOW_BLOCK_UNBIND);
list_del_rcu(&hook->list);
kfree(hook);
}
@@ -9164,8 +9177,7 @@ static void nft_flowtable_event(unsigned long event, struct net_device *dev,
/* flow_offload_netdev_event() cleans up entries for us. */
nft_unregister_flowtable_hook(dev_net(dev), flowtable, hook);
- list_del_rcu(&hook->list);
- kfree_rcu(hook, rcu);
+ hook->ops.dev = NULL;
break;
}
}
@@ -11406,27 +11418,6 @@ int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data,
}
EXPORT_SYMBOL_GPL(nft_data_dump);
-int __nft_release_basechain(struct nft_ctx *ctx)
-{
- struct nft_rule *rule, *nr;
-
- if (WARN_ON(!nft_is_base_chain(ctx->chain)))
- return 0;
-
- nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain);
- list_for_each_entry_safe(rule, nr, &ctx->chain->rules, list) {
- list_del(&rule->list);
- nft_use_dec(&ctx->chain->use);
- nf_tables_rule_release(ctx, rule);
- }
- nft_chain_del(ctx->chain);
- nft_use_dec(&ctx->table->use);
- nf_tables_chain_destroy(ctx);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(__nft_release_basechain);
-
static void __nft_release_hook(struct net *net, struct nft_table *table)
{
struct nft_flowtable *flowtable;
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 274b6f7e6bb5..ddb438bc2afd 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -322,35 +322,18 @@ static void nft_netdev_event(unsigned long event, struct net_device *dev,
struct nft_ctx *ctx)
{
struct nft_base_chain *basechain = nft_base_chain(ctx->chain);
- struct nft_hook *hook, *found = NULL;
- int n = 0;
+ struct nft_hook *hook;
if (event != NETDEV_UNREGISTER)
return;
list_for_each_entry(hook, &basechain->hook_list, list) {
- if (hook->ops.dev == dev)
- found = hook;
-
- n++;
- }
- if (!found)
- return;
-
- if (n > 1) {
- nf_unregister_net_hook(ctx->net, &found->ops);
- list_del_rcu(&found->list);
- kfree_rcu(found, rcu);
- return;
+ if (hook->ops.dev == dev) {
+ nf_unregister_net_hook(ctx->net, &hook->ops);
+ hook->ops.dev = NULL;
+ break;
+ }
}
-
- /* UNREGISTER events are also happening on netns exit.
- *
- * Although nf_tables core releases all tables/chains, only this event
- * handler provides guarantee that hook->ops.dev is still accessible,
- * so we cannot skip exiting net namespaces.
- */
- __nft_release_basechain(ctx);
}
static int nf_tables_netdev_event(struct notifier_block *this,
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 3/7] netfilter: nf_tables: Report active interfaces to user space
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
2024-05-17 13:06 ` [PATCH v2 1/7] netfilter: nf_tables: Store user-defined hook ifname Phil Sutter
2024-05-17 13:06 ` [PATCH v2 2/7] netfilter: nf_tables: Relax hook interface binding Phil Sutter
@ 2024-05-17 13:06 ` Phil Sutter
2024-05-17 13:06 ` [PATCH v2 4/7] netfilter: nf_tables: Dynamic hook interface binding Phil Sutter
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2024-05-17 13:06 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
Since netdev family chains and flowtables now report the interfaces they
were created for irrespective of their existence, introduce new netlink
attributes holding the currently active set of interfaces.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
include/uapi/linux/netfilter/nf_tables.h | 6 +++++-
net/netfilter/nf_tables_api.c | 25 ++++++++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index aa4094ca2444..adcac6ee619d 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -164,6 +164,7 @@ enum nft_list_attributes {
* @NFTA_HOOK_PRIORITY: netfilter hook priority (NLA_U32)
* @NFTA_HOOK_DEV: netdevice name (NLA_STRING)
* @NFTA_HOOK_DEVS: list of netdevices (NLA_NESTED)
+ * @NFTA_HOOK_ACT_DEVS: list of active netdevices (NLA_NESTED)
*/
enum nft_hook_attributes {
NFTA_HOOK_UNSPEC,
@@ -171,6 +172,7 @@ enum nft_hook_attributes {
NFTA_HOOK_PRIORITY,
NFTA_HOOK_DEV,
NFTA_HOOK_DEVS,
+ NFTA_HOOK_ACT_DEVS,
__NFTA_HOOK_MAX
};
#define NFTA_HOOK_MAX (__NFTA_HOOK_MAX - 1)
@@ -1717,13 +1719,15 @@ enum nft_flowtable_attributes {
*
* @NFTA_FLOWTABLE_HOOK_NUM: netfilter hook number (NLA_U32)
* @NFTA_FLOWTABLE_HOOK_PRIORITY: netfilter hook priority (NLA_U32)
- * @NFTA_FLOWTABLE_HOOK_DEVS: input devices this flow table is bound to (NLA_NESTED)
+ * @NFTA_FLOWTABLE_HOOK_DEVS: input devices this flow table is configured for (NLA_NESTED)
+ * @NFTA_FLOWTABLE_HOOK_ACT_DEVS: input devices this flow table is currently bound to (NLA_NESTED)
*/
enum nft_flowtable_hook_attributes {
NFTA_FLOWTABLE_HOOK_UNSPEC,
NFTA_FLOWTABLE_HOOK_NUM,
NFTA_FLOWTABLE_HOOK_PRIORITY,
NFTA_FLOWTABLE_HOOK_DEVS,
+ NFTA_FLOWTABLE_HOOK_ACT_DEVS,
__NFTA_FLOWTABLE_HOOK_MAX
};
#define NFTA_FLOWTABLE_HOOK_MAX (__NFTA_FLOWTABLE_HOOK_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 35990fbed444..87576accc2b2 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1819,6 +1819,18 @@ static int nft_dump_basechain_hook(struct sk_buff *skb, int family,
nla_put(skb, NFTA_HOOK_DEV,
first->ifnamelen, first->ifname))
goto nla_put_failure;
+
+ nest_devs = nla_nest_start_noflag(skb, NFTA_HOOK_ACT_DEVS);
+ if (!nest_devs)
+ goto nla_put_failure;
+
+ list_for_each_entry(hook, hook_list, list) {
+ if (hook->ops.dev &&
+ nla_put_string(skb, NFTA_DEVICE_NAME,
+ hook->ops.dev->name))
+ goto nla_put_failure;
+ }
+ nla_nest_end(skb, nest_devs);
}
nla_nest_end(skb, nest);
@@ -8926,6 +8938,19 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net,
goto nla_put_failure;
}
nla_nest_end(skb, nest_devs);
+
+ nest_devs = nla_nest_start_noflag(skb, NFTA_FLOWTABLE_HOOK_ACT_DEVS);
+ if (!nest_devs)
+ goto nla_put_failure;
+
+ list_for_each_entry_rcu(hook, hook_list, list) {
+ if (hook->ops.dev &&
+ nla_put_string(skb, NFTA_DEVICE_NAME,
+ hook->ops.dev->name))
+ goto nla_put_failure;
+ }
+ nla_nest_end(skb, nest_devs);
+
nla_nest_end(skb, nest);
nlmsg_end(skb, nlh);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 4/7] netfilter: nf_tables: Dynamic hook interface binding
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
` (2 preceding siblings ...)
2024-05-17 13:06 ` [PATCH v2 3/7] netfilter: nf_tables: Report active interfaces to user space Phil Sutter
@ 2024-05-17 13:06 ` Phil Sutter
2024-05-17 13:06 ` [PATCH v2 5/7] netfilter: nf_tables: Correctly handle NETDEV_RENAME events Phil Sutter
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2024-05-17 13:06 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
Upon NETDEV_REGISTER event, search existing flowtables and netdev-family
chains for a matching inactive hook and bind the device.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
net/netfilter/nf_tables_api.c | 76 +++++++++++++++++++++++---------
net/netfilter/nft_chain_filter.c | 40 +++++++++++++++--
2 files changed, 91 insertions(+), 25 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 87576accc2b2..b19f40874c48 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -8460,6 +8460,27 @@ nft_flowtable_type_get(struct net *net, u8 family)
return ERR_PTR(-ENOENT);
}
+static int nft_register_flowtable_hook(struct net *net,
+ struct nft_flowtable *flowtable,
+ struct nft_hook *hook)
+{
+ int err;
+
+ err = flowtable->data.type->setup(&flowtable->data,
+ hook->ops.dev, FLOW_BLOCK_BIND);
+ if (err < 0)
+ return err;
+
+ err = nf_register_net_hook(net, &hook->ops);
+ if (err < 0) {
+ flowtable->data.type->setup(&flowtable->data,
+ hook->ops.dev, FLOW_BLOCK_UNBIND);
+ return err;
+ }
+
+ return 0;
+}
+
/* Only called from error and netdev event paths. */
static void nft_unregister_flowtable_hook(struct net *net,
struct nft_flowtable *flowtable,
@@ -8521,20 +8542,10 @@ static int nft_register_flowtable_net_hooks(struct net *net,
}
}
- err = flowtable->data.type->setup(&flowtable->data,
- hook->ops.dev,
- FLOW_BLOCK_BIND);
+ err = nft_register_flowtable_hook(net, flowtable, hook);
if (err < 0)
goto err_unregister_net_hooks;
- err = nf_register_net_hook(net, &hook->ops);
- if (err < 0) {
- flowtable->data.type->setup(&flowtable->data,
- hook->ops.dev,
- FLOW_BLOCK_UNBIND);
- goto err_unregister_net_hooks;
- }
-
i++;
}
@@ -9191,20 +9202,40 @@ static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net,
return -EMSGSIZE;
}
-static void nft_flowtable_event(unsigned long event, struct net_device *dev,
- struct nft_flowtable *flowtable)
+static int nft_flowtable_event(unsigned long event, struct net_device *dev,
+ struct nft_flowtable *flowtable)
{
struct nft_hook *hook;
list_for_each_entry(hook, &flowtable->hook_list, list) {
- if (hook->ops.dev != dev)
- continue;
+ switch (event) {
+ case NETDEV_UNREGISTER:
+ if (hook->ops.dev != dev)
+ break;
- /* flow_offload_netdev_event() cleans up entries for us. */
- nft_unregister_flowtable_hook(dev_net(dev), flowtable, hook);
- hook->ops.dev = NULL;
- break;
+ /* flow_offload_netdev_event() cleans up entries for us. */
+ nft_unregister_flowtable_hook(dev_net(dev),
+ flowtable, hook);
+ hook->ops.dev = NULL;
+ return 1;
+ case NETDEV_REGISTER:
+ if (hook->ops.dev ||
+ strncmp(hook->ifname, dev->name, hook->ifnamelen))
+ break;
+
+ hook->ops.dev = dev;
+ if (!nft_register_flowtable_hook(dev_net(dev),
+ flowtable, hook))
+ return 1;
+
+ printk(KERN_ERR
+ "flowtable %s: Can't hook into device %s\n",
+ flowtable->name, dev->name);
+ hook->ops.dev = NULL;
+ break;
+ }
}
+ return 0;
}
static int nf_tables_flowtable_event(struct notifier_block *this,
@@ -9216,7 +9247,8 @@ static int nf_tables_flowtable_event(struct notifier_block *this,
struct nft_table *table;
struct net *net;
- if (event != NETDEV_UNREGISTER)
+ if (event != NETDEV_UNREGISTER &&
+ event != NETDEV_REGISTER)
return 0;
net = dev_net(dev);
@@ -9224,9 +9256,11 @@ static int nf_tables_flowtable_event(struct notifier_block *this,
mutex_lock(&nft_net->commit_mutex);
list_for_each_entry(table, &nft_net->tables, list) {
list_for_each_entry(flowtable, &table->flowtables, list) {
- nft_flowtable_event(event, dev, flowtable);
+ if (nft_flowtable_event(event, dev, flowtable))
+ goto out_unlock;
}
}
+out_unlock:
mutex_unlock(&nft_net->commit_mutex);
return NOTIFY_DONE;
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index ddb438bc2afd..b2147f8be60c 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -318,19 +318,50 @@ static const struct nft_chain_type nft_chain_filter_netdev = {
},
};
+static int nft_netdev_hook_dev_update(struct nft_hook *hook,
+ struct net_device *dev)
+{
+ int ret = 0;
+
+ if (hook->ops.dev)
+ nf_unregister_net_hook(dev_net(hook->ops.dev), &hook->ops);
+
+ hook->ops.dev = dev;
+
+ if (dev) {
+ ret = nf_register_net_hook(dev_net(dev), &hook->ops);
+ if (ret < 0)
+ hook->ops.dev = NULL;
+ }
+
+ return ret;
+}
+
static void nft_netdev_event(unsigned long event, struct net_device *dev,
struct nft_ctx *ctx)
{
struct nft_base_chain *basechain = nft_base_chain(ctx->chain);
struct nft_hook *hook;
- if (event != NETDEV_UNREGISTER)
+ if (event != NETDEV_UNREGISTER &&
+ event != NETDEV_REGISTER)
return;
list_for_each_entry(hook, &basechain->hook_list, list) {
- if (hook->ops.dev == dev) {
- nf_unregister_net_hook(ctx->net, &hook->ops);
- hook->ops.dev = NULL;
+ switch (event) {
+ case NETDEV_UNREGISTER:
+ if (hook->ops.dev == dev)
+ nft_netdev_hook_dev_update(hook, NULL);
+ break;
+ case NETDEV_REGISTER:
+ if (hook->ops.dev ||
+ strncmp(hook->ifname, dev->name, hook->ifnamelen))
+ break;
+ if (!nft_netdev_hook_dev_update(hook, dev))
+ return;
+
+ printk(KERN_ERR "chain %s: Can't hook into device %s\n",
+ ctx->chain->name, dev->name);
break;
}
}
@@ -349,6 +380,7 @@ static int nf_tables_netdev_event(struct notifier_block *this,
};
if (event != NETDEV_UNREGISTER &&
+ event != NETDEV_REGISTER &&
event != NETDEV_CHANGENAME)
return NOTIFY_DONE;
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 5/7] netfilter: nf_tables: Correctly handle NETDEV_RENAME events
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
` (3 preceding siblings ...)
2024-05-17 13:06 ` [PATCH v2 4/7] netfilter: nf_tables: Dynamic hook interface binding Phil Sutter
@ 2024-05-17 13:06 ` Phil Sutter
2024-05-17 13:06 ` [PATCH v2 6/7] netfilter: nf_tables: Add notications for hook changes Phil Sutter
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2024-05-17 13:06 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
Treat a netdev rename like removal and recreation with a different name.
In theory, one could leave hooks in place which still cover the new
name, but this is both unlikely and needlessly complicates the
code.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
net/netfilter/nf_tables_api.c | 10 +++++++---
net/netfilter/nft_chain_filter.c | 9 ++++++---
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b19f40874c48..b3a5a2878459 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -9247,9 +9247,13 @@ static int nf_tables_flowtable_event(struct notifier_block *this,
struct nft_table *table;
struct net *net;
- if (event != NETDEV_UNREGISTER &&
- event != NETDEV_REGISTER)
- return 0;
+ if (event == NETDEV_CHANGENAME) {
+ nf_tables_flowtable_event(this, NETDEV_UNREGISTER, ptr);
+ event = NETDEV_REGISTER;
+ } else if (event != NETDEV_UNREGISTER &&
+ event != NETDEV_REGISTER) {
+ return NOTIFY_DONE;
+ }
net = dev_net(dev);
nft_net = nft_pernet(net);
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index b2147f8be60c..cc0cf47503f4 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -379,10 +379,13 @@ static int nf_tables_netdev_event(struct notifier_block *this,
.net = dev_net(dev),
};
- if (event != NETDEV_UNREGISTER &&
- event != NETDEV_REGISTER &&
- event != NETDEV_CHANGENAME)
+ if (event == NETDEV_CHANGENAME) {
+ nf_tables_netdev_event(this, NETDEV_UNREGISTER, ptr);
+ event = NETDEV_REGISTER;
+ } else if (event != NETDEV_UNREGISTER &&
+ event != NETDEV_REGISTER) {
return NOTIFY_DONE;
+ }
nft_net = nft_pernet(ctx.net);
mutex_lock(&nft_net->commit_mutex);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 6/7] netfilter: nf_tables: Add notications for hook changes
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
` (4 preceding siblings ...)
2024-05-17 13:06 ` [PATCH v2 5/7] netfilter: nf_tables: Correctly handle NETDEV_RENAME events Phil Sutter
@ 2024-05-17 13:06 ` Phil Sutter
2024-05-17 13:06 ` [PATCH v2 7/7] selftests: netfilter: Torture nftables netdev hooks Phil Sutter
2024-06-17 23:10 ` [PATCH v2 0/7] Dynamic hook interface binding Pablo Neira Ayuso
7 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2024-05-17 13:06 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
If netdev hooks are updated due to netdev add/remove events, notify user
space via the usual netlink notification mechanism.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
include/net/netfilter/nf_tables.h | 4 ++++
net/netfilter/nf_tables_api.c | 21 +++++++++++++++++----
net/netfilter/nft_chain_filter.c | 16 +++++++++++++---
3 files changed, 34 insertions(+), 7 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 9cbef71f0462..bfa6cb7b2fd7 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -223,6 +223,8 @@ struct nft_ctx {
bool report;
};
+void nft_commit_notify(struct net *net, u32 portid);
+
enum nft_data_desc_flags {
NFT_DATA_DESC_SETELEM = (1 << 0),
};
@@ -1123,6 +1125,8 @@ int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set,
int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set);
int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
+void nf_tables_chain_notify(const struct nft_ctx *ctx, int event,
+ const struct list_head *hook_list);
enum nft_chain_types {
NFT_CHAIN_T_DEFAULT = 0,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b3a5a2878459..39202166c2a2 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1903,8 +1903,8 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
return -1;
}
-static void nf_tables_chain_notify(const struct nft_ctx *ctx, int event,
- const struct list_head *hook_list)
+void nf_tables_chain_notify(const struct nft_ctx *ctx, int event,
+ const struct list_head *hook_list)
{
struct nftables_pernet *nft_net;
struct sk_buff *skb;
@@ -9246,6 +9246,7 @@ static int nf_tables_flowtable_event(struct notifier_block *this,
struct nftables_pernet *nft_net;
struct nft_table *table;
struct net *net;
+ int rc = 0;
if (event == NETDEV_CHANGENAME) {
nf_tables_flowtable_event(this, NETDEV_UNREGISTER, ptr);
@@ -9260,11 +9261,23 @@ static int nf_tables_flowtable_event(struct notifier_block *this,
mutex_lock(&nft_net->commit_mutex);
list_for_each_entry(table, &nft_net->tables, list) {
list_for_each_entry(flowtable, &table->flowtables, list) {
- if (nft_flowtable_event(event, dev, flowtable))
+ rc = nft_flowtable_event(event, dev, flowtable);
+ if (rc)
goto out_unlock;
}
}
out_unlock:
+ if (rc == 1) {
+ struct nft_ctx ctx = {
+ .net = net,
+ .family = flowtable->table->family
+ };
+
+ nf_tables_flowtable_notify(&ctx, flowtable,
+ &flowtable->hook_list,
+ NFT_MSG_NEWFLOWTABLE);
+ nft_commit_notify(ctx.net, ctx.portid);
+ }
mutex_unlock(&nft_net->commit_mutex);
return NOTIFY_DONE;
@@ -10148,7 +10161,7 @@ static void nf_tables_commit_release(struct net *net)
mutex_unlock(&nft_net->commit_mutex);
}
-static void nft_commit_notify(struct net *net, u32 portid)
+void nft_commit_notify(struct net *net, u32 portid)
{
struct nftables_pernet *nft_net = nft_pernet(net);
struct sk_buff *batch_skb = NULL, *nskb, *skb;
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index cc0cf47503f4..f6bea21dab16 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -342,6 +342,7 @@ static void nft_netdev_event(unsigned long event, struct net_device *dev,
{
struct nft_base_chain *basechain = nft_base_chain(ctx->chain);
struct nft_hook *hook;
+ bool notify = false;
if (event != NETDEV_UNREGISTER &&
event != NETDEV_REGISTER)
@@ -350,21 +351,29 @@ static void nft_netdev_event(unsigned long event, struct net_device *dev,
list_for_each_entry(hook, &basechain->hook_list, list) {
switch (event) {
case NETDEV_UNREGISTER:
- if (hook->ops.dev == dev)
+ if (hook->ops.dev == dev) {
nft_netdev_hook_dev_update(hook, NULL);
+ notify = true;
+ }
break;
case NETDEV_REGISTER:
if (hook->ops.dev ||
strncmp(hook->ifname, dev->name, hook->ifnamelen))
break;
- if (!nft_netdev_hook_dev_update(hook, dev))
- return;
+ if (!nft_netdev_hook_dev_update(hook, dev)) {
+ notify = true;
+ goto out_notify;
+ }
printk(KERN_ERR "chain %s: Can't hook into device %s\n",
ctx->chain->name, dev->name);
break;
}
}
+out_notify:
+ if (notify)
+ nf_tables_chain_notify(ctx, NFT_MSG_NEWCHAIN,
+ &basechain->hook_list);
}
static int nf_tables_netdev_event(struct notifier_block *this,
@@ -409,6 +418,7 @@ static int nf_tables_netdev_event(struct notifier_block *this,
nft_netdev_event(event, dev, &ctx);
}
}
+ nft_commit_notify(ctx.net, ctx.portid);
mutex_unlock(&nft_net->commit_mutex);
return NOTIFY_DONE;
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 7/7] selftests: netfilter: Torture nftables netdev hooks
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
` (5 preceding siblings ...)
2024-05-17 13:06 ` [PATCH v2 6/7] netfilter: nf_tables: Add notications for hook changes Phil Sutter
@ 2024-05-17 13:06 ` Phil Sutter
2024-06-17 23:10 ` [PATCH v2 0/7] Dynamic hook interface binding Pablo Neira Ayuso
7 siblings, 0 replies; 14+ messages in thread
From: Phil Sutter @ 2024-05-17 13:06 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
Add a ruleset which binds to various interface names via netdev-family
chains and flowtables and massage the notifiers by frequently renaming
interfaces to match these names.
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
.../testing/selftests/net/netfilter/Makefile | 1 +
.../net/netfilter/nft_interface_stress.sh | 106 ++++++++++++++++++
2 files changed, 107 insertions(+)
create mode 100755 tools/testing/selftests/net/netfilter/nft_interface_stress.sh
diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index dd9a75a33d28..f95678882819 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -19,6 +19,7 @@ TEST_PROGS += nft_concat_range.sh
TEST_PROGS += nft_conntrack_helper.sh
TEST_PROGS += nft_fib.sh
TEST_PROGS += nft_flowtable.sh
+TEST_PROGS += nft_interface_stress.sh
TEST_PROGS += nft_meta.sh
TEST_PROGS += nft_nat.sh
TEST_PROGS += nft_nat_zones.sh
diff --git a/tools/testing/selftests/net/netfilter/nft_interface_stress.sh b/tools/testing/selftests/net/netfilter/nft_interface_stress.sh
new file mode 100755
index 000000000000..6d739f2e0ab9
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/nft_interface_stress.sh
@@ -0,0 +1,106 @@
+#!/bin/bash -e
+#
+# SPDX-License-Identifier: GPL-2.0
+#
+# Torture nftables' netdevice notifier callbacks and related code by frequent
+# renaming of interfaces which netdev-family chains and flowtables hook into.
+
+source lib.sh
+
+checktool "nft --version" "run test without nft tool"
+checktool "iperf3 --version" "run test without iperf3 tool"
+
+# how many seconds to torture the kernel, default to 80% of max run time
+TEST_RUNTIME=$((kselftest_timeout * 8 / 10))
+
+trap "cleanup_all_ns" EXIT
+
+setup_ns nsc nsr nss
+
+ip -net $nsc link add cr0 type veth peer name rc0 netns $nsr
+ip -net $nsc addr add 10.0.0.1/24 dev cr0
+ip -net $nsc link set cr0 up
+ip -net $nsc route add default via 10.0.0.2
+
+ip -net $nss link add sr0 type veth peer name rs0 netns $nsr
+ip -net $nss addr add 10.1.0.1/24 dev sr0
+ip -net $nss link set sr0 up
+ip -net $nss route add default via 10.1.0.2
+
+ip -net $nsr addr add 10.0.0.2/24 dev rc0
+ip -net $nsr link set rc0 up
+ip -net $nsr addr add 10.1.0.2/24 dev rs0
+ip -net $nsr link set rs0 up
+ip netns exec $nsr sysctl -q net.ipv4.ip_forward=1
+ip netns exec $nsr sysctl -q net.ipv4.conf.all.forwarding=1
+
+{
+ echo "table netdev t {"
+ for ((i = 0; i < 10; i++)); do
+ cat <<-EOF
+ chain chain_rc$i {
+ type filter hook ingress device rc$i priority 0
+ counter
+ }
+ chain chain_rs$i {
+ type filter hook ingress device rs$i priority 0
+ counter
+ }
+ EOF
+ done
+ echo "}"
+ echo "table ip t {"
+ for ((i = 0; i < 10; i++)); do
+ cat <<-EOF
+ flowtable ft_${i} {
+ hook ingress priority 0
+ devices = { rc$i, rs$i }
+ }
+ EOF
+ done
+ echo "chain c {"
+ echo "type filter hook forward priority 0"
+ for ((i = 0; i < 10; i++)); do
+ echo -n "iifname rc$i oifname rs$i "
+ echo "ip protocol tcp counter flow add @ft_${i}"
+ done
+ echo "counter"
+ echo "}"
+ echo "}"
+} | ip netns exec $nsr nft -f - || {
+ echo "SKIP: Could not load nft ruleset"
+ exit $ksft_skip
+}
+
+for ((o=0, n=1; ; o=n, n++, n %= 10)); do
+ ip -net $nsr link set rc$o name rc$n
+ ip -net $nsr link set rs$o name rs$n
+done &
+rename_loop_pid=$!
+
+ip netns exec $nsr nft monitor >/dev/null &
+nft_monitor_pid=$!
+
+ip netns exec $nss iperf3 --server --daemon -1
+summary_expr='s,^\[SUM\] .* \([0-9]\+\) Mbits/sec .* receiver,\1,p'
+rate=$(ip netns exec $nsc iperf3 \
+ --format m -c 10.1.0.1 --time $TEST_RUNTIME \
+ --length 56 --parallel 10 -i 0 | sed -n "$summary_expr")
+
+kill $nft_monitor_pid
+kill $rename_loop_pid
+wait
+
+#ip netns exec $nsr nft list ruleset
+
+[[ $(</proc/sys/kernel/tainted) -eq 0 ]] || {
+ echo "FAIL: Kernel is tainted!"
+ exit $ksft_fail
+}
+
+[[ $rate -gt 0 ]] || {
+ echo "FAIL: Zero throughput in iperf3"
+ exit $ksft_fail
+}
+
+exit $ksft_pass
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v2 0/7] Dynamic hook interface binding
2024-05-17 13:06 [PATCH v2 0/7] Dynamic hook interface binding Phil Sutter
` (6 preceding siblings ...)
2024-05-17 13:06 ` [PATCH v2 7/7] selftests: netfilter: Torture nftables netdev hooks Phil Sutter
@ 2024-06-17 23:10 ` Pablo Neira Ayuso
2024-06-19 11:27 ` Phil Sutter
7 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2024-06-17 23:10 UTC (permalink / raw)
To: Phil Sutter; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
Hi Phil,
On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote:
> Changes since v1:
> - New patch 6 adding notifications for updated hooks.
> - New patch 7 adding the requested torture test.
>
> Currently, netdev-family chains and flowtables expect their interfaces
> to exist at creation time. In practice, this bites users of virtual
> interfaces if these happen to be created after the nftables service
> starts up and loads the stored ruleset.
>
> Vice-versa, if an interface disappears at run-time (via module unloading
> or 'ip link del'), it also disappears from the ruleset, along with the
> chain and its rules which binds to it. This is at least problematic for
> setups which store the running ruleset during system shutdown.
I'd suggest that you place your patch 2/7 to modify the existing
behaviour in first place.
> This series attempts to solve these problems by effectively making
> netdev hooks name-based: If no matching interface is found at hook
> creation time, it will be inactive until a matching interface appears.
> If a bound interface is renamed, a matching inactive hook is searched
> for it.
>
> Ruleset dumps will stabilize in that regard. To still provide
> information about which existing interfaces a chain/flowtable currently
> binds to, new netlink attributes *_ACT_DEVS are introduced which are
> filled from the active hooks only.
Currently, NFTA_HOOK_DEVS already represents the netdevice that are
active. If one of these devices goes aways, then it is removed from
the basechain and it does not show up in NFTA_HOOK_DEVS anymore.
There are netlink notifications that need to fit into NLMSG_GOODSIZE,
but this adds yet another netlink array attribute.
I think we cannot escape adding new commands such as:
NFT_MSG_NEWDEVICE
NFT_MSG_GETDEVICE
NFT_MSG_DELDEVICE
to populate the basechain/flowtable, those can be used only if the
"subscription" mecanism is used, so older kernels still rely in
NFTA_HOOK_DEVS (older-older kernels actually already deal with
NFTA_HOOK_DEV only...).
NFT_MSG_NEWDEVICE can provide a flag to specify this is a wildcard
or exact matching.
There is also a need for a netlink attribute to specify if this is
adding a device to a chain or flowtable.
With these new commands, the NFT_NETDEVICE_MAX cap can also go away in
newer kernels with a command like this:
nft add device flowtable ip x y { a, b, c }
nft add device chain ip x y { a, b, c }
which expands to one one NFT_MSG_* message for each item.
Then, the nested notation will need to detect what to use depending on
the user input:
table netdev x {
chain y {
type filter hook ingress devices = { tap* } priority 0
}
}
this triggers the new commands path, same if the list of devices goes
over NFT_NETDEVICE_MAX (256).
This can also help incrementally report on new devices that match on a
given "subscription pattern" via new NFT_MSG_NEWDEVICE command for
monitoring purpose.
Maybe a command like:
nft list device chain ip x y
could be used to list the existing devices that are "active" as per
your definition, so:
nft list ruleset
does not show the listing of "active" devices that expand to tap*,
because this is only informational (not really required to restore a
ruleset), I guess the user only wants this to inspect to make sure
what devices are registered to the given tap* pattern.
There is also another case that would need to be handled:
- chain A with device tap0
- chain B with wildcard device tap*
I would expect a "exclude" clause for the wildcard case will come
sooner or later to define a different policy for a specify chain.
The new specific command approach would be extensible in that sense.
> This series is also prep work for a simple wildcard interface binding
> similar to the wildcard interface matching in meta expression. It should
> suffice to turn struct nft_hook::ops into an array of all matching
> interfaces, but the respective code does not exist yet.
I think this series is all about this wildcard interface matching,
which is not coming explicitly.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v2 0/7] Dynamic hook interface binding
2024-06-17 23:10 ` [PATCH v2 0/7] Dynamic hook interface binding Pablo Neira Ayuso
@ 2024-06-19 11:27 ` Phil Sutter
2024-06-19 14:48 ` Pablo Neira Ayuso
0 siblings, 1 reply; 14+ messages in thread
From: Phil Sutter @ 2024-06-19 11:27 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
On Tue, Jun 18, 2024 at 01:10:21AM +0200, Pablo Neira Ayuso wrote:
> Hi Phil,
>
> On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote:
> > Changes since v1:
> > - New patch 6 adding notifications for updated hooks.
> > - New patch 7 adding the requested torture test.
> >
> > Currently, netdev-family chains and flowtables expect their interfaces
> > to exist at creation time. In practice, this bites users of virtual
> > interfaces if these happen to be created after the nftables service
> > starts up and loads the stored ruleset.
> >
> > Vice-versa, if an interface disappears at run-time (via module unloading
> > or 'ip link del'), it also disappears from the ruleset, along with the
> > chain and its rules which binds to it. This is at least problematic for
> > setups which store the running ruleset during system shutdown.
>
> I'd suggest that you place your patch 2/7 to modify the existing
> behaviour in first place.
I did not to avoid "dead" entries in the hook list (with ops.dev gone,
there's nothing left identifying the hook). But it's only temporary and
maybe cleans up the diffs, will give it a go.
> > This series attempts to solve these problems by effectively making
> > netdev hooks name-based: If no matching interface is found at hook
> > creation time, it will be inactive until a matching interface appears.
> > If a bound interface is renamed, a matching inactive hook is searched
> > for it.
> >
> > Ruleset dumps will stabilize in that regard. To still provide
> > information about which existing interfaces a chain/flowtable currently
> > binds to, new netlink attributes *_ACT_DEVS are introduced which are
> > filled from the active hooks only.
>
> Currently, NFTA_HOOK_DEVS already represents the netdevice that are
> active. If one of these devices goes aways, then it is removed from
> the basechain and it does not show up in NFTA_HOOK_DEVS anymore.
>
> There are netlink notifications that need to fit into NLMSG_GOODSIZE,
> but this adds yet another netlink array attribute.
Hmm. I could introduce NFTA_HOOK_INACTIVE_DEVS which contains only those
entries missing in NFTA_HOOK_DEVS. This shouldn't bloat the dumps too
much (apart from the added overhead) and won't change old user space
behaviour.
> I think we cannot escape adding new commands such as:
>
> NFT_MSG_NEWDEVICE
> NFT_MSG_GETDEVICE
> NFT_MSG_DELDEVICE
>
> to populate the basechain/flowtable, those can be used only if the
> "subscription" mecanism is used, so older kernels still rely in
> NFTA_HOOK_DEVS (older-older kernels actually already deal with
> NFTA_HOOK_DEV only...).
Why can't they co-exist? I.e., NFTA_HOOK_DEV{,S} continues to behave as
before and NEWDEV/DELDEV modify the list of existing chains/flowtables.
An explicit GETDEV to dump currently active interfaces seems reasonable,
especially with wildcard specifiers in mind.
> NFT_MSG_NEWDEVICE can provide a flag to specify this is a wildcard
> or exact matching.
I had the same mechanism we use for wildcard ifname matching in mind
which is to specify a name length which either includes or excludes the
terminating NUL-char. Using strncmp() is sufficient then. Or do you
think more advanced (e.g. "enp*s0") wildcards are useful?
> There is also a need for a netlink attribute to specify if this is
> adding a device to a chain or flowtable.
Since one has to specify the specific chain or flowtable to add to
anyway, one could just add NFTA_DEV_CHAIN and NFTA_DEV_FLOWTABLE
attributes, which are mutually exclusive and together with
NFTA_DEV_TABLE identify the object to manipulate.
> With these new commands, the NFT_NETDEVICE_MAX cap can also go away in
> newer kernels with a command like this:
>
> nft add device flowtable ip x y { a, b, c }
> nft add device chain ip x y { a, b, c }
>
> which expands to one one NFT_MSG_* message for each item.
>
> Then, the nested notation will need to detect what to use depending on
> the user input:
>
> table netdev x {
> chain y {
> type filter hook ingress devices = { tap* } priority 0
> }
> }
>
> this triggers the new commands path, same if the list of devices goes
> over NFT_NETDEVICE_MAX (256).
This is for user space to remain compatible to older kernels, right?
> This can also help incrementally report on new devices that match on a
> given "subscription pattern" via new NFT_MSG_NEWDEVICE command for
> monitoring purpose.
Yes, adding NEWDEV/DELDEV notifications is a nice approach!
> Maybe a command like:
>
> nft list device chain ip x y
>
> could be used to list the existing devices that are "active" as per
> your definition, so:
>
> nft list ruleset
>
> does not show the listing of "active" devices that expand to tap*,
> because this is only informational (not really required to restore a
> ruleset), I guess the user only wants this to inspect to make sure
> what devices are registered to the given tap* pattern.
ACK.
> There is also another case that would need to be handled:
>
> - chain A with device tap0
> - chain B with wildcard device tap*
>
> I would expect a "exclude" clause for the wildcard case will come
> sooner or later to define a different policy for a specify chain.
> The new specific command approach would be extensible in that sense.
As a first implementation, I would just forbid such combinations.
Assuming "tap*" is just "tap" with length 3 and "tap0" is "tap0\0" with
length 5, modifying the duplicate hook check in
nf_tables_parse_netdev_hooks() to perform a strncmp(namea, nameb,
min(lena, lenb)) should suffice.
Another option avoiding maintenance of an exclusion list would be to
just add a new interface to the first chain/flowtable with matching hook
in the list. This should work since 'nft list ruleset' does not sort the
ruleset, so 'add chain t b; add chain t a' is a meaningful difference to
'add chain t a; add chain t b'.
The exclusion list approach either means tracking a list of active
hooks' matching names in each wildcard hook or some preference when
searching a hook for a new interface. All this may become increasingly
complicated by stuff like one hook for 'eth*' and another for 'et*'.
> > This series is also prep work for a simple wildcard interface binding
> > similar to the wildcard interface matching in meta expression. It should
> > suffice to turn struct nft_hook::ops into an array of all matching
> > interfaces, but the respective code does not exist yet.
>
> I think this series is all about this wildcard interface matching,
> which is not coming explicitly.
My motivation for this series is an open ticket complaining about the
inability to define a flowtable for an interface which is created by
NetworkManager (which depends on nftables service for startup).
A related (but not complained so far) issue is in reverse direction: If
the interface vanishes before nftables service saves the ruleset upon
shutdown, the dump will be incomplete.
I consider wildcard interface hooks merely a (nice) side-effect from
fixing the above. Which should not require too much additional work.
Thanks for your review,
Phil
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v2 0/7] Dynamic hook interface binding
2024-06-19 11:27 ` Phil Sutter
@ 2024-06-19 14:48 ` Pablo Neira Ayuso
2024-06-19 15:59 ` Phil Sutter
0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2024-06-19 14:48 UTC (permalink / raw)
To: Phil Sutter, netfilter-devel, Florian Westphal, Thomas Haller
Hi Phil,
On Wed, Jun 19, 2024 at 01:27:06PM +0200, Phil Sutter wrote:
> On Tue, Jun 18, 2024 at 01:10:21AM +0200, Pablo Neira Ayuso wrote:
> > Hi Phil,
> >
> > On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote:
> > > Changes since v1:
> > > - New patch 6 adding notifications for updated hooks.
> > > - New patch 7 adding the requested torture test.
> > >
> > > Currently, netdev-family chains and flowtables expect their interfaces
> > > to exist at creation time. In practice, this bites users of virtual
> > > interfaces if these happen to be created after the nftables service
> > > starts up and loads the stored ruleset.
> > >
> > > Vice-versa, if an interface disappears at run-time (via module unloading
> > > or 'ip link del'), it also disappears from the ruleset, along with the
> > > chain and its rules which binds to it. This is at least problematic for
> > > setups which store the running ruleset during system shutdown.
> >
> > I'd suggest that you place your patch 2/7 to modify the existing
> > behaviour in first place.
>
> I did not to avoid "dead" entries in the hook list (with ops.dev gone,
> there's nothing left identifying the hook). But it's only temporary and
> maybe cleans up the diffs, will give it a go.
OK.
> > > This series attempts to solve these problems by effectively making
> > > netdev hooks name-based: If no matching interface is found at hook
> > > creation time, it will be inactive until a matching interface appears.
> > > If a bound interface is renamed, a matching inactive hook is searched
> > > for it.
> > >
> > > Ruleset dumps will stabilize in that regard. To still provide
> > > information about which existing interfaces a chain/flowtable currently
> > > binds to, new netlink attributes *_ACT_DEVS are introduced which are
> > > filled from the active hooks only.
> >
> > Currently, NFTA_HOOK_DEVS already represents the netdevice that are
> > active. If one of these devices goes aways, then it is removed from
> > the basechain and it does not show up in NFTA_HOOK_DEVS anymore.
> >
> > There are netlink notifications that need to fit into NLMSG_GOODSIZE,
> > but this adds yet another netlink array attribute.
>
> Hmm. I could introduce NFTA_HOOK_INACTIVE_DEVS which contains only those
> entries missing in NFTA_HOOK_DEVS. This shouldn't bloat the dumps too
> much (apart from the added overhead) and won't change old user space
> behaviour.
Not sure. What does NFTA_HOOK_INACTIVE_DEVS contains? Could you
provide an example? Again, if this array gets again too large, there
could be issues with NLMSG_GOODSIZE again for notifications.
I would just display these active devices (in the list of devices that
are attached to this basechain) via the new command _GETDEV that we
are discussing below? These netdevices that match the pattern come and
go, I guess user only wants to make sure they are actually registered
to this hook for diagnostics, showing an exact match, ie. tap0, or
inexact match, ie. tap* should be should when listing the ruleset IMO.
> > I think we cannot escape adding new commands such as:
> >
> > NFT_MSG_NEWDEVICE
> > NFT_MSG_GETDEVICE
> > NFT_MSG_DELDEVICE
> >
> > to populate the basechain/flowtable, those can be used only if the
> > "subscription" mecanism is used, so older kernels still rely in
> > NFTA_HOOK_DEVS (older-older kernels actually already deal with
> > NFTA_HOOK_DEV only...).
>
> Why can't they co-exist? I.e., NFTA_HOOK_DEV{,S} continues to behave as
> before and NEWDEV/DELDEV modify the list of existing chains/flowtables.
> An explicit GETDEV to dump currently active interfaces seems reasonable,
> especially with wildcard specifiers in mind.
yes, NFTA_HOOK_DEVS and these new commands need to coexist, there is
no other way to retain backward compatibility.
> > NFT_MSG_NEWDEVICE can provide a flag to specify this is a wildcard
> > or exact matching.
>
> I had the same mechanism we use for wildcard ifname matching in mind
> which is to specify a name length which either includes or excludes the
> terminating NUL-char. Using strncmp() is sufficient then.
That's fine.
> Or do you think more advanced (e.g. "enp*s0") wildcards are useful?
I prefer future does not bring "enp*s0", but let's agree on an API
that can be extended to accomodate new requirements, that's my only
comment in this regard.
> > There is also a need for a netlink attribute to specify if this is
> > adding a device to a chain or flowtable.
>
> Since one has to specify the specific chain or flowtable to add to
> anyway, one could just add NFTA_DEV_CHAIN and NFTA_DEV_FLOWTABLE
> attributes, which are mutually exclusive and together with
> NFTA_DEV_TABLE identify the object to manipulate.
Right.
> > With these new commands, the NFT_NETDEVICE_MAX cap can also go away in
> > newer kernels with a command like this:
> >
> > nft add device flowtable ip x y { a, b, c }
> > nft add device chain ip x y { a, b, c }
> >
> > which expands to one one NFT_MSG_* message for each item.
> >
> > Then, the nested notation will need to detect what to use depending on
> > the user input:
> >
> > table netdev x {
> > chain y {
> > type filter hook ingress devices = { tap* } priority 0
> > }
> > }
> >
> > this triggers the new commands path, same if the list of devices goes
> > over NFT_NETDEVICE_MAX (256).
>
> This is for user space to remain compatible to older kernels, right?
Yes.
> > This can also help incrementally report on new devices that match on a
> > given "subscription pattern" via new NFT_MSG_NEWDEVICE command for
> > monitoring purpose.
>
> Yes, adding NEWDEV/DELDEV notifications is a nice approach!
This will also help to remove the existing NFT_NETDEVICE_MAX limit.
> > Maybe a command like:
> >
> > nft list device chain ip x y
> >
> > could be used to list the existing devices that are "active" as per
> > your definition, so:
> >
> > nft list ruleset
> >
> > does not show the listing of "active" devices that expand to tap*,
> > because this is only informational (not really required to restore a
> > ruleset), I guess the user only wants this to inspect to make sure
> > what devices are registered to the given tap* pattern.
>
> ACK.
>
> > There is also another case that would need to be handled:
> >
> > - chain A with device tap0
> > - chain B with wildcard device tap*
> >
> > I would expect a "exclude" clause for the wildcard case will come
> > sooner or later to define a different policy for a specify chain.
> > The new specific command approach would be extensible in that sense.
>
> As a first implementation, I would just forbid such combinations.
> Assuming "tap*" is just "tap" with length 3 and "tap0" is "tap0\0" with
> length 5, modifying the duplicate hook check in
> nf_tables_parse_netdev_hooks() to perform a strncmp(namea, nameb,
> min(lena, lenb)) should suffice.
That is fine to ensure a given basechain does not have both tap0 and tap*
> Another option avoiding maintenance of an exclusion list would be to
> just add a new interface to the first chain/flowtable with matching hook
> in the list. This should work since 'nft list ruleset' does not sort the
> ruleset, so 'add chain t b; add chain t a' is a meaningful difference to
> 'add chain t a; add chain t b'.
>
> The exclusion list approach either means tracking a list of active
> hooks' matching names in each wildcard hook or some preference when
> searching a hook for a new interface. All this may become increasingly
> complicated by stuff like one hook for 'eth*' and another for 'et*'.
>
> > > This series is also prep work for a simple wildcard interface binding
> > > similar to the wildcard interface matching in meta expression. It should
> > > suffice to turn struct nft_hook::ops into an array of all matching
> > > interfaces, but the respective code does not exist yet.
> >
> > I think this series is all about this wildcard interface matching,
> > which is not coming explicitly.
>
> My motivation for this series is an open ticket complaining about the
> inability to define a flowtable for an interface which is created by
> NetworkManager (which depends on nftables service for startup).
>
> A related (but not complained so far) issue is in reverse direction: If
> the interface vanishes before nftables service saves the ruleset upon
> shutdown, the dump will be incomplete.
>
> I consider wildcard interface hooks merely a (nice) side-effect from
> fixing the above. Which should not require too much additional work.
OK.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v2 0/7] Dynamic hook interface binding
2024-06-19 14:48 ` Pablo Neira Ayuso
@ 2024-06-19 15:59 ` Phil Sutter
2024-06-20 9:30 ` Pablo Neira Ayuso
2024-06-23 22:12 ` Pablo Neira Ayuso
0 siblings, 2 replies; 14+ messages in thread
From: Phil Sutter @ 2024-06-19 15:59 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, Florian Westphal, Thomas Haller
On Wed, Jun 19, 2024 at 04:48:35PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Jun 19, 2024 at 01:27:06PM +0200, Phil Sutter wrote:
> > On Tue, Jun 18, 2024 at 01:10:21AM +0200, Pablo Neira Ayuso wrote:
> > > Hi Phil,
> > >
> > > On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote:
> > > > Changes since v1:
> > > > - New patch 6 adding notifications for updated hooks.
> > > > - New patch 7 adding the requested torture test.
> > > >
> > > > Currently, netdev-family chains and flowtables expect their interfaces
> > > > to exist at creation time. In practice, this bites users of virtual
> > > > interfaces if these happen to be created after the nftables service
> > > > starts up and loads the stored ruleset.
> > > >
> > > > Vice-versa, if an interface disappears at run-time (via module unloading
> > > > or 'ip link del'), it also disappears from the ruleset, along with the
> > > > chain and its rules which binds to it. This is at least problematic for
> > > > setups which store the running ruleset during system shutdown.
> > >
> > > I'd suggest that you place your patch 2/7 to modify the existing
> > > behaviour in first place.
> >
> > I did not to avoid "dead" entries in the hook list (with ops.dev gone,
> > there's nothing left identifying the hook). But it's only temporary and
> > maybe cleans up the diffs, will give it a go.
>
> OK.
>
> > > > This series attempts to solve these problems by effectively making
> > > > netdev hooks name-based: If no matching interface is found at hook
> > > > creation time, it will be inactive until a matching interface appears.
> > > > If a bound interface is renamed, a matching inactive hook is searched
> > > > for it.
> > > >
> > > > Ruleset dumps will stabilize in that regard. To still provide
> > > > information about which existing interfaces a chain/flowtable currently
> > > > binds to, new netlink attributes *_ACT_DEVS are introduced which are
> > > > filled from the active hooks only.
> > >
> > > Currently, NFTA_HOOK_DEVS already represents the netdevice that are
> > > active. If one of these devices goes aways, then it is removed from
> > > the basechain and it does not show up in NFTA_HOOK_DEVS anymore.
> > >
> > > There are netlink notifications that need to fit into NLMSG_GOODSIZE,
> > > but this adds yet another netlink array attribute.
> >
> > Hmm. I could introduce NFTA_HOOK_INACTIVE_DEVS which contains only those
> > entries missing in NFTA_HOOK_DEVS. This shouldn't bloat the dumps too
> > much (apart from the added overhead) and won't change old user space
> > behaviour.
>
> Not sure. What does NFTA_HOOK_INACTIVE_DEVS contains? Could you
> provide an example? Again, if this array gets again too large, there
> could be issues with NLMSG_GOODSIZE again for notifications.
I assumed your intention was for NFTA_HOOK_DEVS to not change
semantically, i.e. remain to contain only devices which are present at
time of the dump. Then I could introduce INACTIVE_DEVS to contain those
we lost meanwhile. As an example:
1) add netdev chain for devices eth0, eth1, eth2
2) list ruleset:
- HOOK_DEVS = { eth0, eth1, eth2 }
- INACTIVE_DEVS = {}
3) ip link del eth1
4) list ruleset:
- HOOK_DEVS = { eth0, eth2 }
- INACTIVE_DEVS = { eth1 }
This avoids duplicate entries in both lists and thus avoids overhead.
This would fix for the interfaces missing in dumps problem.
Wildcards would appear as-is in either HOOK_DEVS (if there's at least
one matching interface) or INACTIVE_DEVS (if there is none). The actual
list of active interfaces would require a GETDEVICE call.
> I would just display these active devices (in the list of devices that
> are attached to this basechain) via the new command _GETDEV that we
> are discussing below? These netdevices that match the pattern come and
> go, I guess user only wants to make sure they are actually registered
> to this hook for diagnostics, showing an exact match, ie. tap0, or
> inexact match, ie. tap* should be should when listing the ruleset IMO.
OK, let's see if I can sum this up correctly:
1) NFTA_HOOK_DEVS is changed to always reflect what the user specified
2) Interfaces being removed or added trigger NEWDEV/DELDEV notifications
3) Active hooks are dumped by GETDEV netlink request
4) NEWDEV/DELDEV netlink requests/responses added to cover for oversized
chains/flowtables
You're saying we have to use (4) for wildcard interfaces, too. Is this
to keep them away from NFTA_HOOK_DEVS? Because in theory 1-3 are
sufficient for wildcards, too.
> > > I think we cannot escape adding new commands such as:
> > >
> > > NFT_MSG_NEWDEVICE
> > > NFT_MSG_GETDEVICE
> > > NFT_MSG_DELDEVICE
> > >
> > > to populate the basechain/flowtable, those can be used only if the
> > > "subscription" mecanism is used, so older kernels still rely in
> > > NFTA_HOOK_DEVS (older-older kernels actually already deal with
> > > NFTA_HOOK_DEV only...).
> >
> > Why can't they co-exist? I.e., NFTA_HOOK_DEV{,S} continues to behave as
> > before and NEWDEV/DELDEV modify the list of existing chains/flowtables.
> > An explicit GETDEV to dump currently active interfaces seems reasonable,
> > especially with wildcard specifiers in mind.
>
> yes, NFTA_HOOK_DEVS and these new commands need to coexist, there is
> no other way to retain backward compatibility.
>
> > > NFT_MSG_NEWDEVICE can provide a flag to specify this is a wildcard
> > > or exact matching.
> >
> > I had the same mechanism we use for wildcard ifname matching in mind
> > which is to specify a name length which either includes or excludes the
> > terminating NUL-char. Using strncmp() is sufficient then.
>
> That's fine.
>
> > Or do you think more advanced (e.g. "enp*s0") wildcards are useful?
>
> I prefer future does not bring "enp*s0", but let's agree on an API
> that can be extended to accomodate new requirements, that's my only
> comment in this regard.
>
> > > There is also a need for a netlink attribute to specify if this is
> > > adding a device to a chain or flowtable.
> >
> > Since one has to specify the specific chain or flowtable to add to
> > anyway, one could just add NFTA_DEV_CHAIN and NFTA_DEV_FLOWTABLE
> > attributes, which are mutually exclusive and together with
> > NFTA_DEV_TABLE identify the object to manipulate.
>
> Right.
>
> > > With these new commands, the NFT_NETDEVICE_MAX cap can also go away in
> > > newer kernels with a command like this:
> > >
> > > nft add device flowtable ip x y { a, b, c }
> > > nft add device chain ip x y { a, b, c }
> > >
> > > which expands to one one NFT_MSG_* message for each item.
> > >
> > > Then, the nested notation will need to detect what to use depending on
> > > the user input:
> > >
> > > table netdev x {
> > > chain y {
> > > type filter hook ingress devices = { tap* } priority 0
> > > }
> > > }
> > >
> > > this triggers the new commands path, same if the list of devices goes
> > > over NFT_NETDEVICE_MAX (256).
> >
> > This is for user space to remain compatible to older kernels, right?
>
> Yes.
>
> > > This can also help incrementally report on new devices that match on a
> > > given "subscription pattern" via new NFT_MSG_NEWDEVICE command for
> > > monitoring purpose.
> >
> > Yes, adding NEWDEV/DELDEV notifications is a nice approach!
>
> This will also help to remove the existing NFT_NETDEVICE_MAX limit.
>
> > > Maybe a command like:
> > >
> > > nft list device chain ip x y
> > >
> > > could be used to list the existing devices that are "active" as per
> > > your definition, so:
> > >
> > > nft list ruleset
> > >
> > > does not show the listing of "active" devices that expand to tap*,
> > > because this is only informational (not really required to restore a
> > > ruleset), I guess the user only wants this to inspect to make sure
> > > what devices are registered to the given tap* pattern.
> >
> > ACK.
> >
> > > There is also another case that would need to be handled:
> > >
> > > - chain A with device tap0
> > > - chain B with wildcard device tap*
> > >
> > > I would expect a "exclude" clause for the wildcard case will come
> > > sooner or later to define a different policy for a specify chain.
> > > The new specific command approach would be extensible in that sense.
> >
> > As a first implementation, I would just forbid such combinations.
> > Assuming "tap*" is just "tap" with length 3 and "tap0" is "tap0\0" with
> > length 5, modifying the duplicate hook check in
> > nf_tables_parse_netdev_hooks() to perform a strncmp(namea, nameb,
> > min(lena, lenb)) should suffice.
>
> That is fine to ensure a given basechain does not have both tap0 and tap*
Ah, I missed that nf_tables_parse_netdev_hooks() merely searches the
current list for duplicate entries, not all chains'/flowtables' ones.
Hmm. I can create multiple netdev chains attaching to the same
interfaces, but only a single flowtable. Is this intentional?
> > Another option avoiding maintenance of an exclusion list would be to
> > just add a new interface to the first chain/flowtable with matching hook
> > in the list. This should work since 'nft list ruleset' does not sort the
> > ruleset, so 'add chain t b; add chain t a' is a meaningful difference to
> > 'add chain t a; add chain t b'.
> >
> > The exclusion list approach either means tracking a list of active
> > hooks' matching names in each wildcard hook or some preference when
> > searching a hook for a new interface. All this may become increasingly
> > complicated by stuff like one hook for 'eth*' and another for 'et*'.
> >
> > > > This series is also prep work for a simple wildcard interface binding
> > > > similar to the wildcard interface matching in meta expression. It should
> > > > suffice to turn struct nft_hook::ops into an array of all matching
> > > > interfaces, but the respective code does not exist yet.
> > >
> > > I think this series is all about this wildcard interface matching,
> > > which is not coming explicitly.
> >
> > My motivation for this series is an open ticket complaining about the
> > inability to define a flowtable for an interface which is created by
> > NetworkManager (which depends on nftables service for startup).
> >
> > A related (but not complained so far) issue is in reverse direction: If
> > the interface vanishes before nftables service saves the ruleset upon
> > shutdown, the dump will be incomplete.
> >
> > I consider wildcard interface hooks merely a (nice) side-effect from
> > fixing the above. Which should not require too much additional work.
>
> OK.
>
> Thanks.
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v2 0/7] Dynamic hook interface binding
2024-06-19 15:59 ` Phil Sutter
@ 2024-06-20 9:30 ` Pablo Neira Ayuso
2024-06-23 22:12 ` Pablo Neira Ayuso
1 sibling, 0 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2024-06-20 9:30 UTC (permalink / raw)
To: Phil Sutter, netfilter-devel, Florian Westphal, Thomas Haller
Hi Phil,
On Wed, Jun 19, 2024 at 05:59:32PM +0200, Phil Sutter wrote:
> On Wed, Jun 19, 2024 at 04:48:35PM +0200, Pablo Neira Ayuso wrote:
> > On Wed, Jun 19, 2024 at 01:27:06PM +0200, Phil Sutter wrote:
> > > On Tue, Jun 18, 2024 at 01:10:21AM +0200, Pablo Neira Ayuso wrote:
> > > > On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote:
[...]
> > > > > This series attempts to solve these problems by effectively making
> > > > > netdev hooks name-based: If no matching interface is found at hook
> > > > > creation time, it will be inactive until a matching interface appears.
> > > > > If a bound interface is renamed, a matching inactive hook is searched
> > > > > for it.
> > > > >
> > > > > Ruleset dumps will stabilize in that regard. To still provide
> > > > > information about which existing interfaces a chain/flowtable currently
> > > > > binds to, new netlink attributes *_ACT_DEVS are introduced which are
> > > > > filled from the active hooks only.
> > > >
> > > > Currently, NFTA_HOOK_DEVS already represents the netdevice that are
> > > > active. If one of these devices goes aways, then it is removed from
> > > > the basechain and it does not show up in NFTA_HOOK_DEVS anymore.
> > > >
> > > > There are netlink notifications that need to fit into NLMSG_GOODSIZE,
> > > > but this adds yet another netlink array attribute.
> > >
> > > Hmm. I could introduce NFTA_HOOK_INACTIVE_DEVS which contains only those
> > > entries missing in NFTA_HOOK_DEVS. This shouldn't bloat the dumps too
> > > much (apart from the added overhead) and won't change old user space
> > > behaviour.
> >
> > Not sure. What does NFTA_HOOK_INACTIVE_DEVS contains? Could you
> > provide an example? Again, if this array gets again too large, there
> > could be issues with NLMSG_GOODSIZE again for notifications.
>
> I assumed your intention was for NFTA_HOOK_DEVS to not change
> semantically, i.e. remain to contain only devices which are present at
> time of the dump. Then I could introduce INACTIVE_DEVS to contain those
> we lost meanwhile. As an example:
>
> 1) add netdev chain for devices eth0, eth1, eth2
> 2) list ruleset:
> - HOOK_DEVS = { eth0, eth1, eth2 }
> - INACTIVE_DEVS = {}
> 3) ip link del eth1
> 4) list ruleset:
> - HOOK_DEVS = { eth0, eth2 }
> - INACTIVE_DEVS = { eth1 }
Hm. I think such list could grow up collecitng devices that could not
ever show up again.
> This avoids duplicate entries in both lists and thus avoids overhead.
> This would fix for the interfaces missing in dumps problem.
>
> Wildcards would appear as-is in either HOOK_DEVS (if there's at least
> one matching interface) or INACTIVE_DEVS (if there is none). The actual
> list of active interfaces would require a GETDEVICE call.
I think wildcards should inconditionally show in HOOK_DEVS, otherwise
this tracking becomes tricky?
I am not sure ACTIVE_DEVS or INACTIVE_DEVS is worth in the basechain
notification, since this is merely for diagnostics, just let this
information be listed in the specific command that allows to inspect
what devices are matching the wildcard (to me this is for debugging
purpose only, because if users says tap* then that is all tap devices
will be registered in such basechain/flowtable).
> > I would just display these active devices (in the list of devices that
> > are attached to this basechain) via the new command _GETDEV that we
> > are discussing below? These netdevices that match the pattern come and
> > go, I guess user only wants to make sure they are actually registered
> > to this hook for diagnostics, showing an exact match, ie. tap0, or
> > inexact match, ie. tap* should be should when listing the ruleset IMO.
>
> OK, let's see if I can sum this up correctly:
>
> 1) NFTA_HOOK_DEVS is changed to always reflect what the user specified
> 2) Interfaces being removed or added trigger NEWDEV/DELDEV notifications
> 3) Active hooks are dumped by GETDEV netlink request
> 4) NEWDEV/DELDEV netlink requests/responses added to cover for oversized
> chains/flowtables
Makes sense.
> You're saying we have to use (4) for wildcard interfaces, too. Is this
> to keep them away from NFTA_HOOK_DEVS? Because in theory 1-3 are
> sufficient for wildcards, too.
As said, I would only expose active devices in GETDEV.
[...]
> > > > There is also another case that would need to be handled:
> > > >
> > > > - chain A with device tap0
> > > > - chain B with wildcard device tap*
> > > >
> > > > I would expect a "exclude" clause for the wildcard case will come
> > > > sooner or later to define a different policy for a specify chain.
> > > > The new specific command approach would be extensible in that sense.
> > >
> > > As a first implementation, I would just forbid such combinations.
> > > Assuming "tap*" is just "tap" with length 3 and "tap0" is "tap0\0" with
> > > length 5, modifying the duplicate hook check in
> > > nf_tables_parse_netdev_hooks() to perform a strncmp(namea, nameb,
> > > min(lena, lenb)) should suffice.
> >
> > That is fine to ensure a given basechain does not have both tap0 and tap*
>
> Ah, I missed that nf_tables_parse_netdev_hooks() merely searches the
> current list for duplicate entries, not all chains'/flowtables' ones.
>
> Hmm. I can create multiple netdev chains attaching to the same
> interfaces, but only a single flowtable. Is this intentional?
For basechain, definitely because one could have a pure hardware
basechain (for offload) while a pure software policy in separated
basechain, placing hardware before software.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v2 0/7] Dynamic hook interface binding
2024-06-19 15:59 ` Phil Sutter
2024-06-20 9:30 ` Pablo Neira Ayuso
@ 2024-06-23 22:12 ` Pablo Neira Ayuso
1 sibling, 0 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2024-06-23 22:12 UTC (permalink / raw)
To: Phil Sutter, netfilter-devel, Florian Westphal, Thomas Haller
Hi again Phil,
On Wed, Jun 19, 2024 at 05:59:32PM +0200, Phil Sutter wrote:
[...]
> I assumed your intention was for NFTA_HOOK_DEVS to not change
> semantically, i.e. remain to contain only devices which are present at
> time of the dump. Then I could introduce INACTIVE_DEVS to contain those
> we lost meanwhile. As an example:
>
> 1) add netdev chain for devices eth0, eth1, eth2
> 2) list ruleset:
> - HOOK_DEVS = { eth0, eth1, eth2 }
> - INACTIVE_DEVS = {}
> 3) ip link del eth1
> 4) list ruleset:
> - HOOK_DEVS = { eth0, eth2 }
> - INACTIVE_DEVS = { eth1 }
>
> This avoids duplicate entries in both lists and thus avoids overhead.
> This would fix for the interfaces missing in dumps problem.
>
> Wildcards would appear as-is in either HOOK_DEVS (if there's at least
> one matching interface) or INACTIVE_DEVS (if there is none). The actual
> list of active interfaces would require a GETDEVICE call.
[...]
> OK, let's see if I can sum this up correctly:
>
> 1) NFTA_HOOK_DEVS is changed to always reflect what the user specified
> 2) Interfaces being removed or added trigger NEWDEV/DELDEV notifications
> 3) Active hooks are dumped by GETDEV netlink request
> 4) NEWDEV/DELDEV netlink requests/responses added to cover for oversized
> chains/flowtables
It should be possible to use:
nft list hooks
to display the flowtable
if user specifies the device, then it displays the ingress patch from
the given device, the flowtable hook could be shown there for
debugging purpose.
Then, there is no need for the ACTIVE_DEVS/INACTIVE_DEVS attribute.
Rewinding a bit: The NEWDEV/DELDEV interface I am proposing has a
downside, which is that userspace has to deal with two interfaces to
add netdevice to basechain/flowtables (currently NEWCHAIN/NEWFLOWTABLE
can be used for this purpose).
Probably I am overdoing a bit with this new interface.
Then, there is the concern with NLMSG_GOODSIZE, but it should be
possible to deliver independent notifications to userspace via
NEWCHAIN/NEWFLOWTABLE including only the netdevices, ie. allocate one
transaction for each new netdevice, so one notification is send per
new netdevice.
In summary: If ACTIVE_DEVS/INACTIVE_DEVS can go away, and nft list
hooks is used instead to check if there is a hook registered for such
device matching the wildcard netdevice, eg. tap*, then your existing
patch can be used as is.
As said, only purpose to know if the device is registered is
debugging, if user requires tap*, then if tap0 exists it is
registered, otherwise it is a kernel bug.
Apologies for this u-turn a bit, this is taking you back to this
original patch of you minus ACTIVE_DEVS/INACTIVE_DEVS and document
that 'nft link hooks' can be use to check if the netdevice got a hook
registered for the flowtable/basechain.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread