From: Florian Westphal <fw@strlen.de>
To: <netfilter-devel@vger.kernel.org>
Cc: Florian Westphal <fw@strlen.de>, Lorenzo Bianconi <lorenzo@kernel.org>
Subject: [PATCH RFC] netfilter: nf_tables: add flowtable map for xdp offload
Date: Thu, 19 Oct 2023 22:25:04 +0200 [thread overview]
Message-ID: <20231019202507.16439-1-fw@strlen.de> (raw)
This adds a small internal mapping table so that a new bpf (xdp) kfunc
can perform lookups in a flowtable.
I have no intent to push this without nft integration of the xdp program,
this RFC is just to get comments on the general direction because there
is a chicken/egg issue:
As-is, xdp program has access to the device pointer, but no way to do a
lookup in a flowtable -- there is no way to obtain the needed struct
without whacky stunts.
This would allow to such lookup just from device address: the bpf
kfunc would call nf_flowtable_by_dev() internally.
Limitation:
A device cannot be added to multiple flowtables, the mapping needs
to be unique.
As for integration with the kernel, there are several options:
1. Auto-add to the dev-xdp table whenever HW offload is requested.
2. Add to the dev-xdp table, but only if the HW offload request fails.
(softfallback).
3. add a dedicated 'xdp offload' flag to UAPI.
3) should not be needed, because userspace already controls this:
to make it work userspace needs to attach the xdp program to the
network device in the first place.
My thinking is to add a xdp-offload flag to the nft grammer only.
Its not needed on nf uapi side and it would tell nft to attach the xdp
flowtable forward program to the devices listed in the flowtable.
Also, packet flow is altered (qdiscs is bypassed), which is a strong
argument against default-usage.
The xdp prog source would be included with nftables.git and nft
would either attach/detach them or ship an extra prog that does this (TBD).
Open questions:
Do we need to support dev-in-multiple flowtables? I would like to
avoid this, this likely means the future "xdp" flag in nftables would
be restricted to "inet" family. Alternative would be to change the key to
'device address plus protocol family', the xdp prog could derive that from the
packet data.
Timeout handling. Should the XDP program even bother to refresh the
flowtable timeout?
It might make more sense to intentionally have packets
flow through the normal path periodically so neigh entries are up to date.
Also note that flow_offload_xdp struct likely needs to have a refcount
or genmask so that it integrates with the two-phase commit protocol on
netfilter side
(i.e., we should allow 're-add' because its needed to make flush+reload
work).
Not SoB, too raw for my taste.
CC: Lorenzo Bianconi <lorenzo@kernel.org>
---
net/netfilter/nf_flow_table_offload.c | 131 +++++++++++++++++++++++++-
1 file changed, 130 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index a010b25076ca..10313d296a8a 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -17,6 +17,92 @@ static struct workqueue_struct *nf_flow_offload_add_wq;
static struct workqueue_struct *nf_flow_offload_del_wq;
static struct workqueue_struct *nf_flow_offload_stats_wq;
+struct flow_offload_xdp {
+ struct hlist_node hnode;
+
+ unsigned long net_device_addr;
+ struct nf_flowtable *ft;
+
+ struct rcu_head rcuhead;
+};
+
+#define NF_XDP_HT_BITS 4
+static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
+static DEFINE_MUTEX(nf_xdp_hashtable_lock);
+
+/* caller must hold rcu read lock */
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
+{
+ unsigned long key = (unsigned long)dev;
+ const struct flow_offload_xdp *cur;
+
+ hash_for_each_possible_rcu(nf_xdp_hashtable, cur, hnode, key) {
+ if (key == cur->net_device_addr)
+ return cur->ft;
+ }
+
+ return NULL;
+}
+
+static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
+ const struct net_device *dev)
+{
+ unsigned long key = (unsigned long)dev;
+ struct flow_offload_xdp *cur;
+ int err = 0;
+
+ mutex_lock(&nf_xdp_hashtable_lock);
+ hash_for_each_possible(nf_xdp_hashtable, cur, hnode, key) {
+ if (key != cur->net_device_addr)
+ continue;
+ err = -EEXIST;
+ break;
+ }
+
+ if (err == 0) {
+ struct flow_offload_xdp *new;
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (new) {
+ new->net_device_addr = key;
+ new->ft = ft;
+
+ hash_add_rcu(nf_xdp_hashtable, &new->hnode, key);
+ } else {
+ err = -ENOMEM;
+ }
+ }
+
+ mutex_unlock(&nf_xdp_hashtable_lock);
+
+ DEBUG_NET_WARN_ON_ONCE(err == 0 && nf_flowtable_by_dev(dev) != ft);
+
+ return err;
+}
+
+static void nf_flowtable_by_dev_remove(const struct net_device *dev)
+{
+ unsigned long key = (unsigned long)dev;
+ struct flow_offload_xdp *cur;
+ bool found = false;
+
+ mutex_lock(&nf_xdp_hashtable_lock);
+
+ hash_for_each_possible(nf_xdp_hashtable, cur, hnode, key) {
+ if (key != cur->net_device_addr)
+ continue;
+
+ hash_del_rcu(&cur->hnode);
+ kfree_rcu(cur, rcuhead);
+ found = true;
+ break;
+ }
+
+ mutex_unlock(&nf_xdp_hashtable_lock);
+
+ WARN_ON_ONCE(!found);
+}
+
struct flow_offload_work {
struct list_head list;
enum flow_cls_command cmd;
@@ -1183,6 +1269,38 @@ static int nf_flow_table_offload_cmd(struct flow_block_offload *bo,
return 0;
}
+static int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ switch (cmd) {
+ case FLOW_BLOCK_BIND:
+ return nf_flowtable_by_dev_insert(flowtable, dev);
+ case FLOW_BLOCK_UNBIND:
+ nf_flowtable_by_dev_remove(dev);
+ return 0;
+ }
+
+ WARN_ON_ONCE(1);
+ return 0;
+}
+
+static void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ switch (cmd) {
+ case FLOW_BLOCK_BIND:
+ nf_flowtable_by_dev_remove(dev);
+ return;
+ case FLOW_BLOCK_UNBIND:
+ /* We do not re-bind in case hw offload would report error
+ * on *unregister*.
+ */
+ break;
+ }
+}
+
int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
struct net_device *dev,
enum flow_block_command cmd)
@@ -1191,6 +1309,15 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
struct flow_block_offload bo;
int err;
+ /* XXX:
+ *
+ * XDP offload could be made 'never fails', as xdp
+ * frames that don't match are simply passed up to
+ * normal nf hooks (skb sw flowtable), or to stack.
+ */
+ if (nf_flow_offload_xdp_setup(flowtable, dev, cmd))
+ return -EBUSY;
+
if (!nf_flowtable_hw_offload(flowtable))
return 0;
@@ -1200,8 +1327,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
else
err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
&extack);
- if (err < 0)
+ if (err < 0) {
+ nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
return err;
+ }
return nf_flow_table_block_setup(flowtable, &bo, cmd);
}
--
2.41.0
next reply other threads:[~2023-10-19 20:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-19 20:25 Florian Westphal [this message]
2023-10-23 10:33 ` [PATCH RFC] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2023-10-23 11:16 ` Florian Westphal
2023-11-02 8:30 ` Florian Westphal
2023-11-09 11:09 ` Florian Westphal
2023-11-02 10:49 ` Toke Høiland-Jørgensen
2023-11-02 10:54 ` Florian Westphal
2023-11-02 11:07 ` Toke Høiland-Jørgensen
2023-11-02 11:11 ` Florian Westphal
2023-11-02 11:07 ` Florian Westphal
2023-11-02 11:25 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231019202507.16439-1-fw@strlen.de \
--to=fw@strlen.de \
--cc=lorenzo@kernel.org \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).