From: Lorenzo Bianconi <lorenzo@kernel.org>
To: Florian Westphal <fw@strlen.de>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH RFC] netfilter: nf_tables: add flowtable map for xdp offload
Date: Mon, 23 Oct 2023 12:33:02 +0200 [thread overview]
Message-ID: <ZTZL3jpvunApjcTE@lore-desk> (raw)
In-Reply-To: <20231019202507.16439-1-fw@strlen.de>
[-- Attachment #1: Type: text/plain, Size: 7837 bytes --]
> This adds a small internal mapping table so that a new bpf (xdp) kfunc
> can perform lookups in a flowtable.
>
> I have no intent to push this without nft integration of the xdp program,
> this RFC is just to get comments on the general direction because there
> is a chicken/egg issue:
>
> As-is, xdp program has access to the device pointer, but no way to do a
> lookup in a flowtable -- there is no way to obtain the needed struct
> without whacky stunts.
>
> This would allow to such lookup just from device address: the bpf
> kfunc would call nf_flowtable_by_dev() internally.
>
> Limitation:
>
> A device cannot be added to multiple flowtables, the mapping needs
> to be unique.
>
> As for integration with the kernel, there are several options:
>
> 1. Auto-add to the dev-xdp table whenever HW offload is requested.
>
> 2. Add to the dev-xdp table, but only if the HW offload request fails.
> (softfallback).
>
> 3. add a dedicated 'xdp offload' flag to UAPI.
>
> 3) should not be needed, because userspace already controls this:
> to make it work userspace needs to attach the xdp program to the
> network device in the first place.
>
> My thinking is to add a xdp-offload flag to the nft grammer only.
> Its not needed on nf uapi side and it would tell nft to attach the xdp
> flowtable forward program to the devices listed in the flowtable.
>
> Also, packet flow is altered (qdiscs is bypassed), which is a strong
> argument against default-usage.
>
> The xdp prog source would be included with nftables.git and nft
> would either attach/detach them or ship an extra prog that does this (TBD).
Hi Florian,
thx for working on this, I tested this patch with the flowtable lookup kfunc
I am working on (code is available here [0]) and it works properly.
Regards,
Lorenzo
[0] https://github.com/LorenzoBianconi/bpf-next/tree/xdp-flowtable-kfunc
>
> Open questions:
>
> Do we need to support dev-in-multiple flowtables? I would like to
> avoid this, this likely means the future "xdp" flag in nftables would
> be restricted to "inet" family. Alternative would be to change the key to
> 'device address plus protocol family', the xdp prog could derive that from the
> packet data.
>
> Timeout handling. Should the XDP program even bother to refresh the
> flowtable timeout?
I was assuming the flowtable lookup kfunc can take care of it.
>
> It might make more sense to intentionally have packets
> flow through the normal path periodically so neigh entries are up to date.
>
> Also note that flow_offload_xdp struct likely needs to have a refcount
> or genmask so that it integrates with the two-phase commit protocol on
> netfilter side
> (i.e., we should allow 're-add' because its needed to make flush+reload
> work).
>
> Not SoB, too raw for my taste.
>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> net/netfilter/nf_flow_table_offload.c | 131 +++++++++++++++++++++++++-
> 1 file changed, 130 insertions(+), 1 deletion(-)
>
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index a010b25076ca..10313d296a8a 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -17,6 +17,92 @@ static struct workqueue_struct *nf_flow_offload_add_wq;
> static struct workqueue_struct *nf_flow_offload_del_wq;
> static struct workqueue_struct *nf_flow_offload_stats_wq;
>
> +struct flow_offload_xdp {
> + struct hlist_node hnode;
> +
> + unsigned long net_device_addr;
> + struct nf_flowtable *ft;
> +
> + struct rcu_head rcuhead;
> +};
> +
> +#define NF_XDP_HT_BITS 4
> +static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
> +static DEFINE_MUTEX(nf_xdp_hashtable_lock);
> +
> +/* caller must hold rcu read lock */
> +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
> +{
I think this routine needs to be added to some include file (e.g.
include/net/netfilter/nf_flow_table.h)
> + unsigned long key = (unsigned long)dev;
> + const struct flow_offload_xdp *cur;
> +
> + hash_for_each_possible_rcu(nf_xdp_hashtable, cur, hnode, key) {
> + if (key == cur->net_device_addr)
> + return cur->ft;
> + }
> +
> + return NULL;
> +}
> +
> +static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
> + const struct net_device *dev)
> +{
> + unsigned long key = (unsigned long)dev;
> + struct flow_offload_xdp *cur;
> + int err = 0;
> +
> + mutex_lock(&nf_xdp_hashtable_lock);
> + hash_for_each_possible(nf_xdp_hashtable, cur, hnode, key) {
> + if (key != cur->net_device_addr)
> + continue;
> + err = -EEXIST;
> + break;
> + }
> +
> + if (err == 0) {
> + struct flow_offload_xdp *new;
> +
> + new = kzalloc(sizeof(*new), GFP_KERNEL);
> + if (new) {
> + new->net_device_addr = key;
> + new->ft = ft;
> +
> + hash_add_rcu(nf_xdp_hashtable, &new->hnode, key);
> + } else {
> + err = -ENOMEM;
> + }
> + }
> +
> + mutex_unlock(&nf_xdp_hashtable_lock);
> +
> + DEBUG_NET_WARN_ON_ONCE(err == 0 && nf_flowtable_by_dev(dev) != ft);
> +
> + return err;
> +}
> +
> +static void nf_flowtable_by_dev_remove(const struct net_device *dev)
> +{
> + unsigned long key = (unsigned long)dev;
> + struct flow_offload_xdp *cur;
> + bool found = false;
> +
> + mutex_lock(&nf_xdp_hashtable_lock);
> +
> + hash_for_each_possible(nf_xdp_hashtable, cur, hnode, key) {
> + if (key != cur->net_device_addr)
> + continue;
> +
> + hash_del_rcu(&cur->hnode);
> + kfree_rcu(cur, rcuhead);
> + found = true;
> + break;
> + }
> +
> + mutex_unlock(&nf_xdp_hashtable_lock);
> +
> + WARN_ON_ONCE(!found);
> +}
> +
> struct flow_offload_work {
> struct list_head list;
> enum flow_cls_command cmd;
> @@ -1183,6 +1269,38 @@ static int nf_flow_table_offload_cmd(struct flow_block_offload *bo,
> return 0;
> }
>
> +static int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
> + struct net_device *dev,
> + enum flow_block_command cmd)
> +{
> + switch (cmd) {
> + case FLOW_BLOCK_BIND:
> + return nf_flowtable_by_dev_insert(flowtable, dev);
> + case FLOW_BLOCK_UNBIND:
> + nf_flowtable_by_dev_remove(dev);
> + return 0;
> + }
> +
> + WARN_ON_ONCE(1);
> + return 0;
> +}
> +
> +static void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
> + struct net_device *dev,
> + enum flow_block_command cmd)
> +{
> + switch (cmd) {
> + case FLOW_BLOCK_BIND:
> + nf_flowtable_by_dev_remove(dev);
> + return;
> + case FLOW_BLOCK_UNBIND:
> + /* We do not re-bind in case hw offload would report error
> + * on *unregister*.
> + */
> + break;
> + }
> +}
> +
> int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
> struct net_device *dev,
> enum flow_block_command cmd)
> @@ -1191,6 +1309,15 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
> struct flow_block_offload bo;
> int err;
>
> + /* XXX:
> + *
> + * XDP offload could be made 'never fails', as xdp
> + * frames that don't match are simply passed up to
> + * normal nf hooks (skb sw flowtable), or to stack.
> + */
> + if (nf_flow_offload_xdp_setup(flowtable, dev, cmd))
> + return -EBUSY;
> +
> if (!nf_flowtable_hw_offload(flowtable))
> return 0;
>
> @@ -1200,8 +1327,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
> else
> err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
> &extack);
> - if (err < 0)
> + if (err < 0) {
> + nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
> return err;
> + }
>
> return nf_flow_table_block_setup(flowtable, &bo, cmd);
> }
> --
> 2.41.0
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2023-10-23 10:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-19 20:25 [PATCH RFC] netfilter: nf_tables: add flowtable map for xdp offload Florian Westphal
2023-10-23 10:33 ` Lorenzo Bianconi [this message]
2023-10-23 11:16 ` Florian Westphal
2023-11-02 8:30 ` Florian Westphal
2023-11-09 11:09 ` Florian Westphal
2023-11-02 10:49 ` Toke Høiland-Jørgensen
2023-11-02 10:54 ` Florian Westphal
2023-11-02 11:07 ` Toke Høiland-Jørgensen
2023-11-02 11:11 ` Florian Westphal
2023-11-02 11:07 ` Florian Westphal
2023-11-02 11:25 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZTZL3jpvunApjcTE@lore-desk \
--to=lorenzo@kernel.org \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.