All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Bianconi <lorenzo@kernel.org>
To: pablo@netfilter.org
Cc: bpf@vger.kernel.org, kadlec@netfilter.org, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	martin.lau@linux.dev, eddyz87@gmail.com,
	lorenzo.bianconi@redhat.com, toke@redhat.com, fw@strlen.de,
	hawk@kernel.org, horms@kernel.org, donhunte@redhat.com,
	memxor@gmail.com
Subject: Re: [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
Date: Fri, 14 Jun 2024 00:34:21 +0200	[thread overview]
Message-ID: <Zmtz7T99mHi99kI-@lore-desk> (raw)
In-Reply-To: <1298eb8587c50a73da315516fbb1ea0305587dd5.1716987534.git.lorenzo@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 8359 bytes --]

> From: Florian Westphal <fw@strlen.de>
> 
> This adds a small internal mapping table so that a new bpf (xdp) kfunc
> can perform lookups in a flowtable.
> 
> As-is, xdp program has access to the device pointer, but no way to do a
> lookup in a flowtable -- there is no way to obtain the needed struct
> without questionable stunts.
> 
> This allows to obtain an nf_flowtable pointer given a net_device
> structure.
> 
> In order to keep backward compatibility, the infrastructure allows the
> user to add a given device to multiple flowtables, but it will always
> return the first added mapping performing the lookup since it assumes
> the right configuration is 1:1 mapping between flowtables and net_devices.

Hi Pablo,

do you have any feedback about nft part? Thanks.

Regards,
Lorenzo

> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/netfilter/nf_flow_table.h |   8 ++
>  net/netfilter/Makefile                |   2 +-
>  net/netfilter/nf_flow_table_offload.c |   6 +-
>  net/netfilter/nf_flow_table_xdp.c     | 163 ++++++++++++++++++++++++++
>  4 files changed, 176 insertions(+), 3 deletions(-)
>  create mode 100644 net/netfilter/nf_flow_table_xdp.c
> 
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 9abb7ee40d72f..688e02b287cc4 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -305,6 +305,14 @@ struct flow_ports {
>  	__be16 source, dest;
>  };
>  
> +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev);
> +int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
> +			      struct net_device *dev,
> +			      enum flow_block_command cmd);
> +void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
> +				struct net_device *dev,
> +				enum flow_block_command cmd);
> +
>  unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
>  				     const struct nf_hook_state *state);
>  unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 614815a3ed738..18046872a38aa 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -142,7 +142,7 @@ obj-$(CONFIG_NFT_FWD_NETDEV)	+= nft_fwd_netdev.o
>  # flow table infrastructure
>  obj-$(CONFIG_NF_FLOW_TABLE)	+= nf_flow_table.o
>  nf_flow_table-objs		:= nf_flow_table_core.o nf_flow_table_ip.o \
> -				   nf_flow_table_offload.o
> +				   nf_flow_table_offload.o nf_flow_table_xdp.o
>  nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
>  
>  obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index a010b25076ca0..d9b019c98694b 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -1192,7 +1192,7 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
>  	int err;
>  
>  	if (!nf_flowtable_hw_offload(flowtable))
> -		return 0;
> +		return nf_flow_offload_xdp_setup(flowtable, dev, cmd);
>  
>  	if (dev->netdev_ops->ndo_setup_tc)
>  		err = nf_flow_table_offload_cmd(&bo, flowtable, dev, cmd,
> @@ -1200,8 +1200,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
>  	else
>  		err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
>  						     &extack);
> -	if (err < 0)
> +	if (err < 0) {
> +		nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
>  		return err;
> +	}
>  
>  	return nf_flow_table_block_setup(flowtable, &bo, cmd);
>  }
> diff --git a/net/netfilter/nf_flow_table_xdp.c b/net/netfilter/nf_flow_table_xdp.c
> new file mode 100644
> index 0000000000000..b9bdf27ba9bd3
> --- /dev/null
> +++ b/net/netfilter/nf_flow_table_xdp.c
> @@ -0,0 +1,163 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netfilter.h>
> +#include <linux/rhashtable.h>
> +#include <linux/netdevice.h>
> +#include <net/flow_offload.h>
> +#include <net/netfilter/nf_flow_table.h>
> +
> +struct flow_offload_xdp_ft {
> +	struct list_head head;
> +	struct nf_flowtable *ft;
> +	struct rcu_head rcuhead;
> +};
> +
> +struct flow_offload_xdp {
> +	struct hlist_node hnode;
> +	unsigned long net_device_addr;
> +	struct list_head head;
> +};
> +
> +#define NF_XDP_HT_BITS	4
> +static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
> +static DEFINE_MUTEX(nf_xdp_hashtable_lock);
> +
> +/* caller must hold rcu read lock */
> +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
> +{
> +	unsigned long key = (unsigned long)dev;
> +	struct flow_offload_xdp *iter;
> +
> +	hash_for_each_possible_rcu(nf_xdp_hashtable, iter, hnode, key) {
> +		if (key == iter->net_device_addr) {
> +			struct flow_offload_xdp_ft *ft_elem;
> +
> +			/* The user is supposed to insert a given net_device
> +			 * just into a single nf_flowtable so we always return
> +			 * the first element here.
> +			 */
> +			ft_elem = list_first_or_null_rcu(&iter->head,
> +							 struct flow_offload_xdp_ft,
> +							 head);
> +			return ft_elem ? ft_elem->ft : NULL;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
> +				      const struct net_device *dev)
> +{
> +	struct flow_offload_xdp *iter, *elem = NULL;
> +	unsigned long key = (unsigned long)dev;
> +	struct flow_offload_xdp_ft *ft_elem;
> +
> +	ft_elem = kzalloc(sizeof(*ft_elem), GFP_KERNEL_ACCOUNT);
> +	if (!ft_elem)
> +		return -ENOMEM;
> +
> +	ft_elem->ft = ft;
> +
> +	mutex_lock(&nf_xdp_hashtable_lock);
> +
> +	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
> +		if (key == iter->net_device_addr) {
> +			elem = iter;
> +			break;
> +		}
> +	}
> +
> +	if (!elem) {
> +		elem = kzalloc(sizeof(*elem), GFP_KERNEL_ACCOUNT);
> +		if (!elem)
> +			goto err_unlock;
> +
> +		elem->net_device_addr = key;
> +		INIT_LIST_HEAD(&elem->head);
> +		hash_add_rcu(nf_xdp_hashtable, &elem->hnode, key);
> +	}
> +	list_add_tail_rcu(&ft_elem->head, &elem->head);
> +
> +	mutex_unlock(&nf_xdp_hashtable_lock);
> +
> +	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&nf_xdp_hashtable_lock);
> +	kfree(ft_elem);
> +
> +	return -ENOMEM;
> +}
> +
> +static void nf_flowtable_by_dev_remove(struct nf_flowtable *ft,
> +				       const struct net_device *dev)
> +{
> +	struct flow_offload_xdp *iter, *elem = NULL;
> +	unsigned long key = (unsigned long)dev;
> +
> +	mutex_lock(&nf_xdp_hashtable_lock);
> +
> +	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
> +		if (key == iter->net_device_addr) {
> +			elem = iter;
> +			break;
> +		}
> +	}
> +
> +	if (elem) {
> +		struct flow_offload_xdp_ft *ft_elem, *ft_next;
> +
> +		list_for_each_entry_safe(ft_elem, ft_next, &elem->head, head) {
> +			if (ft_elem->ft == ft) {
> +				list_del_rcu(&ft_elem->head);
> +				kfree_rcu(ft_elem, rcuhead);
> +			}
> +		}
> +
> +		if (list_empty(&elem->head))
> +			hash_del_rcu(&elem->hnode);
> +		else
> +			elem = NULL;
> +	}
> +
> +	mutex_unlock(&nf_xdp_hashtable_lock);
> +
> +	if (elem) {
> +		synchronize_rcu();
> +		kfree(elem);
> +	}
> +}
> +
> +int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
> +			      struct net_device *dev,
> +			      enum flow_block_command cmd)
> +{
> +	switch (cmd) {
> +	case FLOW_BLOCK_BIND:
> +		return nf_flowtable_by_dev_insert(flowtable, dev);
> +	case FLOW_BLOCK_UNBIND:
> +		nf_flowtable_by_dev_remove(flowtable, dev);
> +		return 0;
> +	}
> +
> +	WARN_ON_ONCE(1);
> +	return 0;
> +}
> +
> +void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
> +				struct net_device *dev,
> +				enum flow_block_command cmd)
> +{
> +	switch (cmd) {
> +	case FLOW_BLOCK_BIND:
> +		nf_flowtable_by_dev_remove(flowtable, dev);
> +		return;
> +	case FLOW_BLOCK_UNBIND:
> +		/* We do not re-bind in case hw offload would report error
> +		 * on *unregister*.
> +		 */
> +		break;
> +	}
> +}
> -- 
> 2.45.1
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2024-06-13 22:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2024-06-13 22:34   ` Lorenzo Bianconi [this message]
2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
2024-05-29 21:53   ` Alexei Starovoitov
2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
2024-06-13 16:06   ` Daniel Borkmann
2024-06-13 16:54     ` Daniel Xu
2024-06-13 22:11       ` Lorenzo Bianconi
2024-06-14 15:19 ` [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zmtz7T99mHi99kI-@lore-desk \
    --to=lorenzo@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=donhunte@redhat.com \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=kadlec@netfilter.org \
    --cc=kuba@kernel.org \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.