Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and attributes
From: Jakub Kicinski @ 2019-07-09  5:40 UTC (permalink / raw)
  To: Parav Pandit; +Cc: netdev, jiri, saeedm
In-Reply-To: <20190709041739.44292-1-parav@mellanox.com>

On Mon,  8 Jul 2019 23:17:34 -0500, Parav Pandit wrote:
> This patchset carry forwards the work initiated in [1] and discussion
> futher concluded at [2].
> 
> To improve visibility of representor netdevice, its association with
> PF or VF, physical port, two new devlink port flavours are added as
> PCI PF and PCI VF ports.
> 
> A sample eswitch view can be seen below, which will be futher extended to
> mdev subdevices of a PCI function in future.
> 
> Patch-1 moves physical port's attribute to new structure
> Patch-2 enhances netlink response to consider port flavour
> Patch-3,4 extends devlink port attributes and port flavour
> Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
> physical link.

The coding leaves something to be desired:

1) flavour handling in devlink_nl_port_attrs_put() really calls for a
   switch statement,
2) devlink_port_attrs_.*set() can take a pointer to flavour specific
   structure instead of attr structure for setting the parameters,
3) the "ret" variable there is unnecessary,
4) there is inconsistency in whether there is an empty line between
   if (ret) return; after __devlink_port_attrs_set() and attr setting,
5) /* Associated PCI VF for of the PCI PF for this port. */ doesn't
   read great;
6) mlx5 functions should preferably have an appropriate prefix - f.e.
   register_devlink_port() or is_devlink_port_supported().

But I'll leave it to Jiri and Dave to decide if its worth a respin :)
Functionally I think this is okay.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
From: Jakub Kicinski @ 2019-07-09  6:13 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <156261310104.31108.4569969631798277807.stgit@ubuntu3-kvm1>

On Mon, 08 Jul 2019 19:13:29 +0000, John Fastabend wrote:
> Resolve a series of splats discovered by syzbot and an unhash
> TLS issue noted by Eric Dumazet.
> 
> The main issues revolved around interaction between TLS and
> sockmap tear down. TLS and sockmap could both reset sk->prot
> ops creating a condition where a close or unhash op could be
> called forever. A rare race condition resulting from a missing
> rcu sync operation was causing a use after free. Then on the
> TLS side dropping the sock lock and re-acquiring it during the
> close op could hang. Finally, sockmap must be deployed before
> tls for current stack assumptions to be met. This is enforced
> now. A feature series can enable it.
> 
> To fix this first refactor TLS code so the lock is held for the
> entire teardown operation. Then add an unhash callback to ensure
> TLS can not transition from ESTABLISHED to LISTEN state. This
> transition is a similar bug to the one found and fixed previously
> in sockmap. Then apply three fixes to sockmap to fix up races
> on tear down around map free and close. Finally, if sockmap
> is destroyed before TLS we add a new ULP op update to inform
> the TLS stack it should not call sockmap ops. This last one
> appears to be the most commonly found issue from syzbot.

Looks like strparser is not done'd for offload?

About patch 6 - I was recently wondering about the "impossible" syzbot
report where context is not freed and my conclusion was that there
can be someone sitting at lock_sock() in tcp_close() already by the
time we start installing the ULP, so TLS's close will never get called.
The entire replacing of callbacks business is really shaky :(

Perhaps I'm rumbling, I will take a close look after I get some sleep :)

^ permalink raw reply

* Re: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and attributes
From: Jiri Pirko @ 2019-07-09  6:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Parav Pandit, netdev, jiri, saeedm
In-Reply-To: <20190708224012.0280846c@cakuba.netronome.com>

Tue, Jul 09, 2019 at 07:40:12AM CEST, jakub.kicinski@netronome.com wrote:
>On Mon,  8 Jul 2019 23:17:34 -0500, Parav Pandit wrote:
>> This patchset carry forwards the work initiated in [1] and discussion
>> futher concluded at [2].
>> 
>> To improve visibility of representor netdevice, its association with
>> PF or VF, physical port, two new devlink port flavours are added as
>> PCI PF and PCI VF ports.
>> 
>> A sample eswitch view can be seen below, which will be futher extended to
>> mdev subdevices of a PCI function in future.
>> 
>> Patch-1 moves physical port's attribute to new structure
>> Patch-2 enhances netlink response to consider port flavour
>> Patch-3,4 extends devlink port attributes and port flavour
>> Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
>> physical link.
>
>The coding leaves something to be desired:
>
>1) flavour handling in devlink_nl_port_attrs_put() really calls for a
>   switch statement,
>2) devlink_port_attrs_.*set() can take a pointer to flavour specific
>   structure instead of attr structure for setting the parameters,
>3) the "ret" variable there is unnecessary,
>4) there is inconsistency in whether there is an empty line between
>   if (ret) return; after __devlink_port_attrs_set() and attr setting,
>5) /* Associated PCI VF for of the PCI PF for this port. */ doesn't
>   read great;
>6) mlx5 functions should preferably have an appropriate prefix - f.e.
>   register_devlink_port() or is_devlink_port_supported().
>
>But I'll leave it to Jiri and Dave to decide if its worth a respin :)
>Functionally I think this is okay.
>

I'm happy with the set as it is right now. Anyway, if you want your
concerns to be addresses, you should write them to the appropriate code.
This list is hard to follow.


>Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* RE: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and attributes
From: Parav Pandit @ 2019-07-09  6:20 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev@vger.kernel.org, Jiri Pirko, Saeed Mahameed
In-Reply-To: <20190708224012.0280846c@cakuba.netronome.com>

Hi Jakub,

> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Tuesday, July 9, 2019 11:10 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: netdev@vger.kernel.org; Jiri Pirko <jiri@mellanox.com>; Saeed Mahameed
> <saeedm@mellanox.com>
> Subject: Re: [PATCH net-next v6 0/5] devlink: Introduce PCI PF, VF ports and
> attributes
> 
> On Mon,  8 Jul 2019 23:17:34 -0500, Parav Pandit wrote:
> > This patchset carry forwards the work initiated in [1] and discussion
> > futher concluded at [2].
> >
> > To improve visibility of representor netdevice, its association with
> > PF or VF, physical port, two new devlink port flavours are added as
> > PCI PF and PCI VF ports.
> >
> > A sample eswitch view can be seen below, which will be futher extended
> > to mdev subdevices of a PCI function in future.
> >
> > Patch-1 moves physical port's attribute to new structure
> > Patch-2 enhances netlink response to consider port flavour
> > Patch-3,4 extends devlink port attributes and port flavour
> > Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
> > physical link.
> 
> The coding leaves something to be desired:
> 
> 1) flavour handling in devlink_nl_port_attrs_put() really calls for a
>    switch statement,
> 2) devlink_port_attrs_.*set() can take a pointer to flavour specific
>    structure instead of attr structure for setting the parameters,
> 3) the "ret" variable there is unnecessary,
> 4) there is inconsistency in whether there is an empty line between
>    if (ret) return; after __devlink_port_attrs_set() and attr setting,
> 5) /* Associated PCI VF for of the PCI PF for this port. */ doesn't
>    read great;
> 6) mlx5 functions should preferably have an appropriate prefix - f.e.
>    register_devlink_port() or is_devlink_port_supported().
> 
Those two static helper functions doesn't need mlx5_ prefix.
ndo ops are prefixed appropriately.

> But I'll leave it to Jiri and Dave to decide if its worth a respin :) Functionally I
> think this is okay.
> 
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH net-next,v3 11/11] netfilter: nf_tables: add hardware offload support
From: Jiri Pirko @ 2019-07-09  6:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pablo Neira Ayuso, netdev, davem, thomas.lendacky, f.fainelli,
	ariel.elior, michael.chan, madalin.bucur, yisen.zhuang,
	salil.mehta, jeffrey.t.kirsher, tariqt, saeedm, jiri, idosch,
	peppe.cavallaro, grygorii.strashko, andrew, vivien.didelot,
	alexandre.torgue, joabreu, linux-net-drivers, ogerlitz,
	Manish.Chopra, marcelo.leitner, mkubecek, venkatkumar.duvvuru,
	maxime.chevallier, cphealy, netfilter-devel
In-Reply-To: <20190708184437.4d29648a@cakuba.netronome.com>

Tue, Jul 09, 2019 at 03:44:37AM CEST, jakub.kicinski@netronome.com wrote:
>On Mon,  8 Jul 2019 18:06:13 +0200, Pablo Neira Ayuso wrote:
>> This patch adds hardware offload support for nftables through the
>> existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER
>> classifier and the flow rule API. This hardware offload support is
>> available for the NFPROTO_NETDEV family and the ingress hook.
>> 
>> Each nftables expression has a new ->offload interface, that is used to
>> populate the flow rule object that is attached to the transaction
>> object.
>> 
>> There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload
>> an entire table, including all of its chains.
>> 
>> This patch supports for basic metadata (layer 3 and 4 protocol numbers),
>> 5-tuple payload matching and the accept/drop actions; this also includes
>> basechain hardware offload only.
>> 
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>
>Any particular reason to not fence this off with a device feature
>(ethtool -k)?  Then you wouldn't need that per-driver list abomination
>until drivers start advertising it..  IDK if we want the per-device
>offload enable flags or not in general, it seems like a good idea in
>general for admin to be able to disable offload per device 🤷
>
>> +static int nft_flow_offload_rule(struct nft_trans *trans,
>> +				 enum tc_fl_command command)
>> +{
>> +	struct nft_flow_rule *flow = nft_trans_flow_rule(trans);
>> +	struct nft_rule *rule = nft_trans_rule(trans);
>> +	struct tc_cls_flower_offload cls_flower = {};
>> +	struct nft_base_chain *basechain;
>> +	struct netlink_ext_ack extack;
>> +	__be16 proto = ETH_P_ALL;
>> +
>> +	if (!nft_is_base_chain(trans->ctx.chain))
>> +		return -EOPNOTSUPP;
>> +
>> +	basechain = nft_base_chain(trans->ctx.chain);
>> +
>> +	if (flow)
>> +		proto = flow->proto;
>> +
>> +	nft_flow_offload_common_init(&cls_flower.common, proto, &extack);
>> +	cls_flower.command = command;
>> +	cls_flower.cookie = (unsigned long) rule;
>> +	if (flow)
>> +		cls_flower.rule = flow->rule;
>> +
>> +	return nft_setup_cb_call(basechain, TC_SETUP_CLSFLOWER, &cls_flower);
>> +}
>
>Are we 100% okay with using TC cls_flower structures and defines in nft
>code?

Yeah, your right. Should be renamed and moved to "flow offload" as well.

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] bnxt_en: add page_pool support
From: Ilias Apalodimas @ 2019-07-09  6:27 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, gospo, netdev, hawk, ast
In-Reply-To: <1562622784-29918-5-git-send-email-michael.chan@broadcom.com>

Hi Andy, Michael,

On Mon, Jul 08, 2019 at 05:53:04PM -0400, Michael Chan wrote:
> From: Andy Gospodarek <gospo@broadcom.com>
> 
> This removes contention over page allocation for XDP_REDIRECT actions by
> adding page_pool support per queue for the driver.  The performance for
> XDP_REDIRECT actions scales linearly with the number of cores performing
> redirect actions when using the page pools instead of the standard page
> allocator.
> 
> v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.
> 
> Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/Kconfig         |  1 +
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 47 +++++++++++++++++++++++----
>  drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  3 ++
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  3 +-
>  4 files changed, 47 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
> index 2e4a8c7..e9017ca 100644
> --- a/drivers/net/ethernet/broadcom/Kconfig
> +++ b/drivers/net/ethernet/broadcom/Kconfig
> @@ -199,6 +199,7 @@ config BNXT
>  	select FW_LOADER
>  	select LIBCRC32C
>  	select NET_DEVLINK
> +	select PAGE_POOL
>  	---help---
>  	  This driver supports Broadcom NetXtreme-C/E 10/25/40/50 gigabit
>  	  Ethernet cards.  To compile this driver as a module, choose M here:
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index d8f0846..d25bb38 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -54,6 +54,7 @@
>  #include <net/pkt_cls.h>
>  #include <linux/hwmon.h>
>  #include <linux/hwmon-sysfs.h>
> +#include <net/page_pool.h>
>  
>  #include "bnxt_hsi.h"
>  #include "bnxt.h"
> @@ -668,19 +669,20 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
>  }
>  
>  static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
> +					 struct bnxt_rx_ring_info *rxr,
>  					 gfp_t gfp)
>  {
>  	struct device *dev = &bp->pdev->dev;
>  	struct page *page;
>  
> -	page = alloc_page(gfp);
> +	page = page_pool_dev_alloc_pages(rxr->page_pool);
>  	if (!page)
>  		return NULL;
>  
>  	*mapping = dma_map_page_attrs(dev, page, 0, PAGE_SIZE, bp->rx_dir,
>  				      DMA_ATTR_WEAK_ORDERING);
>  	if (dma_mapping_error(dev, *mapping)) {
> -		__free_page(page);
> +		page_pool_recycle_direct(rxr->page_pool, page);
>  		return NULL;
>  	}
>  	*mapping += bp->rx_dma_offset;
> @@ -716,7 +718,8 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
>  	dma_addr_t mapping;
>  
>  	if (BNXT_RX_PAGE_MODE(bp)) {
> -		struct page *page = __bnxt_alloc_rx_page(bp, &mapping, gfp);
> +		struct page *page =
> +			__bnxt_alloc_rx_page(bp, &mapping, rxr, gfp);
>  
>  		if (!page)
>  			return -ENOMEM;
> @@ -2360,7 +2363,7 @@ static void bnxt_free_rx_skbs(struct bnxt *bp)
>  				dma_unmap_page_attrs(&pdev->dev, mapping,
>  						     PAGE_SIZE, bp->rx_dir,
>  						     DMA_ATTR_WEAK_ORDERING);
> -				__free_page(data);
> +				page_pool_recycle_direct(rxr->page_pool, data);
>  			} else {
>  				dma_unmap_single_attrs(&pdev->dev, mapping,
>  						       bp->rx_buf_use_size,
> @@ -2497,6 +2500,8 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
>  		if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
>  
> +		rxr->page_pool = NULL;
> +
>  		kfree(rxr->rx_tpa);
>  		rxr->rx_tpa = NULL;
>  
> @@ -2511,6 +2516,26 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
>  	}
>  }
>  
> +static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> +				   struct bnxt_rx_ring_info *rxr)
> +{
> +	struct page_pool_params pp = { 0 };
> +
> +	pp.pool_size = bp->rx_ring_size;
> +	pp.nid = dev_to_node(&bp->pdev->dev);
> +	pp.dev = &bp->pdev->dev;
> +	pp.dma_dir = DMA_BIDIRECTIONAL;
> +
> +	rxr->page_pool = page_pool_create(&pp);
> +	if (IS_ERR(rxr->page_pool)) {
> +		int err = PTR_ERR(rxr->page_pool);
> +
> +		rxr->page_pool = NULL;
> +		return err;
> +	}
> +	return 0;
> +}
> +
>  static int bnxt_alloc_rx_rings(struct bnxt *bp)
>  {
>  	int i, rc, agg_rings = 0, tpa_rings = 0;
> @@ -2530,14 +2555,24 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
>  
>  		ring = &rxr->rx_ring_struct;
>  
> +		rc = bnxt_alloc_rx_page_pool(bp, rxr);
> +		if (rc)
> +			return rc;
> +
>  		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
> -		if (rc < 0)
> +		if (rc < 0) {
> +			page_pool_free(rxr->page_pool);

commit 1da4bbeffe41ba318812d7590955faee8636668b has been merged. Can you please
change this to page_pool_destroy(). Please note that this jhas to change for the
'normal' shutdown path as well (calling page_pool_destroy after
xdp_rxq_info_unreg)

> +			rxr->page_pool = NULL;
>  			return rc;
> +		}
>  
>  		rc = xdp_rxq_info_reg_mem_model(&rxr->xdp_rxq,
> -						MEM_TYPE_PAGE_SHARED, NULL);
> +						MEM_TYPE_PAGE_POOL,
> +						rxr->page_pool);
>  		if (rc) {
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
> +			page_pool_free(rxr->page_pool);

ditto

> +			rxr->page_pool = NULL;
>  			return rc;
>  		}
>  
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index 8ac51fa..16694b7 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -26,6 +26,8 @@
>  #include <net/xdp.h>
>  #include <linux/dim.h>
>  
> +struct page_pool;
> +
>  struct tx_bd {
>  	__le32 tx_bd_len_flags_type;
>  	#define TX_BD_TYPE					(0x3f << 0)
> @@ -799,6 +801,7 @@ struct bnxt_rx_ring_info {
>  	struct bnxt_ring_struct	rx_ring_struct;
>  	struct bnxt_ring_struct	rx_agg_ring_struct;
>  	struct xdp_rxq_info	xdp_rxq;
> +	struct page_pool	*page_pool;
>  };
>  
>  struct bnxt_cp_ring_info {
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index 12489d2..c6f6f20 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -15,6 +15,7 @@
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
>  #include <linux/filter.h>
> +#include <net/page_pool.h>
>  #include "bnxt_hsi.h"
>  #include "bnxt.h"
>  #include "bnxt_xdp.h"
> @@ -191,7 +192,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
>  
>  		if (xdp_do_redirect(bp->dev, &xdp, xdp_prog)) {
>  			trace_xdp_exception(bp->dev, xdp_prog, act);
> -			__free_page(page);
> +			page_pool_recycle_direct(rxr->page_pool, page);
>  			return true;
>  		}
>  
> -- 
> 2.5.1
> 

Thanks and sorry for the inconvenience :(
/Ilias

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] bnxt_en: Add XDP_REDIRECT support.
From: Ilias Apalodimas @ 2019-07-09  6:30 UTC (permalink / raw)
  To: David Miller; +Cc: michael.chan, gospo, netdev, hawk, ast
In-Reply-To: <20190708.152020.327516269485719584.davem@davemloft.net>

Hi David, 

On Mon, Jul 08, 2019 at 03:20:20PM -0700, David Miller wrote:
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Mon,  8 Jul 2019 17:53:00 -0400
> 
> > This patch series adds XDP_REDIRECT support by Andy Gospodarek.
> 
> Series applied, thanks everyone.

We need a fix on this after merging Ivans patch
commit 1da4bbeffe41ba318812d7590955faee8636668b

page_pool_destroy needs to be explicitely called when shutting down the
interface as it's not automatically called from xdp_rxq_info_unreg()

Thanks
/Ilias

^ permalink raw reply

* Re: [RFC v2] vhost: introduce mdev based hardware vhost backend
From: Tiwei Bie @ 2019-07-09  6:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Alex Williamson, mst, maxime.coquelin, linux-kernel, kvm,
	virtualization, netdev, dan.daly, cunming.liang, zhihong.wang,
	idos, Rob Miller, Ariel Adam
In-Reply-To: <deae5ede-57e9-41e6-ea42-d84e07ca480a@redhat.com>

On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:
> On 2019/7/8 下午2:16, Tiwei Bie wrote:
> > On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> > > On Thu, 4 Jul 2019 14:21:34 +0800
> > > Tiwei Bie <tiwei.bie@intel.com> wrote:
> > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > Details about this can be found here:
> > > > > > > > > > 
> > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > 
> > > > > > > > > > What's new in this version
> > > > > > > > > > ==========================
> > > > > > > > > > 
> > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > > > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > 
> > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > 
> > > > > > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > > > can be used to notify the device.
> > > > > > > > > > 
> > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > 
> > > > > > > > > > The region described by CONFIG_REGION is the main control interface.
> > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > 
> > > > > > > > > > The message type is determined by the `request` field in message
> > > > > > > > > > header. The message size is encoded in the message header too.
> > > > > > > > > > The message format looks like this:
> > > > > > > > > > 
> > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > 	__u64 request;
> > > > > > > > > > 	__u32 flags;
> > > > > > > > > > 	/* Flag values: */
> > > > > > > > > >      #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > > > > > 	__u32 size;
> > > > > > > > > > 	union {
> > > > > > > > > > 		__u64 u64;
> > > > > > > > > > 		struct vhost_vring_state state;
> > > > > > > > > > 		struct vhost_vring_addr addr;
> > > > > > > > > > 	} payload;
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > > > > requests in above structure.
> > > > > > > > > Still a comments like V1. What's the advantage of inventing a new protocol?
> > > > > > > > I'm trying to make it work in VFIO's way..
> > > > > > > > > I believe either of the following should be better:
> > > > > > > > > 
> > > > > > > > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > > > extend it with e.g notify region. The advantages is that all exist userspace
> > > > > > > > > program could be reused without modification (or minimal modification). And
> > > > > > > > > vhost API hides lots of details that is not necessary to be understood by
> > > > > > > > > application (e.g in the case of container).
> > > > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > > > using the existing vfio_mdev) for mdev device?
> > > > > > > Can we simply add them into ioctl of mdev_parent_ops?
> > > > > > Right, either way, these ioctls have to be and just need to be
> > > > > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > > > > also need to consider is that which file descriptor the userspace
> > > > > > will do the ioctl() on. So I'm wondering do you mean let the
> > > > > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > > > > device?
> > > > > Yes.
> > > > Got it! I'm not sure what's Alex opinion on this. If we all
> > > > agree with this, I can do it in this way.
> > > > 
> > > > > Is there any other way btw?
> > > > Just a quick thought.. Maybe totally a bad idea. I was thinking
> > > > whether it would be odd to do non-VFIO's ioctls on VFIO's device
> > > > fd. So I was wondering whether it's possible to allow binding
> > > > another mdev driver (e.g. vhost_mdev) to the supported mdev
> > > > devices. The new mdev driver, vhost_mdev, can provide similar
> > > > ways to let userspace open the mdev device and do the vhost ioctls
> > > > on it. To distinguish with the vfio_mdev compatible mdev devices,
> > > > the device API of the new vhost_mdev compatible mdev devices
> > > > might be e.g. "vhost-net" for net?
> > > > 
> > > > So in VFIO case, the device will be for passthru directly. And
> > > > in VHOST case, the device can be used to accelerate the existing
> > > > virtualized devices.
> > > > 
> > > > How do you think?
> > > VFIO really can't prevent vendor specific ioctls on the device file
> > > descriptor for mdevs, but a) we'd want to be sure the ioctl address
> > > space can't collide with ioctls we'd use for vfio defined purposes and
> > > b) maybe the VFIO user API isn't what you want in the first place if
> > > you intend to mostly/entirely ignore the defined ioctl set and replace
> > > them with your own.  In the case of the latter, you're also not getting
> > > the advantages of the existing VFIO userspace code, so why expose a
> > > VFIO device at all.
> > Yeah, I totally agree.
> 
> 
> I guess the original idea is to reuse the VFIO DMA/IOMMU API for this. Then
> we have the chance to reuse vfio codes in qemu for dealing with e.g vIOMMU.

Yeah, you are right. We have several choices here:

#1. We expose a VFIO device, so we can reuse the VFIO container/group
    based DMA API and potentially reuse a lot of VFIO code in QEMU.

    But in this case, we have two choices for the VFIO device interface
    (i.e. the interface on top of VFIO device fd):

    A) we may invent a new vhost protocol (as demonstrated by the code
       in this RFC) on VFIO device fd to make it work in VFIO's way,
       i.e. regions and irqs.

    B) Or as you proposed, instead of inventing a new vhost protocol,
       we can reuse most existing vhost ioctls on the VFIO device fd
       directly. There should be no conflicts between the VFIO ioctls
       (type is 0x3B) and VHOST ioctls (type is 0xAF) currently.

#2. Instead of exposing a VFIO device, we may expose a VHOST device.
    And we will introduce a new mdev driver vhost-mdev to do this.
    It would be natural to reuse the existing kernel vhost interface
    (ioctls) on it as much as possible. But we will need to invent
    some APIs for DMA programming (reusing VHOST_SET_MEM_TABLE is a
    choice, but it's too heavy and doesn't support vIOMMU by itself).

I'm not sure which one is the best choice we all want..
Which one (#1/A, #1/B, or #2) would you prefer?

> 
> 
> > 
> > > The mdev interface does provide a general interface for creating and
> > > managing virtual devices, vfio-mdev is just one driver on the mdev
> > > bus.  Parav (Mellanox) has been doing work on mdev-core to help clean
> > > out vfio-isms from the interface, aiui, with the intent of implementing
> > > another mdev bus driver for using the devices within the kernel.
> > Great to know this! I found below series after some searching:
> > 
> > https://lkml.org/lkml/2019/3/8/821
> > 
> > In above series, the new mlx5_core mdev driver will do the probe
> > by calling mlx5_get_core_dev() first on the parent device of the
> > mdev device. In vhost_mdev, maybe we can also keep track of all
> > the compatible mdev devices and use this info to do the probe.
> 
> 
> I don't get why this is needed. My understanding is if we want to go this
> way, there're actually two parts. 1) Vhost mdev that implements the device
> managements and vhost ioctl. 2) Vhost it self, which can accept mdev fd as
> it backend through VHOST_NET_SET_BACKEND.

I think with vhost-mdev (or with vfio-mdev if we agree to do vhost
ioctls on vfio device fd directly), we don't need to open /dev/vhost-net
(and there is no VHOST_NET_SET_BACKEND needed) at all. Either way,
after getting the fd of the mdev, we just need to do vhost ioctls
on it directly.

> 
> 
> > But we also need a way to allow vfio_mdev driver to distinguish
> > and reject the incompatible mdev devices.
> 
> 
> One issue for this series is that it doesn't consider DMA isolation at all.
> 
> 
> > 
> > > It
> > > seems like this vhost-mdev driver might be similar, using mdev but not
> > > necessarily vfio-mdev to expose devices.  Thanks,
> > Yeah, I also think so!
> 
> 
> I've cced some driver developers for their inputs. I think we need a sample
> parent drivers in the next version for us to understand the full picture.
> 
> 
> Thanks
> 
> 
> > 
> > Thanks!
> > Tiwei
> > 
> > > Alex

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Leon Romanovsky @ 2019-07-09  6:43 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Doug Ledford, Jason Gunthorpe, David Miller, Networking,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Bernard Metzler
In-Reply-To: <20190709135636.4d36e19f@canb.auug.org.au>

On Tue, Jul 09, 2019 at 01:56:36PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> After merging the net-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/infiniband/sw/siw/siw_cm.c: In function 'siw_create_listen':
> drivers/infiniband/sw/siw/siw_cm.c:1978:3: error: implicit declaration of function 'for_ifa'; did you mean 'fork_idle'? [-Werror=implicit-function-declaration]
>    for_ifa(in_dev)
>    ^~~~~~~
>    fork_idle
> drivers/infiniband/sw/siw/siw_cm.c:1978:18: error: expected ';' before '{' token
>    for_ifa(in_dev)
>                   ^
>                   ;
>    {
>    ~
>
> Caused by commit
>
>   6c52fdc244b5 ("rdma/siw: connection management")
>
> from the rdma tree.  I don't know why this didn't fail after I mereged
> that tree.

I had the same question, because I have this fix for a couple of days already.

From 56c9e15ec670af580daa8c3ffde9503af3042d67 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro@mellanox.com>
Date: Sun, 7 Jul 2019 10:43:42 +0300
Subject: [PATCH] Fixup to build SIW issue

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/sw/siw/siw_cm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 8e618cb7261f..c883bf514341 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -1954,6 +1954,7 @@ static void siw_drop_listeners(struct iw_cm_id *id)
 int siw_create_listen(struct iw_cm_id *id, int backlog)
 {
 	struct net_device *dev = to_siw_dev(id->device)->netdev;
+	const struct in_ifaddr *ifa;
 	int rv = 0, listeners = 0;

 	siw_dbg(id->device, "id 0x%p: backlog %d\n", id, backlog);
@@ -1975,8 +1976,7 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
 			id, &s_laddr.sin_addr, ntohs(s_laddr.sin_port),
 			&s_raddr->sin_addr, ntohs(s_raddr->sin_port));

-		for_ifa(in_dev)
-		{
+		in_dev_for_each_ifa_rcu(ifa, in_dev) {
 			if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||
 			    s_laddr.sin_addr.s_addr == ifa->ifa_address) {
 				s_laddr.sin_addr.s_addr = ifa->ifa_address;
@@ -1988,7 +1988,6 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
 					listeners++;
 			}
 		}
-		endfor_ifa(in_dev);
 		in_dev_put(in_dev);
 	} else if (id->local_addr.ss_family == AF_INET6) {
 		struct inet6_dev *in6_dev = in6_dev_get(dev);
--
2.21.0


>
> I have marked that driver as depending on BROKEN for today.
>
> --
> Cheers,
> Stephen Rothwell



^ permalink raw reply related

* Re: [PATCH net-next 11/16] qlge: Remove qlge_bq.len & size
From: Benjamin Poirier @ 2019-07-09  6:52 UTC (permalink / raw)
  To: Manish Chopra; +Cc: GR-Linux-NIC-Dev, netdev@vger.kernel.org
In-Reply-To: <DM6PR18MB2697E84A8DD54483832AD202ABFD0@DM6PR18MB2697.namprd18.prod.outlook.com>

On 2019/06/27 10:47, Manish Chopra wrote:
> > 
> > -	for (i = 0; i < qdev->rx_ring_count; i++) {
> > +	for (i = 0; i < qdev->rss_ring_count; i++) {
> >  		struct rx_ring *rx_ring = &qdev->rx_ring[i];
> > 
> > -		if (rx_ring->lbq.queue)
> > -			ql_free_lbq_buffers(qdev, rx_ring);
> > -		if (rx_ring->sbq.queue)
> > -			ql_free_sbq_buffers(qdev, rx_ring);
> > +		ql_free_lbq_buffers(qdev, rx_ring);
> > +		ql_free_sbq_buffers(qdev, rx_ring);
> >  	}
> >  }
> > 
> 
> Seems irrelevant change as per what this patch is supposed to do exactly.

Sure, I've removed that hunk.

^ permalink raw reply

* Re: [PATCH v3 net-next 19/19] ionic: Add basic devlink interface
From: Jiri Pirko @ 2019-07-09  6:56 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev
In-Reply-To: <6f9ebbca-4f13-b046-477c-678489e6ffbf@pensando.io>

Tue, Jul 09, 2019 at 12:58:00AM CEST, snelson@pensando.io wrote:
>On 7/8/19 1:03 PM, Jiri Pirko wrote:
>> Mon, Jul 08, 2019 at 09:58:09PM CEST, snelson@pensando.io wrote:
>> > On 7/8/19 12:34 PM, Jiri Pirko wrote:
>> > > Mon, Jul 08, 2019 at 09:25:32PM CEST, snelson@pensando.io wrote:
>> > > > 
>> > > > +
>> > > > +static const struct devlink_ops ionic_dl_ops = {
>> > > > +	.info_get	= ionic_dl_info_get,
>> > > > +};
>> > > > +
>> > > > +int ionic_devlink_register(struct ionic *ionic)
>> > > > +{
>> > > > +	struct devlink *dl;
>> > > > +	struct ionic **ip;
>> > > > +	int err;
>> > > > +
>> > > > +	dl = devlink_alloc(&ionic_dl_ops, sizeof(struct ionic *));
>> > > Oups. Something is wrong with your flow. The devlink alloc is allocating
>> > > the structure that holds private data (per-device data) for you. This is
>> > > misuse :/
>> > > 
>> > > You are missing one parent device struct apparently.
>> > > 
>> > > Oh, I think I see something like it. The unused "struct ionic_devlink".
>> > If I'm not mistaken, the alloc is only allocating enough for a pointer, not
>> > the whole per device struct, and a few lines down from here the pointer to
>> > the new devlink struct is assigned to ionic->dl.  This was based on what I
>> > found in the qed driver's qed_devlink_register(), and it all seems to work.
>> I'm not saying your code won't work. What I say is that you should have
>> a struct for device that would be allocated by devlink_alloc()
>
>Is there a particular reason why?  I appreciate that devlink_alloc() can give
>you this device specific space, just as alloc_etherdev_mq() can, but is there

Yes. Devlink manipulates with the whole device. However,
alloc_etherdev_mq() allocates only net_device. These are 2 different
things. devlink port relates 1:1 to net_device. However, devlink
instance can have multiple ports. What I say is do it correctly.


>a specific reason why this should be used instead of setting up simply a
>pointer to a space that has already been allocated?  There are several
>drivers that are using it the way I've setup here, which happened to be the
>first examples I followed - are they doing something different that makes
>this valid for them?

Nope. I'll look at that and fix.


>
>> 
>> The ionic struct should be associated with devlink_port. That you are
>> missing too.
>
>We don't support any of devlink_port features at this point, just the simple
>device information.

No problem, you can still register devlink_port. You don't have to do
much in order to do so.


>
>sln
>
>> 
>> 
>> > That unused struct ionic_devlink does need to go away, it was superfluous
>> > after working out a better typecast off of devlink_priv().
>> > 
>> > I'll remove the unused struct ionic_devlink, but I think the rest is okay.
>> > 
>> > sln
>> > 
>> > > 
>> > > > +	if (!dl) {
>> > > > +		dev_warn(ionic->dev, "devlink_alloc failed");
>> > > > +		return -ENOMEM;
>> > > > +	}
>> > > > +
>> > > > +	ip = (struct ionic **)devlink_priv(dl);
>> > > > +	*ip = ionic;
>> > > > +	ionic->dl = dl;
>> > > > +
>> > > > +	err = devlink_register(dl, ionic->dev);
>> > > > +	if (err) {
>> > > > +		dev_warn(ionic->dev, "devlink_register failed: %d\n", err);
>> > > > +		goto err_dl_free;
>> > > > +	}
>> > > > +
>> > > > +	return 0;
>> > > > +
>> > > > +err_dl_free:
>> > > > +	ionic->dl = NULL;
>> > > > +	devlink_free(dl);
>> > > > +	return err;
>> > > > +}
>> > > > +
>> > > > +void ionic_devlink_unregister(struct ionic *ionic)
>> > > > +{
>> > > > +	if (!ionic->dl)
>> > > > +		return;
>> > > > +
>> > > > +	devlink_unregister(ionic->dl);
>> > > > +	devlink_free(ionic->dl);
>> > > > +}
>> > > > diff --git a/drivers/net/ethernet/pensando/ionic/ionic_devlink.h b/drivers/net/ethernet/pensando/ionic/ionic_devlink.h
>> > > > new file mode 100644
>> > > > index 000000000000..35528884e29f
>> > > > --- /dev/null
>> > > > +++ b/drivers/net/ethernet/pensando/ionic/ionic_devlink.h
>> > > > @@ -0,0 +1,12 @@
>> > > > +/* SPDX-License-Identifier: GPL-2.0 */
>> > > > +/* Copyright(c) 2017 - 2019 Pensando Systems, Inc */
>> > > > +
>> > > > +#ifndef _IONIC_DEVLINK_H_
>> > > > +#define _IONIC_DEVLINK_H_
>> > > > +
>> > > > +#include <net/devlink.h>
>> > > > +
>> > > > +int ionic_devlink_register(struct ionic *ionic);
>> > > > +void ionic_devlink_unregister(struct ionic *ionic);
>> > > > +
>> > > > +#endif /* _IONIC_DEVLINK_H_ */
>> > > > -- 
>> > > > 2.17.1
>> > > > 
>

^ permalink raw reply

* Re: [PATCH net-next v5 1/4] net/sched: Introduce action ct
From: Paul Blakey @ 2019-07-09  7:00 UTC (permalink / raw)
  To: Florian Westphal, Marcelo Ricardo Leitner
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	netdev@vger.kernel.org, David Miller, Aaron Conole, Zhike Wang,
	Rony Efraim, nst-kernel@redhat.com, John Hurley, Simon Horman,
	Justin Pettit
In-Reply-To: <20190708152805.dul3kgu4csr64fqk@breakpoint.cc>


On 7/8/2019 6:28 PM, Florian Westphal wrote:
> Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
>>> +	} else { /* NFPROTO_IPV6 */
>>> +		enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
>>> +
>>> +		memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
>>> +		err = nf_ct_frag6_gather(net, skb, user);
>> This doesn't build without IPv6 enabled.
>> ERROR: "nf_ct_frag6_gather" [net/sched/act_ct.ko] undefined!
>>
>> We need to (copy and pasted):
>>
>> @@ -179,7 +179,9 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
>>                  local_bh_enable();
>>                  if (err && err != -EINPROGRESS)
>>                          goto out_free;
>> -       } else { /* NFPROTO_IPV6 */
>> +       }
>> +#if IS_ENABLED(IPV6)
>> +       else { /* NFPROTO_IPV6 */
>>                  enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
> Good catch, but it should be
> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> just like ovs conntrack.c ,



Thanks guys.


^ permalink raw reply

* Re: [PATCH net-next iproute2 2/3] tc: Introduce tc ct action
From: Paul Blakey @ 2019-07-09  6:58 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	netdev@vger.kernel.org, David Miller, Aaron Conole, Zhike Wang,
	Justin Pettit, John Hurley, Rony Efraim, nst-kernel@redhat.com,
	Simon Horman
In-Reply-To: <20190708175446.GL3449@localhost.localdomain>


On 7/8/2019 8:54 PM, Marcelo Ricardo Leitner wrote:
> On Sun, Jul 07, 2019 at 11:53:47AM +0300, Paul Blakey wrote:
>> New tc action to send packets to conntrack module, commit
>> them, and set a zone, labels, mark, and nat on the connection.
>>
>> It can also clear the packet's conntrack state by using clear.
>>
>> Usage:
>>     ct clear
>>     ct commit [force] [zone] [mark] [label] [nat]
> Isn't the 'commit' also optional? More like
>      ct [commit [force]] [zone] [mark] [label] [nat]
>
>>     ct [nat] [zone]
>>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> Acked-by: Roi Dayan <roid@mellanox.com>
>> ---
> ...
>> +static void
>> +usage(void)
>> +{
>> +	fprintf(stderr,
>> +		"Usage: ct clear\n"
>> +		"	ct commit [force] [zone ZONE] [mark MASKED_MARK] [label MASKED_LABEL] [nat NAT_SPEC]\n"
> Ditto here then.


In commit msg and here, it means there is multiple modes of operation. I 
think it's easier to split those.

"ct clear" to clear it , not other options can be added here.

"ct commit  [force].... " sends to conntrack and commit a connection, 
and only for commit can you specify force mark  label, and nat with 
nat_spec....

and the last one, "ct [nat] [zone ZONE]" is to just send the packet to 
conntrack on some zone [optional], restore nat [optional].


>
>> +		"	ct [nat] [zone ZONE]\n"
>> +		"Where: ZONE is the conntrack zone table number\n"
>> +		"	NAT_SPEC is {src|dst} addr addr1[-addr2] [port port1[-port2]]\n"
>> +		"\n");
>> +	exit(-1);
>> +}
> ...
>
> The validation below doesn't enforce that commit must be there for
> such case.
which case? commit is optional. the above are the three valid patterns.
>> +static int
>> +parse_ct(struct action_util *a, int *argc_p, char ***argv_p, int tca_id,
>> +		struct nlmsghdr *n)
>> +{
>> +	struct tc_ct sel = {};
>> +	char **argv = *argv_p;
>> +	struct rtattr *tail;
>> +	int argc = *argc_p;
>> +	int ct_action = 0;
>> +	int ret;
>> +
>> +	tail = addattr_nest(n, MAX_MSG, tca_id);
>> +
>> +	if (argc && matches(*argv, "ct") == 0)
>> +		NEXT_ARG_FWD();
>> +
>> +	while (argc > 0) {
>> +		if (matches(*argv, "zone") == 0) {
>> +			NEXT_ARG();
>> +
>> +			if (ct_parse_u16(*argv,
>> +					 TCA_CT_ZONE, TCA_CT_UNSPEC, n)) {
>> +				fprintf(stderr, "ct: Illegal \"zone\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "nat") == 0) {
>> +			ct_action |= TCA_CT_ACT_NAT;
>> +
>> +			NEXT_ARG();
>> +			if (matches(*argv, "src") == 0)
>> +				ct_action |= TCA_CT_ACT_NAT_SRC;
>> +			else if (matches(*argv, "dst") == 0)
>> +				ct_action |= TCA_CT_ACT_NAT_DST;
>> +			else
>> +				continue;
>> +
>> +			NEXT_ARG();
>> +			if (matches(*argv, "addr") != 0)
>> +				usage();
>> +
>> +			NEXT_ARG();
>> +			ret = ct_parse_nat_addr_range(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal nat address range\n");
>> +				return -1;
>> +			}
>> +
>> +			NEXT_ARG_FWD();
>> +			if (matches(*argv, "port") != 0)
>> +				continue;
>> +
>> +			NEXT_ARG();
>> +			ret = ct_parse_nat_port_range(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal nat port range\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "clear") == 0) {
>> +			ct_action |= TCA_CT_ACT_CLEAR;
>> +		} else if (matches(*argv, "commit") == 0) {
>> +			ct_action |= TCA_CT_ACT_COMMIT;
>> +		} else if (matches(*argv, "force") == 0) {
>> +			ct_action |= TCA_CT_ACT_FORCE;
>> +		} else if (matches(*argv, "index") == 0) {
>> +			NEXT_ARG();
>> +			if (get_u32(&sel.index, *argv, 10)) {
>> +				fprintf(stderr, "ct: Illegal \"index\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "mark") == 0) {
>> +			NEXT_ARG();
>> +
>> +			ret = ct_parse_mark(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal \"mark\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "label") == 0) {
>> +			NEXT_ARG();
>> +
>> +			ret = ct_parse_labels(*argv, n);
>> +			if (ret) {
>> +				fprintf(stderr, "ct: Illegal \"label\"\n");
>> +				return -1;
>> +			}
>> +		} else if (matches(*argv, "help") == 0) {
>> +			usage();
>> +		} else {
>> +			break;
>> +		}
>> +		NEXT_ARG_FWD();
>> +	}
>> +
>> +	if (ct_action & TCA_CT_ACT_CLEAR &&
>> +	    ct_action & ~TCA_CT_ACT_CLEAR) {
>> +		fprintf(stderr, "ct: clear can only be used alone\n");
>> +		return -1;
>> +	}
>> +
>> +	if (ct_action & TCA_CT_ACT_NAT_SRC &&
>> +	    ct_action & TCA_CT_ACT_NAT_DST) {
>> +		fprintf(stderr, "ct: src and dst nat can't be used together\n");
>> +		return -1;
>> +	}
>> +
>> +	if ((ct_action & TCA_CT_ACT_COMMIT) &&
>> +	    (ct_action & TCA_CT_ACT_NAT) &&
>> +	    !(ct_action & (TCA_CT_ACT_NAT_SRC | TCA_CT_ACT_NAT_DST))) {
>> +		fprintf(stderr, "ct: commit and nat must set src or dst\n");
>> +		return -1;
>> +	}
>> +
>> +	if (!(ct_action & TCA_CT_ACT_COMMIT) &&
>> +	    (ct_action & (TCA_CT_ACT_NAT_SRC | TCA_CT_ACT_NAT_DST))) {
>> +		fprintf(stderr, "ct: src or dst is only valid if commit is set\n");
>> +		return -1;
>> +	}
>> +
>> +	parse_action_control_dflt(&argc, &argv, &sel.action, false,
>> +				  TC_ACT_PIPE);
>> +	NEXT_ARG_FWD();
>> +
>> +	addattr16(n, MAX_MSG, TCA_CT_ACTION, ct_action);
>> +	addattr_l(n, MAX_MSG, TCA_CT_PARMS, &sel, sizeof(sel));
>> +	addattr_nest_end(n, tail);
>> +
>> +	*argc_p = argc;
>> +	*argv_p = argv;
>> +	return 0;
>> +}
> ...

^ permalink raw reply

* [PATCH net-next] netfilter: nft_meta: Fix build error
From: YueHaibing @ 2019-07-09  7:01 UTC (permalink / raw)
  To: davem, pablo, kadlec, fw, roopa, nikolay, wenxu
  Cc: linux-kernel, netdev, bridge, coreteam, netfilter-devel,
	YueHaibing

If NFT_BRIDGE_META is y and NF_TABLES is m, building fails:

net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init':
nft_meta_bridge.c:(.text+0xd0): undefined reference to `nft_parse_register'
nft_meta_bridge.c:(.text+0xec): undefined reference to `nft_validate_register_store'

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 30e103fe24de ("netfilter: nft_meta: move bridge meta keys into nft_meta_bridge")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 net/bridge/netfilter/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig
index fbc7085..ca3214f 100644
--- a/net/bridge/netfilter/Kconfig
+++ b/net/bridge/netfilter/Kconfig
@@ -12,6 +12,7 @@ if NF_TABLES_BRIDGE
 
 config NFT_BRIDGE_META
 	tristate "Netfilter nf_table bridge meta support"
+	depends on NF_TABLES
 	help
 	  Add support for bridge dedicated meta key.
 
-- 
2.7.4



^ permalink raw reply related

* Re: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Leon Romanovsky @ 2019-07-09  7:02 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: ariel.elior, jgg, dledford, galpress, linux-rdma, davem, netdev
In-Reply-To: <20190708091503.14723-2-michal.kalderon@marvell.com>

On Mon, Jul 08, 2019 at 12:14:58PM +0300, Michal Kalderon wrote:
> Create some common API's for adding entries to a xa_mmap.
> Searching for an entry and freeing one.
>
> The code was copied from the efa driver almost as is, just renamed
> function to be generic and not efa specific.
>
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> ---
>  drivers/infiniband/core/device.c      |   1 +
>  drivers/infiniband/core/rdma_core.c   |   1 +
>  drivers/infiniband/core/uverbs_cmd.c  |   1 +
>  drivers/infiniband/core/uverbs_main.c | 105 ++++++++++++++++++++++++++++++++++
>  include/rdma/ib_verbs.h               |  32 +++++++++++
>  5 files changed, 140 insertions(+)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 8a6ccb936dfe..a830c2c5d691 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
>  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
>  	SET_DEVICE_OP(dev_ops, mmap);
> +	SET_DEVICE_OP(dev_ops, mmap_free);
>  	SET_DEVICE_OP(dev_ops, modify_ah);
>  	SET_DEVICE_OP(dev_ops, modify_cq);
>  	SET_DEVICE_OP(dev_ops, modify_device);
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index ccf4d069c25c..7166741834c8 100644
> --- a/drivers/infiniband/core/rdma_core.c
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
>  	rdma_restrack_del(&ucontext->res);
>
>  	ib_dev->ops.dealloc_ucontext(ucontext);
> +	rdma_user_mmap_entries_remove_free(ucontext);
>  	kfree(ucontext);
>
>  	ufile->ucontext = NULL;
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 7ddd0e5bc6b3..44c0600245e4 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
>
>  	mutex_init(&ucontext->per_mm_list_lock);
>  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> +	xa_init(&ucontext->mmap_xa);
>
>  	ret = get_unused_fd_flags(O_CLOEXEC);
>  	if (ret < 0)
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 11c13c1381cf..37507cc27e8c 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -965,6 +965,111 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL(rdma_user_mmap_io);
>
> +static inline u64
> +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> +{
> +	return (u64)entry->mmap_page << PAGE_SHIFT;
> +}
> +
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u64 mmap_page;
> +
> +	mmap_page = key >> PAGE_SHIFT;
> +	if (mmap_page > U32_MAX)
> +		return NULL;
> +
> +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> +	if (!entry || rdma_user_mmap_get_key(entry) != key ||
> +	    entry->length != len)
> +		return NULL;
> +
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> +		  entry->obj, key, entry->address, entry->length);
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_get);

Please add function description in kernel doc format for all newly EXPORT_SYMBOL()
functions you introduced in RDMA/core.

> +
> +/*
> + * Note this locking scheme cannot support removal of entries, except during
> + * ucontext destruction when the core code guarentees no concurrency.
> + */
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> +				u64 address, u64 length, u8 mmap_flag)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u32 next_mmap_page;
> +	int err;
> +
> +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);

It is worth to use kzalloc and not kmalloc.

Thanks

^ permalink raw reply

* Re: [PATCH nf-next 1/3] netfilter: nf_nat_proto: add nf_nat_bridge_ops support
From: kbuild test robot @ 2019-07-09  7:04 UTC (permalink / raw)
  To: wenxu; +Cc: kbuild-all, pablo, fw, netfilter-devel, netdev
In-Reply-To: <1562574567-8293-1-git-send-email-wenxu@ucloud.cn>

[-- Attachment #1: Type: text/plain, Size: 2475 bytes --]

Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on nf-next/master]

url:    https://github.com/0day-ci/linux/commits/wenxu-ucloud-cn/netfilter-nf_nat_proto-add-nf_nat_bridge_ops-support/20190709-070304
base:   https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next.git master
config: x86_64-randconfig-s1-07091425 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-9) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/netfilter/nf_nat_proto.c: In function 'nf_nat_bridge_in':
>> net/netfilter/nf_nat_proto.c:1048:10: error: implicit declaration of function 'nf_nat_ipv6_in'; did you mean 'nf_nat_ipv4_in'? [-Werror=implicit-function-declaration]
      return nf_nat_ipv6_in(priv, skb, state);
             ^~~~~~~~~~~~~~
             nf_nat_ipv4_in
   net/netfilter/nf_nat_proto.c: In function 'nf_nat_bridge_out':
>> net/netfilter/nf_nat_proto.c:1062:10: error: implicit declaration of function 'nf_nat_ipv6_out'; did you mean 'nf_nat_ipv4_out'? [-Werror=implicit-function-declaration]
      return nf_nat_ipv6_out(priv, skb, state);
             ^~~~~~~~~~~~~~~
             nf_nat_ipv4_out
   cc1: some warnings being treated as errors

vim +1048 net/netfilter/nf_nat_proto.c

  1038	
  1039	#if defined(CONFIG_NF_TABLES_BRIDGE) && IS_ENABLED(CONFIG_NFT_NAT)
  1040	static unsigned int
  1041	nf_nat_bridge_in(void *priv, struct sk_buff *skb,
  1042			 const struct nf_hook_state *state)
  1043	{
  1044		switch (skb->protocol) {
  1045		case htons(ETH_P_IP):
  1046			return nf_nat_ipv4_in(priv, skb, state);
  1047		case htons(ETH_P_IPV6):
> 1048			return nf_nat_ipv6_in(priv, skb, state);
  1049		default:
  1050			return NF_ACCEPT;
  1051		}
  1052	}
  1053	
  1054	static unsigned int
  1055	nf_nat_bridge_out(void *priv, struct sk_buff *skb,
  1056			  const struct nf_hook_state *state)
  1057	{
  1058		switch (skb->protocol) {
  1059		case htons(ETH_P_IP):
  1060			return nf_nat_ipv4_out(priv, skb, state);
  1061		case htons(ETH_P_IPV6):
> 1062			return nf_nat_ipv6_out(priv, skb, state);
  1063		default:
  1064			return NF_ACCEPT;
  1065		}
  1066	}
  1067	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34307 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 12/16] net/mlx5e: Split open/close ICOSQ into stages
From: Jiri Pirko @ 2019-07-09  7:23 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, netdev, Eran Ben Elisha, ayal, jiri,
	Saeed Mahameed, moshe
In-Reply-To: <1562500388-16847-13-git-send-email-tariqt@mellanox.com>

Sun, Jul 07, 2019 at 01:53:04PM CEST, tariqt@mellanox.com wrote:
>From: Aya Levin <ayal@mellanox.com>
>
>Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into
>open and activate where open handles creation and activate enables the
>queue. Do a symmetric thing in close flow: split into close and
>deactivate.
>
>Signed-off-by: Aya Levin <ayal@mellanox.com>
>Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next v3 3/3] net: stmmac: Introducing support for Page Pool
From: Ilias Apalodimas @ 2019-07-09  7:23 UTC (permalink / raw)
  To: David Miller
  Cc: Jose.Abreu, linux-kernel, netdev, linux-stm32, linux-arm-kernel,
	Joao.Pinto, peppe.cavallaro, alexandre.torgue, brouer, arnd
In-Reply-To: <20190708.141515.1767939731073284700.davem@davemloft.net>

Hello, 

> From: Jose Abreu <Jose.Abreu@synopsys.com>
> Date: Mon, 8 Jul 2019 16:08:07 +0000
> 
> > From: Ilias Apalodimas <ilias.apalodimas@linaro.org> | Date: Fri, Jul 
> > 05, 2019 at 16:24:53
> > 
> >> Well ideally we'd like to get the change in before the merge window ourselves,
> >> since we dont want to remove->re-add the same function in stable kernels. If
> >> that doesn't go in i am fine fixing it in the next merge window i guess, since
> >> it offers substantial speedups
> > 
> > I think the series is marked as "Changes Requested" in patchwork. What's 
> > the status of this ?
> 
> That means I expect a respin based upon feedback or similar.  If Ilias and
> you agreed to put this series in as-is, my apologies and just resend the
> series with appropriate ACK and Review tags added.

The patch from Ivan did get merged, can you change the free call to
page_pool_destroy and re-spin? You can add my acked-by

Thanks
/Ilias

^ permalink raw reply

* Re: [PATCH] tipc: ensure skb->lock is initialised
From: Eric Dumazet @ 2019-07-09  7:30 UTC (permalink / raw)
  To: Chris Packham, Eric Dumazet, jon.maloy@ericsson.com,
	ying.xue@windriver.com, davem@davemloft.net
  Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <87fd2150548041219fc42bce80b63c9c@svr-chch-ex1.atlnz.lc>



On 7/8/19 11:13 PM, Chris Packham wrote:
> On 9/07/19 8:43 AM, Chris Packham wrote:
>> On 8/07/19 8:18 PM, Eric Dumazet wrote:
>>>
>>>
>>> On 7/8/19 12:53 AM, Chris Packham wrote:
>>>> tipc_named_node_up() creates a skb list. It passes the list to
>>>> tipc_node_xmit() which has some code paths that can call
>>>> skb_queue_purge() which relies on the list->lock being initialised.
>>>> Ensure tipc_named_node_up() uses skb_queue_head_init() so that the lock
>>>> is explicitly initialised.
>>>>
>>>> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
>>>
>>> I would rather change the faulty skb_queue_purge() to __skb_queue_purge()
>>>
>>
>> Makes sense. I'll look at that for v2.
>>
> 
> Actually maybe not. tipc_rcast_xmit(), tipc_node_xmit_skb(), 
> tipc_send_group_msg(), __tipc_sendmsg(), __tipc_sendstream(), and 
> tipc_sk_timeout() all use skb_queue_head_init(). So my original change 
> brings tipc_named_node_up() into line with them.
> 
> I think it should be safe for tipc_node_xmit() to use 
> __skb_queue_purge() since all the callers seem to have exclusive access 
> to the list of skbs. It still seems that the callers should all use 
> skb_queue_head_init() for consistency.
> 

No, tipc does not use the list lock (it relies on the socket lock)
 and therefore should consistently use __skb_queue_head_init()
instead of skb_queue_head_init()

Using a spinlock to protect a local list makes no sense really,
it spreads false sense of correctness.

tipc_link_xmit() for example never acquires the spinlock,
yet uses skb_peek() and __skb_dequeue()

It is time to clean all this mess.




^ permalink raw reply

* [PATCH net-next v6 0/4] net/sched: Introduce tc connection tracking
From: Paul Blakey @ 2019-07-09  7:30 UTC (permalink / raw)
  To: Jiri Pirko, Paul Blakey, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	Marcelo Ricardo Leitner, netdev, David Miller, Aaron Conole,
	Zhike Wang
  Cc: Rony Efraim, nst-kernel, John Hurley, Simon Horman, Justin Pettit

Hi,

This patch series add connection tracking capabilities in tc sw datapath.
It does so via a new tc action, called act_ct, and new tc flower classifier matching
on conntrack state, mark and label.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

The pattern used in the design here closely resembles OvS, as the plan is to also offload
OvS conntrack rules to tc. OvS datapath rules uses it's recirculation mechanism to send
specific packets to conntrack, and return with the new conntrack state (ct_state) on some other recirc_id
to be matched again (we use goto chain for this).

This results in the following OvS datapath rules:

recirc_id(0),in_port(ens1f0_0),ct_state(-trk),... actions:ct(zone=2),recirc(2)
recirc_id(2),in_port(ens1f0_0),ct_state(+new+trk),ct_mark(0xbb),... actions:ct(commit,zone=2,nat(src=5.5.5.7),mark=0xbb),ens1f0_1
recirc_id(2),in_port(ens1f0_0),ct_state(+est+trk),ct_mark(0xbb),... actions:ct(zone=2,nat),ens1f0_1

recirc_id(1),in_port(ens1f0_1),ct_state(-trk),... actions:ct(zone=2),recirc(1)
recirc_id(1),in_port(ens1f0_1),ct_state(+est+trk),... actions:ct(zone=2,nat),ens1f0_0

Changelog:
	See individual patches.

Paul Blakey (4):
  net/sched: Introduce action ct
  net/flow_dissector: add connection tracking dissection
  net/sched: cls_flower: Add matching on conntrack info
  tc-tests: Add tc action ct tests

 include/linux/skbuff.h                             |  10 +
 include/net/flow_dissector.h                       |  15 +
 include/net/flow_offload.h                         |   5 +
 include/net/tc_act/tc_ct.h                         |  63 ++
 include/uapi/linux/pkt_cls.h                       |  17 +
 include/uapi/linux/tc_act/tc_ct.h                  |  41 +
 net/core/flow_dissector.c                          |  44 +
 net/sched/Kconfig                                  |  11 +
 net/sched/Makefile                                 |   1 +
 net/sched/act_ct.c                                 | 984 +++++++++++++++++++++
 net/sched/cls_api.c                                |   5 +
 net/sched/cls_flower.c                             | 127 ++-
 .../selftests/tc-testing/tc-tests/actions/ct.json  | 314 +++++++
 13 files changed, 1632 insertions(+), 5 deletions(-)
 create mode 100644 include/net/tc_act/tc_ct.h
 create mode 100644 include/uapi/linux/tc_act/tc_ct.h
 create mode 100644 net/sched/act_ct.c
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/ct.json

-- 
1.8.3.1


^ permalink raw reply

* [PATCH net-next v6 2/4] net/flow_dissector: add connection tracking dissection
From: Paul Blakey @ 2019-07-09  7:30 UTC (permalink / raw)
  To: Jiri Pirko, Paul Blakey, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	Marcelo Ricardo Leitner, netdev, David Miller, Aaron Conole,
	Zhike Wang
  Cc: Rony Efraim, nst-kernel, John Hurley, Simon Horman, Justin Pettit
In-Reply-To: <1562657451-20819-1-git-send-email-paulb@mellanox.com>

Retreives connection tracking zone, mark, label, and state from
a SKB.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/skbuff.h       | 10 ++++++++++
 include/net/flow_dissector.h | 15 +++++++++++++++
 net/core/flow_dissector.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b5d427b..1dcb674 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1324,6 +1324,16 @@ void skb_flow_dissect_meta(const struct sk_buff *skb,
 			   struct flow_dissector *flow_dissector,
 			   void *target_container);
 
+/* Gets a skb connection tracking info, ctinfo map should be a
+ * a map of mapsize to translate enum ip_conntrack_info states
+ * to user states.
+ */
+void
+skb_flow_dissect_ct(const struct sk_buff *skb,
+		    struct flow_dissector *flow_dissector,
+		    void *target_container,
+		    u16 *ctinfo_map,
+		    size_t mapsize);
 void
 skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 			     struct flow_dissector *flow_dissector,
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 02478e4..90bd210 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -208,6 +208,20 @@ struct flow_dissector_key_meta {
 	int ingress_ifindex;
 };
 
+/**
+ * struct flow_dissector_key_ct:
+ * @ct_state: conntrack state after converting with map
+ * @ct_mark: conttrack mark
+ * @ct_zone: conntrack zone
+ * @ct_labels: conntrack labels
+ */
+struct flow_dissector_key_ct {
+	u16	ct_state;
+	u16	ct_zone;
+	u32	ct_mark;
+	u32	ct_labels[4];
+};
+
 enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
 	FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -234,6 +248,7 @@ enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_ENC_IP, /* struct flow_dissector_key_ip */
 	FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
 	FLOW_DISSECTOR_KEY_META, /* struct flow_dissector_key_meta */
+	FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
 
 	FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 01ad60b..3e6fedb 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -27,6 +27,10 @@
 #include <scsi/fc/fc_fcoe.h>
 #include <uapi/linux/batadv_packet.h>
 #include <linux/bpf.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_labels.h>
+#endif
 
 static DEFINE_MUTEX(flow_dissector_mutex);
 
@@ -232,6 +236,46 @@ void skb_flow_dissect_meta(const struct sk_buff *skb,
 }
 
 void
+skb_flow_dissect_ct(const struct sk_buff *skb,
+		    struct flow_dissector *flow_dissector,
+		    void *target_container,
+		    u16 *ctinfo_map,
+		    size_t mapsize)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	struct flow_dissector_key_ct *key;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn_labels *cl;
+	struct nf_conn *ct;
+
+	if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_CT))
+		return;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct)
+		return;
+
+	key = skb_flow_dissector_target(flow_dissector,
+					FLOW_DISSECTOR_KEY_CT,
+					target_container);
+
+	if (ctinfo < mapsize)
+		key->ct_state = ctinfo_map[ctinfo];
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)
+	key->ct_zone = ct->zone.id;
+#endif
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
+	key->ct_mark = ct->mark;
+#endif
+
+	cl = nf_ct_labels_find(ct);
+	if (cl)
+		memcpy(key->ct_labels, cl->bits, sizeof(key->ct_labels));
+#endif /* CONFIG_NF_CONNTRACK */
+}
+EXPORT_SYMBOL(skb_flow_dissect_ct);
+
+void
 skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 			     struct flow_dissector *flow_dissector,
 			     void *target_container)
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next v6 3/4] net/sched: cls_flower: Add matching on conntrack info
From: Paul Blakey @ 2019-07-09  7:30 UTC (permalink / raw)
  To: Jiri Pirko, Paul Blakey, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	Marcelo Ricardo Leitner, netdev, David Miller, Aaron Conole,
	Zhike Wang
  Cc: Rony Efraim, nst-kernel, John Hurley, Simon Horman, Justin Pettit
In-Reply-To: <1562657451-20819-1-git-send-email-paulb@mellanox.com>

New matches for conntrack mark, label, zone, and state.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  16 ++++++
 net/sched/cls_flower.c       | 127 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 138 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 7c9d701..6992df1 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -536,12 +536,28 @@ enum {
 	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
 	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
 
+	TCA_FLOWER_KEY_CT_STATE,	/* u16 */
+	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u16 */
+	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
+	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
+	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
+	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
+	TCA_FLOWER_KEY_CT_LABELS,	/* u128 */
+	TCA_FLOWER_KEY_CT_LABELS_MASK,	/* u128 */
+
 	__TCA_FLOWER_MAX,
 };
 
 #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
 
 enum {
+	TCA_FLOWER_KEY_CT_FLAGS_NEW = 1 << 0, /* Beginning of a new connection. */
+	TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */
+	TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */
+	TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */
+};
+
+enum {
 	TCA_FLOWER_KEY_ENC_OPTS_UNSPEC,
 	TCA_FLOWER_KEY_ENC_OPTS_GENEVE, /* Nested
 					 * TCA_FLOWER_KEY_ENC_OPT_GENEVE_
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index ce2e9b1..d90ec2c 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -26,6 +26,8 @@
 #include <net/dst.h>
 #include <net/dst_metadata.h>
 
+#include <uapi/linux/netfilter/nf_conntrack_common.h>
+
 struct fl_flow_key {
 	struct flow_dissector_key_meta meta;
 	struct flow_dissector_key_control control;
@@ -54,6 +56,7 @@ struct fl_flow_key {
 	struct flow_dissector_key_enc_opts enc_opts;
 	struct flow_dissector_key_ports tp_min;
 	struct flow_dissector_key_ports tp_max;
+	struct flow_dissector_key_ct ct;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
 
 struct fl_flow_mask_range {
@@ -272,14 +275,27 @@ static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
 	return __fl_lookup(mask, mkey);
 }
 
+static u16 fl_ct_info_to_flower_map[] = {
+	[IP_CT_ESTABLISHED] =		TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
+					TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED,
+	[IP_CT_RELATED] =		TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
+					TCA_FLOWER_KEY_CT_FLAGS_RELATED,
+	[IP_CT_ESTABLISHED_REPLY] =	TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
+					TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED,
+	[IP_CT_RELATED_REPLY] =		TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
+					TCA_FLOWER_KEY_CT_FLAGS_RELATED,
+	[IP_CT_NEW] =			TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
+					TCA_FLOWER_KEY_CT_FLAGS_NEW,
+};
+
 static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		       struct tcf_result *res)
 {
 	struct cls_fl_head *head = rcu_dereference_bh(tp->root);
-	struct cls_fl_filter *f;
-	struct fl_flow_mask *mask;
-	struct fl_flow_key skb_key;
 	struct fl_flow_key skb_mkey;
+	struct fl_flow_key skb_key;
+	struct fl_flow_mask *mask;
+	struct cls_fl_filter *f;
 
 	list_for_each_entry_rcu(mask, &head->masks, list) {
 		fl_clear_masked_range(&skb_key, mask);
@@ -290,6 +306,9 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 */
 		skb_key.basic.n_proto = skb->protocol;
 		skb_flow_dissect_tunnel_info(skb, &mask->dissector, &skb_key);
+		skb_flow_dissect_ct(skb, &mask->dissector, &skb_key,
+				    fl_ct_info_to_flower_map,
+				    ARRAY_SIZE(fl_ct_info_to_flower_map));
 		skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
 
 		fl_set_masked_key(&skb_mkey, &skb_key, mask);
@@ -704,6 +723,16 @@ static void *fl_get(struct tcf_proto *tp, u32 handle)
 	[TCA_FLOWER_KEY_ENC_IP_TTL_MASK] = { .type = NLA_U8 },
 	[TCA_FLOWER_KEY_ENC_OPTS]	= { .type = NLA_NESTED },
 	[TCA_FLOWER_KEY_ENC_OPTS_MASK]	= { .type = NLA_NESTED },
+	[TCA_FLOWER_KEY_CT_STATE]	= { .type = NLA_U16 },
+	[TCA_FLOWER_KEY_CT_STATE_MASK]	= { .type = NLA_U16 },
+	[TCA_FLOWER_KEY_CT_ZONE]	= { .type = NLA_U16 },
+	[TCA_FLOWER_KEY_CT_ZONE_MASK]	= { .type = NLA_U16 },
+	[TCA_FLOWER_KEY_CT_MARK]	= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_CT_MARK_MASK]	= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_CT_LABELS]	= { .type = NLA_BINARY,
+					    .len = 128 / BITS_PER_BYTE },
+	[TCA_FLOWER_KEY_CT_LABELS_MASK]	= { .type = NLA_BINARY,
+					    .len = 128 / BITS_PER_BYTE },
 };
 
 static const struct nla_policy
@@ -725,11 +754,11 @@ static void fl_set_key_val(struct nlattr **tb,
 {
 	if (!tb[val_type])
 		return;
-	memcpy(val, nla_data(tb[val_type]), len);
+	nla_memcpy(val, tb[val_type], len);
 	if (mask_type == TCA_FLOWER_UNSPEC || !tb[mask_type])
 		memset(mask, 0xff, len);
 	else
-		memcpy(mask, nla_data(tb[mask_type]), len);
+		nla_memcpy(mask, tb[mask_type], len);
 }
 
 static int fl_set_key_port_range(struct nlattr **tb, struct fl_flow_key *key,
@@ -1015,6 +1044,51 @@ static int fl_set_enc_opt(struct nlattr **tb, struct fl_flow_key *key,
 	return 0;
 }
 
+static int fl_set_key_ct(struct nlattr **tb,
+			 struct flow_dissector_key_ct *key,
+			 struct flow_dissector_key_ct *mask,
+			 struct netlink_ext_ack *extack)
+{
+	if (tb[TCA_FLOWER_KEY_CT_STATE]) {
+		if (!IS_ENABLED(CONFIG_NF_CONNTRACK)) {
+			NL_SET_ERR_MSG(extack, "Conntrack isn't enabled");
+			return -EOPNOTSUPP;
+		}
+		fl_set_key_val(tb, &key->ct_state, TCA_FLOWER_KEY_CT_STATE,
+			       &mask->ct_state, TCA_FLOWER_KEY_CT_STATE_MASK,
+			       sizeof(key->ct_state));
+	}
+	if (tb[TCA_FLOWER_KEY_CT_ZONE]) {
+		if (!IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)) {
+			NL_SET_ERR_MSG(extack, "Conntrack zones isn't enabled");
+			return -EOPNOTSUPP;
+		}
+		fl_set_key_val(tb, &key->ct_zone, TCA_FLOWER_KEY_CT_ZONE,
+			       &mask->ct_zone, TCA_FLOWER_KEY_CT_ZONE_MASK,
+			       sizeof(key->ct_zone));
+	}
+	if (tb[TCA_FLOWER_KEY_CT_MARK]) {
+		if (!IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)) {
+			NL_SET_ERR_MSG(extack, "Conntrack mark isn't enabled");
+			return -EOPNOTSUPP;
+		}
+		fl_set_key_val(tb, &key->ct_mark, TCA_FLOWER_KEY_CT_MARK,
+			       &mask->ct_mark, TCA_FLOWER_KEY_CT_MARK_MASK,
+			       sizeof(key->ct_mark));
+	}
+	if (tb[TCA_FLOWER_KEY_CT_LABELS]) {
+		if (!IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS)) {
+			NL_SET_ERR_MSG(extack, "Conntrack labels aren't enabled");
+			return -EOPNOTSUPP;
+		}
+		fl_set_key_val(tb, key->ct_labels, TCA_FLOWER_KEY_CT_LABELS,
+			       mask->ct_labels, TCA_FLOWER_KEY_CT_LABELS_MASK,
+			       sizeof(key->ct_labels));
+	}
+
+	return 0;
+}
+
 static int fl_set_key(struct net *net, struct nlattr **tb,
 		      struct fl_flow_key *key, struct fl_flow_key *mask,
 		      struct netlink_ext_ack *extack)
@@ -1224,6 +1298,10 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 			return ret;
 	}
 
+	ret = fl_set_key_ct(tb, &key->ct, &mask->ct, extack);
+	if (ret)
+		return ret;
+
 	if (tb[TCA_FLOWER_KEY_FLAGS])
 		ret = fl_set_key_flags(tb, &key->control.flags, &mask->control.flags);
 
@@ -1324,6 +1402,8 @@ static void fl_init_dissector(struct flow_dissector *dissector,
 			     FLOW_DISSECTOR_KEY_ENC_IP, enc_ip);
 	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 			     FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
+	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+			     FLOW_DISSECTOR_KEY_CT, ct);
 
 	skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -2078,6 +2158,40 @@ static int fl_dump_key_geneve_opt(struct sk_buff *skb,
 	return -EMSGSIZE;
 }
 
+static int fl_dump_key_ct(struct sk_buff *skb,
+			  struct flow_dissector_key_ct *key,
+			  struct flow_dissector_key_ct *mask)
+{
+	if (IS_ENABLED(CONFIG_NF_CONNTRACK) &&
+	    fl_dump_key_val(skb, &key->ct_state, TCA_FLOWER_KEY_CT_STATE,
+			    &mask->ct_state, TCA_FLOWER_KEY_CT_STATE_MASK,
+			    sizeof(key->ct_state)))
+		goto nla_put_failure;
+
+	if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
+	    fl_dump_key_val(skb, &key->ct_zone, TCA_FLOWER_KEY_CT_ZONE,
+			    &mask->ct_zone, TCA_FLOWER_KEY_CT_ZONE_MASK,
+			    sizeof(key->ct_zone)))
+		goto nla_put_failure;
+
+	if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
+	    fl_dump_key_val(skb, &key->ct_mark, TCA_FLOWER_KEY_CT_MARK,
+			    &mask->ct_mark, TCA_FLOWER_KEY_CT_MARK_MASK,
+			    sizeof(key->ct_mark)))
+		goto nla_put_failure;
+
+	if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) &&
+	    fl_dump_key_val(skb, &key->ct_labels, TCA_FLOWER_KEY_CT_LABELS,
+			    &mask->ct_labels, TCA_FLOWER_KEY_CT_LABELS_MASK,
+			    sizeof(key->ct_labels)))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int fl_dump_key_options(struct sk_buff *skb, int enc_opt_type,
 			       struct flow_dissector_key_enc_opts *enc_opts)
 {
@@ -2311,6 +2425,9 @@ static int fl_dump_key(struct sk_buff *skb, struct net *net,
 	    fl_dump_key_enc_opt(skb, &key->enc_opts, &mask->enc_opts))
 		goto nla_put_failure;
 
+	if (fl_dump_key_ct(skb, &key->ct, &mask->ct))
+		goto nla_put_failure;
+
 	if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
 		goto nla_put_failure;
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next v6 1/4] net/sched: Introduce action ct
From: Paul Blakey @ 2019-07-09  7:30 UTC (permalink / raw)
  To: Jiri Pirko, Paul Blakey, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	Marcelo Ricardo Leitner, netdev, David Miller, Aaron Conole,
	Zhike Wang
  Cc: Rony Efraim, nst-kernel, John Hurley, Simon Horman, Justin Pettit
In-Reply-To: <1562657451-20819-1-git-send-email-paulb@mellanox.com>

Allow sending a packet to conntrack module for connection tracking.

The packet will be marked with conntrack connection's state, and
any metadata such as conntrack mark and label. This state metadata
can later be matched against with tc classifers, for example with the
flower classifier as below.

In addition to committing new connections the user can optionally
specific a zone to track within, set a mark/label and configure nat
with an address range and port range.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

Changelog:
V5->V6:
	Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case
V4->V5:
	Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached()
V3->V4:
	Added strict_start_type for act_ct policy
V2->V3:
	Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c
V1->V2:
	Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max
	Refactored NAT_PORT_MIN_MAX range handling as well
	Added ipv4/ipv6 defragmentation
	Removed extra skb pull push of nw offset in exectute nat
	Refactored tcf_ct_skb_network_trim after pull
	Removed TCA_ACT_CT define
---
 include/net/flow_offload.h        |   5 +
 include/net/tc_act/tc_ct.h        |  63 +++
 include/uapi/linux/pkt_cls.h      |   1 +
 include/uapi/linux/tc_act/tc_ct.h |  41 ++
 net/sched/Kconfig                 |  11 +
 net/sched/Makefile                |   1 +
 net/sched/act_ct.c                | 984 ++++++++++++++++++++++++++++++++++++++
 net/sched/cls_api.c               |   5 +
 8 files changed, 1111 insertions(+)
 create mode 100644 include/net/tc_act/tc_ct.h
 create mode 100644 include/uapi/linux/tc_act/tc_ct.h
 create mode 100644 net/sched/act_ct.c

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 36127c1..a09e256 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -129,6 +129,7 @@ enum flow_action_id {
 	FLOW_ACTION_QUEUE,
 	FLOW_ACTION_SAMPLE,
 	FLOW_ACTION_POLICE,
+	FLOW_ACTION_CT,
 };
 
 /* This is mirroring enum pedit_header_type definition for easy mapping between
@@ -178,6 +179,10 @@ struct flow_action_entry {
 			s64			burst;
 			u64			rate_bytes_ps;
 		} police;
+		struct {				/* FLOW_ACTION_CT */
+			int action;
+			u16 zone;
+		} ct;
 	};
 };
 
diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
new file mode 100644
index 0000000..bdc20ab
--- /dev/null
+++ b/include/net/tc_act/tc_ct.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CT_H
+#define __NET_TC_CT_H
+
+#include <net/act_api.h>
+#include <uapi/linux/tc_act/tc_ct.h>
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_nat.h>
+#include <net/netfilter/nf_conntrack_labels.h>
+
+struct tcf_ct_params {
+	struct nf_conn *tmpl;
+	u16 zone;
+
+	u32 mark;
+	u32 mark_mask;
+
+	u32 labels[NF_CT_LABELS_MAX_SIZE / sizeof(u32)];
+	u32 labels_mask[NF_CT_LABELS_MAX_SIZE / sizeof(u32)];
+
+	struct nf_nat_range2 range;
+	bool ipv4_range;
+
+	u16 ct_action;
+
+	struct rcu_head rcu;
+};
+
+struct tcf_ct {
+	struct tc_action common;
+	struct tcf_ct_params __rcu *params;
+};
+
+#define to_ct(a) ((struct tcf_ct *)a)
+#define to_ct_params(a) ((struct tcf_ct_params *) \
+			 rtnl_dereference((to_ct(a)->params)))
+
+static inline uint16_t tcf_ct_zone(const struct tc_action *a)
+{
+	return to_ct_params(a)->zone;
+}
+
+static inline int tcf_ct_action(const struct tc_action *a)
+{
+	return to_ct_params(a)->ct_action;
+}
+
+#else
+static inline uint16_t tcf_ct_zone(const struct tc_action *a) { return 0; }
+static inline int tcf_ct_action(const struct tc_action *a) { return 0; }
+#endif /* CONFIG_NF_CONNTRACK */
+
+static inline bool is_tcf_ct(const struct tc_action *a)
+{
+#if defined(CONFIG_NET_CLS_ACT) && IS_ENABLED(CONFIG_NF_CONNTRACK)
+	if (a->ops && a->ops->id == TCA_ID_CT)
+		return true;
+#endif
+	return false;
+}
+
+#endif /* __NET_TC_CT_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 8cc6b67..7c9d701 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -106,6 +106,7 @@ enum tca_id {
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
 	TCA_ID_CTINFO,
+	TCA_ID_CT,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
new file mode 100644
index 0000000..5fb1d7a
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CT_H
+#define __UAPI_TC_CT_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+enum {
+	TCA_CT_UNSPEC,
+	TCA_CT_PARMS,
+	TCA_CT_TM,
+	TCA_CT_ACTION,		/* u16 */
+	TCA_CT_ZONE,		/* u16 */
+	TCA_CT_MARK,		/* u32 */
+	TCA_CT_MARK_MASK,	/* u32 */
+	TCA_CT_LABELS,		/* u128 */
+	TCA_CT_LABELS_MASK,	/* u128 */
+	TCA_CT_NAT_IPV4_MIN,	/* be32 */
+	TCA_CT_NAT_IPV4_MAX,	/* be32 */
+	TCA_CT_NAT_IPV6_MIN,	/* struct in6_addr */
+	TCA_CT_NAT_IPV6_MAX,	/* struct in6_addr */
+	TCA_CT_NAT_PORT_MIN,	/* be16 */
+	TCA_CT_NAT_PORT_MAX,	/* be16 */
+	TCA_CT_PAD,
+	__TCA_CT_MAX
+};
+
+#define TCA_CT_MAX (__TCA_CT_MAX - 1)
+
+#define TCA_CT_ACT_COMMIT	(1 << 0)
+#define TCA_CT_ACT_FORCE	(1 << 1)
+#define TCA_CT_ACT_CLEAR	(1 << 2)
+#define TCA_CT_ACT_NAT		(1 << 3)
+#define TCA_CT_ACT_NAT_SRC	(1 << 4)
+#define TCA_CT_ACT_NAT_DST	(1 << 5)
+
+struct tc_ct {
+	tc_gen;
+};
+
+#endif /* __UAPI_TC_CT_H */
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 360fdd3..ff90a71 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -929,6 +929,17 @@ config NET_ACT_TUNNEL_KEY
 	  To compile this code as a module, choose M here: the
 	  module will be called act_tunnel_key.
 
+config NET_ACT_CT
+        tristate "connection tracking tc action"
+        depends on NET_CLS_ACT && NF_CONNTRACK
+        help
+	  Say Y here to allow sending the packets to conntrack module.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_ct.
+
 config NET_IFE_SKBMARK
         tristate "Support to encoding decoding skb mark on IFE action"
         depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index d54bfcb..23d2202 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -28,6 +28,7 @@ obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)	+= act_meta_skbprio.o
 obj-$(CONFIG_NET_IFE_SKBTCINDEX)	+= act_meta_skbtcindex.o
 obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
+obj-$(CONFIG_NET_ACT_CT)	+= act_ct.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
new file mode 100644
index 0000000..b501ce0
--- /dev/null
+++ b/net/sched/act_ct.c
@@ -0,0 +1,984 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* -
+ * net/sched/act_ct.c  Connection Tracking action
+ *
+ * Authors:   Paul Blakey <paulb@mellanox.com>
+ *            Yossi Kuperman <yossiku@mellanox.com>
+ *            Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/pkt_cls.h>
+#include <net/act_api.h>
+#include <net/ip.h>
+#include <net/ipv6_frag.h>
+#include <uapi/linux/tc_act/tc_ct.h>
+#include <net/tc_act/tc_ct.h>
+
+#include <linux/netfilter/nf_nat.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+#include <net/netfilter/nf_conntrack_helper.h>
+#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
+
+static struct tc_action_ops act_ct_ops;
+static unsigned int ct_net_id;
+
+struct tc_ct_action_net {
+	struct tc_action_net tn; /* Must be first */
+	bool labels;
+};
+
+/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
+static bool tcf_ct_skb_nfct_cached(struct net *net, struct sk_buff *skb,
+				   u16 zone_id, bool force)
+{
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct)
+		return false;
+	if (!net_eq(net, read_pnet(&ct->ct_net)))
+		return false;
+	if (nf_ct_zone(ct)->id != zone_id)
+		return false;
+
+	/* Force conntrack entry direction. */
+	if (force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+		if (nf_ct_is_confirmed(ct))
+			nf_ct_kill(ct);
+
+		nf_conntrack_put(&ct->ct_general);
+		nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
+
+		return false;
+	}
+
+	return true;
+}
+
+/* Trim the skb to the length specified by the IP/IPv6 header,
+ * removing any trailing lower-layer padding. This prepares the skb
+ * for higher-layer processing that assumes skb->len excludes padding
+ * (such as nf_ip_checksum). The caller needs to pull the skb to the
+ * network header, and ensure ip_hdr/ipv6_hdr points to valid data.
+ */
+static int tcf_ct_skb_network_trim(struct sk_buff *skb, int family)
+{
+	unsigned int len;
+	int err;
+
+	switch (family) {
+	case NFPROTO_IPV4:
+		len = ntohs(ip_hdr(skb)->tot_len);
+		break;
+	case NFPROTO_IPV6:
+		len = sizeof(struct ipv6hdr)
+			+ ntohs(ipv6_hdr(skb)->payload_len);
+		break;
+	default:
+		len = skb->len;
+	}
+
+	err = pskb_trim_rcsum(skb, len);
+
+	return err;
+}
+
+static u8 tcf_ct_skb_nf_family(struct sk_buff *skb)
+{
+	u8 family = NFPROTO_UNSPEC;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		family = NFPROTO_IPV4;
+		break;
+	case htons(ETH_P_IPV6):
+		family = NFPROTO_IPV6;
+		break;
+	default:
+		break;
+	}
+
+	return family;
+}
+
+static int tcf_ct_ipv4_is_fragment(struct sk_buff *skb, bool *frag)
+{
+	unsigned int len;
+
+	len =  skb_network_offset(skb) + sizeof(struct iphdr);
+	if (unlikely(skb->len < len))
+		return -EINVAL;
+	if (unlikely(!pskb_may_pull(skb, len)))
+		return -ENOMEM;
+
+	*frag = ip_is_fragment(ip_hdr(skb));
+	return 0;
+}
+
+static int tcf_ct_ipv6_is_fragment(struct sk_buff *skb, bool *frag)
+{
+	unsigned int flags = 0, len, payload_ofs = 0;
+	unsigned short frag_off;
+	int nexthdr;
+
+	len =  skb_network_offset(skb) + sizeof(struct ipv6hdr);
+	if (unlikely(skb->len < len))
+		return -EINVAL;
+	if (unlikely(!pskb_may_pull(skb, len)))
+		return -ENOMEM;
+
+	nexthdr = ipv6_find_hdr(skb, &payload_ofs, -1, &frag_off, &flags);
+	if (unlikely(nexthdr < 0))
+		return -EPROTO;
+
+	*frag = flags & IP6_FH_F_FRAG;
+	return 0;
+}
+
+static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
+				   u8 family, u16 zone)
+{
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+	int err = 0;
+	bool frag;
+
+	/* Previously seen (loopback)? Ignore. */
+	ct = nf_ct_get(skb, &ctinfo);
+	if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED)
+		return 0;
+
+	if (family == NFPROTO_IPV4)
+		err = tcf_ct_ipv4_is_fragment(skb, &frag);
+	else
+		err = tcf_ct_ipv6_is_fragment(skb, &frag);
+	if (err || !frag)
+		return err;
+
+	skb_get(skb);
+
+	if (family == NFPROTO_IPV4) {
+		enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
+
+		memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+		local_bh_disable();
+		err = ip_defrag(net, skb, user);
+		local_bh_enable();
+		if (err && err != -EINPROGRESS)
+			goto out_free;
+	} else { /* NFPROTO_IPV6 */
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+		enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
+
+		memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
+		err = nf_ct_frag6_gather(net, skb, user);
+		if (err && err != -EINPROGRESS)
+			goto out_free;
+#else
+		err = -EOPNOTSUPP;
+		goto out_free;
+#endif
+	}
+
+	skb_clear_hash(skb);
+	skb->ignore_df = 1;
+	return err;
+
+out_free:
+	kfree_skb(skb);
+	return err;
+}
+
+static void tcf_ct_params_free(struct rcu_head *head)
+{
+	struct tcf_ct_params *params = container_of(head,
+						    struct tcf_ct_params, rcu);
+
+	if (params->tmpl)
+		nf_conntrack_put(&params->tmpl->ct_general);
+	kfree(params);
+}
+
+#if IS_ENABLED(CONFIG_NF_NAT)
+/* Modelled after nf_nat_ipv[46]_fn().
+ * range is only used for new, uninitialized NAT state.
+ * Returns either NF_ACCEPT or NF_DROP.
+ */
+static int ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
+			  enum ip_conntrack_info ctinfo,
+			  const struct nf_nat_range2 *range,
+			  enum nf_nat_manip_type maniptype)
+{
+	int hooknum, err = NF_ACCEPT;
+
+	/* See HOOK2MANIP(). */
+	if (maniptype == NF_NAT_MANIP_SRC)
+		hooknum = NF_INET_LOCAL_IN; /* Source NAT */
+	else
+		hooknum = NF_INET_LOCAL_OUT; /* Destination NAT */
+
+	switch (ctinfo) {
+	case IP_CT_RELATED:
+	case IP_CT_RELATED_REPLY:
+		if (skb->protocol == htons(ETH_P_IP) &&
+		    ip_hdr(skb)->protocol == IPPROTO_ICMP) {
+			if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
+							   hooknum))
+				err = NF_DROP;
+			goto out;
+		} else if (IS_ENABLED(CONFIG_IPV6) &&
+			   skb->protocol == htons(ETH_P_IPV6)) {
+			__be16 frag_off;
+			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+			int hdrlen = ipv6_skip_exthdr(skb,
+						      sizeof(struct ipv6hdr),
+						      &nexthdr, &frag_off);
+
+			if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
+				if (!nf_nat_icmpv6_reply_translation(skb, ct,
+								     ctinfo,
+								     hooknum,
+								     hdrlen))
+					err = NF_DROP;
+				goto out;
+			}
+		}
+		/* Non-ICMP, fall thru to initialize if needed. */
+		/* fall through */
+	case IP_CT_NEW:
+		/* Seen it before?  This can happen for loopback, retrans,
+		 * or local packets.
+		 */
+		if (!nf_nat_initialized(ct, maniptype)) {
+			/* Initialize according to the NAT action. */
+			err = (range && range->flags & NF_NAT_RANGE_MAP_IPS)
+				/* Action is set up to establish a new
+				 * mapping.
+				 */
+				? nf_nat_setup_info(ct, range, maniptype)
+				: nf_nat_alloc_null_binding(ct, hooknum);
+			if (err != NF_ACCEPT)
+				goto out;
+		}
+		break;
+
+	case IP_CT_ESTABLISHED:
+	case IP_CT_ESTABLISHED_REPLY:
+		break;
+
+	default:
+		err = NF_DROP;
+		goto out;
+	}
+
+	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
+	return err;
+}
+#endif /* CONFIG_NF_NAT */
+
+static void tcf_ct_act_set_mark(struct nf_conn *ct, u32 mark, u32 mask)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
+	u32 new_mark;
+
+	if (!mask)
+		return;
+
+	new_mark = mark | (ct->mark & ~(mask));
+	if (ct->mark != new_mark) {
+		ct->mark = new_mark;
+		if (nf_ct_is_confirmed(ct))
+			nf_conntrack_event_cache(IPCT_MARK, ct);
+	}
+#endif
+}
+
+static void tcf_ct_act_set_labels(struct nf_conn *ct,
+				  u32 *labels,
+				  u32 *labels_m)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS)
+	size_t labels_sz = FIELD_SIZEOF(struct tcf_ct_params, labels);
+
+	if (!memchr_inv(labels_m, 0, labels_sz))
+		return;
+
+	nf_connlabels_replace(ct, labels, labels_m, 4);
+#endif
+}
+
+static int tcf_ct_act_nat(struct sk_buff *skb,
+			  struct nf_conn *ct,
+			  enum ip_conntrack_info ctinfo,
+			  int ct_action,
+			  struct nf_nat_range2 *range,
+			  bool commit)
+{
+#if IS_ENABLED(CONFIG_NF_NAT)
+	enum nf_nat_manip_type maniptype;
+
+	if (!(ct_action & TCA_CT_ACT_NAT))
+		return NF_ACCEPT;
+
+	/* Add NAT extension if not confirmed yet. */
+	if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct))
+		return NF_DROP;   /* Can't NAT. */
+
+	if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) &&
+	    (ctinfo != IP_CT_RELATED || commit)) {
+		/* NAT an established or related connection like before. */
+		if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
+			/* This is the REPLY direction for a connection
+			 * for which NAT was applied in the forward
+			 * direction.  Do the reverse NAT.
+			 */
+			maniptype = ct->status & IPS_SRC_NAT
+				? NF_NAT_MANIP_DST : NF_NAT_MANIP_SRC;
+		else
+			maniptype = ct->status & IPS_SRC_NAT
+				? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST;
+	} else if (ct_action & TCA_CT_ACT_NAT_SRC) {
+		maniptype = NF_NAT_MANIP_SRC;
+	} else if (ct_action & TCA_CT_ACT_NAT_DST) {
+		maniptype = NF_NAT_MANIP_DST;
+	} else {
+		return NF_ACCEPT;
+	}
+
+	return ct_nat_execute(skb, ct, ctinfo, range, maniptype);
+#else
+	return NF_ACCEPT;
+#endif
+}
+
+static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
+		      struct tcf_result *res)
+{
+	struct net *net = dev_net(skb->dev);
+	bool cached, commit, clear, force;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_ct *c = to_ct(a);
+	struct nf_conn *tmpl = NULL;
+	struct nf_hook_state state;
+	int nh_ofs, err, retval;
+	struct tcf_ct_params *p;
+	struct nf_conn *ct;
+	u8 family;
+
+	p = rcu_dereference_bh(c->params);
+
+	retval = READ_ONCE(c->tcf_action);
+	commit = p->ct_action & TCA_CT_ACT_COMMIT;
+	clear = p->ct_action & TCA_CT_ACT_CLEAR;
+	force = p->ct_action & TCA_CT_ACT_FORCE;
+	tmpl = p->tmpl;
+
+	if (clear) {
+		ct = nf_ct_get(skb, &ctinfo);
+		if (ct) {
+			nf_conntrack_put(&ct->ct_general);
+			nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
+		}
+
+		goto out;
+	}
+
+	family = tcf_ct_skb_nf_family(skb);
+	if (family == NFPROTO_UNSPEC)
+		goto drop;
+
+	/* The conntrack module expects to be working at L3.
+	 * We also try to pull the IPv4/6 header to linear area
+	 */
+	nh_ofs = skb_network_offset(skb);
+	skb_pull_rcsum(skb, nh_ofs);
+	err = tcf_ct_handle_fragments(net, skb, family, p->zone);
+	if (err == -EINPROGRESS) {
+		retval = TC_ACT_STOLEN;
+		goto out;
+	}
+	if (err)
+		goto drop;
+
+	err = tcf_ct_skb_network_trim(skb, family);
+	if (err)
+		goto drop;
+
+	/* If we are recirculating packets to match on ct fields and
+	 * committing with a separate ct action, then we don't need to
+	 * actually run the packet through conntrack twice unless it's for a
+	 * different zone.
+	 */
+	cached = tcf_ct_skb_nfct_cached(net, skb, p->zone, force);
+	if (!cached) {
+		/* Associate skb with specified zone. */
+		if (tmpl) {
+			ct = nf_ct_get(skb, &ctinfo);
+			if (skb_nfct(skb))
+				nf_conntrack_put(skb_nfct(skb));
+			nf_conntrack_get(&tmpl->ct_general);
+			nf_ct_set(skb, tmpl, IP_CT_NEW);
+		}
+
+		state.hook = NF_INET_PRE_ROUTING;
+		state.net = net;
+		state.pf = family;
+		err = nf_conntrack_in(skb, &state);
+		if (err != NF_ACCEPT)
+			goto out_push;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct)
+		goto out_push;
+	nf_ct_deliver_cached_events(ct);
+
+	err = tcf_ct_act_nat(skb, ct, ctinfo, p->ct_action, &p->range, commit);
+	if (err != NF_ACCEPT)
+		goto drop;
+
+	if (commit) {
+		tcf_ct_act_set_mark(ct, p->mark, p->mark_mask);
+		tcf_ct_act_set_labels(ct, p->labels, p->labels_mask);
+
+		/* This will take care of sending queued events
+		 * even if the connection is already confirmed.
+		 */
+		nf_conntrack_confirm(skb);
+	}
+
+out_push:
+	skb_push_rcsum(skb, nh_ofs);
+
+out:
+	bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), skb);
+	return retval;
+
+drop:
+	qstats_drop_inc(this_cpu_ptr(a->cpu_qstats));
+	return TC_ACT_SHOT;
+}
+
+static const struct nla_policy ct_policy[TCA_CT_MAX + 1] = {
+	[TCA_CT_UNSPEC] = { .strict_start_type = TCA_CT_UNSPEC + 1 },
+	[TCA_CT_ACTION] = { .type = NLA_U16 },
+	[TCA_CT_PARMS] = { .type = NLA_EXACT_LEN, .len = sizeof(struct tc_ct) },
+	[TCA_CT_ZONE] = { .type = NLA_U16 },
+	[TCA_CT_MARK] = { .type = NLA_U32 },
+	[TCA_CT_MARK_MASK] = { .type = NLA_U32 },
+	[TCA_CT_LABELS] = { .type = NLA_BINARY,
+			    .len = 128 / BITS_PER_BYTE },
+	[TCA_CT_LABELS_MASK] = { .type = NLA_BINARY,
+				 .len = 128 / BITS_PER_BYTE },
+	[TCA_CT_NAT_IPV4_MIN] = { .type = NLA_U32 },
+	[TCA_CT_NAT_IPV4_MAX] = { .type = NLA_U32 },
+	[TCA_CT_NAT_IPV6_MIN] = { .type = NLA_EXACT_LEN,
+				  .len = sizeof(struct in6_addr) },
+	[TCA_CT_NAT_IPV6_MAX] = { .type = NLA_EXACT_LEN,
+				   .len = sizeof(struct in6_addr) },
+	[TCA_CT_NAT_PORT_MIN] = { .type = NLA_U16 },
+	[TCA_CT_NAT_PORT_MAX] = { .type = NLA_U16 },
+};
+
+static int tcf_ct_fill_params_nat(struct tcf_ct_params *p,
+				  struct tc_ct *parm,
+				  struct nlattr **tb,
+				  struct netlink_ext_ack *extack)
+{
+	struct nf_nat_range2 *range;
+
+	if (!(p->ct_action & TCA_CT_ACT_NAT))
+		return 0;
+
+	if (!IS_ENABLED(CONFIG_NF_NAT)) {
+		NL_SET_ERR_MSG_MOD(extack, "Netfilter nat isn't enabled in kernel");
+		return -EOPNOTSUPP;
+	}
+
+	if (!(p->ct_action & (TCA_CT_ACT_NAT_SRC | TCA_CT_ACT_NAT_DST)))
+		return 0;
+
+	if ((p->ct_action & TCA_CT_ACT_NAT_SRC) &&
+	    (p->ct_action & TCA_CT_ACT_NAT_DST)) {
+		NL_SET_ERR_MSG_MOD(extack, "dnat and snat can't be enabled at the same time");
+		return -EOPNOTSUPP;
+	}
+
+	range = &p->range;
+	if (tb[TCA_CT_NAT_IPV4_MIN]) {
+		struct nlattr *max_attr = tb[TCA_CT_NAT_IPV4_MAX];
+
+		p->ipv4_range = true;
+		range->flags |= NF_NAT_RANGE_MAP_IPS;
+		range->min_addr.ip =
+			nla_get_in_addr(tb[TCA_CT_NAT_IPV4_MIN]);
+
+		range->max_addr.ip = max_attr ?
+				     nla_get_in_addr(max_attr) :
+				     range->min_addr.ip;
+	} else if (tb[TCA_CT_NAT_IPV6_MIN]) {
+		struct nlattr *max_attr = tb[TCA_CT_NAT_IPV6_MAX];
+
+		p->ipv4_range = false;
+		range->flags |= NF_NAT_RANGE_MAP_IPS;
+		range->min_addr.in6 =
+			nla_get_in6_addr(tb[TCA_CT_NAT_IPV6_MIN]);
+
+		range->max_addr.in6 = max_attr ?
+				      nla_get_in6_addr(max_attr) :
+				      range->min_addr.in6;
+	}
+
+	if (tb[TCA_CT_NAT_PORT_MIN]) {
+		range->flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
+		range->min_proto.all = nla_get_be16(tb[TCA_CT_NAT_PORT_MIN]);
+
+		range->max_proto.all = tb[TCA_CT_NAT_PORT_MAX] ?
+				       nla_get_be16(tb[TCA_CT_NAT_PORT_MAX]) :
+				       range->min_proto.all;
+	}
+
+	return 0;
+}
+
+static void tcf_ct_set_key_val(struct nlattr **tb,
+			       void *val, int val_type,
+			       void *mask, int mask_type,
+			       int len)
+{
+	if (!tb[val_type])
+		return;
+	nla_memcpy(val, tb[val_type], len);
+
+	if (!mask)
+		return;
+
+	if (mask_type == TCA_CT_UNSPEC || !tb[mask_type])
+		memset(mask, 0xff, len);
+	else
+		nla_memcpy(mask, tb[mask_type], len);
+}
+
+static int tcf_ct_fill_params(struct net *net,
+			      struct tcf_ct_params *p,
+			      struct tc_ct *parm,
+			      struct nlattr **tb,
+			      struct netlink_ext_ack *extack)
+{
+	struct tc_ct_action_net *tn = net_generic(net, ct_net_id);
+	struct nf_conntrack_zone zone;
+	struct nf_conn *tmpl;
+	int err;
+
+	p->zone = NF_CT_DEFAULT_ZONE_ID;
+
+	tcf_ct_set_key_val(tb,
+			   &p->ct_action, TCA_CT_ACTION,
+			   NULL, TCA_CT_UNSPEC,
+			   sizeof(p->ct_action));
+
+	if (p->ct_action & TCA_CT_ACT_CLEAR)
+		return 0;
+
+	err = tcf_ct_fill_params_nat(p, parm, tb, extack);
+	if (err)
+		return err;
+
+	if (tb[TCA_CT_MARK]) {
+		if (!IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)) {
+			NL_SET_ERR_MSG_MOD(extack, "Conntrack mark isn't enabled.");
+			return -EOPNOTSUPP;
+		}
+		tcf_ct_set_key_val(tb,
+				   &p->mark, TCA_CT_MARK,
+				   &p->mark_mask, TCA_CT_MARK_MASK,
+				   sizeof(p->mark));
+	}
+
+	if (tb[TCA_CT_LABELS]) {
+		if (!IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS)) {
+			NL_SET_ERR_MSG_MOD(extack, "Conntrack labels isn't enabled.");
+			return -EOPNOTSUPP;
+		}
+
+		if (!tn->labels) {
+			NL_SET_ERR_MSG_MOD(extack, "Failed to set connlabel length");
+			return -EOPNOTSUPP;
+		}
+		tcf_ct_set_key_val(tb,
+				   p->labels, TCA_CT_LABELS,
+				   p->labels_mask, TCA_CT_LABELS_MASK,
+				   sizeof(p->labels));
+	}
+
+	if (tb[TCA_CT_ZONE]) {
+		if (!IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)) {
+			NL_SET_ERR_MSG_MOD(extack, "Conntrack zones isn't enabled.");
+			return -EOPNOTSUPP;
+		}
+
+		tcf_ct_set_key_val(tb,
+				   &p->zone, TCA_CT_ZONE,
+				   NULL, TCA_CT_UNSPEC,
+				   sizeof(p->zone));
+	}
+
+	if (p->zone == NF_CT_DEFAULT_ZONE_ID)
+		return 0;
+
+	nf_ct_zone_init(&zone, p->zone, NF_CT_DEFAULT_ZONE_DIR, 0);
+	tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL);
+	if (!tmpl) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to allocate conntrack template");
+		return -ENOMEM;
+	}
+	__set_bit(IPS_CONFIRMED_BIT, &tmpl->status);
+	nf_conntrack_get(&tmpl->ct_general);
+	p->tmpl = tmpl;
+
+	return 0;
+}
+
+static int tcf_ct_init(struct net *net, struct nlattr *nla,
+		       struct nlattr *est, struct tc_action **a,
+		       int replace, int bind, bool rtnl_held,
+		       struct tcf_proto *tp,
+		       struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ct_net_id);
+	struct tcf_ct_params *params = NULL;
+	struct nlattr *tb[TCA_CT_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tc_ct *parm;
+	struct tcf_ct *c;
+	int err, res = 0;
+
+	if (!nla) {
+		NL_SET_ERR_MSG_MOD(extack, "Ct requires attributes to be passed");
+		return -EINVAL;
+	}
+
+	err = nla_parse_nested(tb, TCA_CT_MAX, nla, ct_policy, extack);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CT_PARMS]) {
+		NL_SET_ERR_MSG_MOD(extack, "Missing required ct parameters");
+		return -EINVAL;
+	}
+	parm = nla_data(tb[TCA_CT_PARMS]);
+
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+
+	if (!err) {
+		err = tcf_idr_create(tn, parm->index, est, a,
+				     &act_ct_ops, bind, true);
+		if (err) {
+			tcf_idr_cleanup(tn, parm->index);
+			return err;
+		}
+		res = ACT_P_CREATED;
+	} else {
+		if (bind)
+			return 0;
+
+		if (!replace) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+	}
+	err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto cleanup;
+
+	c = to_ct(*a);
+
+	params = kzalloc(sizeof(*params), GFP_KERNEL);
+	if (unlikely(!params)) {
+		err = -ENOMEM;
+		goto cleanup;
+	}
+
+	err = tcf_ct_fill_params(net, params, parm, tb, extack);
+	if (err)
+		goto cleanup;
+
+	spin_lock_bh(&c->tcf_lock);
+	goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+	rcu_swap_protected(c->params, params, lockdep_is_held(&c->tcf_lock));
+	spin_unlock_bh(&c->tcf_lock);
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	if (params)
+		kfree_rcu(params, rcu);
+	if (res == ACT_P_CREATED)
+		tcf_idr_insert(tn, *a);
+
+	return res;
+
+cleanup:
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	kfree(params);
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static void tcf_ct_cleanup(struct tc_action *a)
+{
+	struct tcf_ct_params *params;
+	struct tcf_ct *c = to_ct(a);
+
+	params = rcu_dereference_protected(c->params, 1);
+	if (params)
+		call_rcu(&params->rcu, tcf_ct_params_free);
+}
+
+static int tcf_ct_dump_key_val(struct sk_buff *skb,
+			       void *val, int val_type,
+			       void *mask, int mask_type,
+			       int len)
+{
+	int err;
+
+	if (mask && !memchr_inv(mask, 0, len))
+		return 0;
+
+	err = nla_put(skb, val_type, len, val);
+	if (err)
+		return err;
+
+	if (mask_type != TCA_CT_UNSPEC) {
+		err = nla_put(skb, mask_type, len, mask);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int tcf_ct_dump_nat(struct sk_buff *skb, struct tcf_ct_params *p)
+{
+	struct nf_nat_range2 *range = &p->range;
+
+	if (!(p->ct_action & TCA_CT_ACT_NAT))
+		return 0;
+
+	if (!(p->ct_action & (TCA_CT_ACT_NAT_SRC | TCA_CT_ACT_NAT_DST)))
+		return 0;
+
+	if (range->flags & NF_NAT_RANGE_MAP_IPS) {
+		if (p->ipv4_range) {
+			if (nla_put_in_addr(skb, TCA_CT_NAT_IPV4_MIN,
+					    range->min_addr.ip))
+				return -1;
+			if (nla_put_in_addr(skb, TCA_CT_NAT_IPV4_MAX,
+					    range->max_addr.ip))
+				return -1;
+		} else {
+			if (nla_put_in6_addr(skb, TCA_CT_NAT_IPV6_MIN,
+					     &range->min_addr.in6))
+				return -1;
+			if (nla_put_in6_addr(skb, TCA_CT_NAT_IPV6_MAX,
+					     &range->max_addr.in6))
+				return -1;
+		}
+	}
+
+	if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) {
+		if (nla_put_be16(skb, TCA_CT_NAT_PORT_MIN,
+				 range->min_proto.all))
+			return -1;
+		if (nla_put_be16(skb, TCA_CT_NAT_PORT_MAX,
+				 range->max_proto.all))
+			return -1;
+	}
+
+	return 0;
+}
+
+static inline int tcf_ct_dump(struct sk_buff *skb, struct tc_action *a,
+			      int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ct *c = to_ct(a);
+	struct tcf_ct_params *p;
+
+	struct tc_ct opt = {
+		.index   = c->tcf_index,
+		.refcnt  = refcount_read(&c->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&c->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+
+	spin_lock_bh(&c->tcf_lock);
+	p = rcu_dereference_protected(c->params,
+				      lockdep_is_held(&c->tcf_lock));
+	opt.action = c->tcf_action;
+
+	if (tcf_ct_dump_key_val(skb,
+				&p->ct_action, TCA_CT_ACTION,
+				NULL, TCA_CT_UNSPEC,
+				sizeof(p->ct_action)))
+		goto nla_put_failure;
+
+	if (p->ct_action & TCA_CT_ACT_CLEAR)
+		goto skip_dump;
+
+	if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
+	    tcf_ct_dump_key_val(skb,
+				&p->mark, TCA_CT_MARK,
+				&p->mark_mask, TCA_CT_MARK_MASK,
+				sizeof(p->mark)))
+		goto nla_put_failure;
+
+	if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) &&
+	    tcf_ct_dump_key_val(skb,
+				p->labels, TCA_CT_LABELS,
+				p->labels_mask, TCA_CT_LABELS_MASK,
+				sizeof(p->labels)))
+		goto nla_put_failure;
+
+	if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
+	    tcf_ct_dump_key_val(skb,
+				&p->zone, TCA_CT_ZONE,
+				NULL, TCA_CT_UNSPEC,
+				sizeof(p->zone)))
+		goto nla_put_failure;
+
+	if (tcf_ct_dump_nat(skb, p))
+		goto nla_put_failure;
+
+skip_dump:
+	if (nla_put(skb, TCA_CT_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	tcf_tm_dump(&t, &c->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CT_TM, sizeof(t), &t, TCA_CT_PAD))
+		goto nla_put_failure;
+	spin_unlock_bh(&c->tcf_lock);
+
+	return skb->len;
+nla_put_failure:
+	spin_unlock_bh(&c->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_ct_walker(struct net *net, struct sk_buff *skb,
+			 struct netlink_callback *cb, int type,
+			 const struct tc_action_ops *ops,
+			 struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ct_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_ct_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ct_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static void tcf_stats_update(struct tc_action *a, u64 bytes, u32 packets,
+			     u64 lastuse, bool hw)
+{
+	struct tcf_ct *c = to_ct(a);
+
+	_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
+
+	if (hw)
+		_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats_hw),
+				   bytes, packets);
+	c->tcf_tm.lastuse = max_t(u64, c->tcf_tm.lastuse, lastuse);
+}
+
+static struct tc_action_ops act_ct_ops = {
+	.kind		=	"ct",
+	.id		=	TCA_ID_CT,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ct_act,
+	.dump		=	tcf_ct_dump,
+	.init		=	tcf_ct_init,
+	.cleanup	=	tcf_ct_cleanup,
+	.walk		=	tcf_ct_walker,
+	.lookup		=	tcf_ct_search,
+	.stats_update	=	tcf_stats_update,
+	.size		=	sizeof(struct tcf_ct),
+};
+
+static __net_init int ct_init_net(struct net *net)
+{
+	unsigned int n_bits = FIELD_SIZEOF(struct tcf_ct_params, labels) * 8;
+	struct tc_ct_action_net *tn = net_generic(net, ct_net_id);
+
+	if (nf_connlabels_get(net, n_bits - 1)) {
+		tn->labels = false;
+		pr_err("act_ct: Failed to set connlabels length");
+	} else {
+		tn->labels = true;
+	}
+
+	return tc_action_net_init(&tn->tn, &act_ct_ops);
+}
+
+static void __net_exit ct_exit_net(struct list_head *net_list)
+{
+	struct net *net;
+
+	rtnl_lock();
+	list_for_each_entry(net, net_list, exit_list) {
+		struct tc_ct_action_net *tn = net_generic(net, ct_net_id);
+
+		if (tn->labels)
+			nf_connlabels_put(net);
+	}
+	rtnl_unlock();
+
+	tc_action_net_exit(net_list, ct_net_id);
+}
+
+static struct pernet_operations ct_net_ops = {
+	.init = ct_init_net,
+	.exit_batch = ct_exit_net,
+	.id   = &ct_net_id,
+	.size = sizeof(struct tc_ct_action_net),
+};
+
+static int __init ct_init_module(void)
+{
+	return tcf_register_action(&act_ct_ops, &ct_net_ops);
+}
+
+static void __exit ct_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ct_ops, &ct_net_ops);
+}
+
+module_init(ct_init_module);
+module_exit(ct_cleanup_module);
+MODULE_AUTHOR("Paul Blakey <paulb@mellanox.com>");
+MODULE_AUTHOR("Yossi Kuperman <yossiku@mellanox.com>");
+MODULE_AUTHOR("Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>");
+MODULE_DESCRIPTION("Connection tracking action");
+MODULE_LICENSE("GPL v2");
+
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index ad36bbc..4a7331c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -35,6 +35,7 @@
 #include <net/tc_act/tc_police.h>
 #include <net/tc_act/tc_sample.h>
 #include <net/tc_act/tc_skbedit.h>
+#include <net/tc_act/tc_ct.h>
 
 extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
 
@@ -3266,6 +3267,10 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			entry->police.burst = tcf_police_tcfp_burst(act);
 			entry->police.rate_bytes_ps =
 				tcf_police_rate_bytes_ps(act);
+		} else if (is_tcf_ct(act)) {
+			entry->id = FLOW_ACTION_CT;
+			entry->ct.action = tcf_ct_action(act);
+			entry->ct.zone = tcf_ct_zone(act);
 		} else {
 			goto err_out;
 		}
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next v6 4/4] tc-tests: Add tc action ct tests
From: Paul Blakey @ 2019-07-09  7:30 UTC (permalink / raw)
  To: Jiri Pirko, Paul Blakey, Roi Dayan, Yossi Kuperman, Oz Shlomo,
	Marcelo Ricardo Leitner, netdev, David Miller, Aaron Conole,
	Zhike Wang
  Cc: Rony Efraim, nst-kernel, John Hurley, Simon Horman, Justin Pettit,
	Marcelo Ricardo Leitner
In-Reply-To: <1562657451-20819-1-git-send-email-paulb@mellanox.com>

Add 13 tests ensuring the command line is doing what is supposed to do.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 .../selftests/tc-testing/tc-tests/actions/ct.json  | 314 +++++++++++++++++++++
 1 file changed, 314 insertions(+)
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/ct.json

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json b/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json
new file mode 100644
index 0000000..62b82fe
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json
@@ -0,0 +1,314 @@
+[
+    {
+        "id": "696a",
+        "name": "Add simple ct action",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct index 42",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct zone 0 pipe.*index 42 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "9f20",
+        "name": "Add ct clear action",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct clear index 42",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct clear pipe.*index 42 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "5bea",
+        "name": "Try ct with zone",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct zone 404 index 42",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct zone 404 pipe.*index 42 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "d5d6",
+        "name": "Try ct with zone, commit",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct zone 404 commit index 42",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct commit zone 404 pipe.*index 42 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "029f",
+        "name": "Try ct with zone, commit, mark",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct zone 404 commit mark 0x42 index 42",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct commit mark 66 zone 404 pipe.*index 42 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "a58d",
+        "name": "Try ct with zone, commit, mark, nat",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct zone 404 commit mark 0x42 nat src addr 5.5.5.7 index 42",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct commit mark 66 zone 404 nat src addr 5.5.5.7 pipe.*index 42 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "901b",
+        "name": "Try ct with full nat ipv4 range syntax",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct commit nat src addr 5.5.5.7-5.5.6.0 port 1000-2000 index 44",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct commit zone 0 nat src addr 5.5.5.7-5.5.6.0 port 1000-2000 pipe.*index 44 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "072b",
+        "name": "Try ct with full nat ipv6 syntax",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct commit nat src addr 2001::1 port 1000-2000 index 44",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct commit zone 0 nat src addr 2001::1 port 1000-2000 pipe.*index 44 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "3420",
+        "name": "Try ct with full nat ipv6 range syntax",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct commit nat src addr 2001::1-2001::10 port 1000-2000 index 44",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct commit zone 0 nat src addr 2001::1-2001::10 port 1000-2000 pipe.*index 44 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "4470",
+        "name": "Try ct with full nat ipv6 range syntax + force",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct commit force nat src addr 2001::1-2001::10 port 1000-2000 index 44",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct commit force zone 0 nat src addr 2001::1-2001::10 port 1000-2000 pipe.*index 44 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "5d88",
+        "name": "Try ct with label",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct label 123123 index 44",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct zone 0 label 12312300000000000000000000000000 pipe.*index 44 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "04d4",
+        "name": "Try ct with label with mask",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct label 12312300000000000000000000000001/ffffffff000000000000000000000001 index 44",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct zone 0 label 12312300000000000000000000000001/ffffffff000000000000000000000001 pipe.*index 44 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    },
+    {
+        "id": "9751",
+        "name": "Try ct with mark + mask",
+        "category": [
+            "actions",
+            "ct"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action ct",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action ct mark 0x42/0xf0 index 42",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action ct",
+        "matchPattern": "action order [0-9]*: ct mark 66/0xf0 zone 0 pipe.*index 42 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action ct"
+        ]
+    }
+]
-- 
1.8.3.1


^ permalink raw reply related

* RE: [PATCH net-next v3 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-09  7:38 UTC (permalink / raw)
  To: Ilias Apalodimas, David Miller
  Cc: Jose.Abreu@synopsys.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Joao.Pinto@synopsys.com,
	peppe.cavallaro@st.com, alexandre.torgue@st.com,
	brouer@redhat.com, arnd@arndb.de
In-Reply-To: <20190709072356.GA4599@apalos>

From: Ilias Apalodimas <ilias.apalodimas@linaro.org> | Date: Tue, Jul 09, 2019 at 08:23:56

> The patch from Ivan did get merged, can you change the free call to
> page_pool_destroy and re-spin? You can add my acked-by

Yes, I will re-spin then. Thanks!

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox