Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC PATCH net-next 0/3] net: batched receive in GRO path
From: Edward Cree @ 2019-07-09 19:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet

This series listifies part of GRO processing, in a manner which allows those
 packets which are not GROed (i.e. for which dev_gro_receive returns
 GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
 callback, since GRO's need to hold packets on lists under napi->gro_hash
 makes keeping the packets on other lists awkward, and since the GRO control
 block state of held skbs can refer only to one 'new' skb at a time.
Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb
 onto a list in the napi struct, which is received at the end of the napi
 poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch.
Unlike my previous design ([1]), this does not require changes in drivers,
 and also does not prevent the re-use of napi->skb after GRO_MERGED_FREE or
 GRO_DROP.

Performance figures with this series, collected on a back-to-back pair of
 Solarflare sfn8522-r2 NICs with 120-second NetPerf tests.  In the stats,
 sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
 uncoalesceable packets (e.g. due to IP or TCP options).  The receive side
 (which was the device under test) had the NetPerf process pinned to one CPU,
 and the device interrupts pinned to a second CPU.  CPU utilisation figures
 (used in cases of line-rate performance) are summed across all CPUs.
Where not specified (as batch=), net.core.gro_normal_batch was set to 8.
The net-next baseline used for these tests was commit 7d30a7f6424e.
TCP 4 streams, GRO on: all results line rate (9.415Gbps)
net-next: 210.3% cpu
after #1: 181.5% cpu (-13.7%, p=0.031 vs net-next)
after #3: 191.7% cpu (- 8.9%, p=0.102 vs net-next)
TCP 4 streams, GRO off:
after #1: 7.785 Gbps
after #3: 8.387 Gbps (+ 7.7%, p=0.215 vs #1, but note *)
TCP 1 stream, GRO on: all results line rate & ~200% cpu.
TCP 1 stream, GRO off:
after #1: 6.444 Gbps
after #3: 7.363 Gbps (+14.3%, p=0.003 vs #1)
batch=16: 7.199 Gbps
batch= 4: 7.354 Gbps
batch= 0: 5.899 Gbps
TCP 100 RR, GRO off:
net-next: 995.083 us
after #1: 969.167 us (- 2.6%, p=0.204 vs net-next)
after #3: 976.433 us (- 1.9%, p=0.254 vs net-next)

(*) These tests produced a mixture of line-rate and below-line-rate results,
 meaning that statistically speaking the results were 'censored' by the
 upper bound, and were thus not normally distributed, making a Welch t-test
 mathematically invalid.  I therefore also calculated estimators according
 to [2], which gave the following:
after #1: 8.155 Gbps
after #3: 8.716 Gbps (+ 6.9%, p=0.291 vs #1)
(though my procedure for determining ν wasn't mathematically well-founded
 either, so take that p-value with a grain of salt).

Conclusion:
* Patch #1 is a fairly unambiguous improvement.
* Patch #3 has no statistically significant effect when GRO is active.
* Any effect of patch #3 on latency is within statistical noise.
* When GRO is inactive, patch #3 improves bandwidth, though for multiple
  streams the effect is smaller (possibly owing to the line-rate limit).
* The optimal batch size for this setup appears to be around 8.
* Setting the batch size to zero gives worse performance than before the
  patch; perhaps a static key is needed?
* Drivers which, unlike sfc, pass UDP traffic to GRO would expect to see a
  benefit from gaining access to batching.

Notes for future thought: in principle if we passed the napi pointer to
 napi_gro_complete(), it could add its superframe skb to napi->rx_list,
 rather than immediately netif_receive_skb_internal()ing it.  Without that
 I'm not sure if there's a possibility of OoO between normal and GROed SKBs
 on the same flow.

[1]: http://patchwork.ozlabs.org/cover/997844/
[2]: Cohen 1959, doi: 10.1080/00401706.1959.10489859

Edward Cree (3):
  sfc: don't score irq moderation points for GRO
  sfc: falcon: don't score irq moderation points for GRO
  net: use listified RX for handling GRO_NORMAL skbs

 drivers/net/ethernet/sfc/falcon/rx.c |  5 +----
 drivers/net/ethernet/sfc/rx.c        |  5 +----
 include/linux/netdevice.h            |  3 +++
 net/core/dev.c                       | 32 ++++++++++++++++++++++++++--
 net/core/sysctl_net_core.c           |  8 +++++++
 5 files changed, 43 insertions(+), 10 deletions(-)

^ permalink raw reply

* [RFC PATCH net-next 1/3] sfc: don't score irq moderation points for GRO
From: Edward Cree @ 2019-07-09 19:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <7920e85c-439e-0622-46f8-0602cf37e306@solarflare.com>

We already scored points when handling the RX event, no-one else does this,
 and looking at the history it appears this was originally meant to only
 score on merges, not on GRO_NORMAL.  Moreover, it gets in the way of
 changing GRO to not immediately pass GRO_NORMAL skbs to the stack.
Performance testing with four TCP streams received on a single CPU (where
 throughput was line rate of 9.4Gbps in all tests) showed a 13.7% reduction
 in RX CPU usage (n=6, p=0.03).

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/rx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index d5db045535d3..85ec07f5a674 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -412,7 +412,6 @@ efx_rx_packet_gro(struct efx_channel *channel, struct efx_rx_buffer *rx_buf,
 		  unsigned int n_frags, u8 *eh)
 {
 	struct napi_struct *napi = &channel->napi_str;
-	gro_result_t gro_result;
 	struct efx_nic *efx = channel->efx;
 	struct sk_buff *skb;
 
@@ -449,9 +448,7 @@ efx_rx_packet_gro(struct efx_channel *channel, struct efx_rx_buffer *rx_buf,
 
 	skb_record_rx_queue(skb, channel->rx_queue.core_index);
 
-	gro_result = napi_gro_frags(napi);
-	if (gro_result != GRO_DROP)
-		channel->irq_mod_score += 2;
+	napi_gro_frags(napi);
 }
 
 /* Allocate and construct an SKB around page fragments */


^ permalink raw reply related

* [RFC PATCH net-next 2/3] sfc: falcon: don't score irq moderation points for GRO
From: Edward Cree @ 2019-07-09 19:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <7920e85c-439e-0622-46f8-0602cf37e306@solarflare.com>

Same rationale as for sfc, except that this wasn't performance-tested.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/falcon/rx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/falcon/rx.c b/drivers/net/ethernet/sfc/falcon/rx.c
index fd850d3d8ec0..05ea3523890a 100644
--- a/drivers/net/ethernet/sfc/falcon/rx.c
+++ b/drivers/net/ethernet/sfc/falcon/rx.c
@@ -424,7 +424,6 @@ ef4_rx_packet_gro(struct ef4_channel *channel, struct ef4_rx_buffer *rx_buf,
 		  unsigned int n_frags, u8 *eh)
 {
 	struct napi_struct *napi = &channel->napi_str;
-	gro_result_t gro_result;
 	struct ef4_nic *efx = channel->efx;
 	struct sk_buff *skb;
 
@@ -460,9 +459,7 @@ ef4_rx_packet_gro(struct ef4_channel *channel, struct ef4_rx_buffer *rx_buf,
 
 	skb_record_rx_queue(skb, channel->rx_queue.core_index);
 
-	gro_result = napi_gro_frags(napi);
-	if (gro_result != GRO_DROP)
-		channel->irq_mod_score += 2;
+	napi_gro_frags(napi);
 }
 
 /* Allocate and construct an SKB around page fragments */


^ permalink raw reply related

* [RFC PATCH net-next 3/3] net: use listified RX for handling GRO_NORMAL skbs
From: Edward Cree @ 2019-07-09 19:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <7920e85c-439e-0622-46f8-0602cf37e306@solarflare.com>

When GRO decides not to coalesce a packet, in napi_frags_finish(), instead
 of passing it to the stack immediately, place it on a list in the napi
 struct.  Then, at flush time (napi_complete_done() or napi_poll()), call
 netif_receive_skb_list_internal() on the list.  We'd like to do that in
 napi_gro_flush(), but it's not called if !napi->gro_bitmask, so we have to
 do it in the callers instead.  (There are a handful of drivers that call
 napi_gro_flush() themselves, but it's not clear why, or whether this will
 affect them.)
Because a full 64 packets is an inefficiently large batch, also consume the
 list whenever it exceeds gro_normal_batch, a new net/core sysctl that
 defaults to 8.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/netdevice.h  |  3 +++
 net/core/dev.c             | 32 ++++++++++++++++++++++++++++++--
 net/core/sysctl_net_core.c |  8 ++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88292953aa6f..55ac223553f8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -332,6 +332,8 @@ struct napi_struct {
 	struct net_device	*dev;
 	struct gro_list		gro_hash[GRO_HASH_BUCKETS];
 	struct sk_buff		*skb;
+	struct list_head	rx_list; /* Pending GRO_NORMAL skbs */
+	int			rx_count; /* length of rx_list */
 	struct hrtimer		timer;
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
@@ -4239,6 +4241,7 @@ extern int		dev_weight_rx_bias;
 extern int		dev_weight_tx_bias;
 extern int		dev_rx_weight;
 extern int		dev_tx_weight;
+extern int		gro_normal_batch;
 
 bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev);
 struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index fc676b2610e3..4b6f2ec67fbc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3963,6 +3963,8 @@ int dev_weight_rx_bias __read_mostly = 1;  /* bias for backlog weight */
 int dev_weight_tx_bias __read_mostly = 1;  /* bias for output_queue quota */
 int dev_rx_weight __read_mostly = 64;
 int dev_tx_weight __read_mostly = 64;
+/* Maximum number of GRO_NORMAL skbs to batch up for list-RX */
+int gro_normal_batch __read_mostly = 8;
 
 /* Called with irq disabled */
 static inline void ____napi_schedule(struct softnet_data *sd,
@@ -5742,6 +5744,26 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi)
 }
 EXPORT_SYMBOL(napi_get_frags);
 
+/* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
+static void gro_normal_list(struct napi_struct *napi)
+{
+	if (!napi->rx_count)
+		return;
+	netif_receive_skb_list_internal(&napi->rx_list);
+	INIT_LIST_HEAD(&napi->rx_list);
+	napi->rx_count = 0;
+}
+
+/* Queue one GRO_NORMAL SKB up for list processing.  If batch size exceeded,
+ * pass the whole batch up to the stack.
+ */
+static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb)
+{
+	list_add_tail(&skb->list, &napi->rx_list);
+	if (++napi->rx_count >= gro_normal_batch)
+		gro_normal_list(napi);
+}
+
 static gro_result_t napi_frags_finish(struct napi_struct *napi,
 				      struct sk_buff *skb,
 				      gro_result_t ret)
@@ -5751,8 +5773,8 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,
 	case GRO_HELD:
 		__skb_push(skb, ETH_HLEN);
 		skb->protocol = eth_type_trans(skb, skb->dev);
-		if (ret == GRO_NORMAL && netif_receive_skb_internal(skb))
-			ret = GRO_DROP;
+		if (ret == GRO_NORMAL)
+			gro_normal_one(napi, skb);
 		break;
 
 	case GRO_DROP:
@@ -6029,6 +6051,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 				 NAPIF_STATE_IN_BUSY_POLL)))
 		return false;
 
+	gro_normal_list(n);
+
 	if (n->gro_bitmask) {
 		unsigned long timeout = 0;
 
@@ -6267,6 +6291,8 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 	napi->timer.function = napi_watchdog;
 	init_gro_hash(napi);
 	napi->skb = NULL;
+	INIT_LIST_HEAD(&napi->rx_list);
+	napi->rx_count = 0;
 	napi->poll = poll;
 	if (weight > NAPI_POLL_WEIGHT)
 		netdev_err_once(dev, "%s() called with weight %d\n", __func__,
@@ -6363,6 +6389,8 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 		goto out_unlock;
 	}
 
+	gro_normal_list(n);
+
 	if (n->gro_bitmask) {
 		/* flush too old packets
 		 * If HZ < 1000, flush all packets.
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index f9204719aeee..dba52f53eace 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -569,6 +569,14 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_do_static_key,
 	},
+	{
+		.procname	= "gro_normal_batch",
+		.data		= &gro_normal_batch,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+	},
 	{ }
 };
 

^ permalink raw reply related

* [RFC PATCH linux-next] net: ethernet: ti: davinci_cpdma: cpdma_chan_split_pool() can be static
From: kbuild test robot @ 2019-07-09 19:32 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: kbuild-all, Grygorii Strashko, Andrew Lunn, Ilias Apalodimas,
	linux-omap, netdev, linux-kernel
In-Reply-To: <201907100329.bQl3UgMB%lkp@intel.com>


Fixes: 962fb618909e ("net: ethernet: ti: davinci_cpdma: allow desc split while down")
Signed-off-by: kbuild test robot <lkp@intel.com>
---
 davinci_cpdma.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 0ca2a1a..1f6065b 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -722,7 +722,7 @@ static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
  * cpdma_chan_split_pool - Splits ctrl pool between all channels.
  * Has to be called under ctlr lock
  */
-int cpdma_chan_split_pool(struct cpdma_ctlr *ctlr)
+static int cpdma_chan_split_pool(struct cpdma_ctlr *ctlr)
 {
 	int tx_per_ch_desc = 0, rx_per_ch_desc = 0;
 	int free_rx_num = 0, free_tx_num = 0;

^ permalink raw reply related

* [linux-next:master 13028/13492] drivers/net/ethernet/ti/davinci_cpdma.c:725:5: sparse: sparse: symbol 'cpdma_chan_split_pool' was not declared. Should it be static?
From: kbuild test robot @ 2019-07-09 19:32 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: kbuild-all, Grygorii Strashko, Andrew Lunn, Ilias Apalodimas,
	linux-omap, netdev, linux-kernel

tree:   https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git master
head:   4608a726c66807c27bc7d91bdf8a288254e29985
commit: 962fb618909ef64e0c89af5b79ba0fed910b78e3 [13028/13492] net: ethernet: ti: davinci_cpdma: allow desc split while down
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
        git checkout 962fb618909ef64e0c89af5b79ba0fed910b78e3
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/ti/davinci_cpdma.c:725:5: sparse: sparse: symbol 'cpdma_chan_split_pool' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* RE: [RESEND PATCH iproute2 net-next] devlink: Introduce PCI PF and VF port flavour and attribute
From: Parav Pandit @ 2019-07-09 19:49 UTC (permalink / raw)
  To: Parav Pandit, netdev@vger.kernel.org
  Cc: Saeed Mahameed, jakub.kicinski@netronome.com, Jiri Pirko
In-Reply-To: <20190701183017.25407-1-parav@mellanox.com>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> Sent: Tuesday, July 2, 2019 12:00 AM
> To: netdev@vger.kernel.org
> Cc: Saeed Mahameed <saeedm@mellanox.com>;
> jakub.kicinski@netronome.com; Jiri Pirko <jiri@mellanox.com>; Parav Pandit
> <parav@mellanox.com>
> Subject: [RESEND PATCH iproute2 net-next] devlink: Introduce PCI PF and VF
> port flavour and attribute
> 
> Introduce PCI PF and VF port flavour and port attributes such as PF number
> and VF number.
> 
> $ devlink port show
> pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
> pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
> pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
> 
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
>  devlink/devlink.c            | 23 +++++++++++++++++++++++
>  include/uapi/linux/devlink.h | 11 +++++++++++
>  2 files changed, 34 insertions(+)
>
I will resend this patch with updated kernel commit id for the uapi, possibly once unrelated patch [1] is merged, just to avoid merge conflict.

[1] https://patchwork.ozlabs.org/patch/1129927/


^ permalink raw reply

* Re: [PATCH] vhost: fix null pointer dereference in vhost_del_umem_range
From: David Miller @ 2019-07-09 19:58 UTC (permalink / raw)
  To: kda; +Cc: mst, jasowang, kvm, netdev
In-Reply-To: <20190709114251.24662-1-dkirjanov@suse.com>

From: Denis Kirjanov <kda@linux-powerpc.org>
Date: Tue,  9 Jul 2019 13:42:51 +0200

> @@ -962,7 +962,8 @@ static void vhost_del_umem_range(struct vhost_umem *umem,
>  
>  	while ((node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
>  							   start, end)))
> -		vhost_umem_free(umem, node);
> +		if (node)
> +			vhost_umem_free(umem, node);

If 'node' is NULL we will not be in the body of the loop as per
the while() condition.

How did you test this?

^ permalink raw reply

* Re: [PATCH] net: netsec: start using buffers if page_pool registration succeeded
From: David Miller @ 2019-07-09 20:00 UTC (permalink / raw)
  To: ilias.apalodimas; +Cc: netdev, jaswinder.singh
In-Reply-To: <1562675753-26160-1-git-send-email-ilias.apalodimas@linaro.org>

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: Tue,  9 Jul 2019 15:35:53 +0300

> The current driver starts using page_pool buffers before calling
> xdp_rxq_info_reg_mem_model(). Start using the buffers after the
> registration succeeded, so we won't have to call
> page_pool_request_shutdown() in case of failure
> 
> Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API")
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH bpf-next v3] virtio_net: add XDP meta data support
From: Daniel Borkmann @ 2019-07-09 20:03 UTC (permalink / raw)
  To: Jason Wang, Yuya Kusakabe
  Cc: ast, davem, hawk, jakub.kicinski, john.fastabend, kafai, mst,
	netdev, songliubraving, yhs
In-Reply-To: <eb955137-11d5-13b2-683a-6a2e8425d792@redhat.com>

On 07/09/2019 05:04 AM, Jason Wang wrote:
> On 2019/7/9 上午6:38, Daniel Borkmann wrote:
>> On 07/02/2019 04:11 PM, Yuya Kusakabe wrote:
>>> On 7/2/19 5:33 PM, Jason Wang wrote:
>>>> On 2019/7/2 下午4:16, Yuya Kusakabe wrote:
>>>>> This adds XDP meta data support to both receive_small() and
>>>>> receive_mergeable().
>>>>>
>>>>> Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")
>>>>> Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
>>>>> ---
>>>>> v3:
>>>>>    - fix preserve the vnet header in receive_small().
>>>>> v2:
>>>>>    - keep copy untouched in page_to_skb().
>>>>>    - preserve the vnet header in receive_small().
>>>>>    - fix indentation.
>>>>> ---
>>>>>    drivers/net/virtio_net.c | 45 +++++++++++++++++++++++++++-------------
>>>>>    1 file changed, 31 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index 4f3de0ac8b0b..03a1ae6fe267 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -371,7 +371,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>>                       struct receive_queue *rq,
>>>>>                       struct page *page, unsigned int offset,
>>>>>                       unsigned int len, unsigned int truesize,
>>>>> -                   bool hdr_valid)
>>>>> +                   bool hdr_valid, unsigned int metasize)
>>>>>    {
>>>>>        struct sk_buff *skb;
>>>>>        struct virtio_net_hdr_mrg_rxbuf *hdr;
>>>>> @@ -393,7 +393,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>>        else
>>>>>            hdr_padded_len = sizeof(struct padded_vnet_hdr);
>>>>>    -    if (hdr_valid)
>>>>> +    if (hdr_valid && !metasize)
>>>>>            memcpy(hdr, p, hdr_len);
>>>>>          len -= hdr_len;
>>>>> @@ -405,6 +405,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>>            copy = skb_tailroom(skb);
>>>>>        skb_put_data(skb, p, copy);
>>>>>    +    if (metasize) {
>>>>> +        __skb_pull(skb, metasize);
>>>>> +        skb_metadata_set(skb, metasize);
>>>>> +    }
>>>>> +
>>>>>        len -= copy;
>>>>>        offset += copy;
>>>>>    @@ -644,6 +649,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>>>        unsigned int delta = 0;
>>>>>        struct page *xdp_page;
>>>>>        int err;
>>>>> +    unsigned int metasize = 0;
>>>>>          len -= vi->hdr_len;
>>>>>        stats->bytes += len;
>>>>> @@ -683,10 +689,13 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>>>              xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
>>>>>            xdp.data = xdp.data_hard_start + xdp_headroom;
>>>>> -        xdp_set_data_meta_invalid(&xdp);
>>>>>            xdp.data_end = xdp.data + len;
>>>>> +        xdp.data_meta = xdp.data;
>>>>>            xdp.rxq = &rq->xdp_rxq;
>>>>>            orig_data = xdp.data;
>>>>> +        /* Copy the vnet header to the front of data_hard_start to avoid
>>>>> +         * overwriting by XDP meta data */
>>>>> +        memcpy(xdp.data_hard_start - vi->hdr_len, xdp.data - vi->hdr_len, vi->hdr_len);
>> I'm not fully sure if I'm following this one correctly, probably just missing
>> something. Isn't the vnet header based on how we set up xdp.data_hard_start
>> earlier already in front of it? Wouldn't we copy invalid data from xdp.data -
>> vi->hdr_len into the vnet header at that point (given there can be up to 256
>> bytes of headroom between the two)? If it's relative to xdp.data and headroom
>> is >0, then BPF prog could otherwise mangle this; something doesn't add up to
>> me here. Could you clarify? Thx
> 
> Vnet headr sits just in front of xdp.data not xdp.data_hard_start. So it could be overwrote by metadata, that's why we need a copy here.

For the current code, you can adjust the xdp.data with a positive/negative offset
already via bpf_xdp_adjust_head() helper. If vnet headr sits just in front of
xdp.data, couldn't this be overridden today as well then? Anyway, just wondering
how this is handled differently?

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v2 0/4] Fix hang of Armada 8040 SoC in orion-mdio
From: David Miller @ 2019-07-09 20:03 UTC (permalink / raw)
  To: josua; +Cc: netdev, josua.mayer
In-Reply-To: <20190709130101.5160-1-josua@solid-run.com>

From: josua@solid-run.com
Date: Tue,  9 Jul 2019 15:00:57 +0200

> From: Josua Mayer <josua.mayer@jm0.eu>
> 
> With a modular kernel as configured by Debian a hang was observed with
> the Armada 8040 SoC in the Clearfog GT and Macchiatobin boards.
> 
> The 8040 SoC actually requires four clocks to be enabled for the mdio
> interface to function. All 4 clocks are already specified in
> armada-cp110.dtsi. It has however been missed that the orion-mdio driver
> only supports enabling up to three clocks.
> 
> This patch-set allows the orion-mdio driver to handle four clocks and
> adds a warning when more clocks are specified to prevent this particular
> oversight in the future.
> 
> Changes since v1:
> - fixed condition for priting the warning (Andrew Lunn)
> - rephrased commit description for deferred probing (Andrew Lunn)
> - fixed compiler warnings (kbuild test robot)

Series applied, thanks Josua.

^ permalink raw reply

* RE: [PATCH] tipc: ensure skb->lock is initialised
From: Jon Maloy @ 2019-07-09 20:15 UTC (permalink / raw)
  To: Eric Dumazet, Chris Packham, ying.xue@windriver.com,
	davem@davemloft.net
  Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <ef9a2ec1-1413-e8f9-1193-d53cf8ee52ba@gmail.com>

> -----Original Message-----
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Sent: 9-Jul-19 09:46
> To: Jon Maloy <jon.maloy@ericsson.com>; Eric Dumazet
> <eric.dumazet@gmail.com>; Chris Packham
> <Chris.Packham@alliedtelesis.co.nz>; ying.xue@windriver.com;
> davem@davemloft.net
> Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] tipc: ensure skb->lock is initialised
> 
> 
> 
> On 7/9/19 3:25 PM, Jon Maloy wrote:

[...]

> > TIPC is using the list lock at message reception within the scope of
> tipc_sk_rcv()/tipc_skb_peek_port(), so it is fundamental that the lock always
> is correctly initialized.
> 
> Where is the lock acquired, why was it only acquired by queue purge and not
> normal dequeues ???

It is acquired twice:
- First, in tipc_sk_rcv()->tipc_skb_peek_port(), to peek into one or more queue members and read their destination port number.
- Second, in tipc_sk_rcv()->tipc_sk_enqueue()->tipc_skb_dequeue() to unlink a list member from the queue.

> >>
> > [...]
> >>
> >> tipc_link_xmit() for example never acquires the spinlock, yet uses
> >> skb_peek() and __skb_dequeue()
> >
> >
> > You should look at tipc_node_xmit instead. Node local messages are
> > sent directly to tipc_sk_rcv(), and never go through tipc_link_xmit()
> 
> tipc_node_xmit() calls tipc_link_xmit() eventually, right ?

No. 
tipc_link_xmit() is called only for messages with a non-local destination.  Otherwise, tipc_node_xmit() sends node local messages directly to the destination socket via tipc_sk_rcv().
The argument 'xmitq' becomes 'inputq' in tipc_sk_rcv() and 'list' in tipc_skb_peek_port(), since those functions don't distinguish between local and node external incoming messages.

> 
> Please show me where the head->lock is acquired, and why it needed.

The argument  'inputq'  to tipc_sk_rcv() may come from two sources: 
1) As an aggregated member of each tipc_node::tipc_link_entry. This queue is needed to guarantee sequential delivery of messages from the node/link layer to the socket layer. In this case, there may be buffers for multiple destination sockets in the same queue, and we may have multiple concurrent tipc_sk_rcv() jobs working that queue. So, the lock is needed both for adding (in  link.c::tipc_data_input()), peeking and removing buffers.

2) The case you have been looking at, where it is created as 'xmitq' on the stack by a local socket.  Here, the lock is not strictly needed, as you have observed. But to reduce code duplication we have chosen to let the code in tipc_sk_rcv() handle both types of queues uniformly, i.e., as if they all contain buffers with potentially multiple destination sockets, worked on by multiple concurrent calls. This requires that the lock is initialized even for this type of queue. We have seen no measurable performance difference between this 'generic' reception algorithm and a tailor-made ditto for local messages only,  while the amount of saved code is significant.

> 
> If this is mandatory, then more fixes are needed than just initializing the lock
> for lockdep purposes.

It is not only for lockdep purposes, -it is essential.  But please provide details about where you see that more fixes are needed.

BR
///jon

^ permalink raw reply

* Re: [PATCH] [net-next] net: dsa: vsc73xx: fix NET_DSA and OF dependencies
From: David Miller @ 2019-07-09 20:20 UTC (permalink / raw)
  To: arnd
  Cc: andrew, vivien.didelot, f.fainelli, paweldembicki, linus.walleij,
	netdev, linux-kernel
In-Reply-To: <20190709185626.3275510-1-arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Tue,  9 Jul 2019 20:55:55 +0200

> The restructuring of the driver got the dependencies wrong: without
> CONFIG_NET_DSA we get this build failure:
> 
> WARNING: unmet direct dependencies detected for NET_DSA_VITESSE_VSC73XX
>   Depends on [n]: NETDEVICES [=y] && HAVE_NET_DSA [=y] && OF [=y] && NET_DSA [=n]
>   Selected by [m]:
>   - NET_DSA_VITESSE_VSC73XX_PLATFORM [=m] && NETDEVICES [=y] && HAVE_NET_DSA [=y] && HAS_IOMEM [=y]
> 
> ERROR: "dsa_unregister_switch" [drivers/net/dsa/vitesse-vsc73xx-core.ko] undefined!
> ERROR: "dsa_switch_alloc" [drivers/net/dsa/vitesse-vsc73xx-core.ko] undefined!
> ERROR: "dsa_register_switch" [drivers/net/dsa/vitesse-vsc73xx-core.ko] undefined!
> 
> Add the appropriate dependencies.
> 
> Fixes: 95711cd5f0b4 ("net: dsa: vsc73xx: Split vsc73xx driver")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Applied, thanks Arnd.

^ permalink raw reply

* Re: WARNING: refcount bug in nr_insert_socket
From: syzbot @ 2019-07-09 20:22 UTC (permalink / raw)
  To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs,
	xiyou.wangcong
In-Reply-To: <0000000000000595ea058d411c35@google.com>

syzbot has bisected this bug to:

commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jun 27 21:30:58 2019 +0000

     netrom: fix a memory leak in nr_rx_frame()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1677f227a00000
start commit:   4608a726 Add linux-next specific files for 20190709
git tree:       linux-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1577f227a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1177f227a00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7a02e36d356a9a17
dashboard link: https://syzkaller.appspot.com/bug?extid=ec1fd464d849d91c3665
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16b47be8600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15172e7ba00000

Reported-by: syzbot+ec1fd464d849d91c3665@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH v5 bpf-next 0/4] capture integers in BTF type info for map defs
From: Edward Cree @ 2019-07-09 20:27 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko, andrii.nakryiko, kernel-team,
	ast, netdev, bpf
In-Reply-To: <86f8f511-655c-bf9e-8d78-f2e3f65efdb9@iogearbox.net>

On 05/07/2019 22:15, Daniel Borkmann wrote:
> On 07/05/2019 05:50 PM, Andrii Nakryiko wrote:
>> This patch set implements an update to how BTF-defined maps are specified. The
>> change is in how integer attributes, e.g., type, max_entries, map_flags, are
>> specified: now they are captured as part of map definition struct's BTF type
>> information (using array dimension), eliminating the need for compile-time
>> data initialization and keeping all the metadata in one place.
>>
>> All existing selftests that were using BTF-defined maps are updated, along
>> with some other selftests, that were switched to new syntax.
BTW is this changing the BTF format spec, and if so why isn't it accompanied by
 a patch to Documentation/bpf/btf.rst?  It looks like that doc still talks about
 BPF_ANNOTATE_KV_PAIR, which seems to be long gone.

-Ed

^ permalink raw reply

* Re: [PATCH iproute2-next 2/3] tc: add mpls actions
From: David Ahern @ 2019-07-09 20:36 UTC (permalink / raw)
  To: Stephen Hemminger, John Hurley
  Cc: netdev, davem, jiri, xiyou.wangcong, willemdebruijn.kernel,
	simon.horman, jakub.kicinski, oss-drivers
In-Reply-To: <20190709100051.65bd159d@hermes.lan>

On 7/9/19 11:00 AM, Stephen Hemminger wrote:
> On Tue,  9 Jul 2019 16:59:31 +0100
> John Hurley <john.hurley@netronome.com> wrote:
> 
>> 	if (!tb[TCA_MPLS_PARMS]) {
>> +		print_string(PRINT_FP, NULL, "%s", "[NULL mpls parameters]");
> 
> This is an error message please just use fprintf(stderr instead
> 

skbedit, nat as 2 examples (and the only 2 I checked) do the print_string.

^ permalink raw reply

* Re: [PATCH iproute2-next 2/3] tc: add mpls actions
From: David Ahern @ 2019-07-09 20:39 UTC (permalink / raw)
  To: John Hurley, netdev
  Cc: davem, jiri, xiyou.wangcong, willemdebruijn.kernel, simon.horman,
	jakub.kicinski, oss-drivers
In-Reply-To: <1562687972-23549-3-git-send-email-john.hurley@netronome.com>


On 7/9/19 9:59 AM, John Hurley wrote:
> +static void explain(void)
> +{
> +	fprintf(stderr,
> +		"Usage: mpls pop [ protocol MPLS_PROTO ]\n"
> +		"       mpls push [ protocol MPLS_PROTO ] [ label MPLS_LABEL ] [ tc MPLS_TC ] [ ttl MPLS_TTL ] [ bos MPLS_BOS ] [CONTROL]\n"

that makes for a very long line to the user. Break at the ttl option and
make a newline:

mpls push [ protocol MPLS_PROTO ] [ label MPLS_LABEL ] [ tc MPLS_TC ]
          [ ttl MPLS_TTL ] [ bos MPLS_BOS ] [CONTROL]


> +		"       mpls modify [ label MPLS_LABEL ] [ tc MPLS_TC ] [ ttl MPLS_TTL ] [CONTROL]\n"
> +		"	for pop MPLS_PROTO is next header of packet - e.g. ip or mpls_uc\n"
> +		"       for push MPLS_PROTO is one of mpls_uc or mpls_mc\n"
> +		"            with default: mpls_uc\n"
> +		"       CONTROL := reclassify | pipe | drop | continue | pass |\n"
> +		"                  goto chain <CHAIN_INDEX>\n");
> +}
> +


...

> +static int print_mpls(struct action_util *au, FILE *f, struct rtattr *arg)
> +{
> +	struct rtattr *tb[TCA_MPLS_MAX + 1];
> +	struct tc_mpls *parm;
> +	SPRINT_BUF(b1);
> +	__u32 val;
> +
> +	if (!arg)
> +		return -1;
> +
> +	parse_rtattr_nested(tb, TCA_MPLS_MAX, arg);
> +
> +	if (!tb[TCA_MPLS_PARMS]) {
> +		print_string(PRINT_FP, NULL, "%s", "[NULL mpls parameters]");
> +		return -1;
> +	}
> +	parm = RTA_DATA(tb[TCA_MPLS_PARMS]);
> +
> +	print_string(PRINT_ANY, "kind", "%s ", "mpls");
> +	print_string(PRINT_ANY, "mpls_action", " %s",
> +		     action_names[parm->m_action]);
> +
> +	switch (parm->m_action) {
> +	case TCA_MPLS_ACT_POP:
> +		if (tb[TCA_MPLS_PROTO]) {
> +			__u16 proto;
> +
> +			proto = rta_getattr_u16(tb[TCA_MPLS_PROTO]);
> +			print_string(PRINT_ANY, "protocol", " protocol %s",
> +				     ll_proto_n2a(proto, b1, sizeof(b1)));
> +		}
> +		break;
> +	case TCA_MPLS_ACT_PUSH:
> +		if (tb[TCA_MPLS_PROTO]) {
> +			__u16 proto;
> +
> +			proto = rta_getattr_u16(tb[TCA_MPLS_PROTO]);
> +			print_string(PRINT_ANY, "protocol", " protocol %s",
> +				     ll_proto_n2a(proto, b1, sizeof(b1)));
> +		}
> +		/* Fallthrough */
> +	case TCA_MPLS_ACT_MODIFY:
> +		if (tb[TCA_MPLS_LABEL]) {
> +			val = rta_getattr_u32(tb[TCA_MPLS_LABEL]);
> +			print_uint(PRINT_ANY, "label", " label %u", val);
> +		}
> +		if (tb[TCA_MPLS_TC]) {
> +			val = rta_getattr_u8(tb[TCA_MPLS_TC]);
> +			print_uint(PRINT_ANY, "tc", " tc %u", val);
> +		}
> +		if (tb[TCA_MPLS_BOS]) {
> +			val = rta_getattr_u8(tb[TCA_MPLS_BOS]);
> +			print_uint(PRINT_ANY, "bos", " bos %u", val);
> +		}
> +		if (tb[TCA_MPLS_TTL]) {
> +			val = rta_getattr_u8(tb[TCA_MPLS_TTL]);
> +			print_uint(PRINT_ANY, "ttl", " ttl %u", val);
> +		}
> +		break;
> +	}
> +	print_action_control(f, " ", parm->action, "");
> +
> +	print_uint(PRINT_ANY, "index", "\n\t index %u", parm->index);
> +	print_int(PRINT_ANY, "ref", " ref %d", parm->refcnt);
> +	print_int(PRINT_ANY, "bind", " bind %d", parm->bindcnt);
> +
> +	if (show_stats) {
> +		if (tb[TCA_MPLS_TM]) {
> +			struct tcf_t *tm = RTA_DATA(tb[TCA_MPLS_TM]);
> +
> +			print_tm(f, tm);
> +		}
> +	}
> +
> +	print_string(PRINT_FP, NULL, "%s", "\n");

s/"\n"/_SL_/ ?


^ permalink raw reply

* [PATCH iproute2] utils: don't match empty strings as prefixes
From: Matteo Croce @ 2019-07-09 20:40 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Ahern

iproute has an utility function which checks if a string is a prefix for
another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
instead of 'address'.

This routine unfortunately considers an empty string as prefix
of any pattern, leading to undefined behaviour when an empty
argument is passed to ip:

    # ip ''
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever

    # tc ''
    qdisc noqueue 0: dev lo root refcnt 2

    # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
    # ip addr show dev dummy0
    6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
        inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
           valid_lft forever preferred_lft forever

Rewrite matches() so it takes care of an empty input, and doesn't
scan the input strings three times: the actual implementation
does 2 strlen and a memcpy to accomplish the same task.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 include/utils.h |  2 +-
 lib/utils.c     | 14 +++++++++-----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 927fdc17..f4d12abb 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -198,7 +198,7 @@ int nodev(const char *dev);
 int check_ifname(const char *);
 int get_ifname(char *, const char *);
 const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
-int matches(const char *arg, const char *pattern);
+int matches(const char *prefix, const char *string);
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
 int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
 
diff --git a/lib/utils.c b/lib/utils.c
index be0f11b0..73ce19bb 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -887,13 +887,17 @@ const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
 	return name;
 }
 
-int matches(const char *cmd, const char *pattern)
+/* Check if 'prefix' is a non empty prefix of 'string' */
+int matches(const char *prefix, const char *string)
 {
-	int len = strlen(cmd);
+	if (!*prefix)
+		return 1;
+	while(*string && *prefix == *string) {
+		prefix++;
+		string++;
+	}
 
-	if (len > strlen(pattern))
-		return -1;
-	return memcmp(pattern, cmd, len);
+	return *prefix;
 }
 
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits)
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH mlx5-next 4/5] net/mlx5: Introduce TLS TX offload hardware bits and structures
From: Saeed Mahameed @ 2019-07-09 20:54 UTC (permalink / raw)
  To: saeedm@dev.mellanox.co.il, leon@kernel.org
  Cc: Eran Ben Elisha, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, Tariq Toukan
In-Reply-To: <20190704182113.GG7212@mtr-leonro.mtl.com>

On Thu, 2019-07-04 at 21:21 +0300, Leon Romanovsky wrote:
> On Thu, Jul 04, 2019 at 01:21:04PM -0400, Saeed Mahameed wrote:
> > On Thu, Jul 4, 2019 at 1:15 PM Leon Romanovsky <leon@kernel.org>
> > wrote:
> > > On Thu, Jul 04, 2019 at 01:06:58PM -0400, Saeed Mahameed wrote:
> > > > On Wed, Jul 3, 2019 at 5:27 AM <leon@kernel.org> wrote:
> > > > > On Wed, Jul 03, 2019 at 07:39:32AM +0000, Saeed Mahameed
> > > > > wrote:
> > > > > > From: Eran Ben Elisha <eranbe@mellanox.com>
> > > > > > 
> > > > > > Add TLS offload related IFC structs, layouts and
> > > > > > enumerations.
> > > > > > 
> > > > > > Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> > > > > > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > > > > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > > > > ---
> > > > > >  include/linux/mlx5/device.h   |  14 +++++
> > > > > >  include/linux/mlx5/mlx5_ifc.h | 104
> > > > > > ++++++++++++++++++++++++++++++++--
> > > > > >  2 files changed, 114 insertions(+), 4 deletions(-)
> > > > > 
> > > > > <...>
> > > > > 
> > > > > > @@ -2725,7 +2739,8 @@ struct mlx5_ifc_traffic_counter_bits
> > > > > > {
> > > > > > 
> > > > > >  struct mlx5_ifc_tisc_bits {
> > > > > >       u8         strict_lag_tx_port_affinity[0x1];
> > > > > > -     u8         reserved_at_1[0x3];
> > > > > > +     u8         tls_en[0x1];
> > > > > > +     u8         reserved_at_1[0x2];
> > > > > 
> > > > > It should be reserved_at_2.
> > > > > 
> > > > 
> > > > it should be at_1.
> > > 
> > > Why? See mlx5_ifc_flow_table_prop_layout_bits,
> > > mlx5_ifc_roce_cap_bits, e.t.c.
> > > 
> > 
> > they are all at_1 .. so i don't really understand what you want
> > from me,
> > Leon the code is good, please double check you comments..
> 
> Saeed,
> 
> reserved_at_1 should be renamed to be reserved_at_2.
> 
> strict_lag_tx_port_affinity[0x1] + tls_en[0x1] = 0x2
> 

Ok now it is clear, i trusted the developer on this one :)
anyway you have to admit that you mislead me with your examples:
mx5_ifc_flow_table_prop_layout_bits and mlx5_ifc_roce_cap_bits, they
both are fine so i though this was fine too.

I will fix it up.

Thanks,
Saeed.

> > > Thanks
> > > 
> > > > > Thanks

^ permalink raw reply

* [PATCH net-next,v4 00/11] netfilter: add hardware offload infrastructure
From: Pablo Neira Ayuso @ 2019-07-09 20:55 UTC (permalink / raw)
  To: netdev
  Cc: davem, thomas.lendacky, f.fainelli, ariel.elior, michael.chan,
	madalin.bucur, yisen.zhuang, salil.mehta, jeffrey.t.kirsher,
	tariqt, saeedm, jiri, idosch, jakub.kicinski, peppe.cavallaro,
	grygorii.strashko, andrew, vivien.didelot, alexandre.torgue,
	joabreu, linux-net-drivers, ogerlitz, Manish.Chopra,
	marcelo.leitner, mkubecek, venkatkumar.duvvuru, maxime.chevallier,
	cphealy, phil, netfilter-devel

Hi,

This patchset adds support for Netfilter hardware offloads.

This patchset reuses the existing block infrastructure, the
netdev_ops->ndo_setup_tc() interface, TC_SETUP_CLSFLOWER classifier and
the flow rule API.

Patch #1 adds flow_block_cb_setup_simple(), most drivers do the same thing
         to set up flow blocks, to reduce the number of changes, consolidate
         codebase. Use _simple() postfix as requested by Jakub Kicinski.
         This new function resides in net/core/flow_offload.c

Patch #2 renames TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND.

Patch #3 renames TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*.

Patch #4 adds flow_block_cb_alloc() and flow_block_cb_free() helper
         functions, this is the first patch of the flow block API.

Patch #5 adds the helper to deal with list operations in the flow block API.
         This includes flow_block_cb_lookup(), flow_block_cb_add() and
	 flow_block_cb_remove().

Patch #6 adds flow_block_cb_priv(), flow_block_cb_incref() and
         flow_block_cb_decref() which completes the flow block API.

Patch #7 updates the cls_api to use the flow block API from the new
         tcf_block_setup(). This infrastructure transports these objects
         via list (through the tc_block_offload object) back to the core
	 for registration.

            CLS_API                           DRIVER
        TC_SETUP_BLOCK    ---------->  setup flow_block_cb object &
                                 it adds object to flow_block_offload->cb_list
                                                |
            CLS_API     <-----------------------'
           registers                     list with flow blocks
         flow_block_cb &                   travels back to
       calls ->reoffload               the core for registration

         drivers allocate and sets up (configure the blocks), then
	 registration happens from the core (cls_api and netfilter).

Patch #8 updates drivers to use the flow block API.

Patch #9 removes the tcf block callback API, which is replaced by the
         flow block API.

Patch #10 adds the flow_block_cb_is_busy() helper to check if the block
	  is already used by a subsystem. This helper is invoked from
	  drivers. Once drivers are updated to support for multiple
	  subsystems, they can remove this check.

Patch #11 rename tc structure and definitions for the block bind/unbind
	  path.

Patch #12 introduces basic netfilter hardware offload infrastructure
          for the ingress chain. This includes 5-tuple exact matching
          and accept / drop rule actions. Only basechains are supported
          at this stage, no .reoffload callback is implemented either.
          Default policy to "accept" is only supported for now.

        table netdev filter {
                chain ingress {
                        type filter hook ingress device eth0 priority 0; flags offload;

                        ip daddr 192.168.0.10 tcp dport 22 drop
                }
        }

This patchset reuses the existing tcf block callback API and it places it
in the flow block callback API in net/core/flow_offload.c.

This series aims to address Jakub and Jiri's feedback, please see specific
patches in this batch for changelog in this v4.

Please, apply. Thank you very much.

P.S: yes, Phil, I still believe there is a chance.

Pablo Neira Ayuso (12):
  net: flow_offload: add flow_block_cb_setup_simple()
  net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
  net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
  net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
  net: flow_offload: add list handling functions
  net: flow_offload: add flow_block_cb_{priv,incref,decref}()
  net: sched: use flow block API
  drivers: net: use flow block API
  net: sched: remove tcf block API
  net: flow_offload: add flow_block_cb_is_busy() and use it
  net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
  netfilter: nf_tables: add hardware offload support

 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  27 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c       |  18 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h       |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c      |  29 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |  35 +--
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   |  22 +-
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h   |   6 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  49 ++--
 drivers/net/ethernet/intel/iavf/iavf_main.c        |  58 ++---
 drivers/net/ethernet/intel/igb/igb_main.c          |  43 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  30 +--
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c    |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.h    |   8 +-
 .../ethernet/mellanox/mlx5/core/en/tc_tun_geneve.c |  18 +-
 .../ethernet/mellanox/mlx5/core/en/tc_tun_gre.c    |   4 +-
 .../ethernet/mellanox/mlx5/core/en/tc_tun_vxlan.c  |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  38 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  94 ++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  34 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |   6 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 116 +++++----
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  10 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  |  34 +--
 drivers/net/ethernet/mscc/ocelot_ace.h             |   4 +-
 drivers/net/ethernet/mscc/ocelot_flower.c          |  70 +++---
 drivers/net/ethernet/mscc/ocelot_tc.c              |  47 ++--
 drivers/net/ethernet/netronome/nfp/abm/cls.c       |  22 +-
 drivers/net/ethernet/netronome/nfp/abm/main.h      |   2 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.c      |  30 +--
 drivers/net/ethernet/netronome/nfp/flower/action.c |  14 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   6 +-
 drivers/net/ethernet/netronome/nfp/flower/match.c  |  44 ++--
 .../net/ethernet/netronome/nfp/flower/metadata.c   |   2 +-
 .../net/ethernet/netronome/nfp/flower/offload.c    | 116 +++++----
 drivers/net/ethernet/qlogic/qede/qede.h            |   2 +-
 drivers/net/ethernet/qlogic/qede/qede_filter.c     |   2 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c       |  32 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  23 +-
 drivers/net/netdevsim/netdev.c                     |  29 +--
 include/net/flow_offload.h                         |  96 ++++++++
 include/net/netfilter/nf_tables.h                  |  14 ++
 include/net/netfilter/nf_tables_offload.h          |  76 ++++++
 include/net/pkt_cls.h                              | 129 +---------
 include/uapi/linux/netfilter/nf_tables.h           |   2 +
 net/core/flow_offload.c                            | 118 +++++++++
 net/dsa/slave.c                                    |  33 ++-
 net/netfilter/Makefile                             |   2 +-
 net/netfilter/nf_tables_api.c                      |  39 ++-
 net/netfilter/nf_tables_offload.c                  | 267 +++++++++++++++++++++
 net/netfilter/nft_cmp.c                            |  53 ++++
 net/netfilter/nft_immediate.c                      |  31 +++
 net/netfilter/nft_meta.c                           |  27 +++
 net/netfilter/nft_payload.c                        | 187 +++++++++++++++
 net/sched/cls_api.c                                | 211 ++++++++--------
 net/sched/cls_flower.c                             |  24 +-
 net/sched/sch_ingress.c                            |   6 +-
 56 files changed, 1579 insertions(+), 880 deletions(-)
 create mode 100644 include/net/netfilter/nf_tables_offload.h
 create mode 100644 net/netfilter/nf_tables_offload.c

-- 
2.11.0



^ permalink raw reply

* [PATCH net-next,v4 02/12] net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
From: Pablo Neira Ayuso @ 2019-07-09 20:55 UTC (permalink / raw)
  To: netdev
  Cc: davem, thomas.lendacky, f.fainelli, ariel.elior, michael.chan,
	madalin.bucur, yisen.zhuang, salil.mehta, jeffrey.t.kirsher,
	tariqt, saeedm, jiri, idosch, jakub.kicinski, peppe.cavallaro,
	grygorii.strashko, andrew, vivien.didelot, alexandre.torgue,
	joabreu, linux-net-drivers, ogerlitz, Manish.Chopra,
	marcelo.leitner, mkubecek, venkatkumar.duvvuru, maxime.chevallier,
	cphealy, phil, netfilter-devel
In-Reply-To: <20190709205550.3160-1-pablo@netfilter.org>

Rename from TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND and remove
temporary tc_block_command alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
v4: no changes.

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  4 ++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |  4 ++--
 drivers/net/ethernet/mscc/ocelot_tc.c              |  4 ++--
 .../net/ethernet/netronome/nfp/flower/offload.c    |  8 ++++----
 include/net/flow_offload.h                         |  4 ++--
 include/net/pkt_cls.h                              |  1 -
 net/core/flow_offload.c                            |  4 ++--
 net/dsa/slave.c                                    |  4 ++--
 net/sched/cls_api.c                                | 22 +++++++++++-----------
 9 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 5b5c4ecf4214..a13139a5a5ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -704,7 +704,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 		return -EOPNOTSUPP;
 
 	switch (f->command) {
-	case TC_BLOCK_BIND:
+	case FLOW_BLOCK_BIND:
 		indr_priv = mlx5e_rep_indr_block_priv_lookup(rpriv, netdev);
 		if (indr_priv)
 			return -EEXIST;
@@ -727,7 +727,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 		}
 
 		return err;
-	case TC_BLOCK_UNBIND:
+	case FLOW_BLOCK_UNBIND:
 		indr_priv = mlx5e_rep_indr_block_priv_lookup(rpriv, netdev);
 		if (!indr_priv)
 			return -ENOENT;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index ce285fbeebd3..9cf61a9d8291 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1679,7 +1679,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 	}
 
 	switch (f->command) {
-	case TC_BLOCK_BIND:
+	case FLOW_BLOCK_BIND:
 		err = tcf_block_cb_register(f->block, cb, mlxsw_sp_port,
 					    mlxsw_sp_port, f->extack);
 		if (err)
@@ -1692,7 +1692,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 			return err;
 		}
 		return 0;
-	case TC_BLOCK_UNBIND:
+	case FLOW_BLOCK_UNBIND:
 		mlxsw_sp_setup_tc_block_flower_unbind(mlxsw_sp_port,
 						      f->block, ingress);
 		tcf_block_cb_unregister(f->block, cb, mlxsw_sp_port);
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index 72084306240d..c84942ef8e7b 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -147,14 +147,14 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 	}
 
 	switch (f->command) {
-	case TC_BLOCK_BIND:
+	case FLOW_BLOCK_BIND:
 		ret = tcf_block_cb_register(f->block, cb, port,
 					    port, f->extack);
 		if (ret)
 			return ret;
 
 		return ocelot_setup_tc_block_flower_bind(port, f);
-	case TC_BLOCK_UNBIND:
+	case FLOW_BLOCK_UNBIND:
 		ocelot_setup_tc_block_flower_unbind(port, f);
 		tcf_block_cb_unregister(f->block, cb, port);
 		return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 6dbe947269c3..7c94f5142076 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1315,11 +1315,11 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 	repr_priv->block_shared = tcf_block_shared(f->block);
 
 	switch (f->command) {
-	case TC_BLOCK_BIND:
+	case FLOW_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
 					     nfp_flower_setup_tc_block_cb,
 					     repr, repr, f->extack);
-	case TC_BLOCK_UNBIND:
+	case FLOW_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block,
 					nfp_flower_setup_tc_block_cb,
 					repr);
@@ -1395,7 +1395,7 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 		return -EOPNOTSUPP;
 
 	switch (f->command) {
-	case TC_BLOCK_BIND:
+	case FLOW_BLOCK_BIND:
 		cb_priv = kmalloc(sizeof(*cb_priv), GFP_KERNEL);
 		if (!cb_priv)
 			return -ENOMEM;
@@ -1413,7 +1413,7 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 		}
 
 		return err;
-	case TC_BLOCK_UNBIND:
+	case FLOW_BLOCK_UNBIND:
 		cb_priv = nfp_flower_indr_block_cb_priv_lookup(app, netdev);
 		if (!cb_priv)
 			return -ENOENT;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 6afc6009c6ab..965b0f1133b3 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -234,8 +234,8 @@ static inline void flow_stats_update(struct flow_stats *flow_stats,
 }
 
 enum flow_block_command {
-	TC_BLOCK_BIND,
-	TC_BLOCK_UNBIND,
+	FLOW_BLOCK_BIND,
+	FLOW_BLOCK_UNBIND,
 };
 
 enum flow_block_binder_type {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index b6c306fa9541..1a96f469164f 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -27,7 +27,6 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
 #define tc_block_offload flow_block_offload
-#define tc_block_command flow_block_command
 #define tcf_block_binder_type flow_block_binder_type
 
 struct tcf_block_ext_info {
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index e31c0fdb6b01..593e73f7593a 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -178,10 +178,10 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 	f->driver_block_list = driver_block_list;
 
 	switch (f->command) {
-	case TC_BLOCK_BIND:
+	case FLOW_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, cb, cb_ident, cb_priv,
 					     f->extack);
-	case TC_BLOCK_UNBIND:
+	case FLOW_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, cb, cb_ident);
 		return 0;
 	default:
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 99673f6b07f6..58a71ee0747a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -955,9 +955,9 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 		return -EOPNOTSUPP;
 
 	switch (f->command) {
-	case TC_BLOCK_BIND:
+	case FLOW_BLOCK_BIND:
 		return tcf_block_cb_register(f->block, cb, dev, dev, f->extack);
-	case TC_BLOCK_UNBIND:
+	case FLOW_BLOCK_UNBIND:
 		tcf_block_cb_unregister(f->block, cb, dev);
 		return 0;
 	default:
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index ad36bbcc583e..5597bed80d9c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -674,7 +674,7 @@ static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb)
 
 static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
 				  struct tc_indr_block_cb *indr_block_cb,
-				  enum tc_block_command command)
+				  enum flow_block_command command)
 {
 	struct tc_block_offload bo = {
 		.command	= command,
@@ -705,7 +705,7 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
 	if (err)
 		goto err_dev_put;
 
-	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, TC_BLOCK_BIND);
+	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND);
 	return 0;
 
 err_dev_put:
@@ -742,7 +742,7 @@ void __tc_indr_block_cb_unregister(struct net_device *dev,
 		return;
 
 	/* Send unbind message if required to free any block cbs. */
-	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, TC_BLOCK_UNBIND);
+	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND);
 	tc_indr_block_cb_del(indr_block_cb);
 	tc_indr_block_dev_put(indr_dev);
 }
@@ -759,7 +759,7 @@ EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister);
 
 static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 			       struct tcf_block_ext_info *ei,
-			       enum tc_block_command command,
+			       enum flow_block_command command,
 			       struct netlink_ext_ack *extack)
 {
 	struct tc_indr_block_cb *indr_block_cb;
@@ -775,7 +775,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 	if (!indr_dev)
 		return;
 
-	indr_dev->block = command == TC_BLOCK_BIND ? block : NULL;
+	indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL;
 
 	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
 		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
@@ -790,7 +790,7 @@ static bool tcf_block_offload_in_use(struct tcf_block *block)
 static int tcf_block_offload_cmd(struct tcf_block *block,
 				 struct net_device *dev,
 				 struct tcf_block_ext_info *ei,
-				 enum tc_block_command command,
+				 enum flow_block_command command,
 				 struct netlink_ext_ack *extack)
 {
 	struct tc_block_offload bo = {};
@@ -820,20 +820,20 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 		return -EOPNOTSUPP;
 	}
 
-	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND, extack);
+	err = tcf_block_offload_cmd(block, dev, ei, FLOW_BLOCK_BIND, extack);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_inc;
 	if (err)
 		return err;
 
-	tc_indr_block_call(block, dev, ei, TC_BLOCK_BIND, extack);
+	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
 	return 0;
 
 no_offload_dev_inc:
 	if (tcf_block_offload_in_use(block))
 		return -EOPNOTSUPP;
 	block->nooffloaddevcnt++;
-	tc_indr_block_call(block, dev, ei, TC_BLOCK_BIND, extack);
+	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
 	return 0;
 }
 
@@ -843,11 +843,11 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 	struct net_device *dev = q->dev_queue->dev;
 	int err;
 
-	tc_indr_block_call(block, dev, ei, TC_BLOCK_UNBIND, NULL);
+	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
 
 	if (!dev->netdev_ops->ndo_setup_tc)
 		goto no_offload_dev_dec;
-	err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_UNBIND, NULL);
+	err = tcf_block_offload_cmd(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_dec;
 	return;
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next,v4 01/12] net: flow_offload: add flow_block_cb_setup_simple()
From: Pablo Neira Ayuso @ 2019-07-09 20:55 UTC (permalink / raw)
  To: netdev
  Cc: davem, thomas.lendacky, f.fainelli, ariel.elior, michael.chan,
	madalin.bucur, yisen.zhuang, salil.mehta, jeffrey.t.kirsher,
	tariqt, saeedm, jiri, idosch, jakub.kicinski, peppe.cavallaro,
	grygorii.strashko, andrew, vivien.didelot, alexandre.torgue,
	joabreu, linux-net-drivers, ogerlitz, Manish.Chopra,
	marcelo.leitner, mkubecek, venkatkumar.duvvuru, maxime.chevallier,
	cphealy, phil, netfilter-devel
In-Reply-To: <20190709205550.3160-1-pablo@netfilter.org>

Most drivers do the same thing to set up the flow block callbacks, this
patch adds a helper function to do this.

This preparation patch reduces the number of changes to adapt the
existing drivers to use the flow block callback API.

This new helper function takes a flow block list per-driver, which is
set to NULL until this driver list is used.

This patch also introduces the flow_block_command and
flow_block_binder_type enumerations, which are renamed to use
FLOW_BLOCK_* in follow up patches.

There are three definitions (aliases) in order to reduce the number of
updates in this patch, which go away once drivers are fully adapted to
use this flow block API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
v4: no changes.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 26 ++++-------------
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c     | 28 ++++--------------
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 26 ++++-------------
 drivers/net/ethernet/intel/i40e/i40e_main.c       | 26 ++++-------------
 drivers/net/ethernet/intel/iavf/iavf_main.c       | 35 ++++-------------------
 drivers/net/ethernet/intel/igb/igb_main.c         | 24 +++-------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 27 ++++-------------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 27 ++++-------------
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 26 ++++-------------
 drivers/net/ethernet/netronome/nfp/abm/cls.c      | 17 ++---------
 drivers/net/ethernet/netronome/nfp/bpf/main.c     | 29 ++++---------------
 drivers/net/ethernet/qlogic/qede/qede_main.c      | 23 ++-------------
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 22 ++------------
 drivers/net/netdevsim/netdev.c                    | 26 ++++-------------
 include/net/flow_offload.h                        | 27 +++++++++++++++++
 include/net/pkt_cls.h                             | 20 ++-----------
 net/core/flow_offload.c                           | 25 ++++++++++++++++
 17 files changed, 117 insertions(+), 317 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b7b62273c955..d2f8c3ed6c73 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9847,32 +9847,16 @@ static int bnxt_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-static int bnxt_setup_tc_block(struct net_device *dev,
-			       struct tc_block_offload *f)
-{
-	struct bnxt *bp = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, bnxt_setup_tc_block_cb,
-					     bp, bp, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, bnxt_setup_tc_block_cb, bp);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int bnxt_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			 void *type_data)
 {
+	struct bnxt *bp = netdev_priv(dev);
+
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return bnxt_setup_tc_block(dev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  bnxt_setup_tc_block_cb,
+						  bp, bp, true);
 	case TC_SETUP_QDISC_MQPRIO: {
 		struct tc_mqprio_qopt *mqprio = type_data;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
index f760921389a3..89398ff011d4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
@@ -161,34 +161,16 @@ static int bnxt_vf_rep_setup_tc_block_cb(enum tc_setup_type type,
 	}
 }
 
-static int bnxt_vf_rep_setup_tc_block(struct net_device *dev,
-				      struct tc_block_offload *f)
-{
-	struct bnxt_vf_rep *vf_rep = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block,
-					     bnxt_vf_rep_setup_tc_block_cb,
-					     vf_rep, vf_rep, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block,
-					bnxt_vf_rep_setup_tc_block_cb, vf_rep);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int bnxt_vf_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
 				void *type_data)
 {
+	struct bnxt_vf_rep *vf_rep = netdev_priv(dev);
+
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return bnxt_vf_rep_setup_tc_block(dev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  bnxt_vf_rep_setup_tc_block_cb,
+						  vf_rep, vf_rep, true);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index b08efc48d42f..9a486282a32e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3190,32 +3190,16 @@ static int cxgb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-static int cxgb_setup_tc_block(struct net_device *dev,
-			       struct tc_block_offload *f)
-{
-	struct port_info *pi = netdev2pinfo(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, cxgb_setup_tc_block_cb,
-					     pi, dev, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, cxgb_setup_tc_block_cb, pi);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int cxgb_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			 void *type_data)
 {
+	struct port_info *pi = netdev2pinfo(dev);
+
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return cxgb_setup_tc_block(dev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  cxgb_setup_tc_block_cb,
+						  pi, dev, true);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5361c08328f7..52f0f14d4207 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8177,34 +8177,18 @@ static int i40e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-static int i40e_setup_tc_block(struct net_device *dev,
-			       struct tc_block_offload *f)
-{
-	struct i40e_netdev_priv *np = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, i40e_setup_tc_block_cb,
-					     np, np, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, i40e_setup_tc_block_cb, np);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int __i40e_setup_tc(struct net_device *netdev, enum tc_setup_type type,
 			   void *type_data)
 {
+	struct i40e_netdev_priv *np = netdev_priv(netdev);
+
 	switch (type) {
 	case TC_SETUP_QDISC_MQPRIO:
 		return i40e_setup_tc(netdev, type_data);
 	case TC_SETUP_BLOCK:
-		return i40e_setup_tc_block(netdev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  i40e_setup_tc_block_cb,
+						  np, np, true);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 881561b36083..fd0e2bcc75e5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -3114,35 +3114,6 @@ static int iavf_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 }
 
 /**
- * iavf_setup_tc_block - register callbacks for tc
- * @netdev: network interface device structure
- * @f: tc offload data
- *
- * This function registers block callbacks for tc
- * offloads
- **/
-static int iavf_setup_tc_block(struct net_device *dev,
-			       struct tc_block_offload *f)
-{
-	struct iavf_adapter *adapter = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, iavf_setup_tc_block_cb,
-					     adapter, adapter, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, iavf_setup_tc_block_cb,
-					adapter);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
-/**
  * iavf_setup_tc - configure multiple traffic classes
  * @netdev: network interface device structure
  * @type: type of offload
@@ -3156,11 +3127,15 @@ static int iavf_setup_tc_block(struct net_device *dev,
 static int iavf_setup_tc(struct net_device *netdev, enum tc_setup_type type,
 			 void *type_data)
 {
+	struct iavf_adapter *adapter = netdev_priv(netdev);
+
 	switch (type) {
 	case TC_SETUP_QDISC_MQPRIO:
 		return __iavf_setup_tc(netdev, type_data);
 	case TC_SETUP_BLOCK:
-		return iavf_setup_tc_block(netdev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  iavf_setup_tc_block_cb,
+						  adapter, adapter, true);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index f66dae72fe37..836f9e1a136c 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2783,25 +2783,6 @@ static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-static int igb_setup_tc_block(struct igb_adapter *adapter,
-			      struct tc_block_offload *f)
-{
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, igb_setup_tc_block_cb,
-					     adapter, adapter, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb,
-					adapter);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int igb_offload_txtime(struct igb_adapter *adapter,
 			      struct tc_etf_qopt_offload *qopt)
 {
@@ -2834,7 +2815,10 @@ static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
 	case TC_SETUP_QDISC_CBS:
 		return igb_offload_cbs(adapter, type_data);
 	case TC_SETUP_BLOCK:
-		return igb_setup_tc_block(adapter, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  igb_setup_tc_block_cb,
+						  adapter, adapter, true);
+
 	case TC_SETUP_QDISC_ETF:
 		return igb_offload_txtime(adapter, type_data);
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b613e72c8ee4..b098f5be9c0d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9607,27 +9607,6 @@ static int ixgbe_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-static int ixgbe_setup_tc_block(struct net_device *dev,
-				struct tc_block_offload *f)
-{
-	struct ixgbe_adapter *adapter = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, ixgbe_setup_tc_block_cb,
-					     adapter, adapter, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, ixgbe_setup_tc_block_cb,
-					adapter);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int ixgbe_setup_tc_mqprio(struct net_device *dev,
 				 struct tc_mqprio_qopt *mqprio)
 {
@@ -9638,9 +9617,13 @@ static int ixgbe_setup_tc_mqprio(struct net_device *dev,
 static int __ixgbe_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			    void *type_data)
 {
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return ixgbe_setup_tc_block(dev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  ixgbe_setup_tc_block_cb,
+						  adapter, adapter, true);
 	case TC_SETUP_QDISC_MQPRIO:
 		return ixgbe_setup_tc_mqprio(dev, type_data);
 	default:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 83194d56434d..395890fea358 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3457,36 +3457,19 @@ static int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 		return -EOPNOTSUPP;
 	}
 }
-
-static int mlx5e_setup_tc_block(struct net_device *dev,
-				struct tc_block_offload *f)
-{
-	struct mlx5e_priv *priv = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, mlx5e_setup_tc_block_cb,
-					     priv, priv, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, mlx5e_setup_tc_block_cb,
-					priv);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
 #endif
 
 static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			  void *type_data)
 {
+	struct mlx5e_priv *priv = netdev_priv(dev);
+
 	switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
 	case TC_SETUP_BLOCK:
-		return mlx5e_setup_tc_block(dev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  mlx5e_setup_tc_block_cb,
+						  priv, priv, true);
 #endif
 	case TC_SETUP_QDISC_MQPRIO:
 		return mlx5e_setup_tc_mqprio(dev, type_data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 529f8e4b32c6..5b5c4ecf4214 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1178,32 +1178,16 @@ static int mlx5e_rep_setup_tc_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-static int mlx5e_rep_setup_tc_block(struct net_device *dev,
-				    struct tc_block_offload *f)
-{
-	struct mlx5e_priv *priv = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, mlx5e_rep_setup_tc_cb,
-					     priv, priv, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, mlx5e_rep_setup_tc_cb, priv);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			      void *type_data)
 {
+	struct mlx5e_priv *priv = netdev_priv(dev);
+
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return mlx5e_rep_setup_tc_block(dev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  mlx5e_rep_setup_tc_cb,
+						  priv, priv, true);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/netronome/nfp/abm/cls.c b/drivers/net/ethernet/netronome/nfp/abm/cls.c
index ff3913085665..29fb45734962 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/cls.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/cls.c
@@ -265,19 +265,6 @@ static int nfp_abm_setup_tc_block_cb(enum tc_setup_type type,
 int nfp_abm_setup_cls_block(struct net_device *netdev, struct nfp_repr *repr,
 			    struct tc_block_offload *f)
 {
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block,
-					     nfp_abm_setup_tc_block_cb,
-					     repr, repr, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, nfp_abm_setup_tc_block_cb,
-					repr);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
+	return flow_block_cb_setup_simple(f, NULL, nfp_abm_setup_tc_block_cb,
+					  repr, repr, true);
 }
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 9c136da25221..0c93c84a188a 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -160,35 +160,16 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
 	return 0;
 }
 
-static int nfp_bpf_setup_tc_block(struct net_device *netdev,
-				  struct tc_block_offload *f)
-{
-	struct nfp_net *nn = netdev_priv(netdev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block,
-					     nfp_bpf_setup_tc_block_cb,
-					     nn, nn, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block,
-					nfp_bpf_setup_tc_block_cb,
-					nn);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int nfp_bpf_setup_tc(struct nfp_app *app, struct net_device *netdev,
 			    enum tc_setup_type type, void *type_data)
 {
+	struct nfp_net *nn = netdev_priv(netdev);
+
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return nfp_bpf_setup_tc_block(netdev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  nfp_bpf_setup_tc_block_cb,
+						  nn, nn, true);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index d4a29660751d..cba97ed3dd56 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -579,25 +579,6 @@ static int qede_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	}
 }
 
-static int qede_setup_tc_block(struct qede_dev *edev,
-			       struct tc_block_offload *f)
-{
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block,
-					     qede_setup_tc_block_cb,
-					     edev, edev, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, qede_setup_tc_block_cb, edev);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int
 qede_setup_tc_offload(struct net_device *dev, enum tc_setup_type type,
 		      void *type_data)
@@ -607,7 +588,9 @@ qede_setup_tc_offload(struct net_device *dev, enum tc_setup_type type,
 
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return qede_setup_tc_block(edev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  qede_setup_tc_block_cb,
+						  edev, edev, true);
 	case TC_SETUP_QDISC_MQPRIO:
 		mqprio = type_data;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 3425d4dda03d..0040f10bbef9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3853,24 +3853,6 @@ static int stmmac_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
 	return ret;
 }
 
-static int stmmac_setup_tc_block(struct stmmac_priv *priv,
-				 struct tc_block_offload *f)
-{
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, stmmac_setup_tc_block_cb,
-				priv, priv, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, stmmac_setup_tc_block_cb, priv);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int stmmac_setup_tc(struct net_device *ndev, enum tc_setup_type type,
 			   void *type_data)
 {
@@ -3878,7 +3860,9 @@ static int stmmac_setup_tc(struct net_device *ndev, enum tc_setup_type type,
 
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return stmmac_setup_tc_block(priv, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  stmmac_setup_tc_block_cb,
+						  priv, priv, true);
 	case TC_SETUP_QDISC_CBS:
 		return stmmac_tc_setup_cbs(priv, priv, type_data);
 	default:
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index e5c8aa08e1cd..920dc79e9dc9 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -78,26 +78,6 @@ nsim_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv)
 	return nsim_bpf_setup_tc_block_cb(type, type_data, cb_priv);
 }
 
-static int
-nsim_setup_tc_block(struct net_device *dev, struct tc_block_offload *f)
-{
-	struct netdevsim *ns = netdev_priv(dev);
-
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
-		return -EOPNOTSUPP;
-
-	switch (f->command) {
-	case TC_BLOCK_BIND:
-		return tcf_block_cb_register(f->block, nsim_setup_tc_block_cb,
-					     ns, ns, f->extack);
-	case TC_BLOCK_UNBIND:
-		tcf_block_cb_unregister(f->block, nsim_setup_tc_block_cb, ns);
-		return 0;
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 static int nsim_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
 {
 	struct netdevsim *ns = netdev_priv(dev);
@@ -226,9 +206,13 @@ static int nsim_set_vf_link_state(struct net_device *dev, int vf, int state)
 static int
 nsim_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data)
 {
+	struct netdevsim *ns = netdev_priv(dev);
+
 	switch (type) {
 	case TC_SETUP_BLOCK:
-		return nsim_setup_tc_block(dev, type_data);
+		return flow_block_cb_setup_simple(type_data, NULL,
+						  nsim_setup_tc_block_cb,
+						  ns, ns, true);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 36127c1858a4..6afc6009c6ab 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -3,6 +3,7 @@
 
 #include <linux/kernel.h>
 #include <net/flow_dissector.h>
+#include <net/sch_generic.h>
 
 struct flow_match {
 	struct flow_dissector	*dissector;
@@ -232,4 +233,30 @@ static inline void flow_stats_update(struct flow_stats *flow_stats,
 	flow_stats->lastused	= max_t(u64, flow_stats->lastused, lastused);
 }
 
+enum flow_block_command {
+	TC_BLOCK_BIND,
+	TC_BLOCK_UNBIND,
+};
+
+enum flow_block_binder_type {
+	TCF_BLOCK_BINDER_TYPE_UNSPEC,
+	TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
+	TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
+};
+
+struct tcf_block;
+struct netlink_ext_ack;
+
+struct flow_block_offload {
+	enum flow_block_command command;
+	enum flow_block_binder_type binder_type;
+	struct tcf_block *block;
+	struct list_head *driver_block_list;
+	struct netlink_ext_ack *extack;
+};
+
+int flow_block_cb_setup_simple(struct flow_block_offload *f,
+			       struct list_head *driver_list, tc_setup_cb_t *cb,
+			       void *cb_ident, void *cb_priv, bool ingress_only);
+
 #endif /* _NET_FLOW_OFFLOAD_H */
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 1a7596ba0dbe..b6c306fa9541 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -26,11 +26,9 @@ struct tcf_walker {
 int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
-enum tcf_block_binder_type {
-	TCF_BLOCK_BINDER_TYPE_UNSPEC,
-	TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
-	TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
-};
+#define tc_block_offload flow_block_offload
+#define tc_block_command flow_block_command
+#define tcf_block_binder_type flow_block_binder_type
 
 struct tcf_block_ext_info {
 	enum tcf_block_binder_type binder_type;
@@ -610,18 +608,6 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 		     void *type_data, bool err_stop);
 unsigned int tcf_exts_num_actions(struct tcf_exts *exts);
 
-enum tc_block_command {
-	TC_BLOCK_BIND,
-	TC_BLOCK_UNBIND,
-};
-
-struct tc_block_offload {
-	enum tc_block_command command;
-	enum tcf_block_binder_type binder_type;
-	struct tcf_block *block;
-	struct netlink_ext_ack *extack;
-};
-
 struct tc_cls_common_offload {
 	u32 chain_index;
 	__be16 protocol;
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index f52fe0bc4017..e31c0fdb6b01 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -2,6 +2,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <net/flow_offload.h>
+#include <net/pkt_cls.h>
 
 struct flow_rule *flow_rule_alloc(unsigned int num_actions)
 {
@@ -164,3 +165,27 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 	FLOW_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ENC_OPTS, out);
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
+
+int flow_block_cb_setup_simple(struct flow_block_offload *f,
+			       struct list_head *driver_block_list,
+			       tc_setup_cb_t *cb, void *cb_ident, void *cb_priv,
+			       bool ingress_only)
+{
+	if (ingress_only &&
+	    f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+		return -EOPNOTSUPP;
+
+	f->driver_block_list = driver_block_list;
+
+	switch (f->command) {
+	case TC_BLOCK_BIND:
+		return tcf_block_cb_register(f->block, cb, cb_ident, cb_priv,
+					     f->extack);
+	case TC_BLOCK_UNBIND:
+		tcf_block_cb_unregister(f->block, cb, cb_ident);
+		return 0;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+EXPORT_SYMBOL(flow_block_cb_setup_simple);
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next,v4 03/12] net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
From: Pablo Neira Ayuso @ 2019-07-09 20:55 UTC (permalink / raw)
  To: netdev
  Cc: davem, thomas.lendacky, f.fainelli, ariel.elior, michael.chan,
	madalin.bucur, yisen.zhuang, salil.mehta, jeffrey.t.kirsher,
	tariqt, saeedm, jiri, idosch, jakub.kicinski, peppe.cavallaro,
	grygorii.strashko, andrew, vivien.didelot, alexandre.torgue,
	joabreu, linux-net-drivers, ogerlitz, Manish.Chopra,
	marcelo.leitner, mkubecek, venkatkumar.duvvuru, maxime.chevallier,
	cphealy, phil, netfilter-devel
In-Reply-To: <20190709205550.3160-1-pablo@netfilter.org>

Rename from TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* and
remove temporary tcf_block_binder_type alias.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
v4: no changes.

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    |  2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c      |  4 ++--
 drivers/net/ethernet/mscc/ocelot_flower.c           |  2 +-
 drivers/net/ethernet/mscc/ocelot_tc.c               |  4 ++--
 drivers/net/ethernet/netronome/nfp/flower/offload.c |  6 +++---
 include/net/flow_offload.h                          |  6 +++---
 include/net/pkt_cls.h                               |  3 +--
 net/core/flow_offload.c                             |  2 +-
 net/dsa/slave.c                                     |  4 ++--
 net/sched/cls_api.c                                 | 14 +++++++-------
 net/sched/sch_ingress.c                             |  6 +++---
 11 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index a13139a5a5ca..f5c430973b35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -700,7 +700,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 	struct mlx5e_rep_indr_block_priv *indr_priv;
 	int err = 0;
 
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+	if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
 	switch (f->command) {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 9cf61a9d8291..a178d082f061 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1668,10 +1668,10 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 	bool ingress;
 	int err;
 
-	if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) {
+	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS) {
 		cb = mlxsw_sp_setup_tc_block_cb_matchall_ig;
 		ingress = true;
-	} else if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS) {
+	} else if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS) {
 		cb = mlxsw_sp_setup_tc_block_cb_matchall_eg;
 		ingress = false;
 	} else {
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c
index 8778dee5a471..b682f08a93b4 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -306,7 +306,7 @@ int ocelot_setup_tc_block_flower_bind(struct ocelot_port *port,
 	struct tcf_block_cb *block_cb;
 	int ret;
 
-	if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
+	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
 		return -EOPNOTSUPP;
 
 	block_cb = tcf_block_cb_lookup(f->block,
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index c84942ef8e7b..58a0b5f8850c 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -137,10 +137,10 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 	netdev_dbg(port->dev, "tc_block command %d, binder_type %d\n",
 		   f->command, f->binder_type);
 
-	if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) {
+	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS) {
 		cb = ocelot_setup_tc_block_cb_ig;
 		port->tc.block_shared = tcf_block_shared(f->block);
-	} else if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS) {
+	} else if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS) {
 		cb = ocelot_setup_tc_block_cb_eg;
 	} else {
 		return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 7c94f5142076..46041e509150 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1308,7 +1308,7 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 	struct nfp_repr *repr = netdev_priv(netdev);
 	struct nfp_flower_repr_priv *repr_priv;
 
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+	if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
 	repr_priv = repr->app_priv;
@@ -1389,8 +1389,8 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 	struct nfp_flower_priv *priv = app->priv;
 	int err;
 
-	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS &&
-	    !(f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS &&
+	if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS &&
+	    !(f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS &&
 	      nfp_flower_internal_port_can_offload(app, netdev)))
 		return -EOPNOTSUPP;
 
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 965b0f1133b3..0d7d1e6d6f2a 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -239,9 +239,9 @@ enum flow_block_command {
 };
 
 enum flow_block_binder_type {
-	TCF_BLOCK_BINDER_TYPE_UNSPEC,
-	TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
-	TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
+	FLOW_BLOCK_BINDER_TYPE_UNSPEC,
+	FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
+	FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
 };
 
 struct tcf_block;
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 1a96f469164f..e4499526fde8 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -27,10 +27,9 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
 #define tc_block_offload flow_block_offload
-#define tcf_block_binder_type flow_block_binder_type
 
 struct tcf_block_ext_info {
-	enum tcf_block_binder_type binder_type;
+	enum flow_block_binder_type binder_type;
 	tcf_chain_head_change_t *chain_head_change;
 	void *chain_head_change_priv;
 	u32 block_index;
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 593e73f7593a..6d8187e8effc 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -172,7 +172,7 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       bool ingress_only)
 {
 	if (ingress_only &&
-	    f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+	    f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
 	f->driver_block_list = driver_block_list;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 58a71ee0747a..9b5e202c255e 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -947,9 +947,9 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 {
 	tc_setup_cb_t *cb;
 
-	if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		cb = dsa_slave_setup_tc_block_cb_ig;
-	else if (f->binder_type == TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
+	else if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
 		cb = dsa_slave_setup_tc_block_cb_eg;
 	else
 		return -EOPNOTSUPP;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 5597bed80d9c..fa0c451aca59 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -678,7 +678,7 @@ static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
 {
 	struct tc_block_offload bo = {
 		.command	= command,
-		.binder_type	= TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
+		.binder_type	= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
 		.block		= indr_dev->block,
 	};
 
@@ -1340,17 +1340,17 @@ static void tcf_block_release(struct Qdisc *q, struct tcf_block *block,
 struct tcf_block_owner_item {
 	struct list_head list;
 	struct Qdisc *q;
-	enum tcf_block_binder_type binder_type;
+	enum flow_block_binder_type binder_type;
 };
 
 static void
 tcf_block_owner_netif_keep_dst(struct tcf_block *block,
 			       struct Qdisc *q,
-			       enum tcf_block_binder_type binder_type)
+			       enum flow_block_binder_type binder_type)
 {
 	if (block->keep_dst &&
-	    binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS &&
-	    binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
+	    binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS &&
+	    binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
 		netif_keep_dst(qdisc_dev(q));
 }
 
@@ -1367,7 +1367,7 @@ EXPORT_SYMBOL(tcf_block_netif_keep_dst);
 
 static int tcf_block_owner_add(struct tcf_block *block,
 			       struct Qdisc *q,
-			       enum tcf_block_binder_type binder_type)
+			       enum flow_block_binder_type binder_type)
 {
 	struct tcf_block_owner_item *item;
 
@@ -1382,7 +1382,7 @@ static int tcf_block_owner_add(struct tcf_block *block,
 
 static void tcf_block_owner_del(struct tcf_block *block,
 				struct Qdisc *q,
-				enum tcf_block_binder_type binder_type)
+				enum flow_block_binder_type binder_type)
 {
 	struct tcf_block_owner_item *item;
 
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 599730f804d7..bf56aa519797 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -83,7 +83,7 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
 
 	mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
 
-	q->block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	q->block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	q->block_info.chain_head_change = clsact_chain_head_change;
 	q->block_info.chain_head_change_priv = &q->miniqp;
 
@@ -217,7 +217,7 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 
 	mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress);
 
-	q->ingress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	q->ingress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	q->ingress_block_info.chain_head_change = clsact_chain_head_change;
 	q->ingress_block_info.chain_head_change_priv = &q->miniqp_ingress;
 
@@ -228,7 +228,7 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 
 	mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress);
 
-	q->egress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
+	q->egress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
 	q->egress_block_info.chain_head_change = clsact_chain_head_change;
 	q->egress_block_info.chain_head_change_priv = &q->miniqp_egress;
 
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next,v4 04/12] net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
From: Pablo Neira Ayuso @ 2019-07-09 20:55 UTC (permalink / raw)
  To: netdev
  Cc: davem, thomas.lendacky, f.fainelli, ariel.elior, michael.chan,
	madalin.bucur, yisen.zhuang, salil.mehta, jeffrey.t.kirsher,
	tariqt, saeedm, jiri, idosch, jakub.kicinski, peppe.cavallaro,
	grygorii.strashko, andrew, vivien.didelot, alexandre.torgue,
	joabreu, linux-net-drivers, ogerlitz, Manish.Chopra,
	marcelo.leitner, mkubecek, venkatkumar.duvvuru, maxime.chevallier,
	cphealy, phil, netfilter-devel
In-Reply-To: <20190709205550.3160-1-pablo@netfilter.org>

Add a new helper function to allocate flow_block_cb objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
v4: no changes.

 include/net/flow_offload.h | 14 ++++++++++++++
 net/core/flow_offload.c    | 28 ++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 0d7d1e6d6f2a..bcc4e2fef6ba 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -255,6 +255,20 @@ struct flow_block_offload {
 	struct netlink_ext_ack *extack;
 };
 
+struct flow_block_cb {
+	struct list_head	list;
+	tc_setup_cb_t		*cb;
+	void			*cb_ident;
+	void			*cb_priv;
+	void			(*release)(void *cb_priv);
+	unsigned int		refcnt;
+};
+
+struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+					  void *cb_ident, void *cb_priv,
+					  void (*release)(void *cb_priv));
+void flow_block_cb_free(struct flow_block_cb *block_cb);
+
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       struct list_head *driver_list, tc_setup_cb_t *cb,
 			       void *cb_ident, void *cb_priv, bool ingress_only);
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 6d8187e8effc..d08148cb6953 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -166,6 +166,34 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
+struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+					  void *cb_ident, void *cb_priv,
+					  void (*release)(void *cb_priv))
+{
+	struct flow_block_cb *block_cb;
+
+	block_cb = kzalloc(sizeof(*block_cb), GFP_KERNEL);
+	if (!block_cb)
+		return ERR_PTR(-ENOMEM);
+
+	block_cb->cb = cb;
+	block_cb->cb_ident = cb_ident;
+	block_cb->cb_priv = cb_priv;
+	block_cb->release = release;
+
+	return block_cb;
+}
+EXPORT_SYMBOL(flow_block_cb_alloc);
+
+void flow_block_cb_free(struct flow_block_cb *block_cb)
+{
+	if (block_cb->release)
+		block_cb->release(block_cb->cb_priv);
+
+	kfree(block_cb);
+}
+EXPORT_SYMBOL(flow_block_cb_free);
+
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       struct list_head *driver_block_list,
 			       tc_setup_cb_t *cb, void *cb_ident, void *cb_priv,
-- 
2.11.0



^ permalink raw reply related

* [PATCH net-next,v4 05/12] net: flow_offload: add list handling functions
From: Pablo Neira Ayuso @ 2019-07-09 20:55 UTC (permalink / raw)
  To: netdev
  Cc: davem, thomas.lendacky, f.fainelli, ariel.elior, michael.chan,
	madalin.bucur, yisen.zhuang, salil.mehta, jeffrey.t.kirsher,
	tariqt, saeedm, jiri, idosch, jakub.kicinski, peppe.cavallaro,
	grygorii.strashko, andrew, vivien.didelot, alexandre.torgue,
	joabreu, linux-net-drivers, ogerlitz, Manish.Chopra,
	marcelo.leitner, mkubecek, venkatkumar.duvvuru, maxime.chevallier,
	cphealy, phil, netfilter-devel
In-Reply-To: <20190709205550.3160-1-pablo@netfilter.org>

This patch adds the list handling functions for the flow block API:

* flow_block_cb_lookup() allows drivers to look up for existing flow blocks.
* flow_block_cb_add() adds a flow block to the per driver list to be registered
  by the core.
* flow_block_cb_remove() to remove a flow block from the list of existing
  flow blocks per driver and to request the core to unregister this.

The flow block API also annotates the netns this flow block belongs to.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
v4: missing "per driver" reference in flow_block_cb_add() description - Jiri Pirko.
    Move per-driver list to struct flow_block_offload - Jiri Pirko.

 include/net/flow_offload.h | 19 +++++++++++++++++++
 net/core/flow_offload.c    | 17 +++++++++++++++++
 net/sched/cls_api.c        |  3 +++
 3 files changed, 39 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index bcc4e2fef6ba..52901e12c913 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -251,12 +251,16 @@ struct flow_block_offload {
 	enum flow_block_command command;
 	enum flow_block_binder_type binder_type;
 	struct tcf_block *block;
+	struct net *net;
+	struct list_head cb_list;
 	struct list_head *driver_block_list;
 	struct netlink_ext_ack *extack;
 };
 
 struct flow_block_cb {
+	struct list_head	driver_list;
 	struct list_head	list;
+	struct net		*net;
 	tc_setup_cb_t		*cb;
 	void			*cb_ident;
 	void			*cb_priv;
@@ -269,6 +273,21 @@ struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
 
+struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *offload,
+					   tc_setup_cb_t *cb, void *cb_ident);
+
+static inline void flow_block_cb_add(struct flow_block_cb *block_cb,
+				     struct flow_block_offload *offload)
+{
+	list_add_tail(&block_cb->list, &offload->cb_list);
+}
+
+static inline void flow_block_cb_remove(struct flow_block_cb *block_cb,
+					struct flow_block_offload *offload)
+{
+	list_move(&block_cb->list, &offload->cb_list);
+}
+
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       struct list_head *driver_list, tc_setup_cb_t *cb,
 			       void *cb_ident, void *cb_priv, bool ingress_only);
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index d08148cb6953..c81a7e0c5e04 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -176,6 +176,7 @@ struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
 	if (!block_cb)
 		return ERR_PTR(-ENOMEM);
 
+	block_cb->net = net;
 	block_cb->cb = cb;
 	block_cb->cb_ident = cb_ident;
 	block_cb->cb_priv = cb_priv;
@@ -194,6 +195,22 @@ void flow_block_cb_free(struct flow_block_cb *block_cb)
 }
 EXPORT_SYMBOL(flow_block_cb_free);
 
+struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
+					   tc_setup_cb_t *cb, void *cb_ident)
+{
+	struct flow_block_cb *block_cb;
+
+	list_for_each_entry(block_cb, f->driver_block_list, driver_list) {
+		if (block_cb->net == f->net &&
+		    block_cb->cb == cb &&
+		    block_cb->cb_ident == cb_ident)
+			return block_cb;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL(flow_block_cb_lookup);
+
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       struct list_head *driver_block_list,
 			       tc_setup_cb_t *cb, void *cb_ident, void *cb_priv,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index fa0c451aca59..72761b43ae41 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -679,6 +679,7 @@ static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
 	struct tc_block_offload bo = {
 		.command	= command,
 		.binder_type	= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
+		.net		= dev_net(indr_dev->dev),
 		.block		= indr_dev->block,
 	};
 
@@ -767,6 +768,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 	struct tc_block_offload bo = {
 		.command	= command,
 		.binder_type	= ei->binder_type,
+		.net		= dev_net(dev),
 		.block		= block,
 		.extack		= extack,
 	};
@@ -795,6 +797,7 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
 {
 	struct tc_block_offload bo = {};
 
+	bo.net = dev_net(dev);
 	bo.command = command;
 	bo.binder_type = ei->binder_type;
 	bo.block = block;
-- 
2.11.0



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox