netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances
@ 2017-07-10 18:51 Jiri Pirko
  2017-07-10 18:51 ` [patch net-next RFC 1/4] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Jiri Pirko @ 2017-07-10 18:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

From: Jiri Pirko <jiri@mellanox.com>

Currently the filters added to qdiscs are independent. So for example if you
have 2 netdevices and you create ingress qdisc on both and you want to add
identical filter rules both, you need to add them twice. This patchset
makes this easier and mainly saves resources allowing to share all filters
within a qdisc - I call it a "filter block". Also this helps to save
resources when we do offload to hw for example to expensive TCAM.

So back to the example. First, we create 2 qdiscs. Both will share
block number 22. "22" is just an identification. If we don't pass any
block number, a new one will be generated by kernel:

$ tc qdisc add dev ens7 ingress block 22
                                ^^^^^^^^
$ tc qdisc add dev ens8 ingress block 22
                                ^^^^^^^^

Now if we list the qdiscs, we will see the block index in the output:
qdisc fq_codel 0: dev ens7 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 9014 bytes 99 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22 
                                              ^^^^^^^^
 Sent 4592 bytes 58 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq_codel 0: dev ens8 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 17022 bytes 307 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22 
                                              ^^^^^^^^
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 


Now we can add filter to any of qdiscs sharing the same block:

$ tc filter add dev ens7 parent ffff: protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop


We will see the same output if we list filters for ens7 and ens8, including stats:

$ tc -s filter show dev ens7 root
filter parent ffff: protocol ip pref 25 flower 
filter parent ffff: protocol ip pref 25 flower handle 0x1 
  eth_type ipv4
  dst_ip 192.168.1.0/24
        action order 1: gact action drop
         random type none pass val 0
         index 3 ref 1 bind 1 installed 10201 sec used 10150 sec
        Action statistics:
        Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0 

$ tc -s filter show dev ens8 root
filter dev ens7 parent ffff: protocol ip pref 25 flower 
filter dev ens7 parent ffff: protocol ip pref 25 flower handle 0x1 
  eth_type ipv4
  dst_ip 192.168.1.0/24
        action order 1: gact action drop
         random type none pass val 0
         index 3 ref 1 bind 1 installed 10202 sec used 10152 sec
        Action statistics:
        Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0 


Issues:
- tp->q is set by the device used to add the filter. That has to be resolved.
  Impacts the dump (as you can see above)

Jiri Pirko (4):
  net: sched: introduce support for multiple filter chain pointers
    registration
  net: sched: intruduce qdisc_net helper
  net: sched: introduce shared filter blocks infrastructure
  net: sched: allow ingress and clsact qdiscs to share filter blocks

 include/net/pkt_cls.h          |  22 +++-
 include/net/pkt_sched.h        |   7 ++
 include/net/sch_generic.h      |   4 +-
 include/uapi/linux/pkt_sched.h |  12 +++
 net/sched/cls_api.c            | 240 +++++++++++++++++++++++++++++++++++++----
 net/sched/sch_ingress.c        | 107 ++++++++++++++++--
 6 files changed, 362 insertions(+), 30 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch net-next RFC 1/4] net: sched: introduce support for multiple filter chain pointers registration
  2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
@ 2017-07-10 18:51 ` Jiri Pirko
  2017-07-10 18:51 ` [patch net-next RFC 2/4] net: sched: intruduce qdisc_net helper Jiri Pirko
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2017-07-10 18:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

From: Jiri Pirko <jiri@mellanox.com>

So far, there was possible only to register a single filter chain
pointer to a block->chain[0]. However, when the blocks will get
shareable, we need to allow multiple filter chain pointers registration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c       | 66 ++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1c123e2..7396de8 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -257,7 +257,7 @@ struct qdisc_skb_cb {
 
 struct tcf_chain {
 	struct tcf_proto __rcu *filter_chain;
-	struct tcf_proto __rcu **p_filter_chain;
+	struct list_head filter_chain_list;
 	struct list_head list;
 	struct tcf_block *block;
 	u32 index; /* chain index */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 39da0c5..411f5577 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -186,6 +186,11 @@ static void tcf_proto_destroy(struct tcf_proto *tp)
 	kfree_rcu(tp, rcu);
 }
 
+struct tfc_filter_chain_list_item {
+	struct list_head list;
+	struct tcf_proto __rcu **p_filter_chain;
+};
+
 static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 					  u32 chain_index)
 {
@@ -194,6 +199,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 	chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 	if (!chain)
 		return NULL;
+	INIT_LIST_HEAD(&chain->filter_chain_list);
 	list_add_tail(&chain->list, &block->chain_list);
 	chain->block = block;
 	chain->index = chain_index;
@@ -203,10 +209,11 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 
 static void tcf_chain_flush(struct tcf_chain *chain)
 {
+	struct tfc_filter_chain_list_item *item;
 	struct tcf_proto *tp;
 
-	if (*chain->p_filter_chain)
-		RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
+	list_for_each_entry(item, &chain->filter_chain_list, list)
+		RCU_INIT_POINTER(*item->p_filter_chain, NULL);
 	while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
 		RCU_INIT_POINTER(chain->filter_chain, tp->next);
 		tcf_proto_destroy(tp);
@@ -248,11 +255,40 @@ void tcf_chain_put(struct tcf_chain *chain)
 }
 EXPORT_SYMBOL(tcf_chain_put);
 
+static int
+tcf_chain_filter_chain_ptr_add(struct tcf_chain *chain,
+			       struct tcf_proto __rcu **p_filter_chain)
+{
+	struct tfc_filter_chain_list_item *item;
+
+	item = kmalloc(sizeof(*item), GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+	item->p_filter_chain = p_filter_chain;
+	list_add(&item->list, &chain->filter_chain_list);
+	return 0;
+}
+
 static void
-tcf_chain_filter_chain_ptr_set(struct tcf_chain *chain,
+tcf_chain_filter_chain_ptr_del(struct tcf_chain *chain,
 			       struct tcf_proto __rcu **p_filter_chain)
 {
-	chain->p_filter_chain = p_filter_chain;
+	struct tfc_filter_chain_list_item *item;
+
+	list_for_each_entry(item, &chain->filter_chain_list, list) {
+		if (!p_filter_chain ||
+		    item->p_filter_chain == p_filter_chain) {
+			list_del(&item->list);
+			kfree(item);
+			return;
+		}
+	}
+	WARN_ON(1);
+}
+
+static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
+{
+	return list_first_entry(&block->chain_list, struct tcf_chain, list);
 }
 
 int tcf_block_get(struct tcf_block **p_block,
@@ -271,7 +307,7 @@ int tcf_block_get(struct tcf_block **p_block,
 		err = -ENOMEM;
 		goto err_chain_create;
 	}
-	tcf_chain_filter_chain_ptr_set(chain, p_filter_chain);
+	tcf_chain_filter_chain_ptr_add(chain, p_filter_chain);
 	*p_block = block;
 	return 0;
 
@@ -288,6 +324,8 @@ void tcf_block_put(struct tcf_block *block)
 	if (!block)
 		return;
 
+	tcf_chain_filter_chain_ptr_del(tcf_block_chain_zero(block), NULL);
+
 	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
 		tcf_chain_destroy(chain);
 	kfree(block);
@@ -362,9 +400,13 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 				struct tcf_chain_info *chain_info,
 				struct tcf_proto *tp)
 {
-	if (chain->p_filter_chain &&
-	    *chain_info->pprev == chain->filter_chain)
-		rcu_assign_pointer(*chain->p_filter_chain, tp);
+	if (*chain_info->pprev == chain->filter_chain) {
+		struct tfc_filter_chain_list_item *item;
+
+		list_for_each_entry(item, &chain->filter_chain_list, list)
+			rcu_assign_pointer(*item->p_filter_chain, tp);
+	}
+
 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
 }
@@ -375,8 +417,12 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 {
 	struct tcf_proto *next = rtnl_dereference(chain_info->next);
 
-	if (chain->p_filter_chain && tp == chain->filter_chain)
-		RCU_INIT_POINTER(*chain->p_filter_chain, next);
+	if (tp == chain->filter_chain) {
+		struct tfc_filter_chain_list_item *item;
+
+		list_for_each_entry(item, &chain->filter_chain_list, list)
+			RCU_INIT_POINTER(*item->p_filter_chain, next);
+	}
 	RCU_INIT_POINTER(*chain_info->pprev, next);
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [patch net-next RFC 2/4] net: sched: intruduce qdisc_net helper
  2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
  2017-07-10 18:51 ` [patch net-next RFC 1/4] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
@ 2017-07-10 18:51 ` Jiri Pirko
  2017-07-10 18:51 ` [patch net-next RFC 3/4] net: sched: introduce shared filter blocks infrastructure Jiri Pirko
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2017-07-10 18:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

From: Jiri Pirko <jiri@mellanox.com>

There is going to be need to be able to obtain net structure for
a qdisc. So introduce helper to do it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_sched.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2579c20..da85ad0 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -4,7 +4,9 @@
 #include <linux/jiffies.h>
 #include <linux/ktime.h>
 #include <linux/if_vlan.h>
+#include <linux/netdevice.h>
 #include <net/sch_generic.h>
+#include <net/net_namespace.h>
 
 #define DEFAULT_TX_QUEUE_LEN	1000
 
@@ -132,4 +134,9 @@ static inline unsigned int psched_mtu(const struct net_device *dev)
 	return dev->mtu + dev->hard_header_len;
 }
 
+static inline struct net *qdisc_net(struct Qdisc *q)
+{
+	return dev_net(q->dev_queue->dev);
+}
+
 #endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [patch net-next RFC 3/4] net: sched: introduce shared filter blocks infrastructure
  2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
  2017-07-10 18:51 ` [patch net-next RFC 1/4] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
  2017-07-10 18:51 ` [patch net-next RFC 2/4] net: sched: intruduce qdisc_net helper Jiri Pirko
@ 2017-07-10 18:51 ` Jiri Pirko
  2017-07-10 18:51 ` [patch net-next RFC 4/4] net: sched: allow ingress and clsact qdiscs to share filter blocks Jiri Pirko
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2017-07-10 18:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

From: Jiri Pirko <jiri@mellanox.com>

Allow qdiscs to share filter blocks among them. Each qdisc type has to
use block get/put modifications that enable sharing. Shared blocks are
tracked within each net namespace and identified by u32 value. This
value is auto-generated in case user did not pass it from userspace. If
user passes value that is not used, new block is created. If user passes
value that is already used, the existing block will be re-used.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h     |  22 +++++-
 include/net/sch_generic.h |   2 +
 net/sched/cls_api.c       | 180 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 189 insertions(+), 15 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 537d0a0..4381cbc 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -23,7 +23,12 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
 void tcf_chain_put(struct tcf_chain *chain);
 int tcf_block_get(struct tcf_block **p_block,
 		  struct tcf_proto __rcu **p_filter_chain);
+int tcf_block_get_shared(struct tcf_block **p_block,
+			 struct net *net, u32 block_index,
+			 struct tcf_proto __rcu **p_filter_chain);
 void tcf_block_put(struct tcf_block *block);
+void tcf_block_put_shared(struct tcf_block *block, struct net *net,
+			  struct tcf_proto __rcu **p_filter_chain);
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
 
@@ -35,7 +40,22 @@ int tcf_block_get(struct tcf_block **p_block,
 	return 0;
 }
 
-static inline void tcf_block_put(struct tcf_block *block)
+static inline
+int tcf_block_get_shared(struct tcf_block **p_block,
+			 struct net *net, u32 block_index,
+			 struct tcf_proto __rcu **p_filter_chain)
+{
+	return 0;
+}
+
+static inline
+void tcf_block_put(struct tcf_block *block)
+{
+}
+
+static inline
+void tcf_block_put_shared(struct tcf_block *block, struct net *net,
+			  struct tcf_proto __rcu **p_filter_chain)
 {
 }
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7396de8..cbc7313 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -266,6 +266,8 @@ struct tcf_chain {
 
 struct tcf_block {
 	struct list_head chain_list;
+	u32 index; /* block index for shared blocks */
+	unsigned int refcnt;
 };
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 411f5577..098b9a2 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -25,6 +25,7 @@
 #include <linux/kmod.h>
 #include <linux/err.h>
 #include <linux/slab.h>
+#include <linux/idr.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/netlink.h>
@@ -286,52 +287,175 @@ tcf_chain_filter_chain_ptr_del(struct tcf_chain *chain,
 	WARN_ON(1);
 }
 
-static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
+struct tcf_net {
+	struct idr idr;
+};
+
+static unsigned int tcf_net_id;
+
+static int tcf_block_insert(struct tcf_block *block, struct net *net,
+			    u32 block_index)
 {
-	return list_first_entry(&block->chain_list, struct tcf_chain, list);
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+	int idr_start;
+	int idr_end;
+	int index;
+
+	if (block_index >= INT_MAX)
+		return -EINVAL;
+	idr_start = block_index ? block_index : 1;
+	idr_end = block_index ? block_index + 1 : INT_MAX;
+
+	index = idr_alloc(&tn->idr, block, idr_start, idr_end, GFP_KERNEL);
+	if (index < 0)
+		return index;
+	block->index = index;
+	return 0;
 }
 
-int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain)
+static void tcf_block_remove(struct tcf_block *block, struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_remove(&tn->idr, block->index);
+}
+
+static struct tcf_block *tcf_block_create(void)
 {
-	struct tcf_block *block = kzalloc(sizeof(*block), GFP_KERNEL);
+	struct tcf_block *block;
 	struct tcf_chain *chain;
 	int err;
 
+	block = kzalloc(sizeof(*block), GFP_KERNEL);
 	if (!block)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	INIT_LIST_HEAD(&block->chain_list);
+	block->refcnt = 1;
+
 	/* Create chain 0 by default, it has to be always present. */
 	chain = tcf_chain_create(block, 0);
 	if (!chain) {
 		err = -ENOMEM;
 		goto err_chain_create;
 	}
-	tcf_chain_filter_chain_ptr_add(chain, p_filter_chain);
-	*p_block = block;
-	return 0;
+	return block;
 
 err_chain_create:
 	kfree(block);
+	return ERR_PTR(err);
+}
+
+static void tcf_block_destroy(struct tcf_block *block)
+{
+	struct tcf_chain *chain, *tmp;
+
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
+		tcf_chain_destroy(chain);
+	kfree(block);
+}
+
+static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	return idr_find(&tn->idr, block_index);
+}
+
+static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
+{
+	return list_first_entry(&block->chain_list, struct tcf_chain, list);
+}
+
+static int __tcf_block_get(struct tcf_block **p_block, bool shared,
+			   struct net *net, u32 block_index,
+			   struct tcf_proto __rcu **p_filter_chain)
+{
+	struct tcf_block *block = NULL;
+	bool created = false;
+	int err;
+
+	if (shared) {
+		block = tcf_block_lookup(net, block_index);
+		if (block)
+			block->refcnt++;
+	}
+
+	if (!block) {
+		block = tcf_block_create();
+		if (IS_ERR(block))
+			return PTR_ERR(block);
+		created = true;
+		if (shared) {
+			err = tcf_block_insert(block, net, block_index);
+			if (err)
+				goto err_block_insert;
+		}
+	}
+
+	err = tcf_chain_filter_chain_ptr_add(tcf_block_chain_zero(block),
+					     p_filter_chain);
+	if (err)
+		goto err_chain_filter_chain_ptr_add;
+
+	*p_block = block;
+	return 0;
+
+err_chain_filter_chain_ptr_add:
+	if (created) {
+		if (shared)
+			tcf_block_remove(block, net);
+err_block_insert:
+		tcf_block_destroy(block);
+	} else {
+		block->refcnt--;
+	}
 	return err;
 }
+
+int tcf_block_get(struct tcf_block **p_block,
+		  struct tcf_proto __rcu **p_filter_chain)
+{
+	return __tcf_block_get(p_block, false, NULL, 0, p_filter_chain);
+}
 EXPORT_SYMBOL(tcf_block_get);
 
-void tcf_block_put(struct tcf_block *block)
+int tcf_block_get_shared(struct tcf_block **p_block,
+			 struct net *net, u32 block_index,
+			 struct tcf_proto __rcu **p_filter_chain)
 {
-	struct tcf_chain *chain, *tmp;
+	return __tcf_block_get(p_block, true, net, block_index, p_filter_chain);
+}
+EXPORT_SYMBOL(tcf_block_get_shared);
 
+static void __tcf_block_put(struct tcf_block *block,
+			    bool shared, struct net *net,
+			    struct tcf_proto __rcu **p_filter_chain)
+{
 	if (!block)
 		return;
 
 	tcf_chain_filter_chain_ptr_del(tcf_block_chain_zero(block), NULL);
 
-	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
-		tcf_chain_destroy(chain);
-	kfree(block);
+	if (--block->refcnt == 0) {
+		if (shared)
+			tcf_block_remove(block, net);
+		tcf_block_destroy(block);
+	}
+}
+
+void tcf_block_put(struct tcf_block *block)
+{
+	__tcf_block_put(block, false, NULL, NULL);
 }
 EXPORT_SYMBOL(tcf_block_put);
 
+void tcf_block_put_shared(struct tcf_block *block, struct net *net,
+			  struct tcf_proto __rcu **p_filter_chain)
+{
+	__tcf_block_put(block, true, net, p_filter_chain);
+}
+EXPORT_SYMBOL(tcf_block_put_shared);
+
 /* Main classifier routine: scans classifier chain attached
  * to this qdisc, (optionally) tests for protocol and asks
  * specific classifiers.
@@ -1035,8 +1159,36 @@ int tcf_exts_get_dev(struct net_device *dev, struct tcf_exts *exts,
 }
 EXPORT_SYMBOL(tcf_exts_get_dev);
 
+static __net_init int tcf_net_init(struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_init(&tn->idr);
+	return 0;
+}
+
+static void __net_exit tcf_net_exit(struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_destroy(&tn->idr);
+}
+
+static struct pernet_operations tcf_net_ops = {
+	.init = tcf_net_init,
+	.exit = tcf_net_exit,
+	.id   = &tcf_net_id,
+	.size = sizeof(struct tcf_net),
+};
+
 static int __init tc_filter_init(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&tcf_net_ops);
+	if (err)
+		return err;
+
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, NULL);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL, NULL);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_ctl_tfilter,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [patch net-next RFC 4/4] net: sched: allow ingress and clsact qdiscs to share filter blocks
  2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (2 preceding siblings ...)
  2017-07-10 18:51 ` [patch net-next RFC 3/4] net: sched: introduce shared filter blocks infrastructure Jiri Pirko
@ 2017-07-10 18:51 ` Jiri Pirko
  2017-07-10 18:52 ` [patch iproute2/net-next RFC] tc: Implement filter block sharing to ingress and clsact qdiscs Jiri Pirko
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2017-07-10 18:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

From: Jiri Pirko <jiri@mellanox.com>

Benefit from the previously introduced shared filter blocks
infrastructure and allow ingress and clsact qdisc instances to share
filter blocks. The block index is coming from userspace as qdisc option.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/pkt_sched.h |  12 +++++
 net/sched/sch_ingress.c        | 107 ++++++++++++++++++++++++++++++++++++++---
 2 files changed, 112 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf55..a684087 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -871,4 +871,16 @@ struct tc_pie_xstats {
 	__u32 maxq;             /* maximum queue size */
 	__u32 ecn_mark;         /* packets marked with ecn*/
 };
+
+/* Ingress/clsact */
+
+enum {
+	TCA_CLSACT_UNSPEC,
+	TCA_CLSACT_INGRESS_BLOCK,
+	TCA_CLSACT_EGRESS_BLOCK,
+	__TCA_CLSACT_MAX
+};
+
+#define TCA_CLSACT_MAX	(__TCA_CLSACT_MAX - 1)
+
 #endif
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index d8a9beb..f18b257 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -58,13 +58,42 @@ static struct tcf_block *ingress_tcf_block(struct Qdisc *sch, unsigned long cl)
 	return q->block;
 }
 
+static const struct nla_policy ingress_policy[TCA_CLSACT_MAX + 1] = {
+	[TCA_CLSACT_INGRESS_BLOCK]	= { .type = NLA_U32 },
+};
+
+static int ingress_parse_opt(struct nlattr *opt, u32 *p_ingress_block_index)
+{
+	struct nlattr *tb[TCA_CLSACT_MAX + 1];
+	int err;
+
+	*p_ingress_block_index = 0;
+
+	if (!opt)
+		return 0;
+	err = nla_parse_nested(tb, TCA_CLSACT_MAX, opt, ingress_policy, NULL);
+	if (err)
+		return err;
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK])
+		*p_ingress_block_index =
+			nla_get_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+	return 0;
+}
+
 static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct ingress_sched_data *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
+	u32 ingress_block_index;
 	int err;
 
-	err = tcf_block_get(&q->block, &dev->ingress_cl_list);
+	err = ingress_parse_opt(opt, &ingress_block_index);
+	if (err)
+		return err;
+
+	err = tcf_block_get_shared(&q->block, qdisc_net(sch),
+				   ingress_block_index, &dev->ingress_cl_list);
 	if (err)
 		return err;
 
@@ -77,18 +106,22 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
 static void ingress_destroy(struct Qdisc *sch)
 {
 	struct ingress_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
-	tcf_block_put(q->block);
+	tcf_block_put_shared(q->block, qdisc_net(sch), &dev->ingress_cl_list);
 	net_dec_ingress_queue();
 }
 
 static int ingress_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct ingress_sched_data *q = qdisc_priv(sch);
 	struct nlattr *nest;
 
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CLSACT_INGRESS_BLOCK, q->block->index))
+		goto nla_put_failure;
 
 	return nla_nest_end(skb, nest);
 
@@ -159,17 +192,54 @@ static struct tcf_block *clsact_tcf_block(struct Qdisc *sch, unsigned long cl)
 	}
 }
 
+static const struct nla_policy clsact_policy[TCA_CLSACT_MAX + 1] = {
+	[TCA_CLSACT_INGRESS_BLOCK]	= { .type = NLA_U32 },
+	[TCA_CLSACT_EGRESS_BLOCK]	= { .type = NLA_U32 },
+};
+
+static int clsact_parse_opt(struct nlattr *opt, u32 *p_ingress_block_index,
+			    u32 *p_egress_block_index)
+{
+	struct nlattr *tb[TCA_CLSACT_MAX + 1];
+	int err;
+
+	*p_ingress_block_index = 0;
+	*p_egress_block_index = 0;
+
+	if (!opt)
+		return 0;
+	err = nla_parse_nested(tb, TCA_CLSACT_MAX, opt, clsact_policy, NULL);
+	if (err)
+		return err;
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK])
+		*p_ingress_block_index =
+			nla_get_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+	if (tb[TCA_CLSACT_EGRESS_BLOCK])
+		*p_egress_block_index =
+			nla_get_u32(tb[TCA_CLSACT_EGRESS_BLOCK]);
+	return 0;
+}
+
 static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
+	u32 ingress_block_index;
+	u32 egress_block_index;
 	int err;
 
-	err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list);
+	err = clsact_parse_opt(opt, &ingress_block_index, &egress_block_index);
 	if (err)
 		return err;
 
-	err = tcf_block_get(&q->egress_block, &dev->egress_cl_list);
+	err = tcf_block_get_shared(&q->ingress_block, qdisc_net(sch),
+				   ingress_block_index, &dev->ingress_cl_list);
+	if (err)
+		return err;
+
+	err = tcf_block_get_shared(&q->egress_block, qdisc_net(sch),
+				   egress_block_index, &dev->egress_cl_list);
 	if (err)
 		return err;
 
@@ -184,14 +254,37 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 static void clsact_destroy(struct Qdisc *sch)
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
-	tcf_block_put(q->egress_block);
-	tcf_block_put(q->ingress_block);
+	tcf_block_put_shared(q->egress_block, qdisc_net(sch),
+			     &dev->ingress_cl_list);
+	tcf_block_put_shared(q->ingress_block, qdisc_net(sch),
+			     &dev->egress_cl_list);
 
 	net_dec_ingress_queue();
 	net_dec_egress_queue();
 }
 
+static int clsact_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct clsact_sched_data *q = qdisc_priv(sch);
+	struct nlattr *nest;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (!nest)
+		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CLSACT_INGRESS_BLOCK, q->ingress_block->index))
+		goto nla_put_failure;
+	if (nla_put_u32(skb, TCA_CLSACT_EGRESS_BLOCK, q->egress_block->index))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, nest);
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
 static const struct Qdisc_class_ops clsact_class_ops = {
 	.leaf		=	ingress_leaf,
 	.get		=	clsact_get,
@@ -209,7 +302,7 @@ static struct Qdisc_ops clsact_qdisc_ops __read_mostly = {
 	.priv_size	=	sizeof(struct clsact_sched_data),
 	.init		=	clsact_init,
 	.destroy	=	clsact_destroy,
-	.dump		=	ingress_dump,
+	.dump		=	clsact_dump,
 	.owner		=	THIS_MODULE,
 };
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [patch iproute2/net-next RFC] tc: Implement filter block sharing to ingress and clsact qdiscs
  2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (3 preceding siblings ...)
  2017-07-10 18:51 ` [patch net-next RFC 4/4] net: sched: allow ingress and clsact qdiscs to share filter blocks Jiri Pirko
@ 2017-07-10 18:52 ` Jiri Pirko
  2017-07-11  6:57 ` [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Or Gerlitz
  2017-07-11 12:15 ` Jamal Hadi Salim
  6 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2017-07-10 18:52 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/pkt_sched.h | 12 +++++++++++
 tc/q_clsact.c             | 54 ++++++++++++++++++++++++++++++++++++++++++-----
 tc/q_ingress.c            | 32 +++++++++++++++++++++++++---
 3 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 099bf55..a684087 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -871,4 +871,16 @@ struct tc_pie_xstats {
 	__u32 maxq;             /* maximum queue size */
 	__u32 ecn_mark;         /* packets marked with ecn*/
 };
+
+/* Ingress/clsact */
+
+enum {
+	TCA_CLSACT_UNSPEC,
+	TCA_CLSACT_INGRESS_BLOCK,
+	TCA_CLSACT_EGRESS_BLOCK,
+	__TCA_CLSACT_MAX
+};
+
+#define TCA_CLSACT_MAX	(__TCA_CLSACT_MAX - 1)
+
 #endif
diff --git a/tc/q_clsact.c b/tc/q_clsact.c
index e2a1a71..0ecaa63 100644
--- a/tc/q_clsact.c
+++ b/tc/q_clsact.c
@@ -6,23 +6,67 @@
 
 static void explain(void)
 {
-	fprintf(stderr, "Usage: ... clsact\n");
+	fprintf(stderr, "Usage: ... clsact [ingress_block BLOCK_INDEX] [egress_block BLOCK_INDEX]\n");
 }
 
 static int clsact_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			    struct nlmsghdr *n)
 {
-	if (argc > 0) {
-		fprintf(stderr, "What is \"%s\"?\n", *argv);
-		explain();
-		return -1;
+	struct rtattr *tail;
+	unsigned int ingress_block;
+	unsigned int egress_block;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "ingress_block") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&ingress_block, *argv, 0)) {
+				fprintf(stderr, "Illegal \"ingress_block\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "egress_block") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&egress_block, *argv, 0)) {
+				fprintf(stderr, "Illegal \"egress_block\"\n");
+				return -1;
+			}
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		NEXT_ARG_FWD();
 	}
 
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (ingress_block)
+		addattr32(n, 1024, TCA_CLSACT_INGRESS_BLOCK, ingress_block);
+	if (egress_block)
+		addattr32(n, 1024, TCA_CLSACT_EGRESS_BLOCK, egress_block);
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
 	return 0;
 }
 
 static int clsact_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 {
+	struct rtattr *tb[TCA_CLSACT_MAX + 1];
+	unsigned int block;
+
+	if (!opt)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CLSACT_MAX, opt);
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK] &&
+	    RTA_PAYLOAD(tb[TCA_CLSACT_INGRESS_BLOCK]) >= sizeof(__u32)) {
+		block = rta_getattr_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+		fprintf(f, "ingress_block %u ", block);
+	}
+	if (tb[TCA_CLSACT_EGRESS_BLOCK] &&
+	    RTA_PAYLOAD(tb[TCA_CLSACT_EGRESS_BLOCK]) >= sizeof(__u32)) {
+		block = rta_getattr_u32(tb[TCA_CLSACT_EGRESS_BLOCK]);
+		fprintf(f, "egress_block %u ", block);
+	}
 	return 0;
 }
 
diff --git a/tc/q_ingress.c b/tc/q_ingress.c
index 31699a8..b973e1b 100644
--- a/tc/q_ingress.c
+++ b/tc/q_ingress.c
@@ -17,30 +17,56 @@
 
 static void explain(void)
 {
-	fprintf(stderr, "Usage: ... ingress\n");
+	fprintf(stderr, "Usage: ... ingress [block BLOCK_INDEX]\n");
 }
 
 static int ingress_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			     struct nlmsghdr *n)
 {
+	struct rtattr *tail;
+	unsigned int block;
+
 	while (argc > 0) {
 		if (strcmp(*argv, "handle") == 0) {
 			NEXT_ARG();
-			argc--; argv++;
+		} else if (strcmp(*argv, "block") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&block, *argv, 0)) {
+				fprintf(stderr, "Illegal \"block\"\n");
+				return -1;
+			}
 		} else {
 			fprintf(stderr, "What is \"%s\"?\n", *argv);
 			explain();
 			return -1;
 		}
+		NEXT_ARG_FWD();
 	}
 
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (block)
+		addattr32(n, 1024, TCA_CLSACT_INGRESS_BLOCK, block);
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
 	return 0;
 }
 
 static int ingress_print_opt(struct qdisc_util *qu, FILE *f,
 			     struct rtattr *opt)
 {
-	fprintf(f, "---------------- ");
+	struct rtattr *tb[TCA_CLSACT_MAX + 1];
+	unsigned int block;
+
+	if (!opt)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CLSACT_MAX, opt);
+
+	if (tb[TCA_CLSACT_INGRESS_BLOCK] &&
+	    RTA_PAYLOAD(tb[TCA_CLSACT_INGRESS_BLOCK]) >= sizeof(__u32)) {
+		block = rta_getattr_u32(tb[TCA_CLSACT_INGRESS_BLOCK]);
+		fprintf(f, "block %u ", block);
+	}
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances
  2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (4 preceding siblings ...)
  2017-07-10 18:52 ` [patch iproute2/net-next RFC] tc: Implement filter block sharing to ingress and clsact qdiscs Jiri Pirko
@ 2017-07-11  6:57 ` Or Gerlitz
  2017-07-11  7:02   ` Jiri Pirko
  2017-07-11 12:15 ` Jamal Hadi Salim
  6 siblings, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2017-07-11  6:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Linux Netdev List, David Miller, Jamal Hadi Salim, Cong Wang,
	Eric Dumazet, Stephen Hemminger, Jiri Benc, mlxsw, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, John Fastabend, Alexander Duyck,
	Daniel Borkmann, Or Gerlitz, Roman Mashak, Paul Blakey,
	Mark Bloch

On Mon, Jul 10, 2017 at 9:51 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Currently the filters added to qdiscs are independent. So for example if you
> have 2 netdevices and you create ingress qdisc on both and you want to add
> identical filter rules both, you need to add them twice. This patchset
> makes this easier and mainly saves resources allowing to share all filters
> within a qdisc - I call it a "filter block". Also this helps to save
> resources when we do offload to hw for example to expensive TCAM.

[...]


> Now we can add filter to any of qdiscs sharing the same block:
> $ tc filter add dev ens7 parent ffff: protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

> We will see the same output if we list filters for ens7 and ens8, including stats:

yeah, but the stats belong to the action, not the filter, right? so you create
here actually a shared action? or you also introduced in that series stats
for filters, I am confused...

Or.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances
  2017-07-11  6:57 ` [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Or Gerlitz
@ 2017-07-11  7:02   ` Jiri Pirko
  0 siblings, 0 replies; 11+ messages in thread
From: Jiri Pirko @ 2017-07-11  7:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Linux Netdev List, David Miller, Jamal Hadi Salim, Cong Wang,
	Eric Dumazet, Stephen Hemminger, Jiri Benc, mlxsw, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, John Fastabend, Alexander Duyck,
	Daniel Borkmann, Or Gerlitz, Roman Mashak, Paul Blakey,
	Mark Bloch

Tue, Jul 11, 2017 at 08:57:30AM CEST, gerlitz.or@gmail.com wrote:
>On Mon, Jul 10, 2017 at 9:51 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Currently the filters added to qdiscs are independent. So for example if you
>> have 2 netdevices and you create ingress qdisc on both and you want to add
>> identical filter rules both, you need to add them twice. This patchset
>> makes this easier and mainly saves resources allowing to share all filters
>> within a qdisc - I call it a "filter block". Also this helps to save
>> resources when we do offload to hw for example to expensive TCAM.
>
>[...]
>
>
>> Now we can add filter to any of qdiscs sharing the same block:
>> $ tc filter add dev ens7 parent ffff: protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>
>> We will see the same output if we list filters for ens7 and ens8, including stats:
>
>yeah, but the stats belong to the action, not the filter, right? so you create
>here actually a shared action? or you also introduced in that series stats
>for filters, I am confused...

Filters are shared along with all that belongs to them, so including
actions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances
  2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
                   ` (5 preceding siblings ...)
  2017-07-11  6:57 ` [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Or Gerlitz
@ 2017-07-11 12:15 ` Jamal Hadi Salim
  2017-07-11 12:34   ` Jiri Pirko
  6 siblings, 1 reply; 11+ messages in thread
From: Jamal Hadi Salim @ 2017-07-11 12:15 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw, andrew,
	vivien.didelot, f.fainelli, john.fastabend, alexander.h.duyck,
	daniel, ogerlitz, mrv

Hi Jiri,

Commenting on generalities - will comment on code later:

On 17-07-10 02:51 PM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Currently the filters added to qdiscs are independent. So for example if you
> have 2 netdevices and you create ingress qdisc on both and you want to add
> identical filter rules both, you need to add them twice. This patchset
> makes this easier and mainly saves resources allowing to share all filters
> within a qdisc - I call it a "filter block". Also this helps to save
> resources when we do offload to hw for example to expensive TCAM.
> 
> So back to the example. First, we create 2 qdiscs. Both will share
> block number 22. "22" is just an identification. If we don't pass any
> block number, a new one will be generated by kernel:
> 
> $ tc qdisc add dev ens7 ingress block 22
>                                  ^^^^^^^^
> $ tc qdisc add dev ens8 ingress block 22
>

Above makes intuitive sense.


                                   ^^^^^^^^
> 
> Now if we list the qdiscs, we will see the block index in the output:
> qdisc fq_codel 0: dev ens7 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
>   Sent 9014 bytes 99 pkt (dropped 0, overlimits 0 requeues 0)
>   backlog 0b 0p requeues 0
>    maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>    new_flows_len 0 old_flows_len 0
> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>                                                ^^^^^^^^
>   Sent 4592 bytes 58 pkt (dropped 0, overlimits 0 requeues 0)
>   backlog 0b 0p requeues 0
> qdisc fq_codel 0: dev ens8 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
>   Sent 17022 bytes 307 pkt (dropped 0, overlimits 0 requeues 0)
>   backlog 0b 0p requeues 0
>    maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>    new_flows_len 0 old_flows_len 0
> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>                                                ^^^^^^^^
>   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>   backlog 0b 0p requeues 0
>

So does this.

> 
> Now we can add filter to any of qdiscs sharing the same block:
> 
> $ tc filter add dev ens7 parent ffff: protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>

So for backward compat - this also makes sense. But:
it does make sense to create new syntax for adding
filters and actions:

tc filter add block 22 protocol ip pref 25 flower \
   dst_ip 192.168.0.0/16 action drop

Coordinates of the filter block before were:

<ifindex>, <parent>, [handle]

You should be able to abuse struct tcmsg ifindex to represent block #
as long as you set parent to be something meaningful that is
identified "block coordinate" via TC_H_XXX (pick something safe not
in use by ingress or egress; look at: uapi/linux/pkt_sched.h)

> 
> We will see the same output if we list filters for ens7 and ens8, including stats:
> 
> $ tc -s filter show dev ens7 root
> filter parent ffff: protocol ip pref 25 flower
> filter parent ffff: protocol ip pref 25 flower handle 0x1
>    eth_type ipv4
>    dst_ip 192.168.1.0/24
>          action order 1: gact action drop
>           random type none pass val 0
>           index 3 ref 1 bind 1 installed 10201 sec used 10150 sec
>          Action statistics:
>          Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
>          backlog 0b 0p requeues 0
> 
> $ tc -s filter show dev ens8 root
> filter dev ens7 parent ffff: protocol ip pref 25 flower
> filter dev ens7 parent ffff: protocol ip pref 25 flower handle 0x1
>    eth_type ipv4
>    dst_ip 192.168.1.0/24
>          action order 1: gact action drop
>           random type none pass val 0
>           index 3 ref 1 bind 1 installed 10202 sec used 10152 sec
>          Action statistics:
>          Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
>          backlog 0b 0p requeues 0
> 
> 
> Issues:
> - tp->q is set by the device used to add the filter. That has to be resolved.
>    Impacts the dump (as you can see above)
> 

I think you have more problems if the dump above is reality;->
You added to ingress and this is showing egress.

To complete the thought, dump is:

  tc -s filter show block 22

cheers,
jamal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances
  2017-07-11 12:15 ` Jamal Hadi Salim
@ 2017-07-11 12:34   ` Jiri Pirko
  2017-07-14  7:33     ` Jamal Hadi Salim
  0 siblings, 1 reply; 11+ messages in thread
From: Jiri Pirko @ 2017-07-11 12:34 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

Tue, Jul 11, 2017 at 02:15:27PM CEST, jhs@mojatatu.com wrote:
>Hi Jiri,
>
>Commenting on generalities - will comment on code later:
>
>On 17-07-10 02:51 PM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Currently the filters added to qdiscs are independent. So for example if you
>> have 2 netdevices and you create ingress qdisc on both and you want to add
>> identical filter rules both, you need to add them twice. This patchset
>> makes this easier and mainly saves resources allowing to share all filters
>> within a qdisc - I call it a "filter block". Also this helps to save
>> resources when we do offload to hw for example to expensive TCAM.
>> 
>> So back to the example. First, we create 2 qdiscs. Both will share
>> block number 22. "22" is just an identification. If we don't pass any
>> block number, a new one will be generated by kernel:
>> 
>> $ tc qdisc add dev ens7 ingress block 22
>>                                  ^^^^^^^^
>> $ tc qdisc add dev ens8 ingress block 22
>> 
>
>Above makes intuitive sense.
>
>
>                                  ^^^^^^^^
>> 
>> Now if we list the qdiscs, we will see the block index in the output:
>> qdisc fq_codel 0: dev ens7 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
>>   Sent 9014 bytes 99 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>>    maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>    new_flows_len 0 old_flows_len 0
>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>                                                ^^^^^^^^
>>   Sent 4592 bytes 58 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>> qdisc fq_codel 0: dev ens8 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
>>   Sent 17022 bytes 307 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>>    maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>    new_flows_len 0 old_flows_len 0
>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>                                                ^^^^^^^^
>>   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>> 
>
>So does this.
>
>> 
>> Now we can add filter to any of qdiscs sharing the same block:
>> 
>> $ tc filter add dev ens7 parent ffff: protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>> 
>
>So for backward compat - this also makes sense. But:
>it does make sense to create new syntax for adding
>filters and actions:
>
>tc filter add block 22 protocol ip pref 25 flower \
>  dst_ip 192.168.0.0/16 action drop

Was thinking about that. Decided to pass on this now. This should be
addressed by follow-up anyway.


>
>Coordinates of the filter block before were:
>
><ifindex>, <parent>, [handle]
>
>You should be able to abuse struct tcmsg ifindex to represent block #
>as long as you set parent to be something meaningful that is
>identified "block coordinate" via TC_H_XXX (pick something safe not
>in use by ingress or egress; look at: uapi/linux/pkt_sched.h)

Not sure about this. I have take closer look. In general, I don't like
to abuse anything :)


>
>> 
>> We will see the same output if we list filters for ens7 and ens8, including stats:
>> 
>> $ tc -s filter show dev ens7 root
>> filter parent ffff: protocol ip pref 25 flower
>> filter parent ffff: protocol ip pref 25 flower handle 0x1
>>    eth_type ipv4
>>    dst_ip 192.168.1.0/24
>>          action order 1: gact action drop
>>           random type none pass val 0
>>           index 3 ref 1 bind 1 installed 10201 sec used 10150 sec
>>          Action statistics:
>>          Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
>>          backlog 0b 0p requeues 0
>> 
>> $ tc -s filter show dev ens8 root
>> filter dev ens7 parent ffff: protocol ip pref 25 flower
>> filter dev ens7 parent ffff: protocol ip pref 25 flower handle 0x1
>>    eth_type ipv4
>>    dst_ip 192.168.1.0/24
>>          action order 1: gact action drop
>>           random type none pass val 0
>>           index 3 ref 1 bind 1 installed 10202 sec used 10152 sec
>>          Action statistics:
>>          Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
>>          backlog 0b 0p requeues 0
>> 
>> 
>> Issues:
>> - tp->q is set by the device used to add the filter. That has to be resolved.
>>    Impacts the dump (as you can see above)
>> 
>
>I think you have more problems if the dump above is reality;->
>You added to ingress and this is showing egress.

Howcome? I only don't see "dev x" on ens7. That is the only difference,


>
>To complete the thought, dump is:
>
> tc -s filter show block 22

Understood. Again, this should be addressed in follow-up.


>
>cheers,
>jamal
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances
  2017-07-11 12:34   ` Jiri Pirko
@ 2017-07-14  7:33     ` Jamal Hadi Salim
  0 siblings, 0 replies; 11+ messages in thread
From: Jamal Hadi Salim @ 2017-07-14  7:33 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, xiyou.wangcong, edumazet, stephen, jbenc, mlxsw,
	andrew, vivien.didelot, f.fainelli, john.fastabend,
	alexander.h.duyck, daniel, ogerlitz, mrv

On 17-07-11 08:34 AM, Jiri Pirko wrote:
> Tue, Jul 11, 2017 at 02:15:27PM CEST, jhs@mojatatu.com wrote:
>> Hi Jiri,
>>

[..]
>>> Now we can add filter to any of qdiscs sharing the same block:
>>>
>>> $ tc filter add dev ens7 parent ffff: protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>
>>
>> So for backward compat - this also makes sense. But:
>> it does make sense to create new syntax for adding
>> filters and actions:
>>
>> tc filter add block 22 protocol ip pref 25 flower \
>>   dst_ip 192.168.0.0/16 action drop
> 
> Was thinking about that. Decided to pass on this now. This should be
> addressed by follow-up anyway.
> 

Ok, thanks. I feel it is very important.

> 
>>
>> Coordinates of the filter block before were:
>>
>> <ifindex>, <parent>, [handle]
>>
>> You should be able to abuse struct tcmsg ifindex to represent block #
>> as long as you set parent to be something meaningful that is
>> identified "block coordinate" via TC_H_XXX (pick something safe not
>> in use by ingress or egress; look at: uapi/linux/pkt_sched.h)
> 
> Not sure about this. I have take closer look. In general, I don't like
> to abuse anything :)
> 

I should have said it without "ab" i.e "use" ;->
The purpose of those coordinates is to describe the "location"
of what you call the "block" now.
We used to have egress, then ingress "blocks" tied to a port.
then Daniel added clsact above but still ingress + egress
And it used to be tied to a port and you just changed it to be
port independent etc.
It is a natural evolution.


>>> We will see the same output if we list filters for ens7 and ens8, including stats:

[..]
>>> $ tc -s filter show dev ens8 root
>>> filter dev ens7 parent ffff: protocol ip pref 25 flower
>>> filter dev ens7 parent ffff: protocol ip pref 25 flower handle 0x1
>>>     eth_type ipv4
>>>     dst_ip 192.168.1.0/24
>>>           action order 1: gact action drop
>>>            random type none pass val 0
>>>            index 3 ref 1 bind 1 installed 10202 sec used 10152 sec
>>>           Action statistics:
>>>           Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
>>>           backlog 0b 0p requeues 0
>>>
>>>
>>> Issues:
>>> - tp->q is set by the device used to add the filter. That has to be resolved.
>>>     Impacts the dump (as you can see above)
>>>
>>
>> I think you have more problems if the dump above is reality;->
>> You added to ingress and this is showing egress.
> 
> Howcome? I only don't see "dev x" on ens7. That is the only difference,
> 

I meant you are displaying egress qdisc filters(root) but you added them
to ingress (ffff:). I would have expected the output you showed had you
said:
tc -s filter show dev ens8 parent ffff:


cheers,
jamal

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-14  7:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-10 18:51 [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Jiri Pirko
2017-07-10 18:51 ` [patch net-next RFC 1/4] net: sched: introduce support for multiple filter chain pointers registration Jiri Pirko
2017-07-10 18:51 ` [patch net-next RFC 2/4] net: sched: intruduce qdisc_net helper Jiri Pirko
2017-07-10 18:51 ` [patch net-next RFC 3/4] net: sched: introduce shared filter blocks infrastructure Jiri Pirko
2017-07-10 18:51 ` [patch net-next RFC 4/4] net: sched: allow ingress and clsact qdiscs to share filter blocks Jiri Pirko
2017-07-10 18:52 ` [patch iproute2/net-next RFC] tc: Implement filter block sharing to ingress and clsact qdiscs Jiri Pirko
2017-07-11  6:57 ` [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances Or Gerlitz
2017-07-11  7:02   ` Jiri Pirko
2017-07-11 12:15 ` Jamal Hadi Salim
2017-07-11 12:34   ` Jiri Pirko
2017-07-14  7:33     ` Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).