netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/7] net_sched: act: lockless operation
@ 2015-07-06  6:41 Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 1/7] net: sched: extend percpu stats helpers Eric Dumazet
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06  6:41 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexei Starovoitov, Jamal Hadi Salim, John Fastabend,
	Eric Dumazet, Eric Dumazet

As mentioned by Alexei last week in Budapest, it is a bit weird
to take a spinlock in order to drop a packet in a tc filter...

Lets add percpu infra for tc actions and use it for gact & mirred.

Before changes, my host with 8 RX queues was handling 5 Mpps with gact,
and more than 11 Mpps after.

Mirred change is not yet visible if ifb+qdisc is used, as ifb is
not yet multi queue enabled, but is a step forward.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>

Eric Dumazet (7):
  net: sched: extend percpu stats helpers
  net: sched: add percpu stats to actions
  net_sched: act_gact: make tcfg_pval non zero
  net_sched: act_gact: use a separate packet counters for gact_determ()
  net_sched: act_gact: read tcfg_ptype once
  net_sched: act_gact: remove spinlock in fast path
  net_sched: act_mirred: remove spinlock in fast path

 include/net/act_api.h          | 15 ++++++++++-
 include/net/sch_generic.h      | 31 ++++++++++++++--------
 include/net/tc_act/tc_gact.h   |  7 ++---
 include/net/tc_act/tc_mirred.h |  2 +-
 net/core/dev.c                 |  4 +--
 net/sched/act_api.c            | 44 ++++++++++++++++++++++++--------
 net/sched/act_bpf.c            |  2 +-
 net/sched/act_connmark.c       |  3 ++-
 net/sched/act_csum.c           |  3 ++-
 net/sched/act_gact.c           | 44 ++++++++++++++++++--------------
 net/sched/act_ipt.c            |  2 +-
 net/sched/act_mirred.c         | 58 ++++++++++++++++++++++--------------------
 net/sched/act_nat.c            |  3 ++-
 net/sched/act_pedit.c          |  3 ++-
 net/sched/act_simple.c         |  3 ++-
 net/sched/act_skbedit.c        |  3 ++-
 net/sched/act_vlan.c           |  3 ++-
 17 files changed, 148 insertions(+), 82 deletions(-)

-- 
2.4.3.573.g4eafbef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 1/7] net: sched: extend percpu stats helpers
  2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
@ 2015-07-06  6:41 ` Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 2/7] net: sched: add percpu stats to actions Eric Dumazet
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06  6:41 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexei Starovoitov, Jamal Hadi Salim, John Fastabend,
	Eric Dumazet, Eric Dumazet

qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.

We want to add percpu stats for tc action, so this patch add common
helpers.

qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/sch_generic.h | 31 +++++++++++++++++++++----------
 net/core/dev.c            |  4 ++--
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2738f6f87908..2eab08c38e32 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -513,17 +513,20 @@ static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
 	bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
 }
 
-static inline void qdisc_bstats_update_cpu(struct Qdisc *sch,
-					   const struct sk_buff *skb)
+static inline void bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
+				     const struct sk_buff *skb)
 {
-	struct gnet_stats_basic_cpu *bstats =
-				this_cpu_ptr(sch->cpu_bstats);
-
 	u64_stats_update_begin(&bstats->syncp);
 	bstats_update(&bstats->bstats, skb);
 	u64_stats_update_end(&bstats->syncp);
 }
 
+static inline void qdisc_bstats_cpu_update(struct Qdisc *sch,
+					   const struct sk_buff *skb)
+{
+	bstats_cpu_update(this_cpu_ptr(sch->cpu_bstats), skb);
+}
+
 static inline void qdisc_bstats_update(struct Qdisc *sch,
 				       const struct sk_buff *skb)
 {
@@ -547,16 +550,24 @@ static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
 	sch->qstats.drops += count;
 }
 
-static inline void qdisc_qstats_drop(struct Qdisc *sch)
+static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
 {
-	sch->qstats.drops++;
+	qstats->drops++;
 }
 
-static inline void qdisc_qstats_drop_cpu(struct Qdisc *sch)
+static inline void qstats_overlimit_inc(struct gnet_stats_queue *qstats)
 {
-	struct gnet_stats_queue *qstats = this_cpu_ptr(sch->cpu_qstats);
+	qstats->overlimits++;
+}
 
-	qstats->drops++;
+static inline void qdisc_qstats_drop(struct Qdisc *sch)
+{
+	qstats_drop_inc(&sch->qstats);
+}
+
+static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch)
+{
+	qstats_drop_inc(this_cpu_ptr(sch->cpu_qstats));
 }
 
 static inline void qdisc_qstats_overlimit(struct Qdisc *sch)
diff --git a/net/core/dev.c b/net/core/dev.c
index 6778a9999d52..e0d270143fc7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3646,7 +3646,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 
 	qdisc_skb_cb(skb)->pkt_len = skb->len;
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
-	qdisc_bstats_update_cpu(cl->q, skb);
+	qdisc_bstats_cpu_update(cl->q, skb);
 
 	switch (tc_classify(skb, cl, &cl_res)) {
 	case TC_ACT_OK:
@@ -3654,7 +3654,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
-		qdisc_qstats_drop_cpu(cl->q);
+		qdisc_qstats_cpu_drop(cl->q);
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 		kfree_skb(skb);
-- 
2.4.3.573.g4eafbef

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 2/7] net: sched: add percpu stats to actions
  2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 1/7] net: sched: extend percpu stats helpers Eric Dumazet
@ 2015-07-06  6:41 ` Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 3/7] net_sched: act_gact: make tcfg_pval non zero Eric Dumazet
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06  6:41 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexei Starovoitov, Jamal Hadi Salim, John Fastabend,
	Eric Dumazet, Eric Dumazet

Reuse existing percpu infrastructure John Fastabend added for qdisc.

This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/act_api.h    |  4 +++-
 net/sched/act_api.c      | 44 ++++++++++++++++++++++++++++++++++----------
 net/sched/act_bpf.c      |  2 +-
 net/sched/act_connmark.c |  3 ++-
 net/sched/act_csum.c     |  3 ++-
 net/sched/act_gact.c     |  3 ++-
 net/sched/act_ipt.c      |  2 +-
 net/sched/act_mirred.c   |  3 ++-
 net/sched/act_nat.c      |  3 ++-
 net/sched/act_pedit.c    |  3 ++-
 net/sched/act_simple.c   |  3 ++-
 net/sched/act_skbedit.c  |  3 ++-
 net/sched/act_vlan.c     |  3 ++-
 13 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 3ee4c92afd1b..db2063ffd181 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -21,6 +21,8 @@ struct tcf_common {
 	struct gnet_stats_rate_est64	tcfc_rate_est;
 	spinlock_t			tcfc_lock;
 	struct rcu_head			tcfc_rcu;
+	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	struct gnet_stats_queue __percpu *cpu_qstats;
 };
 #define tcf_head	common.tcfc_head
 #define tcf_index	common.tcfc_index
@@ -103,7 +105,7 @@ int tcf_hash_release(struct tc_action *a, int bind);
 u32 tcf_hash_new_index(struct tcf_hashinfo *hinfo);
 int tcf_hash_check(u32 index, struct tc_action *a, int bind);
 int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
-		    int size, int bind);
+		    int size, int bind, bool cpustats);
 void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est);
 void tcf_hash_insert(struct tc_action *a);
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index af427a3dbcba..074a32f466f8 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -27,6 +27,15 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 
+static void free_tcf(struct rcu_head *head)
+{
+	struct tcf_common *p = container_of(head, struct tcf_common, tcfc_rcu);
+
+	free_percpu(p->cpu_bstats);
+	free_percpu(p->cpu_qstats);
+	kfree(p);
+}
+
 void tcf_hash_destroy(struct tc_action *a)
 {
 	struct tcf_common *p = a->priv;
@@ -41,7 +50,7 @@ void tcf_hash_destroy(struct tc_action *a)
 	 * gen_estimator est_timer() might access p->tcfc_lock
 	 * or bstats, wait a RCU grace period before freeing p
 	 */
-	kfree_rcu(p, tcfc_rcu);
+	call_rcu(&p->tcfc_rcu, free_tcf);
 }
 EXPORT_SYMBOL(tcf_hash_destroy);
 
@@ -230,15 +239,16 @@ void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est)
 	if (est)
 		gen_kill_estimator(&pc->tcfc_bstats,
 				   &pc->tcfc_rate_est);
-	kfree_rcu(pc, tcfc_rcu);
+	call_rcu(&pc->tcfc_rcu, free_tcf);
 }
 EXPORT_SYMBOL(tcf_hash_cleanup);
 
 int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
-		    int size, int bind)
+		    int size, int bind, bool cpustats)
 {
 	struct tcf_hashinfo *hinfo = a->ops->hinfo;
 	struct tcf_common *p = kzalloc(size, GFP_KERNEL);
+	int err = -ENOMEM;
 
 	if (unlikely(!p))
 		return -ENOMEM;
@@ -246,18 +256,32 @@ int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
 	if (bind)
 		p->tcfc_bindcnt = 1;
 
+	if (cpustats) {
+		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+		if (!p->cpu_bstats) {
+err1:
+			kfree(p);
+			return err;
+		}
+		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+		if (!p->cpu_qstats) {
+err2:
+			free_percpu(p->cpu_bstats);
+			goto err1;
+		}
+	}
 	spin_lock_init(&p->tcfc_lock);
 	INIT_HLIST_NODE(&p->tcfc_head);
 	p->tcfc_index = index ? index : tcf_hash_new_index(hinfo);
 	p->tcfc_tm.install = jiffies;
 	p->tcfc_tm.lastuse = jiffies;
 	if (est) {
-		int err = gen_new_estimator(&p->tcfc_bstats, NULL,
-					    &p->tcfc_rate_est,
-					    &p->tcfc_lock, est);
+		err = gen_new_estimator(&p->tcfc_bstats, p->cpu_bstats,
+					&p->tcfc_rate_est,
+					&p->tcfc_lock, est);
 		if (err) {
-			kfree(p);
-			return err;
+			free_percpu(p->cpu_qstats);
+			goto err2;
 		}
 	}
 
@@ -615,10 +639,10 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *a,
 	if (err < 0)
 		goto errout;
 
-	if (gnet_stats_copy_basic(&d, NULL, &p->tcfc_bstats) < 0 ||
+	if (gnet_stats_copy_basic(&d, p->cpu_bstats, &p->tcfc_bstats) < 0 ||
 	    gnet_stats_copy_rate_est(&d, &p->tcfc_bstats,
 				     &p->tcfc_rate_est) < 0 ||
-	    gnet_stats_copy_queue(&d, NULL,
+	    gnet_stats_copy_queue(&d, p->cpu_qstats,
 				  &p->tcfc_qstats,
 				  p->tcfc_qstats.qlen) < 0)
 		goto errout;
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 1d56903fd4c7..99aa271633e9 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -281,7 +281,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
 	if (!tcf_hash_check(parm->index, act, bind)) {
 		ret = tcf_hash_create(parm->index, est, act,
-				      sizeof(*prog), bind);
+				      sizeof(*prog), bind, false);
 		if (ret < 0)
 			goto destroy_fp;
 
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index 295d14bd6c67..f2b540220ad0 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -108,7 +108,8 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
 	parm = nla_data(tb[TCA_CONNMARK_PARMS]);
 
 	if (!tcf_hash_check(parm->index, a, bind)) {
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*ci), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*ci),
+				      bind, false);
 		if (ret)
 			return ret;
 
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 4cd5cf1aedf8..b07c535ba8e7 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -62,7 +62,8 @@ static int tcf_csum_init(struct net *n, struct nlattr *nla, struct nlattr *est,
 	parm = nla_data(tb[TCA_CSUM_PARMS]);
 
 	if (!tcf_hash_check(parm->index, a, bind)) {
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*p), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*p),
+				      bind, false);
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 7fffc2272701..a4f8af29ee30 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -85,7 +85,8 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
 #endif
 
 	if (!tcf_hash_check(parm->index, a, bind)) {
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*gact), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*gact),
+				      bind, false);
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index cbc8dd7dd48a..99c9cc1c7af9 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -114,7 +114,7 @@ static int tcf_ipt_init(struct net *net, struct nlattr *nla, struct nlattr *est,
 		index = nla_get_u32(tb[TCA_IPT_INDEX]);
 
 	if (!tcf_hash_check(index, a, bind) ) {
-		ret = tcf_hash_create(index, est, a, sizeof(*ipt), bind);
+		ret = tcf_hash_create(index, est, a, sizeof(*ipt), bind, false);
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index a42a3b257226..002cd6c83dc6 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -93,7 +93,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	if (!tcf_hash_check(parm->index, a, bind)) {
 		if (dev == NULL)
 			return -EINVAL;
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*m), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*m),
+				      bind, false);
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 270a030d5fd0..5be0b3c1c5b0 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -55,7 +55,8 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est,
 	parm = nla_data(tb[TCA_NAT_PARMS]);
 
 	if (!tcf_hash_check(parm->index, a, bind)) {
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*p), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*p),
+				      bind, false);
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 17e6d6669c7f..ce8676ad892f 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -57,7 +57,8 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 	if (!tcf_hash_check(parm->index, a, bind)) {
 		if (!parm->nkeys)
 			return -EINVAL;
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*p), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*p),
+				      bind, false);
 		if (ret)
 			return ret;
 		p = to_pedit(a);
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 6a8d9488613a..d6b708d6afdf 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -103,7 +103,8 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla,
 	defdata = nla_data(tb[TCA_DEF_DATA]);
 
 	if (!tcf_hash_check(parm->index, a, bind)) {
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*d), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*d),
+				      bind, false);
 		if (ret)
 			return ret;
 
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index fcfeeaf838be..6751b5f8c046 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -99,7 +99,8 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
 
 	if (!tcf_hash_check(parm->index, a, bind)) {
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*d), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*d),
+				      bind, false);
 		if (ret)
 			return ret;
 
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index d735ecf0b1a7..796785e0bf96 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -116,7 +116,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 	action = parm->v_action;
 
 	if (!tcf_hash_check(parm->index, a, bind)) {
-		ret = tcf_hash_create(parm->index, est, a, sizeof(*v), bind);
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*v),
+				      bind, false);
 		if (ret)
 			return ret;
 
-- 
2.4.3.573.g4eafbef

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 3/7] net_sched: act_gact: make tcfg_pval non zero
  2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 1/7] net: sched: extend percpu stats helpers Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 2/7] net: sched: add percpu stats to actions Eric Dumazet
@ 2015-07-06  6:41 ` Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 3/7] net_sched: act: " Eric Dumazet
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06  6:41 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexei Starovoitov, Jamal Hadi Salim, John Fastabend,
	Eric Dumazet, Eric Dumazet

First step for gact RCU operation :

Instead of testing if tcfg_pval is zero or not, just make it 1.

No change in behavior, but slightly faster code.

The smp_rmb()/smp_wmb() barriers, while not strictly needed at this
stage are added for upcoming spinlock removal.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 net/sched/act_gact.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index a4f8af29ee30..22a3a61aa090 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -28,14 +28,16 @@
 #ifdef CONFIG_GACT_PROB
 static int gact_net_rand(struct tcf_gact *gact)
 {
-	if (!gact->tcfg_pval || prandom_u32() % gact->tcfg_pval)
+	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+	if (prandom_u32() % gact->tcfg_pval)
 		return gact->tcf_action;
 	return gact->tcfg_paction;
 }
 
 static int gact_determ(struct tcf_gact *gact)
 {
-	if (!gact->tcfg_pval || gact->tcf_bstats.packets % gact->tcfg_pval)
+	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+	if (gact->tcf_bstats.packets % gact->tcfg_pval)
 		return gact->tcf_action;
 	return gact->tcfg_paction;
 }
@@ -105,7 +107,11 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
 #ifdef CONFIG_GACT_PROB
 	if (p_parm) {
 		gact->tcfg_paction = p_parm->paction;
-		gact->tcfg_pval    = p_parm->pval;
+		gact->tcfg_pval    = max_t(u16, 1, p_parm->pval);
+		/* Make sure tcfg_pval is written before tcfg_ptype
+		 * coupled with smp_rmb() in gact_net_rand() & gact_determ()
+		 */
+		smp_wmb();
 		gact->tcfg_ptype   = p_parm->ptype;
 	}
 #endif
-- 
2.4.3.573.g4eafbef

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 3/7] net_sched: act: make tcfg_pval non zero
  2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
                   ` (2 preceding siblings ...)
  2015-07-06  6:41 ` [PATCH v2 net-next 3/7] net_sched: act_gact: make tcfg_pval non zero Eric Dumazet
@ 2015-07-06  6:41 ` Eric Dumazet
  2015-07-06  6:41 ` [PATCH v2 net-next 4/7] net_sched: act_gact: use a separate packet counters for gact_determ() Eric Dumazet
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06  6:41 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexei Starovoitov, Jamal Hadi Salim, John Fastabend,
	Eric Dumazet, Eric Dumazet

First step for gact RCU operation :

Instead of testing if tcfg_pval is zero or not, just make it 1.

No change in behavior, but slightly faster code.

The smp_rmb()/smp_wmb() barriers, while not strictly needed at this
stage are added for upcoming spinlock removal.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 net/sched/act_gact.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index a4f8af29ee30..22a3a61aa090 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -28,14 +28,16 @@
 #ifdef CONFIG_GACT_PROB
 static int gact_net_rand(struct tcf_gact *gact)
 {
-	if (!gact->tcfg_pval || prandom_u32() % gact->tcfg_pval)
+	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+	if (prandom_u32() % gact->tcfg_pval)
 		return gact->tcf_action;
 	return gact->tcfg_paction;
 }
 
 static int gact_determ(struct tcf_gact *gact)
 {
-	if (!gact->tcfg_pval || gact->tcf_bstats.packets % gact->tcfg_pval)
+	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
+	if (gact->tcf_bstats.packets % gact->tcfg_pval)
 		return gact->tcf_action;
 	return gact->tcfg_paction;
 }
@@ -105,7 +107,11 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
 #ifdef CONFIG_GACT_PROB
 	if (p_parm) {
 		gact->tcfg_paction = p_parm->paction;
-		gact->tcfg_pval    = p_parm->pval;
+		gact->tcfg_pval    = max_t(u16, 1, p_parm->pval);
+		/* Make sure tcfg_pval is written before tcfg_ptype
+		 * coupled with smp_rmb() in gact_net_rand() & gact_determ()
+		 */
+		smp_wmb();
 		gact->tcfg_ptype   = p_parm->ptype;
 	}
 #endif
-- 
2.4.3.573.g4eafbef

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 4/7] net_sched: act_gact: use a separate packet counters for gact_determ()
  2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
                   ` (3 preceding siblings ...)
  2015-07-06  6:41 ` [PATCH v2 net-next 3/7] net_sched: act: " Eric Dumazet
@ 2015-07-06  6:41 ` Eric Dumazet
  2015-07-06  6:49 ` [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
  2015-07-06 12:12 ` Jamal Hadi Salim
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06  6:41 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexei Starovoitov, Jamal Hadi Salim, John Fastabend,
	Eric Dumazet, Eric Dumazet

Second step for gact RCU operation :

We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.

gact_determ() would not work without a central packet counter,
so lets add it for this mode.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/tc_act/tc_gact.h | 7 ++++---
 net/sched/act_gact.c         | 4 +++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 9fc9b578908a..592a6bc02b0b 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -6,9 +6,10 @@
 struct tcf_gact {
 	struct tcf_common	common;
 #ifdef CONFIG_GACT_PROB
-        u16			tcfg_ptype;
-        u16			tcfg_pval;
-        int			tcfg_paction;
+	u16			tcfg_ptype;
+	u16			tcfg_pval;
+	int			tcfg_paction;
+	atomic_t		packets;
 #endif
 };
 #define to_gact(a) \
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 22a3a61aa090..2f9bec584b3f 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -36,8 +36,10 @@ static int gact_net_rand(struct tcf_gact *gact)
 
 static int gact_determ(struct tcf_gact *gact)
 {
+	u32 pack = atomic_inc_return(&gact->packets);
+
 	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
-	if (gact->tcf_bstats.packets % gact->tcfg_pval)
+	if (pack % gact->tcfg_pval)
 		return gact->tcf_action;
 	return gact->tcfg_paction;
 }
-- 
2.4.3.573.g4eafbef

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/7] net_sched: act: lockless operation
  2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
                   ` (4 preceding siblings ...)
  2015-07-06  6:41 ` [PATCH v2 net-next 4/7] net_sched: act_gact: use a separate packet counters for gact_determ() Eric Dumazet
@ 2015-07-06  6:49 ` Eric Dumazet
  2015-07-06 12:12 ` Jamal Hadi Salim
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06  6:49 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexei Starovoitov, Jamal Hadi Salim, John Fastabend,
	Eric Dumazet, Eric Dumazet

On Mon, Jul 6, 2015 at 8:41 AM, Eric Dumazet <edumazet@google.com> wrote:
> As mentioned by Alexei last week in Budapest, it is a bit weird
> to take a spinlock in order to drop a packet in a tc filter...
>

Arg, please ignore v2, I messed my git send-email command

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/7] net_sched: act: lockless operation
  2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
                   ` (5 preceding siblings ...)
  2015-07-06  6:49 ` [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
@ 2015-07-06 12:12 ` Jamal Hadi Salim
  2015-07-06 12:17   ` Eric Dumazet
  6 siblings, 1 reply; 9+ messages in thread
From: Jamal Hadi Salim @ 2015-07-06 12:12 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: netdev, Alexei Starovoitov, John Fastabend, Eric Dumazet

On 07/06/15 02:41, Eric Dumazet wrote:
> As mentioned by Alexei last week in Budapest, it is a bit weird
> to take a spinlock in order to drop a packet in a tc filter...
>
> Lets add percpu infra for tc actions and use it for gact & mirred.
>
> Before changes, my host with 8 RX queues was handling 5 Mpps with gact,
> and more than 11 Mpps after.
>
> Mirred change is not yet visible if ifb+qdisc is used, as ifb is
> not yet multi queue enabled, but is a step forward.
>

on mirred:
The main thing that is read-write is the stats. lastuse timestamp
as well, but that one doesnt have to be accurate (used mainly
to help decide when to flush out policies). so the lock around
all that code is not necessary.

BTW: I was wondering based on what you said earlier on false sharing
if tcf_common can be re-arranged to avoid that?

For mirred, you may have to add for overlimit stat in the common
structure (we should really increment drop stats when we return
TC_ACT_SHOT).

Having said that: we want to maintain the fact that an action
instance (based on index) can be used across different policies.
i dont see any issue with the way you are proceeding for that
to continue to work. i.e i can say something like:
- create an action to redirect or mirror to port blah index 10
- create filter one on eth1 with 1 of the actions being mirred index 10
- create filter two on eth0 with 1 of the actions being mirred index 10

On IFB:
we use tasklets originally to avoid re-ordering. I think tasklets would
still be useful to keep, with each tied to a queue?

cheers,
jamal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/7] net_sched: act: lockless operation
  2015-07-06 12:12 ` Jamal Hadi Salim
@ 2015-07-06 12:17   ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2015-07-06 12:17 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David S. Miller, netdev, Alexei Starovoitov, John Fastabend,
	Eric Dumazet

On Mon, Jul 6, 2015 at 2:12 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> on mirred:
> The main thing that is read-write is the stats. lastuse timestamp
> as well, but that one doesnt have to be accurate (used mainly
> to help decide when to flush out policies). so the lock around
> all that code is not necessary.
>
> BTW: I was wondering based on what you said earlier on false sharing
> if tcf_common can be re-arranged to avoid that?

Not really : You would still have a stall on lastuse ultimately, even
if it sits in a cache line of its own.

I mentioned that it shared a cache line to explain with "perf" was
showing the stall on reading another field.


>
> For mirred, you may have to add for overlimit stat in the common
> structure (we should really increment drop stats when we return
> TC_ACT_SHOT).
>
> Having said that: we want to maintain the fact that an action
> instance (based on index) can be used across different policies.
> i dont see any issue with the way you are proceeding for that
> to continue to work. i.e i can say something like:
> - create an action to redirect or mirror to port blah index 10
> - create filter one on eth1 with 1 of the actions being mirred index 10
> - create filter two on eth0 with 1 of the actions being mirred index 10

I do not see any issues either.

>
> On IFB:
> we use tasklets originally to avoid re-ordering. I think tasklets would
> still be useful to keep, with each tied to a queue?

Sure. We do not care about reorders if packets were delivered to
different RX queues.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-06 12:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-06  6:41 [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
2015-07-06  6:41 ` [PATCH v2 net-next 1/7] net: sched: extend percpu stats helpers Eric Dumazet
2015-07-06  6:41 ` [PATCH v2 net-next 2/7] net: sched: add percpu stats to actions Eric Dumazet
2015-07-06  6:41 ` [PATCH v2 net-next 3/7] net_sched: act_gact: make tcfg_pval non zero Eric Dumazet
2015-07-06  6:41 ` [PATCH v2 net-next 3/7] net_sched: act: " Eric Dumazet
2015-07-06  6:41 ` [PATCH v2 net-next 4/7] net_sched: act_gact: use a separate packet counters for gact_determ() Eric Dumazet
2015-07-06  6:49 ` [PATCH v2 net-next 0/7] net_sched: act: lockless operation Eric Dumazet
2015-07-06 12:12 ` Jamal Hadi Salim
2015-07-06 12:17   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).