* [RFC Patch net-next 0/4] net_sched: make ingress filters lockless
@ 2014-01-09 18:19 Cong Wang
2014-01-09 18:19 ` [RFC Patch net-next 1/4] net_sched: switch filter list to list_head Cong Wang
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Cong Wang @ 2014-01-09 18:19 UTC (permalink / raw)
To: netdev
Cc: Cong Wang, John Fastabend, Eric Dumazet, David S. Miller,
Jamal Hadi Salim
This patch tries to switch filter list to using struct
list_head, so that on the read side, the list can be traversed
with RCU read lock. Same for actions.
I hope either on egress or ingress classify is already protected by RCU
read lock, but I don't pretend I fully understanding qdisc locking.
Also, I am not sure I use RCU API's correctly at all. At least
I don't see any warning with CONFIG_PROVE_RCU=y.
Without this patch, the spin_lock easily appears on the top of
my perf top with 4 netperf sessions (4 is the number of CPU)
and with 1000 u32 filters on ingress.
Question 1:
Am I still missing something on qdisc locking?
Question 2:
Is it okay to call qdisc_bstats_update() without any lock?
Please comment.
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cong Wang (4):
net_sched: switch filter list to list_head
net_sched: avoid holding qdisc lock for filters
net_sched: use RCU for tc actions traverse
net_sched: make ingress qdisc lockless
include/net/pkt_cls.h | 25 ++++--------------------
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 14 +++++++++-----
net/core/dev.c | 2 --
net/sched/act_api.c | 12 +++++++-----
net/sched/cls_api.c | 48 ++++++++++++++++++++++++++++-------------------
net/sched/sch_api.c | 35 +++++++++++++++++++++-------------
net/sched/sch_atm.c | 14 +++++++-------
net/sched/sch_cbq.c | 9 +++++----
net/sched/sch_choke.c | 11 ++++++-----
net/sched/sch_drr.c | 7 ++++---
net/sched/sch_dsmark.c | 7 ++++---
net/sched/sch_fq_codel.c | 9 +++++----
net/sched/sch_hfsc.c | 15 +++++++++------
net/sched/sch_htb.c | 20 ++++++++++++--------
net/sched/sch_ingress.c | 14 +++++++++++---
net/sched/sch_multiq.c | 7 ++++---
net/sched/sch_prio.c | 9 +++++----
net/sched/sch_qfq.c | 7 ++++---
net/sched/sch_sfb.c | 9 +++++----
net/sched/sch_sfq.c | 11 ++++++-----
21 files changed, 160 insertions(+), 129 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC Patch net-next 1/4] net_sched: switch filter list to list_head
2014-01-09 18:19 [RFC Patch net-next 0/4] net_sched: make ingress filters lockless Cong Wang
@ 2014-01-09 18:19 ` Cong Wang
2014-01-10 0:17 ` Eric Dumazet
2014-01-10 3:48 ` John Fastabend
2014-01-09 18:19 ` [RFC Patch net-next 2/4] net_sched: avoid holding qdisc lock for filters Cong Wang
` (2 subsequent siblings)
3 siblings, 2 replies; 19+ messages in thread
From: Cong Wang @ 2014-01-09 18:19 UTC (permalink / raw)
To: netdev
Cc: Cong Wang, John Fastabend, Eric Dumazet, David S. Miller,
Jamal Hadi Salim
So that it could be potentially traversed with RCU.
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
include/net/pkt_sched.h | 4 ++--
include/net/sch_generic.h | 9 ++++++---
net/sched/cls_api.c | 36 ++++++++++++++++++++++++------------
net/sched/sch_api.c | 35 ++++++++++++++++++++++-------------
net/sched/sch_atm.c | 14 +++++++-------
net/sched/sch_cbq.c | 9 +++++----
net/sched/sch_choke.c | 11 ++++++-----
net/sched/sch_drr.c | 7 ++++---
net/sched/sch_dsmark.c | 7 ++++---
net/sched/sch_fq_codel.c | 9 +++++----
net/sched/sch_hfsc.c | 15 +++++++++------
net/sched/sch_htb.c | 20 ++++++++++++--------
net/sched/sch_ingress.c | 14 +++++++++++---
net/sched/sch_multiq.c | 7 ++++---
net/sched/sch_prio.c | 9 +++++----
net/sched/sch_qfq.c | 7 ++++---
net/sched/sch_sfb.c | 9 +++++----
net/sched/sch_sfq.c | 11 ++++++-----
18 files changed, 141 insertions(+), 92 deletions(-)
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 891d80d..27a1efa 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -109,9 +109,9 @@ static inline void qdisc_run(struct Qdisc *q)
__qdisc_run(q);
}
-int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
+int tc_classify_compat(struct sk_buff *skb, const struct list_head *head,
struct tcf_result *res);
-int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+int tc_classify(struct sk_buff *skb, const struct list_head *head,
struct tcf_result *res);
/* Calculate maximal size of packet seen by hard_start_xmit
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 013d96d..97123cc 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -6,6 +6,7 @@
#include <linux/rcupdate.h>
#include <linux/pkt_sched.h>
#include <linux/pkt_cls.h>
+#include <linux/list.h>
#include <net/gen_stats.h>
#include <net/rtnetlink.h>
@@ -143,7 +144,7 @@ struct Qdisc_class_ops {
void (*walk)(struct Qdisc *, struct qdisc_walker * arg);
/* Filter manipulation */
- struct tcf_proto ** (*tcf_chain)(struct Qdisc *, unsigned long);
+ struct list_head * (*tcf_chain)(struct Qdisc *, unsigned long);
unsigned long (*bind_tcf)(struct Qdisc *, unsigned long,
u32 classid);
void (*unbind_tcf)(struct Qdisc *, unsigned long);
@@ -184,6 +185,8 @@ struct tcf_result {
u32 classid;
};
+struct tcf_proto;
+
struct tcf_proto_ops {
struct list_head head;
char kind[IFNAMSIZ];
@@ -212,7 +215,7 @@ struct tcf_proto_ops {
struct tcf_proto {
/* Fast access part */
- struct tcf_proto *next;
+ struct list_head head;
void *root;
int (*classify)(struct sk_buff *,
const struct tcf_proto *,
@@ -376,7 +379,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
void __qdisc_calculate_pkt_len(struct sk_buff *skb,
const struct qdisc_size_table *stab);
void tcf_destroy(struct tcf_proto *tp);
-void tcf_destroy_chain(struct tcf_proto **fl);
+void tcf_destroy_chain(struct list_head *fl);
/* Reset all TX qdiscs greater then index of a device. */
static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d8c42b1..bcc9987 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -125,8 +125,9 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
u32 parent;
struct net_device *dev;
struct Qdisc *q;
- struct tcf_proto **back, **chain;
- struct tcf_proto *tp;
+ struct tcf_proto *back;
+ struct list_head *chain;
+ struct tcf_proto *tp, *res = NULL;
const struct tcf_proto_ops *tp_ops;
const struct Qdisc_class_ops *cops;
unsigned long cl;
@@ -196,21 +197,27 @@ replay:
goto errout;
/* Check the chain for existence of proto-tcf with this priority */
- for (back = chain; (tp = *back) != NULL; back = &tp->next) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(tp, chain, head) {
+ back = list_next_entry(tp, head);
if (tp->prio >= prio) {
if (tp->prio == prio) {
if (!nprio ||
- (tp->protocol != protocol && protocol))
+ (tp->protocol != protocol && protocol)) {
+ rcu_read_unlock();
goto errout;
+ }
+ res = tp;
} else
- tp = NULL;
+ res = NULL;
break;
}
}
+ rcu_read_unlock();
root_lock = qdisc_root_sleeping_lock(q);
- if (tp == NULL) {
+ if ((tp = res) == NULL) {
/* Proto-tcf does not exist, create new one */
if (tca[TCA_KIND] == NULL || !protocol)
@@ -228,6 +235,7 @@ replay:
tp = kzalloc(sizeof(*tp), GFP_KERNEL);
if (tp == NULL)
goto errout;
+ INIT_LIST_HEAD(&tp->head);
err = -ENOENT;
tp_ops = tcf_proto_lookup_ops(tca[TCA_KIND]);
if (tp_ops == NULL) {
@@ -258,7 +266,7 @@ replay:
}
tp->ops = tp_ops;
tp->protocol = protocol;
- tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back));
+ tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(back));
tp->q = q;
tp->classify = tp_ops->classify;
tp->classid = parent;
@@ -280,7 +288,7 @@ replay:
if (fh == 0) {
if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
spin_lock_bh(root_lock);
- *back = tp->next;
+ list_del_rcu(&tp->head);
spin_unlock_bh(root_lock);
tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER);
@@ -321,8 +329,7 @@ replay:
if (err == 0) {
if (tp_created) {
spin_lock_bh(root_lock);
- tp->next = *back;
- *back = tp;
+ list_add_rcu(&tp->head, chain);
spin_unlock_bh(root_lock);
}
tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER);
@@ -417,7 +424,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
int s_t;
struct net_device *dev;
struct Qdisc *q;
- struct tcf_proto *tp, **chain;
+ struct tcf_proto *tp;
+ struct list_head *chain;
struct tcmsg *tcm = nlmsg_data(cb->nlh);
unsigned long cl = 0;
const struct Qdisc_class_ops *cops;
@@ -451,7 +459,9 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
s_t = cb->args[0];
- for (tp = *chain, t = 0; tp; tp = tp->next, t++) {
+ t = 0;
+ rcu_read_lock();
+ list_for_each_entry_rcu(tp, chain, head) {
if (t < s_t)
continue;
if (TC_H_MAJ(tcm->tcm_info) &&
@@ -482,7 +492,9 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
cb->args[1] = arg.w.count + 1;
if (arg.w.stop)
break;
+ t++;
}
+ rcu_read_unlock();
cb->args[0] = t;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 1313145..df86686 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
#include <linux/hrtimer.h>
#include <linux/lockdep.h>
#include <linux/slab.h>
+#include <linux/rculist.h>
#include <net/net_namespace.h>
#include <net/sock.h>
@@ -1772,13 +1773,15 @@ done:
* to this qdisc, (optionally) tests for protocol and asks
* specific classifiers.
*/
-int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
+int tc_classify_compat(struct sk_buff *skb, const struct list_head *head,
struct tcf_result *res)
{
+ struct tcf_proto *tp;
__be16 protocol = skb->protocol;
- int err;
+ int err = -1;
- for (; tp; tp = tp->next) {
+ WARN_ON_ONCE(!rcu_read_lock_held());
+ list_for_each_entry_rcu(tp, head, head) {
if (tp->protocol != protocol &&
tp->protocol != htons(ETH_P_ALL))
continue;
@@ -1789,23 +1792,25 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
if (err != TC_ACT_RECLASSIFY && skb->tc_verd)
skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
#endif
- return err;
+ break;
}
}
- return -1;
+
+ return err;
}
EXPORT_SYMBOL(tc_classify_compat);
-int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+int tc_classify(struct sk_buff *skb, const struct list_head *head,
struct tcf_result *res)
{
int err = 0;
#ifdef CONFIG_NET_CLS_ACT
- const struct tcf_proto *otp = tp;
+ const struct tcf_proto *tp;
+ const struct tcf_proto *otp = list_first_entry(head, struct tcf_proto, head);
reclassify:
#endif
- err = tc_classify_compat(skb, tp, res);
+ err = tc_classify_compat(skb, head, res);
#ifdef CONFIG_NET_CLS_ACT
if (err == TC_ACT_RECLASSIFY) {
u32 verd = G_TC_VERD(skb->tc_verd);
@@ -1830,16 +1835,20 @@ void tcf_destroy(struct tcf_proto *tp)
{
tp->ops->destroy(tp);
module_put(tp->ops->owner);
+ synchronize_rcu();
kfree(tp);
}
-void tcf_destroy_chain(struct tcf_proto **fl)
+void tcf_destroy_chain(struct list_head *fl)
{
- struct tcf_proto *tp;
+ struct tcf_proto *tp, *n;
+ LIST_HEAD(list);
+ list_splice_init_rcu(fl, &list, synchronize_rcu);
- while ((tp = *fl) != NULL) {
- *fl = tp->next;
- tcf_destroy(tp);
+ list_for_each_entry_safe(tp, n, &list, head) {
+ tp->ops->destroy(tp);
+ module_put(tp->ops->owner);
+ kfree(tp);
}
}
EXPORT_SYMBOL(tcf_destroy_chain);
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 1f9c314..20a07c2 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -41,7 +41,7 @@
struct atm_flow_data {
struct Qdisc *q; /* FIFO, TBF, etc. */
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
struct atm_vcc *vcc; /* VCC; NULL if VCC is closed */
void (*old_pop)(struct atm_vcc *vcc,
struct sk_buff *skb); /* chaining */
@@ -273,7 +273,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
error = -ENOBUFS;
goto err_out;
}
- flow->filter_list = NULL;
+ INIT_LIST_HEAD(&flow->filter_list);
flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid);
if (!flow->q)
flow->q = &noop_qdisc;
@@ -311,7 +311,7 @@ static int atm_tc_delete(struct Qdisc *sch, unsigned long arg)
pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow);
if (list_empty(&flow->list))
return -EINVAL;
- if (flow->filter_list || flow == &p->link)
+ if (!list_empty(&flow->filter_list) || flow == &p->link)
return -EBUSY;
/*
* Reference count must be 2: one for "keepalive" (set at class
@@ -345,7 +345,7 @@ static void atm_tc_walk(struct Qdisc *sch, struct qdisc_walker *walker)
}
}
-static struct tcf_proto **atm_tc_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *atm_tc_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct atm_qdisc_data *p = qdisc_priv(sch);
struct atm_flow_data *flow = (struct atm_flow_data *)cl;
@@ -370,9 +370,9 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
if (TC_H_MAJ(skb->priority) != sch->handle ||
!(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) {
list_for_each_entry(flow, &p->flows, list) {
- if (flow->filter_list) {
+ if (!list_empty(&flow->filter_list)) {
result = tc_classify_compat(skb,
- flow->filter_list,
+ &flow->filter_list,
&res);
if (result < 0)
continue;
@@ -544,7 +544,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt)
if (!p->link.q)
p->link.q = &noop_qdisc;
pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q);
- p->link.filter_list = NULL;
+ INIT_LIST_HEAD(&p->link.filter_list);
p->link.vcc = NULL;
p->link.sock = NULL;
p->link.classid = sch->handle;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 2f80d01..a5914ff 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -133,7 +133,7 @@ struct cbq_class {
struct gnet_stats_rate_est64 rate_est;
struct tc_cbq_xstats xstats;
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
int refcnt;
int filters;
@@ -239,8 +239,8 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
/*
* Step 2+n. Apply classifier.
*/
- if (!head->filter_list ||
- (result = tc_classify_compat(skb, head->filter_list, &res)) < 0)
+ if (list_empty(&head->filter_list) ||
+ (result = tc_classify_compat(skb, &head->filter_list, &res)) < 0)
goto fallback;
cl = (void *)res.class;
@@ -1881,6 +1881,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
}
}
+ INIT_LIST_HEAD(&cl->filter_list);
cl->R_tab = rtab;
rtab = NULL;
cl->refcnt = 1;
@@ -1976,7 +1977,7 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg)
return 0;
}
-static struct tcf_proto **cbq_find_tcf(struct Qdisc *sch, unsigned long arg)
+static struct list_head *cbq_find_tcf(struct Qdisc *sch, unsigned long arg)
{
struct cbq_sched_data *q = qdisc_priv(sch);
struct cbq_class *cl = (struct cbq_class *)arg;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index ddd73cb..9b36830 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -58,7 +58,7 @@ struct choke_sched_data {
/* Variables */
struct red_vars vars;
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
struct {
u32 prob_drop; /* Early probability drops */
u32 prob_mark; /* Early probability marks */
@@ -202,7 +202,7 @@ static bool choke_classify(struct sk_buff *skb,
struct tcf_result res;
int result;
- result = tc_classify(skb, q->filter_list, &res);
+ result = tc_classify(skb, &q->filter_list, &res);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
@@ -256,7 +256,7 @@ static bool choke_match_random(const struct choke_sched_data *q,
return false;
oskb = choke_peek_random(q, pidx);
- if (q->filter_list)
+ if (!list_empty(&q->filter_list))
return choke_get_classid(nskb) == choke_get_classid(oskb);
return choke_match_flow(oskb, nskb);
@@ -268,7 +268,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
const struct red_parms *p = &q->parms;
int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- if (q->filter_list) {
+ if (!list_empty(&q->filter_list)) {
/* If using external classifiers, get result and record it. */
if (!choke_classify(skb, sch, &ret))
goto other_drop; /* Packet was eaten by filter */
@@ -476,6 +476,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
q->flags = ctl->flags;
q->limit = ctl->limit;
+ INIT_LIST_HEAD(&q->filter_list);
red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
ctl->Plog, ctl->Scell_log,
@@ -566,7 +567,7 @@ static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
return 0;
}
-static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *choke_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct choke_sched_data *q = qdisc_priv(sch);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 8302717..62f45ac 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -35,7 +35,7 @@ struct drr_class {
struct drr_sched {
struct list_head active;
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
struct Qdisc_class_hash clhash;
};
@@ -184,7 +184,7 @@ static void drr_put_class(struct Qdisc *sch, unsigned long arg)
drr_destroy_class(sch, cl);
}
-static struct tcf_proto **drr_tcf_chain(struct Qdisc *sch, unsigned long cl)
+static struct list_head *drr_tcf_chain(struct Qdisc *sch, unsigned long cl)
{
struct drr_sched *q = qdisc_priv(sch);
@@ -328,7 +328,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
}
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tc_classify(skb, q->filter_list, &res);
+ result = tc_classify(skb, &q->filter_list, &res);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
@@ -443,6 +443,7 @@ static int drr_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
if (err < 0)
return err;
INIT_LIST_HEAD(&q->active);
+ INIT_LIST_HEAD(&q->filter_list);
return 0;
}
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 49d6ef3..4489ffe 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -37,7 +37,7 @@
struct dsmark_qdisc_data {
struct Qdisc *q;
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
u8 *mask; /* "owns" the array */
u8 *value;
u16 indices;
@@ -186,7 +186,7 @@ ignore:
}
}
-static inline struct tcf_proto **dsmark_find_tcf(struct Qdisc *sch,
+static inline struct list_head *dsmark_find_tcf(struct Qdisc *sch,
unsigned long cl)
{
struct dsmark_qdisc_data *p = qdisc_priv(sch);
@@ -229,7 +229,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
skb->tc_index = TC_H_MIN(skb->priority);
else {
struct tcf_result res;
- int result = tc_classify(skb, p->filter_list, &res);
+ int result = tc_classify(skb, &p->filter_list, &res);
pr_debug("result %d class 0x%04x\n", result, res.classid);
@@ -380,6 +380,7 @@ static int dsmark_init(struct Qdisc *sch, struct nlattr *opt)
p->indices = indices;
p->default_index = default_index;
p->set_tc_index = nla_get_flag(tb[TCA_DSMARK_SET_TC_INDEX]);
+ INIT_LIST_HEAD(&p->filter_list);
p->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, sch->handle);
if (p->q == NULL)
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 5578628..3c19917 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -52,7 +52,7 @@ struct fq_codel_flow {
}; /* please try to keep this structure <= 64 bytes */
struct fq_codel_sched_data {
- struct tcf_proto *filter_list; /* optional external classifier */
+ struct list_head filter_list; /* optional external classifier */
struct fq_codel_flow *flows; /* Flows table [flows_cnt] */
u32 *backlogs; /* backlog table [flows_cnt] */
u32 flows_cnt; /* number of flows */
@@ -92,11 +92,11 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
TC_H_MIN(skb->priority) <= q->flows_cnt)
return TC_H_MIN(skb->priority);
- if (!q->filter_list)
+ if (list_emty(&q->filter_list))
return fq_codel_hash(q, skb) + 1;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tc_classify(skb, q->filter_list, &res);
+ result = tc_classify(skb, &q->filter_list, &res);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
@@ -393,6 +393,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
q->perturbation = net_random();
INIT_LIST_HEAD(&q->new_flows);
INIT_LIST_HEAD(&q->old_flows);
+ INIT_LIST_HEAD(&q->filter_list);
codel_params_init(&q->cparams);
codel_stats_init(&q->cstats);
q->cparams.ecn = true;
@@ -501,7 +502,7 @@ static void fq_codel_put(struct Qdisc *q, unsigned long cl)
{
}
-static struct tcf_proto **fq_codel_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *fq_codel_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct fq_codel_sched_data *q = qdisc_priv(sch);
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index c407561..8bb4ade 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -116,7 +116,7 @@ struct hfsc_class {
struct gnet_stats_queue qstats;
struct gnet_stats_rate_est64 rate_est;
unsigned int level; /* class level in hierarchy */
- struct tcf_proto *filter_list; /* filter list */
+ struct list_head filter_list; /* filter list */
unsigned int filter_cnt; /* filter count */
struct hfsc_sched *sched; /* scheduler data */
@@ -1083,6 +1083,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
cl->refcnt = 1;
cl->sched = q;
cl->cl_parent = parent;
+ INIT_LIST_HEAD(&cl->filter_list);
cl->qdisc = qdisc_create_dflt(sch->dev_queue,
&pfifo_qdisc_ops, classid);
if (cl->qdisc == NULL)
@@ -1151,7 +1152,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
struct hfsc_sched *q = qdisc_priv(sch);
struct hfsc_class *head, *cl;
struct tcf_result res;
- struct tcf_proto *tcf;
+ struct list_head *list;
int result;
if (TC_H_MAJ(skb->priority ^ sch->handle) == 0 &&
@@ -1161,8 +1162,10 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
head = &q->root;
- tcf = q->root.filter_list;
- while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
+ list = &q->root.filter_list;
+ while (list) {
+ if ((result = tc_classify(skb, list, &res)) < 0)
+ break;
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
case TC_ACT_QUEUED:
@@ -1185,7 +1188,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
return cl; /* hit leaf class */
/* apply inner filter chain */
- tcf = cl->filter_list;
+ list = &cl->filter_list;
head = cl;
}
@@ -1285,7 +1288,7 @@ hfsc_unbind_tcf(struct Qdisc *sch, unsigned long arg)
cl->filter_cnt--;
}
-static struct tcf_proto **
+static struct list_head *
hfsc_tcf_chain(struct Qdisc *sch, unsigned long arg)
{
struct hfsc_sched *q = qdisc_priv(sch);
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 0db5a6e..c28e405 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -103,7 +103,7 @@ struct htb_class {
u32 prio; /* these two are used only by leaves... */
int quantum; /* but stored for parent-to-leaf return */
- struct tcf_proto *filter_list; /* class attached filters */
+ struct list_head filter_list; /* class attached filters */
int filter_cnt;
int refcnt; /* usage count of this class */
@@ -153,7 +153,7 @@ struct htb_sched {
int rate2quantum; /* quant = rate / rate2quantum */
/* filters for qdisc itself */
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
#define HTB_WARN_TOOMANYEVENTS 0x1
unsigned int warned; /* only one warning */
@@ -209,7 +209,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
struct htb_sched *q = qdisc_priv(sch);
struct htb_class *cl;
struct tcf_result res;
- struct tcf_proto *tcf;
+ struct list_head *head;
int result;
/* allow to select class by setting skb->priority to valid classid;
@@ -223,8 +223,10 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
return cl;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- tcf = q->filter_list;
- while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
+ head = &q->filter_list;
+ while (!list_empty(head)) {
+ if ((result = tc_classify(skb, head, &res)) < 0)
+ break;
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
case TC_ACT_QUEUED:
@@ -246,7 +248,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
return cl; /* we hit leaf; return it */
/* we have got inner class; apply inner filter chain */
- tcf = cl->filter_list;
+ head = &cl->filter_list;
}
/* classification failed; try to use default class */
cl = htb_find(TC_H_MAKE(TC_H_MAJ(sch->handle), q->defcls), sch);
@@ -1040,6 +1042,7 @@ static int htb_init(struct Qdisc *sch, struct nlattr *opt)
qdisc_watchdog_init(&q->watchdog, sch);
INIT_WORK(&q->work, htb_work_func);
skb_queue_head_init(&q->direct_queue);
+ INIT_LIST_HEAD(&q->filter_list);
if (tb[TCA_HTB_DIRECT_QLEN])
q->direct_qlen = nla_get_u32(tb[TCA_HTB_DIRECT_QLEN]);
@@ -1413,6 +1416,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
cl->refcnt = 1;
cl->children = 0;
INIT_LIST_HEAD(&cl->un.leaf.drop_list);
+ INIT_LIST_HEAD(&cl->filter_list);
RB_CLEAR_NODE(&cl->pq_node);
for (prio = 0; prio < TC_HTB_NUMPRIO; prio++)
@@ -1518,11 +1522,11 @@ failure:
return err;
}
-static struct tcf_proto **htb_find_tcf(struct Qdisc *sch, unsigned long arg)
+static struct list_head *htb_find_tcf(struct Qdisc *sch, unsigned long arg)
{
struct htb_sched *q = qdisc_priv(sch);
struct htb_class *cl = (struct htb_class *)arg;
- struct tcf_proto **fl = cl ? &cl->filter_list : &q->filter_list;
+ struct list_head *fl = cl ? &cl->filter_list : &q->filter_list;
return fl;
}
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index bce1665..b182e0c 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -17,7 +17,7 @@
struct ingress_qdisc_data {
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
};
/* ------------------------- Class/flow operations ------------------------- */
@@ -46,7 +46,7 @@ static void ingress_walk(struct Qdisc *sch, struct qdisc_walker *walker)
{
}
-static struct tcf_proto **ingress_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *ingress_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct ingress_qdisc_data *p = qdisc_priv(sch);
@@ -61,7 +61,7 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
struct tcf_result res;
int result;
- result = tc_classify(skb, p->filter_list, &res);
+ result = tc_classify(skb, &p->filter_list, &res);
qdisc_bstats_update(sch, skb);
switch (result) {
@@ -108,6 +108,13 @@ nla_put_failure:
return -1;
}
+static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
+{
+ struct ingress_qdisc_data *q = qdisc_priv(sch);
+ INIT_LIST_HEAD(&q->filter_list);
+ return 0;
+}
+
static const struct Qdisc_class_ops ingress_class_ops = {
.leaf = ingress_leaf,
.get = ingress_get,
@@ -119,6 +126,7 @@ static const struct Qdisc_class_ops ingress_class_ops = {
};
static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
+ .init = ingress_init,
.cl_ops = &ingress_class_ops,
.id = "ingress",
.priv_size = sizeof(struct ingress_qdisc_data),
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index afb050a..e4be093 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -31,7 +31,7 @@ struct multiq_sched_data {
u16 bands;
u16 max_bands;
u16 curband;
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
struct Qdisc **queues;
};
@@ -45,7 +45,7 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
int err;
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- err = tc_classify(skb, q->filter_list, &res);
+ err = tc_classify(skb, &q->filter_list, &res);
#ifdef CONFIG_NET_CLS_ACT
switch (err) {
case TC_ACT_STOLEN:
@@ -258,6 +258,7 @@ static int multiq_init(struct Qdisc *sch, struct nlattr *opt)
if (opt == NULL)
return -EINVAL;
+ INIT_LIST_HEAD(&q->filter_list);
q->max_bands = qdisc_dev(sch)->num_tx_queues;
q->queues = kcalloc(q->max_bands, sizeof(struct Qdisc *), GFP_KERNEL);
@@ -388,7 +389,7 @@ static void multiq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
}
}
-static struct tcf_proto **multiq_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *multiq_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct multiq_sched_data *q = qdisc_priv(sch);
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 79359b6..f4ee09f 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -24,7 +24,7 @@
struct prio_sched_data {
int bands;
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
u8 prio2band[TC_PRIO_MAX+1];
struct Qdisc *queues[TCQ_PRIO_BANDS];
};
@@ -40,7 +40,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
if (TC_H_MAJ(skb->priority) != sch->handle) {
- err = tc_classify(skb, q->filter_list, &res);
+ err = tc_classify(skb, &q->filter_list, &res);
#ifdef CONFIG_NET_CLS_ACT
switch (err) {
case TC_ACT_STOLEN:
@@ -50,7 +50,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
return NULL;
}
#endif
- if (!q->filter_list || err < 0) {
+ if (list_empty(&q->filter_list) || err < 0) {
if (TC_H_MAJ(band))
band = 0;
return q->queues[q->prio2band[band & TC_PRIO_MAX]];
@@ -235,6 +235,7 @@ static int prio_init(struct Qdisc *sch, struct nlattr *opt)
if ((err = prio_tune(sch, opt)) != 0)
return err;
}
+ INIT_LIST_HEAD(&q->filter_list);
return 0;
}
@@ -351,7 +352,7 @@ static void prio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
}
}
-static struct tcf_proto **prio_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *prio_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct prio_sched_data *q = qdisc_priv(sch);
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 8056fb4..c43f85a 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -181,7 +181,7 @@ struct qfq_group {
};
struct qfq_sched {
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
struct Qdisc_class_hash clhash;
u64 oldV, V; /* Precise virtual times. */
@@ -576,7 +576,7 @@ static void qfq_put_class(struct Qdisc *sch, unsigned long arg)
qfq_destroy_class(sch, cl);
}
-static struct tcf_proto **qfq_tcf_chain(struct Qdisc *sch, unsigned long cl)
+static struct list_head *qfq_tcf_chain(struct Qdisc *sch, unsigned long cl)
{
struct qfq_sched *q = qdisc_priv(sch);
@@ -714,7 +714,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
}
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tc_classify(skb, q->filter_list, &res);
+ result = tc_classify(skb, &q->filter_list, &res);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
@@ -1498,6 +1498,7 @@ static int qfq_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
}
INIT_HLIST_HEAD(&q->nonfull_aggs);
+ INIT_LIST_HEAD(&q->filter_list);
return 0;
}
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 30ea467..20fcc6d 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -55,7 +55,7 @@ struct sfb_bins {
struct sfb_sched_data {
struct Qdisc *qdisc;
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
unsigned long rehash_interval;
unsigned long warmup_time; /* double buffering warmup time in jiffies */
u32 max;
@@ -259,7 +259,7 @@ static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q,
struct tcf_result res;
int result;
- result = tc_classify(skb, q->filter_list, &res);
+ result = tc_classify(skb, &q->filter_list, &res);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
@@ -306,7 +306,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
}
}
- if (q->filter_list) {
+ if (!list_empty(&q->filter_list)) {
/* If using external classifiers, get result and record it. */
if (!sfb_classify(skb, q, &ret, &salt))
goto other_drop;
@@ -533,6 +533,7 @@ static int sfb_change(struct Qdisc *sch, struct nlattr *opt)
q->tokens_avail = ctl->penalty_burst;
q->token_time = jiffies;
+ INIT_LIST_HEAD(&q->filter_list);
q->slot = 0;
q->double_buffering = false;
sfb_zero_all_buckets(q);
@@ -660,7 +661,7 @@ static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
}
}
-static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct sfb_sched_data *q = qdisc_priv(sch);
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 76f01e0..2a5b2a4 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -125,7 +125,7 @@ struct sfq_sched_data {
u8 cur_depth; /* depth of longest slot */
u8 flags;
unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */
- struct tcf_proto *filter_list;
+ struct list_head filter_list;
sfq_index *ht; /* Hash table ('divisor' slots) */
struct sfq_slot *slots; /* Flows table ('maxflows' entries) */
@@ -194,13 +194,13 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
TC_H_MIN(skb->priority) <= q->divisor)
return TC_H_MIN(skb->priority);
- if (!q->filter_list) {
+ if (list_empty(&q->filter_list)) {
skb_flow_dissect(skb, &sfq_skb_cb(skb)->keys);
return sfq_hash(q, skb) + 1;
}
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
- result = tc_classify(skb, q->filter_list, &res);
+ result = tc_classify(skb, &q->filter_list, &res);
if (result >= 0) {
#ifdef CONFIG_NET_CLS_ACT
switch (result) {
@@ -630,7 +630,7 @@ static void sfq_perturbation(unsigned long arg)
spin_lock(root_lock);
q->perturbation = net_random();
- if (!q->filter_list && q->tail)
+ if (list_empty(&q->filter_list) && q->tail)
sfq_rehash(sch);
spin_unlock(root_lock);
@@ -760,6 +760,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum);
q->perturb_period = 0;
q->perturbation = net_random();
+ INIT_LIST_HEAD(&q->filter_list);
if (opt) {
int err = sfq_change(sch, opt);
@@ -846,7 +847,7 @@ static void sfq_put(struct Qdisc *q, unsigned long cl)
{
}
-static struct tcf_proto **sfq_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct list_head *sfq_find_tcf(struct Qdisc *sch, unsigned long cl)
{
struct sfq_sched_data *q = qdisc_priv(sch);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC Patch net-next 2/4] net_sched: avoid holding qdisc lock for filters
2014-01-09 18:19 [RFC Patch net-next 0/4] net_sched: make ingress filters lockless Cong Wang
2014-01-09 18:19 ` [RFC Patch net-next 1/4] net_sched: switch filter list to list_head Cong Wang
@ 2014-01-09 18:19 ` Cong Wang
2014-01-10 3:13 ` John Fastabend
2014-01-09 18:19 ` [RFC Patch net-next 3/4] net_sched: use RCU for tc actions traverse Cong Wang
2014-01-09 18:19 ` [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless Cong Wang
3 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2014-01-09 18:19 UTC (permalink / raw)
To: netdev
Cc: Cong Wang, John Fastabend, Eric Dumazet, David S. Miller,
Jamal Hadi Salim
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
include/net/pkt_cls.h | 25 ++++---------------------
include/net/sch_generic.h | 5 +++--
net/sched/cls_api.c | 12 +++++-------
3 files changed, 12 insertions(+), 30 deletions(-)
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 50ea079..9afb15b 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -18,26 +18,9 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops);
int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
static inline unsigned long
-__cls_set_class(unsigned long *clp, unsigned long cl)
+cls_set_class(unsigned long *clp, unsigned long cl)
{
- unsigned long old_cl;
-
- old_cl = *clp;
- *clp = cl;
- return old_cl;
-}
-
-static inline unsigned long
-cls_set_class(struct tcf_proto *tp, unsigned long *clp,
- unsigned long cl)
-{
- unsigned long old_cl;
-
- tcf_tree_lock(tp);
- old_cl = __cls_set_class(clp, cl);
- tcf_tree_unlock(tp);
-
- return old_cl;
+ return xchg(clp, cl);
}
static inline void
@@ -46,7 +29,7 @@ tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
unsigned long cl;
cl = tp->q->ops->cl_ops->bind_tcf(tp->q, base, r->classid);
- cl = cls_set_class(tp, &r->class, cl);
+ cl = cls_set_class(&r->class, cl);
if (cl)
tp->q->ops->cl_ops->unbind_tcf(tp->q, cl);
}
@@ -56,7 +39,7 @@ tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
{
unsigned long cl;
- if ((cl = __cls_set_class(&r->class, 0)) != 0)
+ if ((cl = cls_set_class(&r->class, 0)) != 0)
tp->q->ops->cl_ops->unbind_tcf(tp->q, cl);
}
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 97123cc..000ce54 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -215,6 +215,7 @@ struct tcf_proto_ops {
struct tcf_proto {
/* Fast access part */
+ spinlock_t lock;
struct list_head head;
void *root;
int (*classify)(struct sk_buff *,
@@ -312,8 +313,8 @@ static inline void sch_tree_unlock(const struct Qdisc *q)
spin_unlock_bh(qdisc_root_sleeping_lock(q));
}
-#define tcf_tree_lock(tp) sch_tree_lock((tp)->q)
-#define tcf_tree_unlock(tp) sch_tree_unlock((tp)->q)
+#define tcf_tree_lock(tp) spin_lock_bh(&(tp)->lock);
+#define tcf_tree_unlock(tp) spin_unlock_bh(&(tp)->lock);
extern struct Qdisc noop_qdisc;
extern struct Qdisc_ops noop_qdisc_ops;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bcc9987..9ba2804 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -117,7 +117,6 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
{
struct net *net = sock_net(skb->sk);
struct nlattr *tca[TCA_MAX + 1];
- spinlock_t *root_lock;
struct tcmsg *t;
u32 protocol;
u32 prio;
@@ -215,8 +214,6 @@ replay:
}
rcu_read_unlock();
- root_lock = qdisc_root_sleeping_lock(q);
-
if ((tp = res) == NULL) {
/* Proto-tcf does not exist, create new one */
@@ -236,6 +233,7 @@ replay:
if (tp == NULL)
goto errout;
INIT_LIST_HEAD(&tp->head);
+ spin_lock_init(&tp->lock);
err = -ENOENT;
tp_ops = tcf_proto_lookup_ops(tca[TCA_KIND]);
if (tp_ops == NULL) {
@@ -287,9 +285,9 @@ replay:
if (fh == 0) {
if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
- spin_lock_bh(root_lock);
+ tcf_tree_lock(tp);
list_del_rcu(&tp->head);
- spin_unlock_bh(root_lock);
+ tcf_tree_unlock(tp);
tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER);
tcf_destroy(tp);
@@ -328,9 +326,9 @@ replay:
err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh);
if (err == 0) {
if (tp_created) {
- spin_lock_bh(root_lock);
+ tcf_tree_lock(tp);
list_add_rcu(&tp->head, chain);
- spin_unlock_bh(root_lock);
+ tcf_tree_unlock(tp);
}
tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER);
} else {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC Patch net-next 3/4] net_sched: use RCU for tc actions traverse
2014-01-09 18:19 [RFC Patch net-next 0/4] net_sched: make ingress filters lockless Cong Wang
2014-01-09 18:19 ` [RFC Patch net-next 1/4] net_sched: switch filter list to list_head Cong Wang
2014-01-09 18:19 ` [RFC Patch net-next 2/4] net_sched: avoid holding qdisc lock for filters Cong Wang
@ 2014-01-09 18:19 ` Cong Wang
2014-01-10 4:03 ` John Fastabend
2014-01-09 18:19 ` [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless Cong Wang
3 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2014-01-09 18:19 UTC (permalink / raw)
To: netdev
Cc: Cong Wang, John Fastabend, Eric Dumazet, David S. Miller,
Jamal Hadi Salim
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/sched/act_api.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f63e146..e3c655e 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -351,7 +351,7 @@ int tcf_action_exec(struct sk_buff *skb, const struct list_head *actions,
ret = TC_ACT_OK;
goto exec_done;
}
- list_for_each_entry(a, actions, list) {
+ list_for_each_entry_rcu(a, actions, list) {
repeat:
ret = a->ops->act(skb, a, res);
if (TC_MUNGED & skb->tc_verd) {
@@ -372,11 +372,13 @@ EXPORT_SYMBOL(tcf_action_exec);
void tcf_action_destroy(struct list_head *actions, int bind)
{
struct tc_action *a, *tmp;
+ LIST_HEAD(list);
+ list_splice_init_rcu(actions, &list, synchronize_rcu);
- list_for_each_entry_safe(a, tmp, actions, list) {
+ list_for_each_entry_safe(a, tmp, &list, list) {
if (a->ops->cleanup(a, bind) == ACT_P_DELETED)
module_put(a->ops->owner);
- list_del(&a->list);
+ list_del_rcu(&a->list);
kfree(a);
}
}
@@ -420,7 +422,7 @@ tcf_action_dump(struct sk_buff *skb, struct list_head *actions, int bind, int re
int err = -EINVAL;
struct nlattr *nest;
- list_for_each_entry(a, actions, list) {
+ list_for_each_entry_rcu(a, actions, list) {
nest = nla_nest_start(skb, a->order);
if (nest == NULL)
goto nla_put_failure;
@@ -542,7 +544,7 @@ int tcf_action_init(struct net *net, struct nlattr *nla,
goto err;
}
act->order = i;
- list_add_tail(&act->list, actions);
+ list_add_tail_rcu(&act->list, actions);
}
return 0;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-09 18:19 [RFC Patch net-next 0/4] net_sched: make ingress filters lockless Cong Wang
` (2 preceding siblings ...)
2014-01-09 18:19 ` [RFC Patch net-next 3/4] net_sched: use RCU for tc actions traverse Cong Wang
@ 2014-01-09 18:19 ` Cong Wang
2014-01-10 0:21 ` Eric Dumazet
3 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2014-01-09 18:19 UTC (permalink / raw)
To: netdev
Cc: Cong Wang, John Fastabend, Eric Dumazet, David S. Miller,
Jamal Hadi Salim
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/core/dev.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index ce01847..e357d05 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3376,10 +3376,8 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
q = rxq->qdisc;
if (q != &noop_qdisc) {
- spin_lock(qdisc_lock(q));
if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
result = qdisc_enqueue_root(skb, q);
- spin_unlock(qdisc_lock(q));
}
return result;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 1/4] net_sched: switch filter list to list_head
2014-01-09 18:19 ` [RFC Patch net-next 1/4] net_sched: switch filter list to list_head Cong Wang
@ 2014-01-10 0:17 ` Eric Dumazet
2014-01-10 0:20 ` Cong Wang
2014-01-10 3:48 ` John Fastabend
1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-01-10 0:17 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, John Fastabend, David S. Miller, Jamal Hadi Salim
On Thu, 2014-01-09 at 10:19 -0800, Cong Wang wrote:
> So that it could be potentially traversed with RCU.
This never made sense to me.
RCU doesn't need list_head at all.
list_head is convenient (not only for RCU), but not a need.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 1/4] net_sched: switch filter list to list_head
2014-01-10 0:17 ` Eric Dumazet
@ 2014-01-10 0:20 ` Cong Wang
0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2014-01-10 0:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linux Kernel Network Developers, John Fastabend, David S. Miller,
Jamal Hadi Salim
On Thu, Jan 9, 2014 at 4:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-01-09 at 10:19 -0800, Cong Wang wrote:
>> So that it could be potentially traversed with RCU.
>
> This never made sense to me.
>
> RCU doesn't need list_head at all.
>
> list_head is convenient (not only for RCU), but not a need.
>
I knew, playing RCU list without using list_head is hard to
get right, at least for me. Maybe I should rephrase it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-09 18:19 ` [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless Cong Wang
@ 2014-01-10 0:21 ` Eric Dumazet
2014-01-10 0:30 ` Cong Wang
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-01-10 0:21 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, John Fastabend, David S. Miller, Jamal Hadi Salim
On Thu, 2014-01-09 at 10:19 -0800, Cong Wang wrote:
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> net/core/dev.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ce01847..e357d05 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3376,10 +3376,8 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
>
> q = rxq->qdisc;
> if (q != &noop_qdisc) {
> - spin_lock(qdisc_lock(q));
> if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> result = qdisc_enqueue_root(skb, q);
> - spin_unlock(qdisc_lock(q));
> }
>
> return result;
Really, you'll have to explain in the changelog why you think this is
safe, because I really do not see how this can be valid.
I think I already said it was not safe at all...
You could try a multiqueue NIC for some interesting effects.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-10 0:21 ` Eric Dumazet
@ 2014-01-10 0:30 ` Cong Wang
2014-01-10 0:49 ` Stephen Hemminger
2014-01-10 1:11 ` Eric Dumazet
0 siblings, 2 replies; 19+ messages in thread
From: Cong Wang @ 2014-01-10 0:30 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linux Kernel Network Developers, John Fastabend, David S. Miller,
Jamal Hadi Salim
On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> Really, you'll have to explain in the changelog why you think this is
> safe, because I really do not see how this can be valid.
>
> I think I already said it was not safe at all...
>
> You could try a multiqueue NIC for some interesting effects.
>
There is only one ingress queue, that is dev->ingress_queue, right?
And since on ingress, the only possible qdisc is sch_ingress,
looking at ingress_enqueue(), I don't see anything so dangerous.
As I said in the cover letter, I may still miss something in the qdisc
layer, but doesn't look like related with multiqueue. Mind to be more
specific?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-10 0:30 ` Cong Wang
@ 2014-01-10 0:49 ` Stephen Hemminger
2014-01-10 1:06 ` John Fastabend
2014-01-10 1:09 ` Cong Wang
2014-01-10 1:11 ` Eric Dumazet
1 sibling, 2 replies; 19+ messages in thread
From: Stephen Hemminger @ 2014-01-10 0:49 UTC (permalink / raw)
To: Cong Wang
Cc: Eric Dumazet, Linux Kernel Network Developers, John Fastabend,
David S. Miller, Jamal Hadi Salim
On Thu, 9 Jan 2014 16:30:12 -0800
Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >
> > Really, you'll have to explain in the changelog why you think this is
> > safe, because I really do not see how this can be valid.
> >
> > I think I already said it was not safe at all...
> >
> > You could try a multiqueue NIC for some interesting effects.
> >
>
> There is only one ingress queue, that is dev->ingress_queue, right?
>
> And since on ingress, the only possible qdisc is sch_ingress,
> looking at ingress_enqueue(), I don't see anything so dangerous.
>
> As I said in the cover letter, I may still miss something in the qdisc
> layer, but doesn't look like related with multiqueue. Mind to be more
> specific?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I think what Eric is saying is that on a multi-queue NIC, multiple
queues can be receiving packets and then sending them on to the ingress
queue discipline. Up until your patch that code itself was protected
by qdisc_lock and did not have to worry about any SMP issues. Now, any
qdisc attached on ingress could run in parallel. This would break all
the code in those queue disciplines. Think of the simplest case
of policing.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-10 0:49 ` Stephen Hemminger
@ 2014-01-10 1:06 ` John Fastabend
2014-01-12 12:30 ` Jamal Hadi Salim
2014-01-10 1:09 ` Cong Wang
1 sibling, 1 reply; 19+ messages in thread
From: John Fastabend @ 2014-01-10 1:06 UTC (permalink / raw)
To: Cong Wang
Cc: Stephen Hemminger, Eric Dumazet, Linux Kernel Network Developers,
John Fastabend, David S. Miller, Jamal Hadi Salim
On 1/9/2014 4:49 PM, Stephen Hemminger wrote:
> On Thu, 9 Jan 2014 16:30:12 -0800
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>> On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>> Really, you'll have to explain in the changelog why you think this is
>>> safe, because I really do not see how this can be valid.
>>>
>>> I think I already said it was not safe at all...
>>>
>>> You could try a multiqueue NIC for some interesting effects.
>>>
>>
>> There is only one ingress queue, that is dev->ingress_queue, right?
>>
>> And since on ingress, the only possible qdisc is sch_ingress,
>> looking at ingress_enqueue(), I don't see anything so dangerous.
>>
>> As I said in the cover letter, I may still miss something in the qdisc
>> layer, but doesn't look like related with multiqueue. Mind to be more
>> specific?
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> I think what Eric is saying is that on a multi-queue NIC, multiple
> queues can be receiving packets and then sending them on to the ingress
> queue discipline. Up until your patch that code itself was protected
> by qdisc_lock and did not have to worry about any SMP issues. Now, any
> qdisc attached on ingress could run in parallel. This would break all
> the code in those queue disciplines. Think of the simplest case
> of policing.
> --
Just to re-iterate you need to go through each and every qdisc,
classifier, action and verify it is safe to run in parallel. Take
a look at how the skb lists are managed in the qdiscs. If we want
to do this we need to make these changes in some coherent way
because it touches lots of pieces.
Also your stats are going to get hosed none of the bstats, qstats
supports this. I'll send out the classifier set later tonight
if you want. I got stalled going through the actions.
Finally any global state in those qdiscs is going to drive performance
down so many of them would likely need to be redesigned.
.John
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-10 0:49 ` Stephen Hemminger
2014-01-10 1:06 ` John Fastabend
@ 2014-01-10 1:09 ` Cong Wang
1 sibling, 0 replies; 19+ messages in thread
From: Cong Wang @ 2014-01-10 1:09 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Eric Dumazet, Linux Kernel Network Developers, John Fastabend,
David S. Miller, Jamal Hadi Salim
On Thu, Jan 9, 2014 at 4:49 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> I think what Eric is saying is that on a multi-queue NIC, multiple
> queues can be receiving packets and then sending them on to the ingress
> queue discipline. Up until your patch that code itself was protected
> by qdisc_lock and did not have to worry about any SMP issues. Now, any
> qdisc attached on ingress could run in parallel. This would break all
> the code in those queue disciplines. Think of the simplest case
> of policing.
I noticed that we have multiqueue for rx, which is dev->_rx[], so they
still share the name dev->ingress_queue. Packets on different CPU's
from the same device still need to run through the same ingress qdisc.
Except for ifb, ingress qdisc can only do filtering (with some actions),
therefore I still don't see the problem.
ifb device switches to its "egress" to do policing, which is safe since it
does not use ->ingress_queue any more.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-10 0:30 ` Cong Wang
2014-01-10 0:49 ` Stephen Hemminger
@ 2014-01-10 1:11 ` Eric Dumazet
2014-01-10 1:21 ` Cong Wang
1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-01-10 1:11 UTC (permalink / raw)
To: Cong Wang
Cc: Linux Kernel Network Developers, John Fastabend, David S. Miller,
Jamal Hadi Salim
On Thu, 2014-01-09 at 16:30 -0800, Cong Wang wrote:
> On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >
> > Really, you'll have to explain in the changelog why you think this is
> > safe, because I really do not see how this can be valid.
> >
> > I think I already said it was not safe at all...
> >
> > You could try a multiqueue NIC for some interesting effects.
> >
>
> There is only one ingress queue, that is dev->ingress_queue, right ?
Yes. And you can have multiple cpus trying to use it at the same time.
> And since on ingress, the only possible qdisc is sch_ingress,
> looking at ingress_enqueue(), I don't see anything so dangerous.
>
> As I said in the cover letter, I may still miss something in the qdisc
> layer, but doesn't look like related with multiqueue. Mind to be more
> specific?
Well, there is one qdisc, and if your NIC is multiqueue, with for
example 32 queues, you can have 32 cpu happily using this qdisc at once.
Thats why you need the spinlock.
I am very afraid seeing you changing this path without understanding
how it is used.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-10 1:11 ` Eric Dumazet
@ 2014-01-10 1:21 ` Cong Wang
0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2014-01-10 1:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linux Kernel Network Developers, John Fastabend, David S. Miller,
Jamal Hadi Salim
On Thu, Jan 9, 2014 at 5:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Well, there is one qdisc, and if your NIC is multiqueue, with for
> example 32 queues, you can have 32 cpu happily using this qdisc at once.
I did see this, but still don't see the problem.
>
> Thats why you need the spinlock.
>
It looks like you are saying we queue the packets somewhere
in qdisc_enqueue_root() therefore needs a spinlock, but looking
at the code:
static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
{
qdisc_calculate_pkt_len(skb, sch);
return sch->enqueue(skb, sch);
}
static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch)
{
qdisc_skb_cb(skb)->pkt_len = skb->len;
return qdisc_enqueue(skb, sch) & NET_XMIT_MASK;
}
so it almost equals to calling ->enqueue directly, for ingress, which is
ingress_enqueue(). Except updating some stats, the only thing
it does is calling tc_classify().
So where is the problem at qdisc layer? I must miss something too obvious...
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 2/4] net_sched: avoid holding qdisc lock for filters
2014-01-09 18:19 ` [RFC Patch net-next 2/4] net_sched: avoid holding qdisc lock for filters Cong Wang
@ 2014-01-10 3:13 ` John Fastabend
0 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2014-01-10 3:13 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, Eric Dumazet, David S. Miller, Jamal Hadi Salim
On 01/09/2014 10:19 AM, Cong Wang wrote:
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> include/net/pkt_cls.h | 25 ++++---------------------
> include/net/sch_generic.h | 5 +++--
> net/sched/cls_api.c | 12 +++++-------
> 3 files changed, 12 insertions(+), 30 deletions(-)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 50ea079..9afb15b 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -18,26 +18,9 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops);
> int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
>
> static inline unsigned long
> -__cls_set_class(unsigned long *clp, unsigned long cl)
> +cls_set_class(unsigned long *clp, unsigned long cl)
> {
commit message should mention this conversion to xchg.
> - unsigned long old_cl;
> -
> - old_cl = *clp;
> - *clp = cl;
> - return old_cl;
> -}
> -
> -static inline unsigned long
> -cls_set_class(struct tcf_proto *tp, unsigned long *clp,
> - unsigned long cl)
> -{
> - unsigned long old_cl;
> -
> - tcf_tree_lock(tp);
> - old_cl = __cls_set_class(clp, cl);
> - tcf_tree_unlock(tp);
> -
> - return old_cl;
> + return xchg(clp, cl);
> }
>
> static inline void
> @@ -46,7 +29,7 @@ tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
> unsigned long cl;
>
> cl = tp->q->ops->cl_ops->bind_tcf(tp->q, base, r->classid);
> - cl = cls_set_class(tp, &r->class, cl);
> + cl = cls_set_class(&r->class, cl);
> if (cl)
> tp->q->ops->cl_ops->unbind_tcf(tp->q, cl);
> }
> @@ -56,7 +39,7 @@ tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
> {
> unsigned long cl;
>
> - if ((cl = __cls_set_class(&r->class, 0)) != 0)
> + if ((cl = cls_set_class(&r->class, 0)) != 0)
> tp->q->ops->cl_ops->unbind_tcf(tp->q, cl);
> }
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 97123cc..000ce54 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -215,6 +215,7 @@ struct tcf_proto_ops {
>
> struct tcf_proto {
> /* Fast access part */
> + spinlock_t lock;
[...]
Th lock can just be dropped all these operations are initiated via
rtnetlink and hence have rtnl held.
.John
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 1/4] net_sched: switch filter list to list_head
2014-01-09 18:19 ` [RFC Patch net-next 1/4] net_sched: switch filter list to list_head Cong Wang
2014-01-10 0:17 ` Eric Dumazet
@ 2014-01-10 3:48 ` John Fastabend
2014-01-10 4:10 ` Cong Wang
1 sibling, 1 reply; 19+ messages in thread
From: John Fastabend @ 2014-01-10 3:48 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, Eric Dumazet, David S. Miller, Jamal Hadi Salim
On 01/09/2014 10:19 AM, Cong Wang wrote:
> So that it could be potentially traversed with RCU.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> include/net/pkt_sched.h | 4 ++--
> include/net/sch_generic.h | 9 ++++++---
> net/sched/cls_api.c | 36 ++++++++++++++++++++++++------------
> net/sched/sch_api.c | 35 ++++++++++++++++++++++-------------
> net/sched/sch_atm.c | 14 +++++++-------
> net/sched/sch_cbq.c | 9 +++++----
> net/sched/sch_choke.c | 11 ++++++-----
> net/sched/sch_drr.c | 7 ++++---
> net/sched/sch_dsmark.c | 7 ++++---
> net/sched/sch_fq_codel.c | 9 +++++----
> net/sched/sch_hfsc.c | 15 +++++++++------
> net/sched/sch_htb.c | 20 ++++++++++++--------
> net/sched/sch_ingress.c | 14 +++++++++++---
> net/sched/sch_multiq.c | 7 ++++---
> net/sched/sch_prio.c | 9 +++++----
> net/sched/sch_qfq.c | 7 ++++---
> net/sched/sch_sfb.c | 9 +++++----
> net/sched/sch_sfq.c | 11 ++++++-----
> 18 files changed, 141 insertions(+), 92 deletions(-)
>
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 891d80d..27a1efa 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -109,9 +109,9 @@ static inline void qdisc_run(struct Qdisc *q)
> __qdisc_run(q);
> }
>
> -int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
> +int tc_classify_compat(struct sk_buff *skb, const struct list_head *head,
> struct tcf_result *res);
> -int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> +int tc_classify(struct sk_buff *skb, const struct list_head *head,
> struct tcf_result *res);
>
> /* Calculate maximal size of packet seen by hard_start_xmit
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 013d96d..97123cc 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -6,6 +6,7 @@
> #include <linux/rcupdate.h>
> #include <linux/pkt_sched.h>
> #include <linux/pkt_cls.h>
> +#include <linux/list.h>
> #include <net/gen_stats.h>
> #include <net/rtnetlink.h>
>
> @@ -143,7 +144,7 @@ struct Qdisc_class_ops {
> void (*walk)(struct Qdisc *, struct qdisc_walker * arg);
>
> /* Filter manipulation */
> - struct tcf_proto ** (*tcf_chain)(struct Qdisc *, unsigned long);
> + struct list_head * (*tcf_chain)(struct Qdisc *, unsigned long);
I'm not sure, I thought it was just fine the way it was. That pattern
exists in a few other places inside the networking stack and we don't
go around converting them to lists. But that is just my opinion.
> unsigned long (*bind_tcf)(struct Qdisc *, unsigned long,
> u32 classid);
> void (*unbind_tcf)(struct Qdisc *, unsigned long);
> @@ -184,6 +185,8 @@ struct tcf_result {
> u32 classid;
> };
>
> +struct tcf_proto;
> +
> struct tcf_proto_ops {
> struct list_head head;
> char kind[IFNAMSIZ];
> @@ -212,7 +215,7 @@ struct tcf_proto_ops {
>
> struct tcf_proto {
> /* Fast access part */
> - struct tcf_proto *next;
> + struct list_head head;
> void *root;
> int (*classify)(struct sk_buff *,
> const struct tcf_proto *,
> @@ -376,7 +379,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
> void __qdisc_calculate_pkt_len(struct sk_buff *skb,
> const struct qdisc_size_table *stab);
> void tcf_destroy(struct tcf_proto *tp);
> -void tcf_destroy_chain(struct tcf_proto **fl);
> +void tcf_destroy_chain(struct list_head *fl);
>
> /* Reset all TX qdiscs greater then index of a device. */
> static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index d8c42b1..bcc9987 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -125,8 +125,9 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
> u32 parent;
> struct net_device *dev;
> struct Qdisc *q;
> - struct tcf_proto **back, **chain;
> - struct tcf_proto *tp;
> + struct tcf_proto *back;
> + struct list_head *chain;
> + struct tcf_proto *tp, *res = NULL;
> const struct tcf_proto_ops *tp_ops;
> const struct Qdisc_class_ops *cops;
> unsigned long cl;
> @@ -196,21 +197,27 @@ replay:
> goto errout;
>
> /* Check the chain for existence of proto-tcf with this priority */
> - for (back = chain; (tp = *back) != NULL; back = &tp->next) {
> + rcu_read_lock();
why rcu_read_lock() this is under rtnl lock.
> + list_for_each_entry_rcu(tp, chain, head) {
No need for the rcu you are under the writer lock.
> + back = list_next_entry(tp, head);
> if (tp->prio >= prio) {
> if (tp->prio == prio) {
> if (!nprio ||
> - (tp->protocol != protocol && protocol))
> + (tp->protocol != protocol && protocol)) {
> + rcu_read_unlock();
> goto errout;
> + }
> + res = tp;
> } else
> - tp = NULL;
> + res = NULL;
> break;
> }
> }
> + rcu_read_unlock();
ditto
>
> root_lock = qdisc_root_sleeping_lock(q);
>
> - if (tp == NULL) {
> + if ((tp = res) == NULL) {
> /* Proto-tcf does not exist, create new one */
>
> if (tca[TCA_KIND] == NULL || !protocol)
> @@ -228,6 +235,7 @@ replay:
> tp = kzalloc(sizeof(*tp), GFP_KERNEL);
> if (tp == NULL)
> goto errout;
> + INIT_LIST_HEAD(&tp->head);
> err = -ENOENT;
> tp_ops = tcf_proto_lookup_ops(tca[TCA_KIND]);
> if (tp_ops == NULL) {
> @@ -258,7 +266,7 @@ replay:
> }
> tp->ops = tp_ops;
> tp->protocol = protocol;
> - tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back));
> + tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(back));
> tp->q = q;
> tp->classify = tp_ops->classify;
> tp->classid = parent;
> @@ -280,7 +288,7 @@ replay:
> if (fh == 0) {
> if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
> spin_lock_bh(root_lock);
> - *back = tp->next;
> + list_del_rcu(&tp->head);
> spin_unlock_bh(root_lock);
>
> tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER);
> @@ -321,8 +329,7 @@ replay:
> if (err == 0) {
> if (tp_created) {
> spin_lock_bh(root_lock);
> - tp->next = *back;
> - *back = tp;
> + list_add_rcu(&tp->head, chain);
> spin_unlock_bh(root_lock);
> }
> tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER);
> @@ -417,7 +424,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
> int s_t;
> struct net_device *dev;
> struct Qdisc *q;
> - struct tcf_proto *tp, **chain;
> + struct tcf_proto *tp;
> + struct list_head *chain;
> struct tcmsg *tcm = nlmsg_data(cb->nlh);
> unsigned long cl = 0;
> const struct Qdisc_class_ops *cops;
> @@ -451,7 +459,9 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
>
> s_t = cb->args[0];
>
> - for (tp = *chain, t = 0; tp; tp = tp->next, t++) {
> + t = 0;
> + rcu_read_lock();
same under rtnl right?
> + list_for_each_entry_rcu(tp, chain, head) {
... so the _rcu can be dropped.
> if (t < s_t)
> continue;
> if (TC_H_MAJ(tcm->tcm_info) &&
> @@ -482,7 +492,9 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
> cb->args[1] = arg.w.count + 1;
> if (arg.w.stop)
> break;
> + t++;
> }
> + rcu_read_unlock();
>
> cb->args[0] = t;
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 1313145..df86686 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -29,6 +29,7 @@
> #include <linux/hrtimer.h>
> #include <linux/lockdep.h>
> #include <linux/slab.h>
> +#include <linux/rculist.h>
>
> #include <net/net_namespace.h>
> #include <net/sock.h>
> @@ -1772,13 +1773,15 @@ done:
> * to this qdisc, (optionally) tests for protocol and asks
> * specific classifiers.
> */
> -int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
> +int tc_classify_compat(struct sk_buff *skb, const struct list_head *head,
notice you don't check for a null head value here so it better not be
null.
> struct tcf_result *res)
> {
> + struct tcf_proto *tp;
> __be16 protocol = skb->protocol;
> - int err;
> + int err = -1;
>
> - for (; tp; tp = tp->next) {
> + WARN_ON_ONCE(!rcu_read_lock_held());
its just rfc code but no need for the warn on.
> + list_for_each_entry_rcu(tp, head, head) {
> if (tp->protocol != protocol &&
> tp->protocol != htons(ETH_P_ALL))
> continue;
> @@ -1789,23 +1792,25 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
> if (err != TC_ACT_RECLASSIFY && skb->tc_verd)
> skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
> #endif
> - return err;
> + break;
> }
> }
> - return -1;
> +
> + return err;
> }
> EXPORT_SYMBOL(tc_classify_compat);
>
> -int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> +int tc_classify(struct sk_buff *skb, const struct list_head *head,
> struct tcf_result *res)
> {
> int err = 0;
> #ifdef CONFIG_NET_CLS_ACT
> - const struct tcf_proto *otp = tp;
> + const struct tcf_proto *tp;
> + const struct tcf_proto *otp = list_first_entry(head, struct tcf_proto, head);
> reclassify:
> #endif
>
> - err = tc_classify_compat(skb, tp, res);
> + err = tc_classify_compat(skb, head, res);
> #ifdef CONFIG_NET_CLS_ACT
> if (err == TC_ACT_RECLASSIFY) {
> u32 verd = G_TC_VERD(skb->tc_verd);
> @@ -1830,16 +1835,20 @@ void tcf_destroy(struct tcf_proto *tp)
> {
> tp->ops->destroy(tp);
> module_put(tp->ops->owner);
> + synchronize_rcu();
maybe call_rcu we don't wan't to syncronize during a destroy chain
operation.
> kfree(tp);
> }
>
> -void tcf_destroy_chain(struct tcf_proto **fl)
> +void tcf_destroy_chain(struct list_head *fl)
> {
> - struct tcf_proto *tp;
> + struct tcf_proto *tp, *n;
> + LIST_HEAD(list);
> + list_splice_init_rcu(fl, &list, synchronize_rcu);
>
> - while ((tp = *fl) != NULL) {
> - *fl = tp->next;
> - tcf_destroy(tp);
> + list_for_each_entry_safe(tp, n, &list, head) {
> + tp->ops->destroy(tp);
> + module_put(tp->ops->owner);
> + kfree(tp);
fix the sync above and then there is no reason as far as I can tell
to change this code. Or at least that should be kfree_rcu().
> }
> }
> EXPORT_SYMBOL(tcf_destroy_chain);
> diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
> index 1f9c314..20a07c2 100644
> --- a/net/sched/sch_atm.c
> +++ b/net/sched/sch_atm.c
> @@ -41,7 +41,7 @@
>
> struct atm_flow_data {
> struct Qdisc *q; /* FIFO, TBF, etc. */
> - struct tcf_proto *filter_list;
> + struct list_head filter_list;
> struct atm_vcc *vcc; /* VCC; NULL if VCC is closed */
> void (*old_pop)(struct atm_vcc *vcc,
> struct sk_buff *skb); /* chaining */
> @@ -273,7 +273,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
> error = -ENOBUFS;
> goto err_out;
> }
> - flow->filter_list = NULL;
> + INIT_LIST_HEAD(&flow->filter_list);
> flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid);
> if (!flow->q)
> flow->q = &noop_qdisc;
> @@ -311,7 +311,7 @@ static int atm_tc_delete(struct Qdisc *sch, unsigned long arg)
> pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow);
> if (list_empty(&flow->list))
> return -EINVAL;
> - if (flow->filter_list || flow == &p->link)
> + if (!list_empty(&flow->filter_list) || flow == &p->link)
> return -EBUSY;
> /*
> * Reference count must be 2: one for "keepalive" (set at class
> @@ -345,7 +345,7 @@ static void atm_tc_walk(struct Qdisc *sch, struct qdisc_walker *walker)
> }
> }
>
> -static struct tcf_proto **atm_tc_find_tcf(struct Qdisc *sch, unsigned long cl)
> +static struct list_head *atm_tc_find_tcf(struct Qdisc *sch, unsigned long cl)
> {
> struct atm_qdisc_data *p = qdisc_priv(sch);
> struct atm_flow_data *flow = (struct atm_flow_data *)cl;
> @@ -370,9 +370,9 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> if (TC_H_MAJ(skb->priority) != sch->handle ||
> !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) {
> list_for_each_entry(flow, &p->flows, list) {
> - if (flow->filter_list) {
> + if (!list_empty(&flow->filter_list)) {
> result = tc_classify_compat(skb,
> - flow->filter_list,
> + &flow->filter_list,
> &res);
> if (result < 0)
> continue;
> @@ -544,7 +544,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt)
> if (!p->link.q)
> p->link.q = &noop_qdisc;
> pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q);
> - p->link.filter_list = NULL;
> + INIT_LIST_HEAD(&p->link.filter_list);
> p->link.vcc = NULL;
> p->link.sock = NULL;
> p->link.classid = sch->handle;
> diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
> index 2f80d01..a5914ff 100644
> --- a/net/sched/sch_cbq.c
> +++ b/net/sched/sch_cbq.c
> @@ -133,7 +133,7 @@ struct cbq_class {
> struct gnet_stats_rate_est64 rate_est;
> struct tc_cbq_xstats xstats;
>
> - struct tcf_proto *filter_list;
> + struct list_head filter_list;
>
> int refcnt;
> int filters;
> @@ -239,8 +239,8 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
> /*
> * Step 2+n. Apply classifier.
> */
> - if (!head->filter_list ||
> - (result = tc_classify_compat(skb, head->filter_list, &res)) < 0)
> + if (list_empty(&head->filter_list) ||
> + (result = tc_classify_compat(skb, &head->filter_list, &res)) < 0)
there is a race here, if the list is not empty when you check it but
emptied by the time you call tc_classify_compat.
Also notice you accessed a list without an _rcu function. If you grep
for list_empty_rcu() iirc its not even defined to stop this sort of
error.
> goto fallback;
>
> cl = (void *)res.class;
> @@ -1881,6 +1881,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
> }
> }
>
> + INIT_LIST_HEAD(&cl->filter_list);
> cl->R_tab = rtab;
> rtab = NULL;
> cl->refcnt = 1;
> @@ -1976,7 +1977,7 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg)
> return 0;
> }
>
> -static struct tcf_proto **cbq_find_tcf(struct Qdisc *sch, unsigned long arg)
> +static struct list_head *cbq_find_tcf(struct Qdisc *sch, unsigned long arg)
> {
> struct cbq_sched_data *q = qdisc_priv(sch);
> struct cbq_class *cl = (struct cbq_class *)arg;
> diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
> index ddd73cb..9b36830 100644
> --- a/net/sched/sch_choke.c
> +++ b/net/sched/sch_choke.c
> @@ -58,7 +58,7 @@ struct choke_sched_data {
>
> /* Variables */
> struct red_vars vars;
> - struct tcf_proto *filter_list;
> + struct list_head filter_list;
> struct {
> u32 prob_drop; /* Early probability drops */
> u32 prob_mark; /* Early probability marks */
> @@ -202,7 +202,7 @@ static bool choke_classify(struct sk_buff *skb,
> struct tcf_result res;
> int result;
>
> - result = tc_classify(skb, q->filter_list, &res);
> + result = tc_classify(skb, &q->filter_list, &res);
hmm q->filter_list passed to tc_classify without rcu_deref.
> if (result >= 0) {
> #ifdef CONFIG_NET_CLS_ACT
> switch (result) {
> @@ -256,7 +256,7 @@ static bool choke_match_random(const struct choke_sched_data *q,
> return false;
>
> oskb = choke_peek_random(q, pidx);
> - if (q->filter_list)
> + if (!list_empty(&q->filter_list))
> return choke_get_classid(nskb) == choke_get_classid(oskb);
same class of list_empty bug?
>
> return choke_match_flow(oskb, nskb);
> @@ -268,7 +268,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> const struct red_parms *p = &q->parms;
> int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
>
> - if (q->filter_list) {
> + if (!list_empty(&q->filter_list)) {
> /* If using external classifiers, get result and record it. */
> if (!choke_classify(skb, sch, &ret))
> goto other_drop; /* Packet was eaten by filter */
> @@ -476,6 +476,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
>
> q->flags = ctl->flags;
> q->limit = ctl->limit;
> + INIT_LIST_HEAD(&q->filter_list);
>
> red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
> ctl->Plog, ctl->Scell_log,
> @@ -566,7 +567,7 @@ static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
> return 0;
> }
>
> -static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
> +static struct list_head *choke_find_tcf(struct Qdisc *sch, unsigned long cl)
> {
> struct choke_sched_data *q = qdisc_priv(sch);
>
> diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
> index 8302717..62f45ac 100644
> --- a/net/sched/sch_drr.c
> +++ b/net/sched/sch_drr.c
> @@ -35,7 +35,7 @@ struct drr_class {
>
> struct drr_sched {
> struct list_head active;
> - struct tcf_proto *filter_list;
> + struct list_head filter_list;
also all of this needs to be annotated.
> struct Qdisc_class_hash clhash;
> };
>
> @@ -184,7 +184,7 @@ static void drr_put_class(struct Qdisc *sch, unsigned long arg)
> drr_destroy_class(sch, cl);
> }
>
> -static struct tcf_proto **drr_tcf_chain(struct Qdisc *sch, unsigned long cl)
> +static struct list_head *drr_tcf_chain(struct Qdisc *sch, unsigned long cl)
> {
> struct drr_sched *q = qdisc_priv(sch);
>
> @@ -328,7 +328,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
> }
>
> *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> - result = tc_classify(skb, q->filter_list, &res);
> + result = tc_classify(skb, &q->filter_list, &res);
ditto no rcu_deref.
> if (result >= 0) {
> #ifdef CONFIG_NET_CLS_ACT
> switch (result) {
> @@ -443,6 +443,7 @@ static int drr_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
> if (err < 0)
> return err;
> INIT_LIST_HEAD(&q->active);
> + INIT_LIST_HEAD(&q->filter_list);
> return 0;
> }
>
> diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
> index 49d6ef3..4489ffe 100644
> --- a/net/sched/sch_dsmark.c
> +++ b/net/sched/sch_dsmark.c
> @@ -37,7 +37,7 @@
>
> struct dsmark_qdisc_data {
> struct Qdisc *q;
> - struct tcf_proto *filter_list;
> + struct list_head filter_list;
annotate and let smatch catch a lot of bugs for you. Same comment
for all the above instantiates I didn't comment on.
> u8 *mask; /* "owns" the array */
> u8 *value;
> u16 indices;
> @@ -186,7 +186,7 @@ ignore:
> }
> }
>
> -static inline struct tcf_proto **dsmark_find_tcf(struct Qdisc *sch,
> +static inline struct list_head *dsmark_find_tcf(struct Qdisc *sch,
> unsigned long cl)
> {
> struct dsmark_qdisc_data *p = qdisc_priv(sch);
> @@ -229,7 +229,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> skb->tc_index = TC_H_MIN(skb->priority);
> else {
> struct tcf_result res;
> - int result = tc_classify(skb, p->filter_list, &res);
> + int result = tc_classify(skb, &p->filter_list, &res);
rcu deref.
[...]
OK now I'm just repeating myself. You see the trend.
Additionally I don't see how any of the cls_*c change routines work in
your series. For example look at basic_change. Even with the rtnl lock
you need to do a read, copy, update (RCU namesake) you can't just modify
it in place like you have done.
I'll send a fixed up series out in a few minutes it should illustrate
my point.
.John
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 3/4] net_sched: use RCU for tc actions traverse
2014-01-09 18:19 ` [RFC Patch net-next 3/4] net_sched: use RCU for tc actions traverse Cong Wang
@ 2014-01-10 4:03 ` John Fastabend
0 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2014-01-10 4:03 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, Eric Dumazet, David S. Miller, Jamal Hadi Salim
On 01/09/2014 10:19 AM, Cong Wang wrote:
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> net/sched/act_api.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index f63e146..e3c655e 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -351,7 +351,7 @@ int tcf_action_exec(struct sk_buff *skb, const struct list_head *actions,
> ret = TC_ACT_OK;
> goto exec_done;
> }
> - list_for_each_entry(a, actions, list) {
> + list_for_each_entry_rcu(a, actions, list) {
> repeat:
> ret = a->ops->act(skb, a, res);
> if (TC_MUNGED & skb->tc_verd) {
> @@ -372,11 +372,13 @@ EXPORT_SYMBOL(tcf_action_exec);
> void tcf_action_destroy(struct list_head *actions, int bind)
> {
> struct tc_action *a, *tmp;
> + LIST_HEAD(list);
> + list_splice_init_rcu(actions, &list, synchronize_rcu);
>
> - list_for_each_entry_safe(a, tmp, actions, list) {
> + list_for_each_entry_safe(a, tmp, &list, list) {
> if (a->ops->cleanup(a, bind) == ACT_P_DELETED)
> module_put(a->ops->owner);
> - list_del(&a->list);
> + list_del_rcu(&a->list);
> kfree(a);
no this wont work you need kfree_rcu or call_rcu or rcu sync.
> }
> }
> @@ -420,7 +422,7 @@ tcf_action_dump(struct sk_buff *skb, struct list_head *actions, int bind, int re
> int err = -EINVAL;
> struct nlattr *nest;
>
> - list_for_each_entry(a, actions, list) {
> + list_for_each_entry_rcu(a, actions, list) {
> nest = nla_nest_start(skb, a->order);
> if (nest == NULL)
> goto nla_put_failure;
> @@ -542,7 +544,7 @@ int tcf_action_init(struct net *net, struct nlattr *nla,
> goto err;
> }
> act->order = i;
> - list_add_tail(&act->list, actions);
> + list_add_tail_rcu(&act->list, actions);
> }
> return 0;
>
>
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 1/4] net_sched: switch filter list to list_head
2014-01-10 3:48 ` John Fastabend
@ 2014-01-10 4:10 ` Cong Wang
0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2014-01-10 4:10 UTC (permalink / raw)
To: John Fastabend
Cc: Linux Kernel Network Developers, Eric Dumazet, David S. Miller,
Jamal Hadi Salim
On Thu, Jan 9, 2014 at 7:48 PM, John Fastabend <john.fastabend@gmail.com> wrote:
<...>
>
> OK now I'm just repeating myself. You see the trend.
>
> Additionally I don't see how any of the cls_*c change routines work in
> your series. For example look at basic_change. Even with the rtnl lock
> you need to do a read, copy, update (RCU namesake) you can't just modify
> it in place like you have done.
>
> I'll send a fixed up series out in a few minutes it should illustrate
> my point.
>
Thanks for the detailed review, since it is RFC, I myself even
don't expect it is complete, see my two open questions in
cover letter.
It is already late for this merge window, so I target this for
the _next_ net-next.
Therefore, I prefer to get early reviews rather than detailed one
at this stage.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless
2014-01-10 1:06 ` John Fastabend
@ 2014-01-12 12:30 ` Jamal Hadi Salim
0 siblings, 0 replies; 19+ messages in thread
From: Jamal Hadi Salim @ 2014-01-12 12:30 UTC (permalink / raw)
To: John Fastabend, Cong Wang
Cc: Stephen Hemminger, Eric Dumazet, Linux Kernel Network Developers,
John Fastabend, David S. Miller
On 01/09/14 20:06, John Fastabend wrote:
> Just to re-iterate you need to go through each and every qdisc,
> classifier, action and verify it is safe to run in parallel. Take
> a look at how the skb lists are managed in the qdiscs. If we want
> to do this we need to make these changes in some coherent way
> because it touches lots of pieces.
>
Indeed. Everything assumes the global qdisc lock is protecting them.
Actually actions are probably the best at the moment because
the lock is very fine grained to just the action instance
and it protects both control and data paths.
But filters have stuff littered everywhere. Egress qdiscs
as you mention have queues at multi levels etc.
> Also your stats are going to get hosed none of the bstats, qstats
> supports this.
Stats are probably the easiest to "fix".
Didnt Eric (or somebody else) fix netdev level stats to use seq counts?
Would that idea not be applicable here?
>I'll send out the classifier set later tonight
> if you want. I got stalled going through the actions.
>
The thing to note is:
actions can be shared across filters, netdevices and cpus.
By default they are not shared across filters and netdevices that is
a config option. You still have to worry about sharing across cpus
which will happen because a flow can be shared across cpus.
You could probably get rid of the lock if you can show
that you can make data and control path mutually exclusive
(rtnl will protect control path).
> Finally any global state in those qdiscs is going to drive performance
> down so many of them would likely need to be redesigned.
>
I feel like per-cpu qdiscs is the best surgery at the moment.
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-01-12 12:30 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-09 18:19 [RFC Patch net-next 0/4] net_sched: make ingress filters lockless Cong Wang
2014-01-09 18:19 ` [RFC Patch net-next 1/4] net_sched: switch filter list to list_head Cong Wang
2014-01-10 0:17 ` Eric Dumazet
2014-01-10 0:20 ` Cong Wang
2014-01-10 3:48 ` John Fastabend
2014-01-10 4:10 ` Cong Wang
2014-01-09 18:19 ` [RFC Patch net-next 2/4] net_sched: avoid holding qdisc lock for filters Cong Wang
2014-01-10 3:13 ` John Fastabend
2014-01-09 18:19 ` [RFC Patch net-next 3/4] net_sched: use RCU for tc actions traverse Cong Wang
2014-01-10 4:03 ` John Fastabend
2014-01-09 18:19 ` [RFC Patch net-next 4/4] net_sched: make ingress qdisc lockless Cong Wang
2014-01-10 0:21 ` Eric Dumazet
2014-01-10 0:30 ` Cong Wang
2014-01-10 0:49 ` Stephen Hemminger
2014-01-10 1:06 ` John Fastabend
2014-01-12 12:30 ` Jamal Hadi Salim
2014-01-10 1:09 ` Cong Wang
2014-01-10 1:11 ` Eric Dumazet
2014-01-10 1:21 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).