Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH v2 1/6] fsl/fman: enable FMan Keygen
From: Madalin-cristian Bucur @ 2017-08-23  4:36 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20170822.143510.1048652578169181274.davem@davemloft.net>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of David Miller
> Sent: Wednesday, August 23, 2017 12:35 AM
> Subject: Re: [PATCH v2 1/6] fsl/fman: enable FMan Keygen
> 
> From: Madalin Bucur <madalin.bucur@nxp.com>
> Date: Tue, 22 Aug 2017 20:31:01 +0300
> 
> >  /**
> > + * fman_get_keygen
> > + *
> > + * @fman:	A Pointer to FMan device
> > + *
> > + * Get the handle to KeyGen module part of FM driver
> > + *
> > + * Return: Handle to KeyGen
> > + */
> > +struct fman_keygen *fman_get_keygen(struct fman *fman)
> > +{
> > +	return fman->keygen;
> > +}
> > +EXPORT_SYMBOL(fman_get_keygen);
> 
> Please don't do this.
> 
> Just directly derefence the pointer in the source code to
> get the keygen.
> 
> Thank you.

Hi,

The struct fman is only visible in the fman file, the fman port module uses struct
fman as an opaque pointer, thus this export.

Madalin

^ permalink raw reply

* [Patch net-next] net_sched: remove tc class reference counting
From: Cong Wang @ 2017-08-23  4:38 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Jamal Hadi Salim

For TC classes, their ->get() and ->put() are always paired, and the
reference counting is completely useless, because:

1) For class modification and dumping paths, we already hold RTNL lock,
   so all of these ->get(),->change(),->put() are atomic.

2) For filter bindiing/unbinding, we use other reference counter than
   this one, and they should have RTNL lock too.

3) For ->qlen_notify(), it is special because it is called on ->enqueue()
   path, but we already hold qdisc tree lock there, and we hold this
   tree lock when graft or delete the class too, so it should not be gone
   or changed until we release the tree lock.

Therefore, this patch removes ->get() and ->put(), but:

1) Adds a new ->find() to find the pointer to a class by classid, no
   refcnt.

2) Move the original class destroy upon the last refcnt into ->delete(),
   right after releasing tree lock. This is fine because the class is
   already removed from hash when holding the lock.

For those who also use ->put() as ->unbind(), just rename them to reflect
this change.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/sch_generic.h |  3 +--
 net/sched/cls_api.c       | 17 ++++++-----------
 net/sched/sch_api.c       | 21 ++++++++-------------
 net/sched/sch_atm.c       | 30 ++++++++++++++++--------------
 net/sched/sch_cbq.c       | 41 ++++-------------------------------------
 net/sched/sch_drr.c       | 30 +++++-------------------------
 net/sched/sch_dsmark.c    | 17 ++++++++---------
 net/sched/sch_fq_codel.c  |  9 ++++-----
 net/sched/sch_hfsc.c      | 32 +++++---------------------------
 net/sched/sch_htb.c       | 33 +++++++--------------------------
 net/sched/sch_ingress.c   | 20 +++++++++-----------
 net/sched/sch_mq.c        |  9 ++-------
 net/sched/sch_mqprio.c    |  9 ++-------
 net/sched/sch_multiq.c    | 11 +++++------
 net/sched/sch_netem.c     |  9 ++-------
 net/sched/sch_prio.c      | 11 +++++------
 net/sched/sch_qfq.c       | 30 +++++-------------------------
 net/sched/sch_red.c       |  9 ++-------
 net/sched/sch_sfb.c       |  9 ++++-----
 net/sched/sch_sfq.c       |  9 ++++-----
 net/sched/sch_tbf.c       |  9 ++-------
 21 files changed, 106 insertions(+), 262 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1688f0f6c7ba..d817585aa5df 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -147,8 +147,7 @@ struct Qdisc_class_ops {
 	void			(*qlen_notify)(struct Qdisc *, unsigned long);
 
 	/* Class manipulation routines */
-	unsigned long		(*get)(struct Qdisc *, u32 classid);
-	void			(*put)(struct Qdisc *, unsigned long);
+	unsigned long		(*find)(struct Qdisc *, u32 classid);
 	int			(*change)(struct Qdisc *, u32, u32,
 					struct nlattr **, unsigned long *);
 	int			(*delete)(struct Qdisc *, unsigned long);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index eef6b077f30e..d470a4e2de58 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -586,7 +586,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 	/* Do we search for filter, attached to class? */
 	if (TC_H_MIN(parent)) {
-		cl = cops->get(q, parent);
+		cl = cops->find(q, parent);
 		if (cl == 0)
 			return -ENOENT;
 	}
@@ -716,8 +716,6 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
-	if (cl)
-		cops->put(q, cl);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
@@ -822,17 +820,17 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		goto out;
 	cops = q->ops->cl_ops;
 	if (!cops)
-		goto errout;
+		goto out;
 	if (!cops->tcf_block)
-		goto errout;
+		goto out;
 	if (TC_H_MIN(tcm->tcm_parent)) {
-		cl = cops->get(q, tcm->tcm_parent);
+		cl = cops->find(q, tcm->tcm_parent);
 		if (cl == 0)
-			goto errout;
+			goto out;
 	}
 	block = cops->tcf_block(q, cl);
 	if (!block)
-		goto errout;
+		goto out;
 
 	index_start = cb->args[0];
 	index = 0;
@@ -847,9 +845,6 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 
 	cb->args[0] = index;
 
-errout:
-	if (cl)
-		cops->put(q, cl);
 out:
 	return skb->len;
 }
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 0fea0c50b763..6e7f70fb85de 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -160,7 +160,7 @@ int register_qdisc(struct Qdisc_ops *qops)
 	if (qops->cl_ops) {
 		const struct Qdisc_class_ops *cops = qops->cl_ops;
 
-		if (!(cops->get && cops->put && cops->walk && cops->leaf))
+		if (!(cops->find && cops->walk && cops->leaf))
 			goto out_einval;
 
 		if (cops->tcf_block && !(cops->bind_tcf && cops->unbind_tcf))
@@ -327,12 +327,11 @@ static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 
 	if (cops == NULL)
 		return NULL;
-	cl = cops->get(p, classid);
+	cl = cops->find(p, classid);
 
 	if (cl == 0)
 		return NULL;
 	leaf = cops->leaf(p, cl);
-	cops->put(p, cl);
 	return leaf;
 }
 
@@ -777,9 +776,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
 		}
 		cops = sch->ops->cl_ops;
 		if (notify && cops->qlen_notify) {
-			cl = cops->get(sch, parentid);
+			cl = cops->find(sch, parentid);
 			cops->qlen_notify(sch, cl);
-			cops->put(sch, cl);
 		}
 		sch->q.qlen -= n;
 		sch->qstats.backlog -= len;
@@ -871,11 +869,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 
 		err = -EOPNOTSUPP;
 		if (cops && cops->graft) {
-			unsigned long cl = cops->get(parent, classid);
-			if (cl) {
+			unsigned long cl = cops->find(parent, classid);
+
+			if (cl)
 				err = cops->graft(parent, cl, new, &old);
-				cops->put(parent, cl);
-			} else
+			else
 				err = -ENOENT;
 		}
 		if (!err)
@@ -1664,7 +1662,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n,
 		clid = TC_H_MAKE(qid, clid);
 
 	if (clid)
-		cl = cops->get(q, clid);
+		cl = cops->find(q, clid);
 
 	if (cl == 0) {
 		err = -ENOENT;
@@ -1703,9 +1701,6 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n,
 		tclass_notify(net, skb, n, q, new_cl, RTM_NEWTCLASS);
 
 out:
-	if (cl)
-		cops->put(q, cl);
-
 	return err;
 }
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 2732950766a9..c5fcdf1a58a0 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -108,23 +108,29 @@ static struct Qdisc *atm_tc_leaf(struct Qdisc *sch, unsigned long cl)
 	return flow ? flow->q : NULL;
 }
 
-static unsigned long atm_tc_get(struct Qdisc *sch, u32 classid)
+static unsigned long atm_tc_find(struct Qdisc *sch, u32 classid)
 {
 	struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch);
 	struct atm_flow_data *flow;
 
-	pr_debug("atm_tc_get(sch %p,[qdisc %p],classid %x)\n", sch, p, classid);
+	pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid);
 	flow = lookup_flow(sch, classid);
-	if (flow)
-		flow->ref++;
-	pr_debug("atm_tc_get: flow %p\n", flow);
+	pr_debug("%s: flow %p\n", __func__, flow);
 	return (unsigned long)flow;
 }
 
 static unsigned long atm_tc_bind_filter(struct Qdisc *sch,
 					unsigned long parent, u32 classid)
 {
-	return atm_tc_get(sch, classid);
+	struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch);
+	struct atm_flow_data *flow;
+
+	pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid);
+	flow = lookup_flow(sch, classid);
+	if (flow)
+		flow->ref++;
+	pr_debug("%s: flow %p\n", __func__, flow);
+	return (unsigned long)flow;
 }
 
 /*
@@ -234,7 +240,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 		excess = NULL;
 	else {
 		excess = (struct atm_flow_data *)
-			atm_tc_get(sch, nla_get_u32(tb[TCA_ATM_EXCESS]));
+			atm_tc_find(sch, nla_get_u32(tb[TCA_ATM_EXCESS]));
 		if (!excess)
 			return -ENOENT;
 	}
@@ -262,10 +268,9 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 
 		for (i = 1; i < 0x8000; i++) {
 			classid = TC_H_MAKE(sch->handle, 0x8000 | i);
-			cl = atm_tc_get(sch, classid);
+			cl = atm_tc_find(sch, classid);
 			if (!cl)
 				break;
-			atm_tc_put(sch, cl);
 		}
 	}
 	pr_debug("atm_tc_change: new id %x\n", classid);
@@ -305,8 +310,6 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 	*arg = (unsigned long)flow;
 	return 0;
 err_out:
-	if (excess)
-		atm_tc_put(sch, (unsigned long)excess);
 	sockfd_put(sock);
 	return error;
 }
@@ -377,7 +380,7 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	result = TC_ACT_OK;	/* be nice to gcc */
 	flow = NULL;
 	if (TC_H_MAJ(skb->priority) != sch->handle ||
-	    !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) {
+	    !(flow = (struct atm_flow_data *)atm_tc_find(sch, skb->priority))) {
 		struct tcf_proto *fl;
 
 		list_for_each_entry(flow, &p->flows, list) {
@@ -655,8 +658,7 @@ static int atm_tc_dump(struct Qdisc *sch, struct sk_buff *skb)
 static const struct Qdisc_class_ops atm_class_ops = {
 	.graft		= atm_tc_graft,
 	.leaf		= atm_tc_leaf,
-	.get		= atm_tc_get,
-	.put		= atm_tc_put,
+	.find		= atm_tc_find,
 	.change		= atm_tc_change,
 	.delete		= atm_tc_delete,
 	.walk		= atm_tc_walk,
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 1bdb0106f342..3ec8bec109bb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -129,7 +129,6 @@ struct cbq_class {
 	struct tcf_proto __rcu	*filter_list;
 	struct tcf_block	*block;
 
-	int			refcnt;
 	int			filters;
 
 	struct cbq_class	*defaults[TC_PRIO_MAX + 1];
@@ -1155,7 +1154,6 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt)
 	if (err < 0)
 		goto put_rtab;
 
-	q->link.refcnt = 1;
 	q->link.sibling = &q->link;
 	q->link.common.classid = sch->handle;
 	q->link.qdisc = sch;
@@ -1388,16 +1386,11 @@ static void cbq_qlen_notify(struct Qdisc *sch, unsigned long arg)
 	cbq_deactivate_class(cl);
 }
 
-static unsigned long cbq_get(struct Qdisc *sch, u32 classid)
+static unsigned long cbq_find(struct Qdisc *sch, u32 classid)
 {
 	struct cbq_sched_data *q = qdisc_priv(sch);
-	struct cbq_class *cl = cbq_class_lookup(q, classid);
 
-	if (cl) {
-		cl->refcnt++;
-		return (unsigned long)cl;
-	}
-	return 0;
+	return (unsigned long)cbq_class_lookup(q, classid);
 }
 
 static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
@@ -1443,25 +1436,6 @@ static void cbq_destroy(struct Qdisc *sch)
 	qdisc_class_hash_destroy(&q->clhash);
 }
 
-static void cbq_put(struct Qdisc *sch, unsigned long arg)
-{
-	struct cbq_class *cl = (struct cbq_class *)arg;
-
-	if (--cl->refcnt == 0) {
-#ifdef CONFIG_NET_CLS_ACT
-		spinlock_t *root_lock = qdisc_root_sleeping_lock(sch);
-		struct cbq_sched_data *q = qdisc_priv(sch);
-
-		spin_lock_bh(root_lock);
-		if (q->rx_class == cl)
-			q->rx_class = NULL;
-		spin_unlock_bh(root_lock);
-#endif
-
-		cbq_destroy_class(sch, cl);
-	}
-}
-
 static int
 cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **tca,
 		 unsigned long *arg)
@@ -1608,7 +1582,6 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 
 	cl->R_tab = rtab;
 	rtab = NULL;
-	cl->refcnt = 1;
 	cl->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid);
 	if (!cl->q)
 		cl->q = &noop_qdisc;
@@ -1689,12 +1662,7 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg)
 	cbq_rmprio(q, cl);
 	sch_tree_unlock(sch);
 
-	BUG_ON(--cl->refcnt == 0);
-	/*
-	 * This shouldn't happen: we "hold" one cops->get() when called
-	 * from tc_ctl_tclass; the destroy method is done from cops->put().
-	 */
-
+	cbq_destroy_class(sch, cl);
 	return 0;
 }
 
@@ -1760,8 +1728,7 @@ static const struct Qdisc_class_ops cbq_class_ops = {
 	.graft		=	cbq_graft,
 	.leaf		=	cbq_leaf,
 	.qlen_notify	=	cbq_qlen_notify,
-	.get		=	cbq_get,
-	.put		=	cbq_put,
+	.find		=	cbq_find,
 	.change		=	cbq_change_class,
 	.delete		=	cbq_delete,
 	.walk		=	cbq_walk,
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 1d2f6235dfcf..2d0e8d4bdc29 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -20,7 +20,6 @@
 
 struct drr_class {
 	struct Qdisc_class_common	common;
-	unsigned int			refcnt;
 	unsigned int			filter_cnt;
 
 	struct gnet_stats_basic_packed		bstats;
@@ -111,7 +110,6 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
-	cl->refcnt	   = 1;
 	cl->common.classid = classid;
 	cl->quantum	   = quantum;
 	cl->qdisc	   = qdisc_create_dflt(sch->dev_queue,
@@ -163,32 +161,15 @@ static int drr_delete_class(struct Qdisc *sch, unsigned long arg)
 	drr_purge_queue(cl);
 	qdisc_class_hash_remove(&q->clhash, &cl->common);
 
-	BUG_ON(--cl->refcnt == 0);
-	/*
-	 * This shouldn't happen: we "hold" one cops->get() when called
-	 * from tc_ctl_tclass; the destroy method is done from cops->put().
-	 */
-
 	sch_tree_unlock(sch);
-	return 0;
-}
-
-static unsigned long drr_get_class(struct Qdisc *sch, u32 classid)
-{
-	struct drr_class *cl = drr_find_class(sch, classid);
 
-	if (cl != NULL)
-		cl->refcnt++;
-
-	return (unsigned long)cl;
+	drr_destroy_class(sch, cl);
+	return 0;
 }
 
-static void drr_put_class(struct Qdisc *sch, unsigned long arg)
+static unsigned long drr_search_class(struct Qdisc *sch, u32 classid)
 {
-	struct drr_class *cl = (struct drr_class *)arg;
-
-	if (--cl->refcnt == 0)
-		drr_destroy_class(sch, cl);
+	return (unsigned long)drr_find_class(sch, classid);
 }
 
 static struct tcf_block *drr_tcf_block(struct Qdisc *sch, unsigned long cl)
@@ -478,8 +459,7 @@ static void drr_destroy_qdisc(struct Qdisc *sch)
 static const struct Qdisc_class_ops drr_class_ops = {
 	.change		= drr_change_class,
 	.delete		= drr_delete_class,
-	.get		= drr_get_class,
-	.put		= drr_put_class,
+	.find		= drr_search_class,
 	.tcf_block	= drr_tcf_block,
 	.bind_tcf	= drr_bind_tcf,
 	.unbind_tcf	= drr_unbind_tcf,
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 6d94fcc3592a..2836c80c7aa5 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -85,21 +85,21 @@ static struct Qdisc *dsmark_leaf(struct Qdisc *sch, unsigned long arg)
 	return p->q;
 }
 
-static unsigned long dsmark_get(struct Qdisc *sch, u32 classid)
+static unsigned long dsmark_find(struct Qdisc *sch, u32 classid)
 {
-	pr_debug("%s(sch %p,[qdisc %p],classid %x)\n",
-		 __func__, sch, qdisc_priv(sch), classid);
-
 	return TC_H_MIN(classid) + 1;
 }
 
 static unsigned long dsmark_bind_filter(struct Qdisc *sch,
 					unsigned long parent, u32 classid)
 {
-	return dsmark_get(sch, classid);
+	pr_debug("%s(sch %p,[qdisc %p],classid %x)\n",
+		 __func__, sch, qdisc_priv(sch), classid);
+
+	return dsmark_find(sch, classid);
 }
 
-static void dsmark_put(struct Qdisc *sch, unsigned long cl)
+static void dsmark_unbind_filter(struct Qdisc *sch, unsigned long cl)
 {
 }
 
@@ -469,14 +469,13 @@ static int dsmark_dump(struct Qdisc *sch, struct sk_buff *skb)
 static const struct Qdisc_class_ops dsmark_class_ops = {
 	.graft		=	dsmark_graft,
 	.leaf		=	dsmark_leaf,
-	.get		=	dsmark_get,
-	.put		=	dsmark_put,
+	.find		=	dsmark_find,
 	.change		=	dsmark_change,
 	.delete		=	dsmark_delete,
 	.walk		=	dsmark_walk,
 	.tcf_block	=	dsmark_tcf_block,
 	.bind_tcf	=	dsmark_bind_filter,
-	.unbind_tcf	=	dsmark_put,
+	.unbind_tcf	=	dsmark_unbind_filter,
 	.dump		=	dsmark_dump_class,
 };
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 337f2d6d81e4..7699b50688cd 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -579,7 +579,7 @@ static struct Qdisc *fq_codel_leaf(struct Qdisc *sch, unsigned long arg)
 	return NULL;
 }
 
-static unsigned long fq_codel_get(struct Qdisc *sch, u32 classid)
+static unsigned long fq_codel_find(struct Qdisc *sch, u32 classid)
 {
 	return 0;
 }
@@ -592,7 +592,7 @@ static unsigned long fq_codel_bind(struct Qdisc *sch, unsigned long parent,
 	return 0;
 }
 
-static void fq_codel_put(struct Qdisc *q, unsigned long cl)
+static void fq_codel_unbind(struct Qdisc *q, unsigned long cl)
 {
 }
 
@@ -683,11 +683,10 @@ static void fq_codel_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 
 static const struct Qdisc_class_ops fq_codel_class_ops = {
 	.leaf		=	fq_codel_leaf,
-	.get		=	fq_codel_get,
-	.put		=	fq_codel_put,
+	.find		=	fq_codel_find,
 	.tcf_block	=	fq_codel_tcf_block,
 	.bind_tcf	=	fq_codel_bind,
-	.unbind_tcf	=	fq_codel_put,
+	.unbind_tcf	=	fq_codel_unbind,
 	.dump		=	fq_codel_dump_class,
 	.dump_stats	=	fq_codel_dump_class_stats,
 	.walk		=	fq_codel_walk,
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 15f09cb9f1ff..7c7820d0fdc7 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -110,7 +110,6 @@ enum hfsc_class_flags {
 
 struct hfsc_class {
 	struct Qdisc_class_common cl_common;
-	unsigned int	refcnt;		/* usage count */
 
 	struct gnet_stats_basic_packed bstats;
 	struct gnet_stats_queue qstats;
@@ -1045,7 +1044,6 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		hfsc_change_usc(cl, usc, 0);
 
 	cl->cl_common.classid = classid;
-	cl->refcnt    = 1;
 	cl->sched     = q;
 	cl->cl_parent = parent;
 	cl->qdisc = qdisc_create_dflt(sch->dev_queue,
@@ -1101,13 +1099,9 @@ hfsc_delete_class(struct Qdisc *sch, unsigned long arg)
 	hfsc_purge_queue(sch, cl);
 	qdisc_class_hash_remove(&q->clhash, &cl->cl_common);
 
-	BUG_ON(--cl->refcnt == 0);
-	/*
-	 * This shouldn't happen: we "hold" one cops->get() when called
-	 * from tc_ctl_tclass; the destroy method is done from cops->put().
-	 */
-
 	sch_tree_unlock(sch);
+
+	hfsc_destroy_class(sch, cl);
 	return 0;
 }
 
@@ -1208,23 +1202,9 @@ hfsc_qlen_notify(struct Qdisc *sch, unsigned long arg)
 }
 
 static unsigned long
-hfsc_get_class(struct Qdisc *sch, u32 classid)
-{
-	struct hfsc_class *cl = hfsc_find_class(classid, sch);
-
-	if (cl != NULL)
-		cl->refcnt++;
-
-	return (unsigned long)cl;
-}
-
-static void
-hfsc_put_class(struct Qdisc *sch, unsigned long arg)
+hfsc_search_class(struct Qdisc *sch, u32 classid)
 {
-	struct hfsc_class *cl = (struct hfsc_class *)arg;
-
-	if (--cl->refcnt == 0)
-		hfsc_destroy_class(sch, cl);
+	return (unsigned long)hfsc_find_class(classid, sch);
 }
 
 static unsigned long
@@ -1413,7 +1393,6 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
 		goto err_tcf;
 
 	q->root.cl_common.classid = sch->handle;
-	q->root.refcnt  = 1;
 	q->root.sched   = q;
 	q->root.qdisc = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops,
 					  sch->handle);
@@ -1661,8 +1640,7 @@ static const struct Qdisc_class_ops hfsc_class_ops = {
 	.graft		= hfsc_graft_class,
 	.leaf		= hfsc_class_leaf,
 	.qlen_notify	= hfsc_qlen_notify,
-	.get		= hfsc_get_class,
-	.put		= hfsc_put_class,
+	.find		= hfsc_search_class,
 	.bind_tcf	= hfsc_bind_tcf,
 	.unbind_tcf	= hfsc_unbind_tcf,
 	.tcf_block	= hfsc_tcf_block,
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index dcf3c85e1f4f..f955b59d3c7c 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -107,7 +107,6 @@ struct htb_class {
 	struct tcf_proto __rcu	*filter_list;	/* class attached filters */
 	struct tcf_block	*block;
 	int			filter_cnt;
-	int			refcnt;		/* usage count of this class */
 
 	int			level;		/* our level (see above) */
 	unsigned int		children;
@@ -193,6 +192,10 @@ static inline struct htb_class *htb_find(u32 handle, struct Qdisc *sch)
 	return container_of(clc, struct htb_class, common);
 }
 
+static unsigned long htb_search(struct Qdisc *sch, u32 handle)
+{
+	return (unsigned long)htb_find(handle, sch);
+}
 /**
  * htb_classify - classify a packet into class
  *
@@ -1189,14 +1192,6 @@ static void htb_qlen_notify(struct Qdisc *sch, unsigned long arg)
 	htb_deactivate(qdisc_priv(sch), cl);
 }
 
-static unsigned long htb_get(struct Qdisc *sch, u32 classid)
-{
-	struct htb_class *cl = htb_find(classid, sch);
-	if (cl)
-		cl->refcnt++;
-	return (unsigned long)cl;
-}
-
 static inline int htb_parent_last_child(struct htb_class *cl)
 {
 	if (!cl->parent)
@@ -1316,22 +1311,10 @@ static int htb_delete(struct Qdisc *sch, unsigned long arg)
 	if (last_child)
 		htb_parent_to_leaf(q, cl, new_q);
 
-	BUG_ON(--cl->refcnt == 0);
-	/*
-	 * This shouldn't happen: we "hold" one cops->get() when called
-	 * from tc_ctl_tclass; the destroy method is done from cops->put().
-	 */
-
 	sch_tree_unlock(sch);
-	return 0;
-}
 
-static void htb_put(struct Qdisc *sch, unsigned long arg)
-{
-	struct htb_class *cl = (struct htb_class *)arg;
-
-	if (--cl->refcnt == 0)
-		htb_destroy_class(sch, cl);
+	htb_destroy_class(sch, cl);
+	return 0;
 }
 
 static int htb_change_class(struct Qdisc *sch, u32 classid,
@@ -1422,7 +1405,6 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			}
 		}
 
-		cl->refcnt = 1;
 		cl->children = 0;
 		INIT_LIST_HEAD(&cl->un.leaf.drop_list);
 		RB_CLEAR_NODE(&cl->pq_node);
@@ -1598,8 +1580,7 @@ static const struct Qdisc_class_ops htb_class_ops = {
 	.graft		=	htb_graft,
 	.leaf		=	htb_leaf,
 	.qlen_notify	=	htb_qlen_notify,
-	.get		=	htb_get,
-	.put		=	htb_put,
+	.find		=	htb_search,
 	.change		=	htb_change_class,
 	.delete		=	htb_delete,
 	.walk		=	htb_walk,
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index a15c543c3569..44de4ee51ce9 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -27,7 +27,7 @@ static struct Qdisc *ingress_leaf(struct Qdisc *sch, unsigned long arg)
 	return NULL;
 }
 
-static unsigned long ingress_get(struct Qdisc *sch, u32 classid)
+static unsigned long ingress_find(struct Qdisc *sch, u32 classid)
 {
 	return TC_H_MIN(classid) + 1;
 }
@@ -35,10 +35,10 @@ static unsigned long ingress_get(struct Qdisc *sch, u32 classid)
 static unsigned long ingress_bind_filter(struct Qdisc *sch,
 					 unsigned long parent, u32 classid)
 {
-	return ingress_get(sch, classid);
+	return ingress_find(sch, classid);
 }
 
-static void ingress_put(struct Qdisc *sch, unsigned long cl)
+static void ingress_unbind_filter(struct Qdisc *sch, unsigned long cl)
 {
 }
 
@@ -94,12 +94,11 @@ static int ingress_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 static const struct Qdisc_class_ops ingress_class_ops = {
 	.leaf		=	ingress_leaf,
-	.get		=	ingress_get,
-	.put		=	ingress_put,
+	.find		=	ingress_find,
 	.walk		=	ingress_walk,
 	.tcf_block	=	ingress_tcf_block,
 	.bind_tcf	=	ingress_bind_filter,
-	.unbind_tcf	=	ingress_put,
+	.unbind_tcf	=	ingress_unbind_filter,
 };
 
 static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
@@ -117,7 +116,7 @@ struct clsact_sched_data {
 	struct tcf_block *egress_block;
 };
 
-static unsigned long clsact_get(struct Qdisc *sch, u32 classid)
+static unsigned long clsact_find(struct Qdisc *sch, u32 classid)
 {
 	switch (TC_H_MIN(classid)) {
 	case TC_H_MIN(TC_H_MIN_INGRESS):
@@ -131,7 +130,7 @@ static unsigned long clsact_get(struct Qdisc *sch, u32 classid)
 static unsigned long clsact_bind_filter(struct Qdisc *sch,
 					unsigned long parent, u32 classid)
 {
-	return clsact_get(sch, classid);
+	return clsact_find(sch, classid);
 }
 
 static struct tcf_block *clsact_tcf_block(struct Qdisc *sch, unsigned long cl)
@@ -183,12 +182,11 @@ static void clsact_destroy(struct Qdisc *sch)
 
 static const struct Qdisc_class_ops clsact_class_ops = {
 	.leaf		=	ingress_leaf,
-	.get		=	clsact_get,
-	.put		=	ingress_put,
+	.find		=	clsact_find,
 	.walk		=	ingress_walk,
 	.tcf_block	=	clsact_tcf_block,
 	.bind_tcf	=	clsact_bind_filter,
-	.unbind_tcf	=	ingress_put,
+	.unbind_tcf	=	ingress_unbind_filter,
 };
 
 static struct Qdisc_ops clsact_qdisc_ops __read_mostly = {
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index cadfdd4f1e52..f3a3e507422b 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -165,7 +165,7 @@ static struct Qdisc *mq_leaf(struct Qdisc *sch, unsigned long cl)
 	return dev_queue->qdisc_sleeping;
 }
 
-static unsigned long mq_get(struct Qdisc *sch, u32 classid)
+static unsigned long mq_find(struct Qdisc *sch, u32 classid)
 {
 	unsigned int ntx = TC_H_MIN(classid);
 
@@ -174,10 +174,6 @@ static unsigned long mq_get(struct Qdisc *sch, u32 classid)
 	return ntx;
 }
 
-static void mq_put(struct Qdisc *sch, unsigned long cl)
-{
-}
-
 static int mq_dump_class(struct Qdisc *sch, unsigned long cl,
 			 struct sk_buff *skb, struct tcmsg *tcm)
 {
@@ -223,8 +219,7 @@ static const struct Qdisc_class_ops mq_class_ops = {
 	.select_queue	= mq_select_queue,
 	.graft		= mq_graft,
 	.leaf		= mq_leaf,
-	.get		= mq_get,
-	.put		= mq_put,
+	.find		= mq_find,
 	.walk		= mq_walk,
 	.dump		= mq_dump_class,
 	.dump_stats	= mq_dump_class_stats,
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 2165a05994b7..6bcdfe6e7b63 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -277,7 +277,7 @@ static struct Qdisc *mqprio_leaf(struct Qdisc *sch, unsigned long cl)
 	return dev_queue->qdisc_sleeping;
 }
 
-static unsigned long mqprio_get(struct Qdisc *sch, u32 classid)
+static unsigned long mqprio_find(struct Qdisc *sch, u32 classid)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	unsigned int ntx = TC_H_MIN(classid);
@@ -287,10 +287,6 @@ static unsigned long mqprio_get(struct Qdisc *sch, u32 classid)
 	return ntx;
 }
 
-static void mqprio_put(struct Qdisc *sch, unsigned long cl)
-{
-}
-
 static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
 			 struct sk_buff *skb, struct tcmsg *tcm)
 {
@@ -403,8 +399,7 @@ static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 static const struct Qdisc_class_ops mqprio_class_ops = {
 	.graft		= mqprio_graft,
 	.leaf		= mqprio_leaf,
-	.get		= mqprio_get,
-	.put		= mqprio_put,
+	.find		= mqprio_find,
 	.walk		= mqprio_walk,
 	.dump		= mqprio_dump_class,
 	.dump_stats	= mqprio_dump_class_stats,
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index f143b7bbaa0d..a5df979b6248 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -306,7 +306,7 @@ multiq_leaf(struct Qdisc *sch, unsigned long arg)
 	return q->queues[band];
 }
 
-static unsigned long multiq_get(struct Qdisc *sch, u32 classid)
+static unsigned long multiq_find(struct Qdisc *sch, u32 classid)
 {
 	struct multiq_sched_data *q = qdisc_priv(sch);
 	unsigned long band = TC_H_MIN(classid);
@@ -319,11 +319,11 @@ static unsigned long multiq_get(struct Qdisc *sch, u32 classid)
 static unsigned long multiq_bind(struct Qdisc *sch, unsigned long parent,
 				 u32 classid)
 {
-	return multiq_get(sch, classid);
+	return multiq_find(sch, classid);
 }
 
 
-static void multiq_put(struct Qdisc *q, unsigned long cl)
+static void multiq_unbind(struct Qdisc *q, unsigned long cl)
 {
 }
 
@@ -385,12 +385,11 @@ static struct tcf_block *multiq_tcf_block(struct Qdisc *sch, unsigned long cl)
 static const struct Qdisc_class_ops multiq_class_ops = {
 	.graft		=	multiq_graft,
 	.leaf		=	multiq_leaf,
-	.get		=	multiq_get,
-	.put		=	multiq_put,
+	.find		=	multiq_find,
 	.walk		=	multiq_walk,
 	.tcf_block	=	multiq_tcf_block,
 	.bind_tcf	=	multiq_bind,
-	.unbind_tcf	=	multiq_put,
+	.unbind_tcf	=	multiq_unbind,
 	.dump		=	multiq_dump_class,
 	.dump_stats	=	multiq_dump_class_stats,
 };
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 1b3dd6190e93..cf5aad0aabfc 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1096,15 +1096,11 @@ static struct Qdisc *netem_leaf(struct Qdisc *sch, unsigned long arg)
 	return q->qdisc;
 }
 
-static unsigned long netem_get(struct Qdisc *sch, u32 classid)
+static unsigned long netem_find(struct Qdisc *sch, u32 classid)
 {
 	return 1;
 }
 
-static void netem_put(struct Qdisc *sch, unsigned long arg)
-{
-}
-
 static void netem_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 {
 	if (!walker->stop) {
@@ -1120,8 +1116,7 @@ static void netem_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 static const struct Qdisc_class_ops netem_class_ops = {
 	.graft		=	netem_graft,
 	.leaf		=	netem_leaf,
-	.get		=	netem_get,
-	.put		=	netem_put,
+	.find		=	netem_find,
 	.walk		=	netem_walk,
 	.dump		=	netem_dump_class,
 };
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index e3e364cc9a70..f31b28f788c0 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -260,7 +260,7 @@ prio_leaf(struct Qdisc *sch, unsigned long arg)
 	return q->queues[band];
 }
 
-static unsigned long prio_get(struct Qdisc *sch, u32 classid)
+static unsigned long prio_find(struct Qdisc *sch, u32 classid)
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
 	unsigned long band = TC_H_MIN(classid);
@@ -272,11 +272,11 @@ static unsigned long prio_get(struct Qdisc *sch, u32 classid)
 
 static unsigned long prio_bind(struct Qdisc *sch, unsigned long parent, u32 classid)
 {
-	return prio_get(sch, classid);
+	return prio_find(sch, classid);
 }
 
 
-static void prio_put(struct Qdisc *q, unsigned long cl)
+static void prio_unbind(struct Qdisc *q, unsigned long cl)
 {
 }
 
@@ -338,12 +338,11 @@ static struct tcf_block *prio_tcf_block(struct Qdisc *sch, unsigned long cl)
 static const struct Qdisc_class_ops prio_class_ops = {
 	.graft		=	prio_graft,
 	.leaf		=	prio_leaf,
-	.get		=	prio_get,
-	.put		=	prio_put,
+	.find		=	prio_find,
 	.walk		=	prio_walk,
 	.tcf_block	=	prio_tcf_block,
 	.bind_tcf	=	prio_bind,
-	.unbind_tcf	=	prio_put,
+	.unbind_tcf	=	prio_unbind,
 	.dump		=	prio_dump_class,
 	.dump_stats	=	prio_dump_class_stats,
 };
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 9caa959f91e1..cd661a7f81e6 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -132,7 +132,6 @@ struct qfq_aggregate;
 struct qfq_class {
 	struct Qdisc_class_common common;
 
-	unsigned int refcnt;
 	unsigned int filter_cnt;
 
 	struct gnet_stats_basic_packed bstats;
@@ -477,7 +476,6 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
-	cl->refcnt = 1;
 	cl->common.classid = classid;
 	cl->deficit = lmax;
 
@@ -555,32 +553,15 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg)
 	qfq_purge_queue(cl);
 	qdisc_class_hash_remove(&q->clhash, &cl->common);
 
-	BUG_ON(--cl->refcnt == 0);
-	/*
-	 * This shouldn't happen: we "hold" one cops->get() when called
-	 * from tc_ctl_tclass; the destroy method is done from cops->put().
-	 */
-
 	sch_tree_unlock(sch);
-	return 0;
-}
-
-static unsigned long qfq_get_class(struct Qdisc *sch, u32 classid)
-{
-	struct qfq_class *cl = qfq_find_class(sch, classid);
-
-	if (cl != NULL)
-		cl->refcnt++;
 
-	return (unsigned long)cl;
+	qfq_destroy_class(sch, cl);
+	return 0;
 }
 
-static void qfq_put_class(struct Qdisc *sch, unsigned long arg)
+static unsigned long qfq_search_class(struct Qdisc *sch, u32 classid)
 {
-	struct qfq_class *cl = (struct qfq_class *)arg;
-
-	if (--cl->refcnt == 0)
-		qfq_destroy_class(sch, cl);
+	return (unsigned long)qfq_find_class(sch, classid);
 }
 
 static struct tcf_block *qfq_tcf_block(struct Qdisc *sch, unsigned long cl)
@@ -1510,8 +1491,7 @@ static void qfq_destroy_qdisc(struct Qdisc *sch)
 static const struct Qdisc_class_ops qfq_class_ops = {
 	.change		= qfq_change_class,
 	.delete		= qfq_delete_class,
-	.get		= qfq_get_class,
-	.put		= qfq_put_class,
+	.find		= qfq_search_class,
 	.tcf_block	= qfq_tcf_block,
 	.bind_tcf	= qfq_bind_tcf,
 	.unbind_tcf	= qfq_unbind_tcf,
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 11292adce412..93b9d70a9b28 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -311,15 +311,11 @@ static struct Qdisc *red_leaf(struct Qdisc *sch, unsigned long arg)
 	return q->qdisc;
 }
 
-static unsigned long red_get(struct Qdisc *sch, u32 classid)
+static unsigned long red_find(struct Qdisc *sch, u32 classid)
 {
 	return 1;
 }
 
-static void red_put(struct Qdisc *sch, unsigned long arg)
-{
-}
-
 static void red_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 {
 	if (!walker->stop) {
@@ -335,8 +331,7 @@ static void red_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 static const struct Qdisc_class_ops red_class_ops = {
 	.graft		=	red_graft,
 	.leaf		=	red_leaf,
-	.get		=	red_get,
-	.put		=	red_put,
+	.find		=	red_find,
 	.walk		=	red_walk,
 	.dump		=	red_dump_class,
 };
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 11fb6ec878d6..cc39e170b4aa 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -632,12 +632,12 @@ static struct Qdisc *sfb_leaf(struct Qdisc *sch, unsigned long arg)
 	return q->qdisc;
 }
 
-static unsigned long sfb_get(struct Qdisc *sch, u32 classid)
+static unsigned long sfb_find(struct Qdisc *sch, u32 classid)
 {
 	return 1;
 }
 
-static void sfb_put(struct Qdisc *sch, unsigned long arg)
+static void sfb_unbind(struct Qdisc *sch, unsigned long arg)
 {
 }
 
@@ -683,14 +683,13 @@ static unsigned long sfb_bind(struct Qdisc *sch, unsigned long parent,
 static const struct Qdisc_class_ops sfb_class_ops = {
 	.graft		=	sfb_graft,
 	.leaf		=	sfb_leaf,
-	.get		=	sfb_get,
-	.put		=	sfb_put,
+	.find		=	sfb_find,
 	.change		=	sfb_change_class,
 	.delete		=	sfb_delete,
 	.walk		=	sfb_walk,
 	.tcf_block	=	sfb_tcf_block,
 	.bind_tcf	=	sfb_bind,
-	.unbind_tcf	=	sfb_put,
+	.unbind_tcf	=	sfb_unbind,
 	.dump		=	sfb_dump_class,
 };
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 82469ef9655e..b46e0121dd1a 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -808,7 +808,7 @@ static struct Qdisc *sfq_leaf(struct Qdisc *sch, unsigned long arg)
 	return NULL;
 }
 
-static unsigned long sfq_get(struct Qdisc *sch, u32 classid)
+static unsigned long sfq_find(struct Qdisc *sch, u32 classid)
 {
 	return 0;
 }
@@ -821,7 +821,7 @@ static unsigned long sfq_bind(struct Qdisc *sch, unsigned long parent,
 	return 0;
 }
 
-static void sfq_put(struct Qdisc *q, unsigned long cl)
+static void sfq_unbind(struct Qdisc *q, unsigned long cl)
 {
 }
 
@@ -885,11 +885,10 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 
 static const struct Qdisc_class_ops sfq_class_ops = {
 	.leaf		=	sfq_leaf,
-	.get		=	sfq_get,
-	.put		=	sfq_put,
+	.find		=	sfq_find,
 	.tcf_block	=	sfq_tcf_block,
 	.bind_tcf	=	sfq_bind,
-	.unbind_tcf	=	sfq_put,
+	.unbind_tcf	=	sfq_unbind,
 	.dump		=	sfq_dump_class,
 	.dump_stats	=	sfq_dump_class_stats,
 	.walk		=	sfq_walk,
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index b2e4b6ad241a..d5dba972ab06 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -510,15 +510,11 @@ static struct Qdisc *tbf_leaf(struct Qdisc *sch, unsigned long arg)
 	return q->qdisc;
 }
 
-static unsigned long tbf_get(struct Qdisc *sch, u32 classid)
+static unsigned long tbf_find(struct Qdisc *sch, u32 classid)
 {
 	return 1;
 }
 
-static void tbf_put(struct Qdisc *sch, unsigned long arg)
-{
-}
-
 static void tbf_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 {
 	if (!walker->stop) {
@@ -534,8 +530,7 @@ static void tbf_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 static const struct Qdisc_class_ops tbf_class_ops = {
 	.graft		=	tbf_graft,
 	.leaf		=	tbf_leaf,
-	.get		=	tbf_get,
-	.put		=	tbf_put,
+	.find		=	tbf_find,
 	.walk		=	tbf_walk,
 	.dump		=	tbf_dump_class,
 };
-- 
2.13.0

^ permalink raw reply related

* Re: [PATCH v2 1/6] fsl/fman: enable FMan Keygen
From: David Miller @ 2017-08-23  4:46 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev, linuxppc-dev, linux-kernel
In-Reply-To: <AM5PR0402MB2691CF2AC9555829EC89C59CEC850@AM5PR0402MB2691.eurprd04.prod.outlook.com>

From: Madalin-cristian Bucur <madalin.bucur@nxp.com>
Date: Wed, 23 Aug 2017 04:36:56 +0000

> The struct fman is only visible in the fman file, the fman port
> module uses struct fman as an opaque pointer, thus this export.

Don't use that programming model.

Export the datastructure properly to it's users.

This abstraction scheme is so wasteful and costly.

^ permalink raw reply

* [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
From: Dexuan Cui @ 2017-08-23  4:52 UTC (permalink / raw)
  To: 'Jorgen S. Hansen', 'Stefan Hajnoczi',
	'davem@davemloft.net', 'netdev@vger.kernel.org'
  Cc: 'Michal Kubecek', 'joe@perches.com',
	'olaf@aepfle.de', Stephen Hemminger,
	'jasowang@redhat.com', KY Srinivasan, Haiyang Zhang,
	'Dave Scott', 'linux-kernel@vger.kernel.org',
	'apw@canonical.com', 'Rolf Neugebauer',
	'gregkh@linuxfoundation.org', 'Marcelo Cerri',
	'devel@linuxdriverproject.org',
	'Vitaly Kuznetsov', 'George Zhang',
	'Dan Carpenter'


Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.

More info about Hyper-V Sockets is available here:

"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Reilly Grant <grantr@vmware.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
---

Changes in v2:
	Fixed hvs_stream_allow() for cid and the comments
		Thanks Stefan Hajnoczi!

	Added proper locking when using vsock_enqueue_accept()
		Thanks Stefan Hajnoczi and Jorgen Hansen!

	The previous v1 patch is not needed any more:
 	[PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race

	Another previous v1 patch is being discussed in another thread:
	    vsock: only load vmci transport on VMware hypervisor by default


 MAINTAINERS                      |   1 +
 net/vmw_vsock/Kconfig            |  12 +
 net/vmw_vsock/Makefile           |   3 +
 net/vmw_vsock/hyperv_transport.c | 888 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 904 insertions(+)
 create mode 100644 net/vmw_vsock/hyperv_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2db0f8c..dae0573 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6279,6 +6279,7 @@ F:	drivers/net/hyperv/
 F:	drivers/scsi/storvsc_drv.c
 F:	drivers/uio/uio_hv_generic.c
 F:	drivers/video/fbdev/hyperv_fb.c
+F:	net/vmw_vsock/hyperv_transport.c
 F:	include/linux/hyperv.h
 F:	tools/hv/
 F:	Documentation/ABI/stable/sysfs-bus-vmbus
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index a7ae09d..3f52929 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -46,3 +46,15 @@ config VIRTIO_VSOCKETS_COMMON
 	  This option is selected by any driver which needs to access
 	  the virtio_vsock.  The module will be called
 	  vmw_vsock_virtio_transport_common.
+
+config HYPERV_VSOCKETS
+	tristate "Hyper-V transport for Virtual Sockets"
+	depends on VSOCKETS && HYPERV
+	help
+	  This module implements a Hyper-V transport for Virtual Sockets.
+
+	  Enable this transport if your Virtual Machine host supports Virtual
+	  Sockets over Hyper-V VMBus.
+
+	  To compile this driver as a module, choose M here: the module will be
+	  called hv_sock. If unsure, say N.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 09fc2eb..e63d574 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o
+obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o
 
 vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o
 
@@ -11,3 +12,5 @@ vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \
 vmw_vsock_virtio_transport-y += virtio_transport.o
 
 vmw_vsock_virtio_transport_common-y += virtio_transport_common.o
+
+hv_sock-y += hyperv_transport.o
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
new file mode 100644
index 0000000..597fb25
--- /dev/null
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -0,0 +1,888 @@
+/*
+ * Hyper-V transport for vsock
+ *
+ * Hyper-V Sockets supplies a byte-stream based communication mechanism
+ * between the host and the VM. This driver implements the necessary
+ * support in the VM by introducing the new vsock transport.
+ *
+ * Copyright (c) 2017, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/hyperv.h>
+#include <net/sock.h>
+#include <net/af_vsock.h>
+
+/* The host side's design of the feature requires 6 exact 4KB pages for
+ * recv/send rings respectively -- this is suboptimal considering memory
+ * consumption, however unluckily we have to live with it, before the
+ * host comes up with a better design in the future.
+ */
+#define PAGE_SIZE_4K		4096
+#define RINGBUFFER_HVS_RCV_SIZE (PAGE_SIZE_4K * 6)
+#define RINGBUFFER_HVS_SND_SIZE (PAGE_SIZE_4K * 6)
+
+/* The MTU is 16KB per the host side's design */
+#define HVS_MTU_SIZE		(1024 * 16)
+
+struct vmpipe_proto_header {
+	u32 pkt_type;
+	u32 data_size;
+};
+
+/* For recv, we use the VMBus in-place packet iterator APIs to directly copy
+ * data from the ringbuffer into the userspace buffer.
+ */
+struct hvs_recv_buf {
+	/* The header before the payload data */
+	struct vmpipe_proto_header hdr;
+
+	/* The payload */
+	u8 data[HVS_MTU_SIZE];
+};
+
+/* We can send up to HVS_MTU_SIZE bytes of payload to the host, but let's use
+ * a small size, i.e. HVS_SEND_BUF_SIZE, to minimize the dynamically-allocated
+ * buffer, because tests show there is no significant performance difference.
+ *
+ * Note: the buffer can be eliminated in the future when we add new VMBus
+ * ringbuffer APIs that allow us to directly copy data from userspace buffer
+ * to VMBus ringbuffer.
+ */
+#define HVS_SEND_BUF_SIZE (PAGE_SIZE_4K - sizeof(struct vmpipe_proto_header))
+
+struct hvs_send_buf {
+	/* The header before the payload data */
+	struct vmpipe_proto_header hdr;
+
+	/* The payload */
+	u8 data[HVS_SEND_BUF_SIZE];
+};
+
+#define HVS_HEADER_LEN	(sizeof(struct vmpacket_descriptor) + \
+			 sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write(), and
+ * __hv_pkt_iter_next().
+ */
+#define VMBUS_PKT_TRAILER	(sizeof(u64))
+
+#define HVS_PKT_LEN(payload_len)	(HVS_HEADER_LEN + \
+					 ALIGN((payload_len), 8) + \
+					 VMBUS_PKT_TRAILER)
+
+/* Per-socket state (accessed via vsk->trans) */
+struct hvsock {
+	struct vsock_sock *vsk;
+
+	uuid_le	vm_srv_id;
+	uuid_le	host_srv_id;
+
+	struct vmbus_channel *chan;
+	struct vmpacket_descriptor *recv_desc;
+
+	/* The length of the payload not delivered to userland yet */
+	u32 recv_data_len;
+	/* The offset of the payload */
+	u32 recv_data_off;
+
+	/* Have we sent the zero-length packet (FIN)? */
+	unsigned long fin_sent;
+};
+
+/* In the VM, we support Hyper-V Sockets with AF_VSOCK, and the endpoint is
+ * <cid, port> (see struct sockaddr_vm). Note: cid is not really used here:
+ * when we write apps to connect to the host, we can only use VMADDR_CID_ANY
+ * or VMADDR_CID_HOST (both are equivalent) as the remote cid, and when we
+ * write apps to bind() & listen() in the VM, we can only use VMADDR_CID_ANY
+ * as the local cid.
+ *
+ * On the host, Hyper-V Sockets are supported by Winsock AF_HYPERV:
+ * https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-
+ * guide/make-integration-service, and the endpoint is <VmID, ServiceId> with
+ * the below sockaddr:
+ *
+ * struct SOCKADDR_HV
+ * {
+ *    ADDRESS_FAMILY Family;
+ *    USHORT Reserved;
+ *    GUID VmId;
+ *    GUID ServiceId;
+ * };
+ * Note: VmID is not used by Linux VM and actually it isn't transmitted via
+ * VMBus, because here it's obvious the host and the VM can easily identify
+ * each other. Though the VmID is useful on the host, especially in the case
+ * of Windows container, Linux VM doesn't need it at all.
+ *
+ * To make use of the AF_VSOCK infrastructure in Linux VM, we have to limit
+ * the available GUID space of SOCKADDR_HV so that we can create a mapping
+ * between AF_VSOCK port and SOCKADDR_HV Service GUID. The rule of writing
+ * Hyper-V Sockets apps on the host and in Linux VM is:
+ *
+ ****************************************************************************
+ * The only valid Service GUIDs, from the perspectives of both the host and *
+ * Linux VM, that can be connected by the other end, must conform to this   *
+ * format: <port>-facb-11e6-bd58-64006a7986d3, and the "port" must be in    *
+ * this range [0, 0x7FFFFFFF].                                              *
+ ****************************************************************************
+ *
+ * When we write apps on the host to connect(), the GUID ServiceID is used.
+ * When we write apps in Linux VM to connect(), we only need to specify the
+ * port and the driver will form the GUID and use that to request the host.
+ *
+ * From the perspective of Linux VM:
+ * 1. the local ephemeral port (i.e. the local auto-bound port when we call
+ * connect() without explicit bind()) is generated by __vsock_bind_stream(),
+ * and the range is [1024, 0xFFFFFFFF).
+ * 2. the remote ephemeral port (i.e. the auto-generated remote port for
+ * a connect request initiated by the host's connect()) is generated by
+ * hvs_remote_addr_init() and the range is [0x80000000, 0xFFFFFFFF).
+ */
+
+#define MAX_LISTEN_PORT			((u32)0x7FFFFFFF)
+#define MAX_VM_LISTEN_PORT		MAX_LISTEN_PORT
+#define MAX_HOST_LISTEN_PORT		MAX_LISTEN_PORT
+#define MIN_HOST_EPHEMERAL_PORT		(MAX_HOST_LISTEN_PORT + 1)
+
+/* 00000000-facb-11e6-bd58-64006a7986d3 */
+static const uuid_le srv_id_template =
+	UUID_LE(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
+		0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
+
+static inline bool is_valid_srv_id(const uuid_le *id)
+{
+	return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
+}
+
+static inline unsigned int get_port_by_srv_id(const uuid_le *svr_id)
+{
+	return *((unsigned int *)svr_id);
+}
+
+static inline void hvs_addr_init(struct sockaddr_vm *addr,
+				 const uuid_le *svr_id)
+{
+	unsigned int port = get_port_by_srv_id(svr_id);
+
+	vsock_addr_init(addr, VMADDR_CID_ANY, port);
+}
+
+static inline void hvs_remote_addr_init(struct sockaddr_vm *remote,
+					struct sockaddr_vm *local)
+{
+	static u32 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
+	struct sock *sk;
+
+	vsock_addr_init(remote, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+
+	while (1) {
+		/* Wrap around ? */
+		if (host_ephemeral_port < MIN_HOST_EPHEMERAL_PORT ||
+		    host_ephemeral_port == VMADDR_PORT_ANY)
+			host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
+
+		remote->svm_port = host_ephemeral_port++;
+
+		sk = vsock_find_connected_socket(remote, local);
+		if (!sk) {
+			/* Found an available ephemeral port */
+			return;
+		}
+
+		/* Release refcnt got in vsock_find_connected_socket */
+		sock_put(sk);
+	}
+}
+
+static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
+{
+	set_channel_pending_send_size(chan,
+				      HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
+
+	/* See hvs_stream_has_space(): we must make sure the host has seen
+	 * the new pending send size, before we can re-check the writable
+	 * bytes.
+	 */
+	virt_mb();
+}
+
+static void hvs_clear_channel_pending_send_size(struct vmbus_channel *chan)
+{
+	set_channel_pending_send_size(chan, 0);
+
+	/* Ditto */
+	virt_mb();
+}
+
+static bool hvs_channel_readable(struct vmbus_channel *chan)
+{
+	u32 readable = hv_get_bytes_to_read(&chan->inbound);
+
+	/* 0-size payload means FIN */
+	return readable >= HVS_PKT_LEN(0);
+}
+
+static int hvs_channel_readable_payload(struct vmbus_channel *chan)
+{
+	u32 readable = hv_get_bytes_to_read(&chan->inbound);
+
+	if (readable > HVS_PKT_LEN(0)) {
+		/* At least we have 1 byte to read. We don't need to return
+		 * the exact readable bytes: see vsock_stream_recvmsg() ->
+		 * vsock_stream_has_data().
+		 */
+		return 1;
+	}
+
+	if (readable == HVS_PKT_LEN(0)) {
+		/* 0-size payload means FIN */
+		return 0;
+	}
+
+	/* No payload or FIN */
+	return -1;
+}
+
+static inline size_t hvs_channel_writable_bytes(struct vmbus_channel *chan)
+{
+	u32 writeable = hv_get_bytes_to_write(&chan->outbound);
+	size_t ret;
+
+	/* The ringbuffer mustn't be 100% full, and we should reserve a
+	 * zero-length-payload packet for the FIN: see hv_ringbuffer_write()
+	 * and hvs_shutdown().
+	 */
+	if (writeable <= HVS_PKT_LEN(1) + HVS_PKT_LEN(0))
+		return 0;
+
+	ret = writeable - HVS_PKT_LEN(1) - HVS_PKT_LEN(0);
+
+	return round_down(ret, 8);
+}
+
+static int hvs_send_data(struct vmbus_channel *chan,
+			 struct hvs_send_buf *send_buf, size_t to_write)
+{
+	send_buf->hdr.pkt_type = 1;
+	send_buf->hdr.data_size = to_write;
+	return vmbus_sendpacket(chan, &send_buf->hdr,
+				sizeof(send_buf->hdr) + to_write,
+				0, VM_PKT_DATA_INBAND, 0);
+}
+
+static void hvs_channel_cb(void *ctx)
+{
+	struct sock *sk = (struct sock *)ctx;
+	struct vsock_sock *vsk = vsock_sk(sk);
+	struct hvsock *hvs = vsk->trans;
+	struct vmbus_channel *chan = hvs->chan;
+
+	if (hvs_channel_readable(chan))
+		sk->sk_data_ready(sk);
+
+	/* See hvs_stream_has_space(): when we reach here, the writable bytes
+	 * may be already less than HVS_PKT_LEN(HVS_SEND_BUF_SIZE).
+	 */
+	if (hv_get_bytes_to_write(&chan->outbound) > 0)
+		sk->sk_write_space(sk);
+}
+
+static void hvs_close_connection(struct vmbus_channel *chan)
+{
+	struct sock *sk = get_per_channel_state(chan);
+	struct vsock_sock *vsk = vsock_sk(sk);
+
+	sk->sk_state = SS_UNCONNECTED;
+	sock_set_flag(sk, SOCK_DONE);
+	vsk->peer_shutdown |= SEND_SHUTDOWN | RCV_SHUTDOWN;
+
+	sk->sk_state_change(sk);
+}
+
+static void hvs_open_connection(struct vmbus_channel *chan)
+{
+	uuid_le *if_instance, *if_type;
+	unsigned char conn_from_host;
+
+	struct sockaddr_vm addr;
+	struct sock *sk, *new = NULL;
+	struct vsock_sock *vnew;
+	struct hvsock *hvs, *hvs_new;
+	int ret;
+
+	if_type = &chan->offermsg.offer.if_type;
+	if_instance = &chan->offermsg.offer.if_instance;
+	conn_from_host = chan->offermsg.offer.u.pipe.user_def[0];
+
+	/* The host or the VM should only listen on a port in
+	 * [0, MAX_LISTEN_PORT]
+	 */
+	if (!is_valid_srv_id(if_type) ||
+	    get_port_by_srv_id(if_type) > MAX_LISTEN_PORT)
+		return;
+
+	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
+	sk = vsock_find_bound_socket(&addr);
+	if (!sk)
+		return;
+
+	if ((conn_from_host && sk->sk_state != VSOCK_SS_LISTEN) ||
+	    (!conn_from_host && sk->sk_state != SS_CONNECTING))
+		goto out;
+
+	if (conn_from_host) {
+		if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog)
+			goto out;
+
+		new = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
+				     sk->sk_type, 0);
+		if (!new)
+			goto out;
+
+		new->sk_state = SS_CONNECTING;
+		vnew = vsock_sk(new);
+		hvs_new = vnew->trans;
+		hvs_new->chan = chan;
+	} else {
+		hvs = vsock_sk(sk)->trans;
+		hvs->chan = chan;
+	}
+
+	set_channel_read_mode(chan, HV_CALL_DIRECT);
+	ret = vmbus_open(chan, RINGBUFFER_HVS_SND_SIZE,
+			 RINGBUFFER_HVS_RCV_SIZE, NULL, 0,
+			 hvs_channel_cb, conn_from_host ? new : sk);
+	if (ret != 0) {
+		if (conn_from_host) {
+			hvs_new->chan = NULL;
+			sock_put(new);
+		} else {
+			hvs->chan = NULL;
+		}
+		goto out;
+	}
+
+	set_per_channel_state(chan, conn_from_host ? new : sk);
+	vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
+
+	if (conn_from_host) {
+		new->sk_state = SS_CONNECTED;
+		sk->sk_ack_backlog++;
+
+		hvs_addr_init(&vnew->local_addr, if_type);
+		hvs_remote_addr_init(&vnew->remote_addr, &vnew->local_addr);
+
+		hvs_new->vm_srv_id = *if_type;
+		hvs_new->host_srv_id = *if_instance;
+
+		vsock_insert_connected(vnew);
+
+		lock_sock(sk);
+		vsock_enqueue_accept(sk, new);
+		release_sock(sk);
+	} else {
+		sk->sk_state = SS_CONNECTED;
+		sk->sk_socket->state = SS_CONNECTED;
+
+		vsock_insert_connected(vsock_sk(sk));
+	}
+
+	sk->sk_state_change(sk);
+
+out:
+	/* Release refcnt obtained when we called vsock_find_bound_socket() */
+	sock_put(sk);
+}
+
+static u32 hvs_get_local_cid(void)
+{
+	return VMADDR_CID_ANY;
+}
+
+static int hvs_sock_init(struct vsock_sock *vsk, struct vsock_sock *psk)
+{
+	struct hvsock *hvs;
+
+	hvs = kzalloc(sizeof(*hvs), GFP_KERNEL);
+	if (!hvs)
+		return -ENOMEM;
+
+	vsk->trans = hvs;
+	hvs->vsk = vsk;
+
+	return 0;
+}
+
+static int hvs_connect(struct vsock_sock *vsk)
+{
+	struct hvsock *h = vsk->trans;
+
+	h->vm_srv_id = srv_id_template;
+	h->host_srv_id = srv_id_template;
+
+	*((u32 *)&h->vm_srv_id) = vsk->local_addr.svm_port;
+	*((u32 *)&h->host_srv_id) = vsk->remote_addr.svm_port;
+
+	return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id);
+}
+
+static int hvs_shutdown(struct vsock_sock *vsk, int mode)
+{
+	struct vmpipe_proto_header hdr;
+	struct hvs_send_buf *send_buf;
+	struct hvsock *hvs;
+
+	if (!(mode & SEND_SHUTDOWN))
+		return 0;
+
+	hvs = vsk->trans;
+
+	if (test_and_set_bit(0, &hvs->fin_sent))
+		return 0;
+
+	send_buf = (struct hvs_send_buf *)&hdr;
+
+	/* It can't fail: see hvs_channel_writable_bytes(). */
+	(void)hvs_send_data(hvs->chan, send_buf, 0);
+
+	return 0;
+}
+
+static void hvs_release(struct vsock_sock *vsk)
+{
+	struct hvsock *hvs = vsk->trans;
+	struct vmbus_channel *chan = hvs->chan;
+
+	if (chan)
+		hvs_shutdown(vsk, RCV_SHUTDOWN | SEND_SHUTDOWN);
+
+	vsock_remove_sock(vsk);
+}
+
+static void hvs_destruct(struct vsock_sock *vsk)
+{
+	struct hvsock *hvs = vsk->trans;
+	struct vmbus_channel *chan = hvs->chan;
+
+	if (chan)
+		vmbus_hvsock_device_unregister(chan);
+
+	kfree(hvs);
+}
+
+static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
+{
+	return -EOPNOTSUPP;
+}
+
+static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
+			     size_t len, int flags)
+{
+	return -EOPNOTSUPP;
+}
+
+static int hvs_dgram_enqueue(struct vsock_sock *vsk,
+			     struct sockaddr_vm *remote, struct msghdr *msg,
+			     size_t dgram_len)
+{
+	return -EOPNOTSUPP;
+}
+
+static bool hvs_dgram_allow(u32 cid, u32 port)
+{
+	return false;
+}
+
+static int hvs_update_recv_data(struct hvsock *hvs)
+{
+	struct hvs_recv_buf *recv_buf;
+	u32 payload_len;
+
+	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
+	payload_len = recv_buf->hdr.data_size;
+
+	if (payload_len > HVS_MTU_SIZE)
+		return -EIO;
+
+	if (payload_len == 0)
+		hvs->vsk->peer_shutdown |= SEND_SHUTDOWN;
+
+	hvs->recv_data_len = payload_len;
+	hvs->recv_data_off = 0;
+
+	return 0;
+}
+
+static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
+				  size_t len, int flags)
+{
+	struct hvsock *hvs = vsk->trans;
+	bool need_refill = !hvs->recv_desc;
+	struct hvs_recv_buf *recv_buf;
+	u32 to_read;
+	int ret;
+
+	if (flags & MSG_PEEK)
+		return -EOPNOTSUPP;
+
+	if (need_refill) {
+		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
+		ret = hvs_update_recv_data(hvs);
+		if (ret)
+			return ret;
+	}
+
+	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
+	to_read = min_t(u32, len, hvs->recv_data_len);
+	ret = memcpy_to_msg(msg, recv_buf->data + hvs->recv_data_off, to_read);
+	if (ret != 0)
+		return ret;
+
+	hvs->recv_data_len -= to_read;
+	if (hvs->recv_data_len == 0) {
+		hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
+		if (hvs->recv_desc) {
+			ret = hvs_update_recv_data(hvs);
+			if (ret)
+				return ret;
+		}
+	} else {
+		hvs->recv_data_off += to_read;
+	}
+
+	return to_read;
+}
+
+static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
+				  size_t len)
+{
+	struct hvsock *hvs = vsk->trans;
+	struct vmbus_channel *chan = hvs->chan;
+	struct hvs_send_buf *send_buf;
+	ssize_t to_write, max_writable, ret;
+
+	BUILD_BUG_ON(sizeof(*send_buf) != PAGE_SIZE_4K);
+
+	send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL);
+	if (!send_buf)
+		return -ENOMEM;
+
+	max_writable = hvs_channel_writable_bytes(chan);
+	to_write = min_t(ssize_t, len, max_writable);
+	to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
+
+	ret = memcpy_from_msg(send_buf->data, msg, to_write);
+	if (ret < 0)
+		goto out;
+
+	ret = hvs_send_data(hvs->chan, send_buf, to_write);
+	if (ret < 0)
+		goto out;
+
+	ret = to_write;
+out:
+	kfree(send_buf);
+	return ret;
+}
+
+static s64 hvs_stream_has_data(struct vsock_sock *vsk)
+{
+	struct hvsock *hvs = vsk->trans;
+	s64 ret;
+
+	if (hvs->recv_data_len > 0)
+		return 1;
+
+	switch (hvs_channel_readable_payload(hvs->chan)) {
+	case 1:
+		ret = 1;
+		break;
+	case 0:
+		vsk->peer_shutdown |= SEND_SHUTDOWN;
+		ret = 0;
+		break;
+	default: /* -1 */
+		ret = 0;
+		break;
+	}
+
+	return ret;
+}
+
+static s64 hvs_stream_has_space(struct vsock_sock *vsk)
+{
+	struct hvsock *hvs = vsk->trans;
+	struct vmbus_channel *chan = hvs->chan;
+	s64 ret;
+
+	ret = hvs_channel_writable_bytes(chan);
+	if (ret > 0)  {
+		hvs_clear_channel_pending_send_size(chan);
+	} else {
+		/* See hvs_channel_cb() */
+		hvs_set_channel_pending_send_size(chan);
+
+		/* Re-check the writable bytes to avoid race */
+		ret = hvs_channel_writable_bytes(chan);
+		if (ret > 0)
+			hvs_clear_channel_pending_send_size(chan);
+	}
+
+	return ret;
+}
+
+static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
+{
+	return HVS_MTU_SIZE + 1;
+}
+
+static bool hvs_stream_is_active(struct vsock_sock *vsk)
+{
+	struct hvsock *hvs = vsk->trans;
+
+	return hvs->chan != NULL;
+}
+
+static bool hvs_stream_allow(u32 cid, u32 port)
+{
+	/* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0xFFFFFFFF) is
+	 * reserved as ephemeral ports, which are used as the host's ports
+	 * when the host initiates connections.
+	 *
+	 * Perform this check in the guest so an immediate error is produced
+	 * instead of a timeout.
+	 */
+	if (port > MAX_HOST_LISTEN_PORT)
+		return false;
+
+	if (cid == VMADDR_CID_HOST)
+		return true;
+
+	return false;
+}
+
+static
+int hvs_notify_poll_in(struct vsock_sock *vsk, size_t target, bool *readable)
+{
+	struct hvsock *hvs = vsk->trans;
+
+	*readable = hvs_channel_readable(hvs->chan);
+	return 0;
+}
+
+static
+int hvs_notify_poll_out(struct vsock_sock *vsk, size_t target, bool *writable)
+{
+	*writable = hvs_stream_has_space(vsk) > 0;
+
+	return 0;
+}
+
+static
+int hvs_notify_recv_init(struct vsock_sock *vsk, size_t target,
+			 struct vsock_transport_recv_notify_data *d)
+{
+	return 0;
+}
+
+static
+int hvs_notify_recv_pre_block(struct vsock_sock *vsk, size_t target,
+			      struct vsock_transport_recv_notify_data *d)
+{
+	return 0;
+}
+
+static
+int hvs_notify_recv_pre_dequeue(struct vsock_sock *vsk, size_t target,
+				struct vsock_transport_recv_notify_data *d)
+{
+	return 0;
+}
+
+static
+int hvs_notify_recv_post_dequeue(struct vsock_sock *vsk, size_t target,
+				 ssize_t copied, bool data_read,
+				 struct vsock_transport_recv_notify_data *d)
+{
+	return 0;
+}
+
+static
+int hvs_notify_send_init(struct vsock_sock *vsk,
+			 struct vsock_transport_send_notify_data *d)
+{
+	return 0;
+}
+
+static
+int hvs_notify_send_pre_block(struct vsock_sock *vsk,
+			      struct vsock_transport_send_notify_data *d)
+{
+	return 0;
+}
+
+static
+int hvs_notify_send_pre_enqueue(struct vsock_sock *vsk,
+				struct vsock_transport_send_notify_data *d)
+{
+	return 0;
+}
+
+static
+int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
+				 struct vsock_transport_send_notify_data *d)
+{
+	return 0;
+}
+
+static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val)
+{
+	/* Ignored. */
+}
+
+static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
+{
+	/* Ignored. */
+}
+
+static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
+{
+	/* Ignored. */
+}
+
+static u64 hvs_get_buffer_size(struct vsock_sock *vsk)
+{
+	return -ENOPROTOOPT;
+}
+
+static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk)
+{
+	return -ENOPROTOOPT;
+}
+
+static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk)
+{
+	return -ENOPROTOOPT;
+}
+
+static struct vsock_transport hvs_transport = {
+	.get_local_cid            = hvs_get_local_cid,
+
+	.init                     = hvs_sock_init,
+	.destruct                 = hvs_destruct,
+	.release                  = hvs_release,
+	.connect                  = hvs_connect,
+	.shutdown                 = hvs_shutdown,
+
+	.dgram_bind               = hvs_dgram_bind,
+	.dgram_dequeue            = hvs_dgram_dequeue,
+	.dgram_enqueue            = hvs_dgram_enqueue,
+	.dgram_allow              = hvs_dgram_allow,
+
+	.stream_dequeue           = hvs_stream_dequeue,
+	.stream_enqueue           = hvs_stream_enqueue,
+	.stream_has_data          = hvs_stream_has_data,
+	.stream_has_space         = hvs_stream_has_space,
+	.stream_rcvhiwat          = hvs_stream_rcvhiwat,
+	.stream_is_active         = hvs_stream_is_active,
+	.stream_allow             = hvs_stream_allow,
+
+	.notify_poll_in           = hvs_notify_poll_in,
+	.notify_poll_out          = hvs_notify_poll_out,
+	.notify_recv_init         = hvs_notify_recv_init,
+	.notify_recv_pre_block    = hvs_notify_recv_pre_block,
+	.notify_recv_pre_dequeue  = hvs_notify_recv_pre_dequeue,
+	.notify_recv_post_dequeue = hvs_notify_recv_post_dequeue,
+	.notify_send_init         = hvs_notify_send_init,
+	.notify_send_pre_block    = hvs_notify_send_pre_block,
+	.notify_send_pre_enqueue  = hvs_notify_send_pre_enqueue,
+	.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
+
+	.set_buffer_size          = hvs_set_buffer_size,
+	.set_min_buffer_size      = hvs_set_min_buffer_size,
+	.set_max_buffer_size      = hvs_set_max_buffer_size,
+	.get_buffer_size          = hvs_get_buffer_size,
+	.get_min_buffer_size      = hvs_get_min_buffer_size,
+	.get_max_buffer_size      = hvs_get_max_buffer_size,
+};
+
+static int hvs_probe(struct hv_device *hdev,
+		     const struct hv_vmbus_device_id *dev_id)
+{
+	struct vmbus_channel *chan = hdev->channel;
+
+	hvs_open_connection(chan);
+
+	/* Always return success to suppress the unnecessary error message
+	 * in vmbus_probe(): on error the host will rescind the device in
+	 * 30 seconds and we can do cleanup at that time in
+	 * vmbus_onoffer_rescind().
+	 */
+	return 0;
+}
+
+static int hvs_remove(struct hv_device *hdev)
+{
+	struct vmbus_channel *chan = hdev->channel;
+
+	vmbus_close(chan);
+
+	return 0;
+}
+
+/* This isn't really used. See vmbus_match() and vmbus_probe() */
+static const struct hv_vmbus_device_id id_table[] = {
+	{},
+};
+
+static struct hv_driver hvs_drv = {
+	.name		= "hv_sock",
+	.hvsock		= true,
+	.id_table	= id_table,
+	.probe		= hvs_probe,
+	.remove		= hvs_remove,
+};
+
+static int __init hvs_init(void)
+{
+	int ret;
+
+	if (vmbus_proto_version < VERSION_WIN10)
+		return -ENODEV;
+
+	ret = vmbus_driver_register(&hvs_drv);
+	if (ret != 0)
+		return ret;
+
+	ret = vsock_core_init(&hvs_transport);
+	if (ret) {
+		vmbus_driver_unregister(&hvs_drv);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit hvs_exit(void)
+{
+	vsock_core_exit();
+	vmbus_driver_unregister(&hvs_drv);
+}
+
+module_init(hvs_init);
+module_exit(hvs_exit);
+
+MODULE_DESCRIPTION("Hyper-V Sockets");
+MODULE_VERSION("1.0.0");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_NETPROTO(PF_VSOCK);
-- 
2.7.4

^ permalink raw reply related

* Re: Something hitting my total number of connections to the server
From: Akshat Kakkar @ 2017-08-23  5:08 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Eric Dumazet, David Laight, netdev
In-Reply-To: <CADVnQy=57_rc8A93zAcqr9hz1WtWniSoZdtx+-aTiZ6AyjZmQg@mail.gmail.com>

On Tue, Aug 22, 2017 at 5:58 PM, Neal Cardwell <ncardwell@google.com> wrote:
> On Tue, Aug 22, 2017 at 1:42 AM, Akshat Kakkar <akshat.1984@gmail.com> wrote:
>> There are multiple hosts/clients. All are mainly windows based.
>>
>> Timestamp is not used as my clients mainly are windows based and in
>> that it tcp timestamp is by defauly disabled.
> ...
>> net.ipv4.tcp_tw_reuse=1
>> net.ipv4.tcp_tw_recycle=1
>
> I suspect the problem is there. The net.ipv4.tcp_tw_recycle setting
> should be 0. Running with the value 1 is known to cause buggy behavior
> related to TCP timestamps, and that feature has been removed in kernel
> v4.12.
>
> Can you please re-run your tests with net.ipv4.tcp_tw_recycle=0 or a
> newer kernel?
>
> neal

Thanks for your reply.

I understand that.

But my point is, though tcp timestamp is enabled on the server, but as
client is not using it ... so how come this _bug_ (if any) is
triggered in first place.

^ permalink raw reply

* RE: [PATCH v2 1/6] fsl/fman: enable FMan Keygen
From: Madalin-cristian Bucur @ 2017-08-23  5:16 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, Florinel Iordache
In-Reply-To: <20170822.214645.953115103494976385.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, August 23, 2017 7:47 AM
> To: Madalin-cristian Bucur <madalin.bucur@nxp.com>
> Cc: netdev@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/6] fsl/fman: enable FMan Keygen
> 
> From: Madalin-cristian Bucur <madalin.bucur@nxp.com>
> Date: Wed, 23 Aug 2017 04:36:56 +0000
> 
> > The struct fman is only visible in the fman file, the fman port
> > module uses struct fman as an opaque pointer, thus this export.
> 
> Don't use that programming model.
> 
> Export the datastructure properly to it's users.
> 
> This abstraction scheme is so wasteful and costly.

Normally does not come with this cost, it's this case where one of the
sub-modules needs to call into another that gets things complicated.
I'll move struct fman to the header file.

Thanks,
Madalin

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Ido Schimmel @ 2017-08-23  5:41 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Linux-Next Mailing List,
	Linux Kernel Mailing List, Wei Wang, Ido Schimmel, iri Pirko
In-Reply-To: <20170823113105.0bd692ea@canb.auug.org.au>

On Wed, Aug 23, 2017 at 11:31:05AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   net/ipv6/ip6_fib.c
> 
> between commit:
> 
>   c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
> 
> from the net tree and commit:
> 
>   a460aa83963b ("ipv6: fib: Add helpers to hold / drop a reference on rt6_info")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Looks good to me.

Thanks!

^ permalink raw reply

* [PATCH] dt-binding: net/phy: fix interrupts description
From: Baruch Siach @ 2017-08-23  6:11 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, David S . Miller
  Cc: devicetree, netdev, Baruch Siach

Commit b053dc5a722ea (powerpc: Refactor device tree binding) split the
Ethernet PHY binding documentation out of the big booting-without-of.txt
file, leaving a dangling reference to "section 2" in the 'interrupts'
property description. Drop that reference, and make the description look
more like the rest.

While at it, make the example interrupt-parent phandle look more like a
real world phandle, and use an IRQ_TYPE_ macro for the 'interrupts'
type.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 Documentation/devicetree/bindings/net/phy.txt | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
index b55857696fc3..86ba72af6163 100644
--- a/Documentation/devicetree/bindings/net/phy.txt
+++ b/Documentation/devicetree/bindings/net/phy.txt
@@ -2,11 +2,7 @@ PHY nodes

 Required properties:

- - interrupts : <a b> where a is the interrupt number and b is a
-   field that represents an encoding of the sense and level
-   information for the interrupt.  This should be encoded based on
-   the information in section 2) depending on the type of interrupt
-   controller you have.
+ - interrupts : interrupt specifier for the sole interrupt.
  - interrupt-parent : the phandle for the interrupt controller that
    services interrupts for this device.
  - reg : The ID number for the phy, usually a small integer
@@ -56,7 +52,7 @@ Example:

 ethernet-phy@0 {
 	compatible = "ethernet-phy-id0141.0e90", "ethernet-phy-ieee802.3-c22";
-	interrupt-parent = <40000>;
-	interrupts = <35 1>;
+	interrupt-parent = <&PIC>;
+	interrupts = <35 IRQ_TYPE_EDGE_RISING>;
 	reg = <0>;
 };
-- 
2.14.1

^ permalink raw reply related

* [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

This series tackles the bug I've already tried to fix in commit 
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work").
I created a separate workqueue to avoid possible deadlock, and
the lockdep error disappeared, coincidentally.  The way workqueues
are operating, separate workqueue doesn't necessarily mean separate
thread of execution.  Luckily we can safely forego the lock.

Second fix changes the order in which vNIC netdevs and representors
are created/destroyed.  The fix is kept small and should be sufficient
for net because of how flower uses representors, a more thorough fix 
will be targeted at net-next.

Third fix avoids leaking mapped frame buffers if FW sent a frame with
unknown portid.

Jakub Kicinski (3):
  nfp: don't hold PF lock while enabling SR-IOV
  nfp: make sure representors are destroyed before their lower netdev
  nfp: avoid buffer leak when representor is missing

 drivers/net/ethernet/netronome/nfp/nfp_main.c      | 16 +++++++---------
 .../net/ethernet/netronome/nfp/nfp_net_common.c    | 10 +++++-----
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 22 ++++++++++++++--------
 3 files changed, 26 insertions(+), 22 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH net 1/3] nfp: don't hold PF lock while enabling SR-IOV
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20170823062244.26580-1-jakub.kicinski@netronome.com>

Enabling SR-IOV VFs will cause the PCI subsystem to schedule a
work and flush its workqueue.  Since the nfp driver schedules its
own work we can't enable VFs while holding driver load.  Commit
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
tried to avoid this deadlock by creating a separate workqueue.
Unfortunately, due to the architecture of workqueue subsystem this
does not guarantee a separate thread of execution.  Luckily
we can simply take pci_enable_sriov() from under the driver lock.

Take pci_disable_sriov() from under the lock too for symmetry.

Fixes: 6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index d67969d3e484..3f199db2002e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -98,21 +98,20 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
 	struct nfp_pf *pf = pci_get_drvdata(pdev);
 	int err;
 
-	mutex_lock(&pf->lock);
-
 	if (num_vfs > pf->limit_vfs) {
 		nfp_info(pf->cpp, "Firmware limits number of VFs to %u\n",
 			 pf->limit_vfs);
-		err = -EINVAL;
-		goto err_unlock;
+		return -EINVAL;
 	}
 
 	err = pci_enable_sriov(pdev, num_vfs);
 	if (err) {
 		dev_warn(&pdev->dev, "Failed to enable PCI SR-IOV: %d\n", err);
-		goto err_unlock;
+		return err;
 	}
 
+	mutex_lock(&pf->lock);
+
 	err = nfp_app_sriov_enable(pf->app, num_vfs);
 	if (err) {
 		dev_warn(&pdev->dev,
@@ -129,9 +128,8 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
 	return num_vfs;
 
 err_sriov_disable:
-	pci_disable_sriov(pdev);
-err_unlock:
 	mutex_unlock(&pf->lock);
+	pci_disable_sriov(pdev);
 	return err;
 #endif
 	return 0;
@@ -158,10 +156,10 @@ static int nfp_pcie_sriov_disable(struct pci_dev *pdev)
 
 	pf->num_vfs = 0;
 
+	mutex_unlock(&pf->lock);
+
 	pci_disable_sriov(pdev);
 	dev_dbg(&pdev->dev, "Removed VFs.\n");
-
-	mutex_unlock(&pf->lock);
 #endif
 	return 0;
 }
-- 
2.11.0

^ permalink raw reply related

* [PATCH net 2/3] nfp: make sure representors are destroyed before their lower netdev
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20170823062244.26580-1-jakub.kicinski@netronome.com>

App start/stop callbacks can perform application initialization.
Unfortunately, flower app started using them for creating and
destroying representors.  This can lead to a situation where
lower vNIC netdev is destroyed while representors still try
to pass traffic.  This will most likely lead to a NULL-dereference
on the lower netdev TX path.

Move the start/stop callbacks, so that representors are created/
destroyed when vNICs are fully initialized.

Fixes: 5de73ee46704 ("nfp: general representor implementation")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 5797dbf2b507..1aca4e57bf41 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -456,10 +456,6 @@ static int nfp_net_pf_app_start(struct nfp_pf *pf)
 {
 	int err;
 
-	err = nfp_net_pf_app_start_ctrl(pf);
-	if (err)
-		return err;
-
 	err = nfp_app_start(pf->app, pf->ctrl_vnic);
 	if (err)
 		goto err_ctrl_stop;
@@ -484,7 +480,6 @@ static void nfp_net_pf_app_stop(struct nfp_pf *pf)
 	if (pf->num_vfs)
 		nfp_app_sriov_disable(pf->app);
 	nfp_app_stop(pf->app);
-	nfp_net_pf_app_stop_ctrl(pf);
 }
 
 static void nfp_net_pci_unmap_mem(struct nfp_pf *pf)
@@ -559,7 +554,7 @@ static int nfp_net_pci_map_mem(struct nfp_pf *pf)
 
 static void nfp_net_pci_remove_finish(struct nfp_pf *pf)
 {
-	nfp_net_pf_app_stop(pf);
+	nfp_net_pf_app_stop_ctrl(pf);
 	/* stop app first, to avoid double free of ctrl vNIC's ddir */
 	nfp_net_debugfs_dir_clean(&pf->ddir);
 
@@ -690,6 +685,7 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 {
 	struct nfp_net_fw_version fw_ver;
 	u8 __iomem *ctrl_bar, *qc_bar;
+	struct nfp_net *nn;
 	int stride;
 	int err;
 
@@ -766,7 +762,7 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 	if (err)
 		goto err_free_vnics;
 
-	err = nfp_net_pf_app_start(pf);
+	err = nfp_net_pf_app_start_ctrl(pf);
 	if (err)
 		goto err_free_irqs;
 
@@ -774,12 +770,20 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 	if (err)
 		goto err_stop_app;
 
+	err = nfp_net_pf_app_start(pf);
+	if (err)
+		goto err_clean_vnics;
+
 	mutex_unlock(&pf->lock);
 
 	return 0;
 
+err_clean_vnics:
+	list_for_each_entry(nn, &pf->vnics, vnic_list)
+		if (nfp_net_is_data_vnic(nn))
+			nfp_net_pf_clean_vnic(pf, nn);
 err_stop_app:
-	nfp_net_pf_app_stop(pf);
+	nfp_net_pf_app_stop_ctrl(pf);
 err_free_irqs:
 	nfp_net_pf_free_irqs(pf);
 err_free_vnics:
@@ -803,6 +807,8 @@ void nfp_net_pci_remove(struct nfp_pf *pf)
 	if (list_empty(&pf->vnics))
 		goto out;
 
+	nfp_net_pf_app_stop(pf);
+
 	list_for_each_entry(nn, &pf->vnics, vnic_list)
 		if (nfp_net_is_data_vnic(nn))
 			nfp_net_pf_clean_vnic(pf, nn);
-- 
2.11.0

^ permalink raw reply related

* [PATCH net 3/3] nfp: avoid buffer leak when representor is missing
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski
In-Reply-To: <20170823062244.26580-1-jakub.kicinski@netronome.com>

When driver receives a muxed frame, but it can't find the representor
netdev it is destined to it will try to "drop" that frame, i.e. reuse
the buffer.  The issue is that the replacement buffer has already been
allocated at this point, and reusing the buffer from received frame
will leak it.  Change the code to put the new buffer on the ring
earlier and not reuse the old buffer (make the buffer parameter
to nfp_net_rx_drop() a NULL).

Fixes: 91bf82ca9eed ("nfp: add support for tx/rx with metadata portid")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9f77ce038a4a..1ff0c577819e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1751,6 +1751,10 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			continue;
 		}
 
+		nfp_net_dma_unmap_rx(dp, rxbuf->dma_addr);
+
+		nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
+
 		if (likely(!meta.portid)) {
 			netdev = dp->netdev;
 		} else {
@@ -1759,16 +1763,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			nn = netdev_priv(dp->netdev);
 			netdev = nfp_app_repr_get(nn->app, meta.portid);
 			if (unlikely(!netdev)) {
-				nfp_net_rx_drop(dp, r_vec, rx_ring, rxbuf, skb);
+				nfp_net_rx_drop(dp, r_vec, rx_ring, NULL, skb);
 				continue;
 			}
 			nfp_repr_inc_rx_stats(netdev, pkt_len);
 		}
 
-		nfp_net_dma_unmap_rx(dp, rxbuf->dma_addr);
-
-		nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
-
 		skb_reserve(skb, pkt_off);
 		skb_put(skb, pkt_len);
 
-- 
2.11.0

^ permalink raw reply related

* Re: XDP redirect measurements, gotchas and tracepoints
From: Michael Chan @ 2017-08-23  6:59 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Duyck, Alexander H, john.fastabend@gmail.com, brouer@redhat.com,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, andy@greyhouse.net,
	borkmann@iogearbox.net
In-Reply-To: <CAKgT0Uf=ZfVeK-wtvNxSyGEVZ3UseUOHiP3ZOg-SrzmqsR=LtQ@mail.gmail.com>

On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>
>> Right, but it's conceivable to add an API to "return" the buffer to
>> the input device, right?
>
> You could, it is just added complexity. "just free the buffer" in
> ixgbe usually just amounts to one atomic operation to decrement the
> total page count since page recycling is already implemented in the
> driver. You still would have to unmap the buffer regardless of if you
> were recycling it or not so all you would save is 1.000015259 atomic
> operations per packet. The fraction is because once every 64K uses we
> have to bulk update the count on the page.
>

If the buffer is returned to the input device, the input device can
keep the DMA mapping.  All it needs to do is to dma_sync it back to
the input device when the buffer is returned.

^ permalink raw reply

* [PATCH RESEND v12 7/8] wireless: ipw2200: Replace PCI pool old API
From: Romain Perier @ 2017-08-23  7:16 UTC (permalink / raw)
  To: Stanislav Yakovlev, Kalle Valo
  Cc: linux-wireless, netdev, linux-kernel, Romain Perier

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier <romain.perier@collabora.com>
Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com>
Acked-by: Stanislav Yakovlev <stas.yakovlev@gmail.com>
---

Note: As requested by Kalle Valo, resend and add linux-wireless to Cc:

 drivers/net/wireless/intel/ipw2x00/ipw2200.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.c b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
index c311b1a994c1..c57c567add4d 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
@@ -3209,7 +3209,7 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 	struct fw_chunk *chunk;
 	int total_nr = 0;
 	int i;
-	struct pci_pool *pool;
+	struct dma_pool *pool;
 	void **virts;
 	dma_addr_t *phys;
 
@@ -3226,9 +3226,10 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 		kfree(virts);
 		return -ENOMEM;
 	}
-	pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+	pool = dma_pool_create("ipw2200", &priv->pci_dev->dev, CB_MAX_LENGTH, 0,
+			       0);
 	if (!pool) {
-		IPW_ERROR("pci_pool_create failed\n");
+		IPW_ERROR("dma_pool_create failed\n");
 		kfree(phys);
 		kfree(virts);
 		return -ENOMEM;
@@ -3253,7 +3254,7 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 
 		nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
 		for (i = 0; i < nr; i++) {
-			virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+			virts[total_nr] = dma_pool_alloc(pool, GFP_KERNEL,
 							 &phys[total_nr]);
 			if (!virts[total_nr]) {
 				ret = -ENOMEM;
@@ -3297,9 +3298,9 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 	}
  out:
 	for (i = 0; i < total_nr; i++)
-		pci_pool_free(pool, virts[i], phys[i]);
+		dma_pool_free(pool, virts[i], phys[i]);
 
-	pci_pool_destroy(pool);
+	dma_pool_destroy(pool);
 	kfree(phys);
 	kfree(virts);
 
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next v4] openvswitch: enable NSH support
From: Jiri Benc @ 2017-08-23  7:26 UTC (permalink / raw)
  To: Yang, Yi
  Cc: netdev@vger.kernel.org, dev@openvswitch.org, blp@ovn.org,
	e@erig.me, jan.scheurich@ericsson.com
In-Reply-To: <20170822093825.GA94865@cran64.bj.intel.com>

On Tue, 22 Aug 2017 17:38:26 +0800, Yang, Yi wrote:
> I checked GSO support code, we have two cases to handle, one case is to
> output NSH packet to VxLAN-gpe port, the other case is to output NSH packet
> to Ethernet port (tap, veth, vhost, physical eth, etc.), I don't think
> it is a good way to do GSO support in OVS module, because we can't know
> the final output port at this point, if the final output port is
> VxLAN-gpe port, it will still face GSO issue, but I didn't see vxlan
> module handles this, maybe udp tunnel did this. I think a NSH packet can
> be split into two packets, one has NSH header, another one doesn't,
> so I'm not sure how vxlan module can avoid this situation.

As I said before, by either implementing GSO support for NSH or by
segmenting in software before pushing the NSH header. In the first
case, the packet will be software segmented before being pushed to the
hardware, in the second case, it will be software segmented before
pushing the NSH header.

> For the second case, we can add a nsh GSO module in net/nsh/nsh_gso.c
> (as MPLS did in net/mpls/mpls_gso.c), that is the best way to handle
> this, what do you think about it?

Yes, that's one of the two options that we've been talking about. It's
the better and the more efficient one.

 Jiri

^ permalink raw reply

* Re: [PATCH net-next v2 10/10] arm64: dts: marvell: mcbin: enable more networking ports
From: Baruch Siach @ 2017-08-23  7:28 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, jason, andrew, gregory.clement, sebastian.hesselbarth,
	thomas.petazzoni, netdev, linux, nadavh, stefanc, mw,
	linux-arm-kernel
In-Reply-To: <20170822170830.32413-11-antoine.tenart@free-electrons.com>

Hi Antoine,

On Tue, Aug 22, 2017 at 07:08:30PM +0200, Antoine Tenart wrote:
> This patch enables the two GE/SFP ports. They are configured in 10GKR
> mode by default. To do this the cpm_xdmio is enabled as well, and two
> phy descriptions are added.
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> Tested-by: Marcin Wojtas <mw@semihalf.com>
> ---
>  arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 30 +++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> index abd39d1c1739..6cb4b000e1ac 100644
> --- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> +++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> @@ -127,6 +127,30 @@
>  	};
>  };
>  
> +&cpm_xmdio {
> +	status = "okay";
> +
> +	phy0: ethernet-phy@0 {
> +		compatible = "ethernet-phy-ieee802.3-c45";
> +		reg = <0>;
> +	};
> +
> +	phy1: ethernet-phy@1 {

Should be named 'ethernet-phy@8' to match the 'reg' property.

> +		compatible = "ethernet-phy-ieee802.3-c45";
> +		reg = <8>;
> +	};
> +};

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply

* Stable apply request [was: Bluetooth: bnep: fix possible might sleep error in bnep_session]
From: Jiri Slaby @ 2017-08-23  7:29 UTC (permalink / raw)
  To: Marcel Holtmann, Jeffy Chen, stable
  Cc: LKML, open list:BLUETOOTH DRIVERS, rvaswani, Brian Norris,
	dmitry.torokhov, Douglas Anderson, acho, Johan Hedberg,
	Network Development, David S. Miller, Gustavo F. Padovan
In-Reply-To: <0815411E-62D7-42DC-A0BF-78FD150C0949@holtmann.org>

On 06/27/2017, 07:32 PM, Marcel Holtmann wrote:
>> It looks like bnep_session has same pattern as the issue reported in
>> old rfcomm:
>>
>> 	while (1) {
>> 		set_current_state(TASK_INTERRUPTIBLE);
>> 		if (condition)
>> 			break;
>> 		// may call might_sleep here
>> 		schedule();
>> 	}
>> 	__set_current_state(TASK_RUNNING);
>>
>> Which fixed at:
>> 	dfb2fae Bluetooth: Fix nested sleeps
>>
>> So let's fix it at the same way, also follow the suggestion of:
>> https://lwn.net/Articles/628628/

...

> all 3 patches have been applied to bluetooth-next tree.

Hi,

given users are hitting it in at least 4.4 and 4.12, can we have all
three in all stables where this applies?

5da8e47d849d Bluetooth: hidp: fix possible might sleep error in
hidp_session_thread
f06d977309d0 Bluetooth: cmtp: fix possible might sleep error in cmtp_session
25717382c1dd Bluetooth: bnep: fix possible might sleep error in bnep_session

I am not sure: to stable directly or via net stable?

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: [PATCH V8 net-next 00/22] Huawei HiNIC Ethernet Driver
From: Arnd Bergmann @ 2017-08-23  7:37 UTC (permalink / raw)
  To: Aviad Krawczyk
  Cc: David Miller, Linux Kernel Mailing List, Networking, bc.y,
	victor.gissin, zhaochen6, tony.qu
In-Reply-To: <cover.1503330613.git.aviad.krawczyk@huawei.com>

On Mon, Aug 21, 2017 at 5:55 PM, Aviad Krawczyk
<aviad.krawczyk@huawei.com> wrote:
> The patch-set contains the support of the HiNIC Ethernet driver for
> hinic family of PCIE Network Interface Cards.
>
> The Huawei's PCIE HiNIC card is a new Ethernet card and hence there was
> a need of a new driver.
>
> The current driver is meant to be used for the Physical Function and there
> would soon be a support for Virtual Function and more features once the
> basic PF driver has been accepted.

Sorry I didn't comment before it got merged, but once it appeared in
linux-next I saw it and wondered why this is grouped under huawei
unlike the network drivers for the parts integrated into the hip0x
SoCs from hisilicon. Both appear to be made by Huawei's HiSilicon
subsidiary.

I did not check whether the two devices are related at all or could
share some of the source code as well, but it might be good to
move this one into the existing directory to spare confusion later.

         Arnd

^ permalink raw reply

* Re: [PATCH net-next 5/5] xdp: get tracepoints xdp_exception and xdp_redirect in sync
From: Jesper Dangaard Brouer @ 2017-08-23  7:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, John Fastabend, brouer
In-Reply-To: <599CA26D.7060003@iogearbox.net>

On Tue, 22 Aug 2017 23:30:21 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 08/22/2017 10:47 PM, Jesper Dangaard Brouer wrote:
> > Remove the net_device string name from the xdp_exception tracepoint,
> > like the xdp_redirect tracepoint.
> >
> > Align the TP_STRUCT to have common entries between these two
> > tracepoint.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >   include/trace/events/xdp.h |   24 ++++++++++++------------
> >   1 file changed, 12 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
> > index 7511bed80558..6495b0d9d5c7 100644
> > --- a/include/trace/events/xdp.h
> > +++ b/include/trace/events/xdp.h
> > @@ -31,22 +31,22 @@ TRACE_EVENT(xdp_exception,
> >   	TP_ARGS(dev, xdp, act),
> >
> >   	TP_STRUCT__entry(
> > -		__string(name, dev->name)
> >   		__array(u8, prog_tag, 8)
> >   		__field(u32, act)
> > +		__field(int, ifindex)
> >   	),
> >
> >   	TP_fast_assign(
> >   		BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(xdp->tag));
> >   		memcpy(__entry->prog_tag, xdp->tag, sizeof(xdp->tag));
> > -		__assign_str(name, dev->name);
> > -		__entry->act = act;
> > +		__entry->act		= act;
> > +		__entry->ifindex	= dev->ifindex;
> >   	),
> >  
> [...]
> >   TRACE_EVENT(xdp_redirect,
> > @@ -57,26 +57,26 @@ TRACE_EVENT(xdp_redirect,
> >   	TP_ARGS(from_index, to_index, xdp, act, err),
> >
> >   	TP_STRUCT__entry(
> > -		__field(int, from_index)
> > -		__field(int, to_index)
> >   		__array(u8, prog_tag, 8)
> >   		__field(u32, act)
> > +		__field(int, from_index)
> > +		__field(int, to_index)
> >   		__field(int, err)
> >   	),
> >
> >   	TP_fast_assign(
> >   		BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(xdp->tag));
> >   		memcpy(__entry->prog_tag, xdp->tag, sizeof(xdp->tag));
> > +		__entry->act		= act;
> >   		__entry->from_index	= from_index;
> >   		__entry->to_index	= to_index;
> > -		__entry->act = act;  
> 
> If you already get them in sync, could you also make it consistent
> that for both tracepoints in TP_ARGS() we use either ifindex
> directly or device pointer and extract it from TP_fast_assign().
> Right now, it's mixed.

I did this because, in the (bpf_redirect)_map case, I was hoping that
"from_index" could be come the index from the map.  Looking closer at
the devmap design, I cannot see how we can ever know what (the bpf_prog
thinks) is the corresponding map-ingress/from_index.

Based on that, I think we can/should use the device pointer as arg (as
you suggested). And then rename "from_index" to "ifindex", where
ifindex is the XDP ifindex the xdp_buff arrived on.

I guess we can keep the "to_index".  We also need to extend the
tracepoint args with a "map", to let the trace user tell whether the
"to_index" is an ifindex or a map-index.

I'll code this up and submit at V2.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier
From: Mickaël Salaün @ 2017-08-23  7:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski,
	Arnaldo Carvalho de Melo, Casey Schaufler, Daniel Borkmann,
	David Drysdale, David S . Miller, Eric W . Biederman,
	James Morris, Jann Horn, Jonathan Corbet, Matthew Garrett,
	Michael Kerrisk, Kees Cook, Paul Moore, Sargun Dhillon,
	Serge E . Hallyn, Shuah Khan, Tejun Heo, Thomas Graf
In-Reply-To: <20170823024452.zvizovwfd7xjucsx@ast-mbp>


[-- Attachment #1.1: Type: text/plain, Size: 6655 bytes --]


On 23/08/2017 04:44, Alexei Starovoitov wrote:
> On Mon, Aug 21, 2017 at 02:09:25AM +0200, Mickaël Salaün wrote:
>> The goal of the program subtype is to be able to have different static
>> fine-grained verifications for a unique program type.
>>
>> The struct bpf_verifier_ops gets a new optional function:
>> is_valid_subtype(). This new verifier is called at the beginning of the
>> eBPF program verification to check if the (optional) program subtype is
>> valid.
>>
>> For now, only Landlock eBPF programs are using a program subtype (see
>> next commit) but this could be used by other program types in the future.
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: David S. Miller <davem@davemloft.net>
>> Link: https://lkml.kernel.org/r/20160827205559.GA43880@ast-mbp.thefacebook.com
>> ---
>>
>> Changes since v6:
>> * rename Landlock version to ABI to better reflect its purpose
>> * fix unsigned integer checks
>> * fix pointer cast
>> * constify pointers
>> * rebase
>>
>> Changes since v5:
>> * use a prog_subtype pointer and make it future-proof
>> * add subtype test
>> * constify bpf_load_program()'s subtype argument
>> * cleanup subtype initialization
>> * rebase
>>
>> Changes since v4:
>> * replace the "status" field with "version" (more generic)
>> * replace the "access" field with "ability" (less confusing)
>>
>> Changes since v3:
>> * remove the "origin" field
>> * add an "option" field
>> * cleanup comments
>> ---
>>  include/linux/bpf.h                         |  7 ++-
>>  include/linux/filter.h                      |  2 +
>>  include/uapi/linux/bpf.h                    | 11 +++++
>>  kernel/bpf/syscall.c                        | 22 ++++++++-
>>  kernel/bpf/verifier.c                       | 17 +++++--
>>  kernel/trace/bpf_trace.c                    | 15 ++++--
>>  net/core/filter.c                           | 71 ++++++++++++++++++-----------
>>  samples/bpf/bpf_load.c                      |  3 +-
>>  samples/bpf/cookie_uid_helper_example.c     |  2 +-
>>  samples/bpf/fds_example.c                   |  2 +-
>>  samples/bpf/sock_example.c                  |  3 +-
>>  samples/bpf/test_cgrp2_attach.c             |  2 +-
>>  samples/bpf/test_cgrp2_attach2.c            |  2 +-
>>  samples/bpf/test_cgrp2_sock.c               |  2 +-
>>  tools/include/uapi/linux/bpf.h              | 11 +++++
>>  tools/lib/bpf/bpf.c                         | 10 +++-
>>  tools/lib/bpf/bpf.h                         |  5 +-
>>  tools/lib/bpf/libbpf.c                      |  4 +-
>>  tools/perf/tests/bpf.c                      |  2 +-
>>  tools/testing/selftests/bpf/test_align.c    |  2 +-
>>  tools/testing/selftests/bpf/test_tag.c      |  2 +-
>>  tools/testing/selftests/bpf/test_verifier.c | 17 ++++++-
>>  22 files changed, 158 insertions(+), 56 deletions(-)
> 
> ...
> 
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 7015116331af..0c3fadbb5a58 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -464,6 +464,8 @@ struct bpf_prog {
>>  	u32			len;		/* Number of filter blocks */
>>  	u32			jited_len;	/* Size of jited insns in bytes */
>>  	u8			tag[BPF_TAG_SIZE];
>> +	u8			has_subtype;
>> +	union bpf_prog_subtype	subtype;	/* Fine-grained verifications */
> 
> these burn a hole in very performance sensitive structure.
> Also there are bits rigth above. use them instead of u8 has_subtype?
> or can these two fields be part of bpf_prog_aux ?

OK, I'll create one bit variable and a bpf_prog_subtype field in the
bpf_prog_aux struct then.


> 
>>  	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
>>  	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
>>  	unsigned int		(*bpf_func)(const void *ctx,
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 843818dff96d..8541ab85e432 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -177,6 +177,15 @@ enum bpf_attach_type {
>>  /* Specify numa node during map creation */
>>  #define BPF_F_NUMA_NODE		(1U << 2)
>>  
>> +union bpf_prog_subtype {
>> +	struct {
>> +		__u32		abi; /* minimal ABI version, cf. user doc */
> 
> the concept of abi (version) sounds a bit weird to me.
> Why bother with it at all?
> Once the first set of patches lands the kernel as whole will have landlock feature
> with a set of helpers, actions, event types.
> Some future patches will extend the landlock feature step by step.
> This abi concept assumes that anyone who adds new helper would need
> to keep incrementing this 'abi'. What value does it give to user or to kernel?
> The users will already know that landlock is present in kernel 4.14 or whatever
> and the kernel 4.18 has more landlock features. Why bother with extra abi number?

That's right for helpers and context fields, but we can't check the use
of one field's content. The status field is intended to be a bitfield
extendable in the future. For example, one use case is to set a flag to
inform the eBPF program that it was already called with the same context
and can skip most of its check (if not related to maps). Same goes for
the FS action bitfield, one may want to add more of them. Another
example may be the check for abilities. We may want to relax/remove the
capability require to set one of them. With an ABI version, the user can
easily check if the current kernel support that.

> 
>> +		__u32		event; /* enum landlock_subtype_event */
>> +		__aligned_u64	ability; /* LANDLOCK_SUBTYPE_ABILITY_* */
>> +		__aligned_u64	option; /* LANDLOCK_SUBTYPE_OPTION_* */
>> +	} landlock_rule;
>> +} __attribute__((aligned(8)));
>> +
>>  union bpf_attr {
>>  	struct { /* anonymous struct used by BPF_MAP_CREATE command */
>>  		__u32	map_type;	/* one of enum bpf_map_type */
>> @@ -212,6 +221,8 @@ union bpf_attr {
>>  		__aligned_u64	log_buf;	/* user supplied buffer */
>>  		__u32		kern_version;	/* checked when prog_type=kprobe */
>>  		__u32		prog_flags;
>> +		__aligned_u64	prog_subtype;	/* bpf_prog_subtype address */
>> +		__u32		prog_subtype_size;
>>  	};
> 
> more general question: what is the status of security/ bits?
> I'm assuming they still need to be reviewed and explicitly acked by James, right?

Right, the review process is ongoing. :)
BTW, I'll be at Linux Security Summit (co-located with Plumbers) next
month. We'll be able to clarify some points there too.

Regards,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH net-next 2/3] net: mvpp2: unify the txq size define use
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev
In-Reply-To: <20170823074656.14595-1-antoine.tenart@free-electrons.com>

The txq size is defined by MVPP2_AGGR_TXQ_SIZE, which is sometime not
used directly but through variables. As it is a fixed value use the
define everywhere in the driver.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 4a2ed5c66bb4..0f5a2a4f9ddf 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5413,15 +5413,14 @@ static unsigned int mvpp2_tx_done(struct mvpp2_port *port, u32 cause,
 
 /* Allocate and initialize descriptors for aggr TXQ */
 static int mvpp2_aggr_txq_init(struct platform_device *pdev,
-			       struct mvpp2_tx_queue *aggr_txq,
-			       int desc_num, int cpu,
+			       struct mvpp2_tx_queue *aggr_txq, int cpu,
 			       struct mvpp2 *priv)
 {
 	u32 txq_dma;
 
 	/* Allocate memory for TX descriptors */
 	aggr_txq->descs = dma_alloc_coherent(&pdev->dev,
-				desc_num * MVPP2_DESC_ALIGNED_SIZE,
+				MVPP2_AGGR_TXQ_SIZE * MVPP2_DESC_ALIGNED_SIZE,
 				&aggr_txq->descs_dma, GFP_KERNEL);
 	if (!aggr_txq->descs)
 		return -ENOMEM;
@@ -5442,7 +5441,8 @@ static int mvpp2_aggr_txq_init(struct platform_device *pdev,
 			MVPP22_AGGR_TXQ_DESC_ADDR_OFFS;
 
 	mvpp2_write(priv, MVPP2_AGGR_TXQ_DESC_ADDR_REG(cpu), txq_dma);
-	mvpp2_write(priv, MVPP2_AGGR_TXQ_DESC_SIZE_REG(cpu), desc_num);
+	mvpp2_write(priv, MVPP2_AGGR_TXQ_DESC_SIZE_REG(cpu),
+		    MVPP2_AGGR_TXQ_SIZE);
 
 	return 0;
 }
@@ -7691,8 +7691,7 @@ static int mvpp2_init(struct platform_device *pdev, struct mvpp2 *priv)
 	for_each_present_cpu(i) {
 		priv->aggr_txqs[i].id = i;
 		priv->aggr_txqs[i].size = MVPP2_AGGR_TXQ_SIZE;
-		err = mvpp2_aggr_txq_init(pdev, &priv->aggr_txqs[i],
-					  MVPP2_AGGR_TXQ_SIZE, i, priv);
+		err = mvpp2_aggr_txq_init(pdev, &priv->aggr_txqs[i], i, priv);
 		if (err < 0)
 			return err;
 	}
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 1/3] net: define the TSO header size in net/tso.h
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev
In-Reply-To: <20170823074656.14595-1-antoine.tenart@free-electrons.com>

The TSO header size was defined in many drivers. Factorize the code and
define its size in net/tso.h.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 1 -
 drivers/net/ethernet/freescale/fec_main.c          | 1 -
 drivers/net/ethernet/marvell/mv643xx_eth.c         | 2 --
 drivers/net/ethernet/marvell/mvneta.c              | 3 ---
 include/net/tso.h                                  | 2 ++
 5 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
index 57858522c33c..67d1a3230773 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
@@ -277,7 +277,6 @@ struct snd_queue {
 	u16		xdp_free_cnt;
 	bool		is_xdp;
 
-#define	TSO_HEADER_SIZE	128
 	/* For TSO segment's header */
 	char		*tso_hdrs;
 	dma_addr_t	tso_hdrs_phys;
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index df09b254553d..56f56d6ada9c 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -226,7 +226,6 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 
 #define COPYBREAK_DEFAULT	256
 
-#define TSO_HEADER_SIZE		128
 /* Max number of allowed TCP segments for software TSO */
 #define FEC_MAX_TSO_SEGS	100
 #define FEC_MAX_SKB_DESCS	(FEC_MAX_TSO_SEGS * 2 + MAX_SKB_FRAGS)
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index 9c94ea9b2b80..fb2d533ae4ef 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -183,8 +183,6 @@ static char mv643xx_eth_driver_version[] = "1.4";
 #define DEFAULT_TX_QUEUE_SIZE	512
 #define SKB_DMA_REALIGN		((PAGE_SIZE - NET_SKB_PAD) % SMP_CACHE_BYTES)
 
-#define TSO_HEADER_SIZE		128
-
 /* Max number of allowed TCP segments for software TSO */
 #define MV643XX_MAX_TSO_SEGS 100
 #define MV643XX_MAX_SKB_DESCS (MV643XX_MAX_TSO_SEGS * 2 + MAX_SKB_FRAGS)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0aab74c2a209..35ff1ecfcff0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -281,9 +281,6 @@
  */
 #define MVNETA_RSS_LU_TABLE_SIZE	1
 
-/* TSO header size */
-#define TSO_HEADER_SIZE 128
-
 /* Max number of Rx descriptors */
 #define MVNETA_MAX_RXD 128
 
diff --git a/include/net/tso.h b/include/net/tso.h
index b7be852bfe9d..9a56c39e6d0a 100644
--- a/include/net/tso.h
+++ b/include/net/tso.h
@@ -3,6 +3,8 @@
 
 #include <net/ip.h>
 
+#define TSO_HEADER_SIZE		128
+
 struct tso_t {
 	int next_frag_idx;
 	void *data;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 0/3] net: mvpp2: software TSO support
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev

Hi all,

This series adds the s/w TSO support in the PPv2 driver, in addition to
two cosmetic commits. As stated in patch 3/3:

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).

Thanks!
Antoine

Antoine Tenart (3):
  net: define the TSO header size in net/tso.h
  net: mvpp2: unify the txq size define use
  net: mvpp2: software tso support

 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |   1 -
 drivers/net/ethernet/freescale/fec_main.c          |   1 -
 drivers/net/ethernet/marvell/mv643xx_eth.c         |   2 -
 drivers/net/ethernet/marvell/mvneta.c              |   3 -
 drivers/net/ethernet/marvell/mvpp2.c               | 182 ++++++++++++++++++---
 include/net/tso.h                                  |   2 +
 6 files changed, 164 insertions(+), 27 deletions(-)

-- 
2.13.5

^ permalink raw reply

* [PATCH net-next 3/3] net: mvpp2: software tso support
From: Antoine Tenart @ 2017-08-23  7:46 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, mw, stefanc, netdev
In-Reply-To: <20170823074656.14595-1-antoine.tenart@free-electrons.com>

The patch uses the tso API to implement the tso functionality in Marvell
PPv2 driver.

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 171 ++++++++++++++++++++++++++++++++---
 1 file changed, 157 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 0f5a2a4f9ddf..90103bce6c66 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -37,6 +37,7 @@
 #include <uapi/linux/ppp_defs.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
+#include <net/tso.h>
 
 /* RX Fifo Registers */
 #define MVPP2_RX_DATA_FIFO_SIZE_REG(port)	(0x00 + 4 * (port))
@@ -1033,6 +1034,10 @@ struct mvpp2_txq_pcpu {
 
 	/* Index of the TX DMA descriptor to be cleaned up */
 	int txq_get_index;
+
+	/* DMA buffer for TSO headers */
+	char *tso_headers;
+	dma_addr_t tso_headers_dma;
 };
 
 struct mvpp2_tx_queue {
@@ -5623,6 +5628,14 @@ static int mvpp2_txq_init(struct mvpp2_port *port,
 		txq_pcpu->reserved_num = 0;
 		txq_pcpu->txq_put_index = 0;
 		txq_pcpu->txq_get_index = 0;
+
+		txq_pcpu->tso_headers =
+			dma_alloc_coherent(port->dev->dev.parent,
+					   MVPP2_AGGR_TXQ_SIZE * TSO_HEADER_SIZE,
+					   &txq_pcpu->tso_headers_dma,
+					   GFP_KERNEL);
+		if (!txq_pcpu->tso_headers)
+			goto cleanup;
 	}
 
 	return 0;
@@ -5630,6 +5643,11 @@ static int mvpp2_txq_init(struct mvpp2_port *port,
 	for_each_present_cpu(cpu) {
 		txq_pcpu = per_cpu_ptr(txq->pcpu, cpu);
 		kfree(txq_pcpu->buffs);
+
+		dma_free_coherent(port->dev->dev.parent,
+				  MVPP2_AGGR_TXQ_SIZE * MVPP2_DESC_ALIGNED_SIZE,
+				  txq_pcpu->tso_headers,
+				  txq_pcpu->tso_headers_dma);
 	}
 
 	dma_free_coherent(port->dev->dev.parent,
@@ -5649,6 +5667,11 @@ static void mvpp2_txq_deinit(struct mvpp2_port *port,
 	for_each_present_cpu(cpu) {
 		txq_pcpu = per_cpu_ptr(txq->pcpu, cpu);
 		kfree(txq_pcpu->buffs);
+
+		dma_free_coherent(port->dev->dev.parent,
+				  MVPP2_AGGR_TXQ_SIZE * MVPP2_DESC_ALIGNED_SIZE,
+				  txq_pcpu->tso_headers,
+				  txq_pcpu->tso_headers_dma);
 	}
 
 	if (txq->descs)
@@ -6238,6 +6261,123 @@ static int mvpp2_tx_frag_process(struct mvpp2_port *port, struct sk_buff *skb,
 	return -ENOMEM;
 }
 
+static inline void mvpp2_tso_put_hdr(struct sk_buff *skb,
+				     struct net_device *dev,
+				     struct mvpp2_tx_queue *txq,
+				     struct mvpp2_tx_queue *aggr_txq,
+				     struct mvpp2_txq_pcpu *txq_pcpu,
+				     int hdr_sz)
+{
+	struct mvpp2_port *port = netdev_priv(dev);
+	struct mvpp2_tx_desc *tx_desc = mvpp2_txq_next_desc_get(aggr_txq);
+	dma_addr_t addr;
+
+	mvpp2_txdesc_txq_set(port, tx_desc, txq->id);
+	mvpp2_txdesc_size_set(port, tx_desc, hdr_sz);
+
+	addr = txq_pcpu->tso_headers_dma +
+	       txq_pcpu->txq_put_index * TSO_HEADER_SIZE;
+	mvpp2_txdesc_offset_set(port, tx_desc, addr & MVPP2_TX_DESC_ALIGN);
+	mvpp2_txdesc_dma_addr_set(port, tx_desc, addr & ~MVPP2_TX_DESC_ALIGN);
+
+	mvpp2_txdesc_cmd_set(port, tx_desc, mvpp2_skb_tx_csum(port, skb) |
+					    MVPP2_TXD_F_DESC |
+					    MVPP2_TXD_PADDING_DISABLE);
+	mvpp2_txq_inc_put(port, txq_pcpu, NULL, tx_desc);
+}
+
+static inline int mvpp2_tso_put_data(struct sk_buff *skb,
+				     struct net_device *dev, struct tso_t *tso,
+				     struct mvpp2_tx_queue *txq,
+				     struct mvpp2_tx_queue *aggr_txq,
+				     struct mvpp2_txq_pcpu *txq_pcpu,
+				     int sz, bool left, bool last)
+{
+	struct mvpp2_port *port = netdev_priv(dev);
+	struct mvpp2_tx_desc *tx_desc = mvpp2_txq_next_desc_get(aggr_txq);
+	dma_addr_t buf_dma_addr;
+
+	mvpp2_txdesc_txq_set(port, tx_desc, txq->id);
+	mvpp2_txdesc_size_set(port, tx_desc, sz);
+
+	buf_dma_addr = dma_map_single(dev->dev.parent, tso->data, sz,
+				      DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(dev->dev.parent, buf_dma_addr))) {
+		mvpp2_txq_desc_put(txq);
+		return -ENOMEM;
+	}
+
+	mvpp2_txdesc_offset_set(port, tx_desc,
+				buf_dma_addr & MVPP2_TX_DESC_ALIGN);
+	mvpp2_txdesc_dma_addr_set(port, tx_desc,
+				  buf_dma_addr & ~MVPP2_TX_DESC_ALIGN);
+
+	if (!left) {
+		mvpp2_txdesc_cmd_set(port, tx_desc, MVPP2_TXD_L_DESC);
+		if (last) {
+			mvpp2_txq_inc_put(port, txq_pcpu, skb, tx_desc);
+			return 0;
+		}
+	} else {
+		mvpp2_txdesc_cmd_set(port, tx_desc, 0);
+	}
+
+	mvpp2_txq_inc_put(port, txq_pcpu, NULL, tx_desc);
+	return 0;
+}
+
+static int mvpp2_tx_tso(struct sk_buff *skb, struct net_device *dev,
+			struct mvpp2_tx_queue *txq,
+			struct mvpp2_tx_queue *aggr_txq,
+			struct mvpp2_txq_pcpu *txq_pcpu)
+{
+	struct mvpp2_port *port = netdev_priv(dev);
+	struct tso_t tso;
+	int hdr_sz = skb_transport_offset(skb) + tcp_hdrlen(skb);
+	int i, len, descs = 0;
+
+	/* Check number of available descriptors */
+	if (mvpp2_aggr_desc_num_check(port->priv, aggr_txq,
+				      tso_count_descs(skb)) ||
+	    mvpp2_txq_reserved_desc_num_proc(port->priv, txq, txq_pcpu,
+					     tso_count_descs(skb)))
+		return 0;
+
+	tso_start(skb, &tso);
+	len = skb->len - hdr_sz;
+	while (len > 0) {
+		int left = min_t(int, skb_shinfo(skb)->gso_size, len);
+		char *hdr = txq_pcpu->tso_headers +
+			    txq_pcpu->txq_put_index * TSO_HEADER_SIZE;
+
+		len -= left;
+		descs++;
+
+		tso_build_hdr(skb, hdr, &tso, left, len == 0);
+		mvpp2_tso_put_hdr(skb, dev, txq, aggr_txq, txq_pcpu, hdr_sz);
+
+		while (left > 0) {
+			int sz = min_t(int, tso.size, left);
+			left -= sz;
+			descs++;
+
+			if (mvpp2_tso_put_data(skb, dev, &tso, txq, aggr_txq,
+					       txq_pcpu, sz, left, len == 0))
+				goto release;
+			tso_build_data(skb, &tso, sz);
+		}
+	}
+
+	return descs;
+
+release:
+	for (i = descs - 1; i >= 0; i--) {
+		struct mvpp2_tx_desc *tx_desc = txq->descs + i;
+		tx_desc_unmap_put(port, txq, tx_desc);
+	}
+	return 0;
+}
+
 /* Main tx processing */
 static int mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
 {
@@ -6255,6 +6395,10 @@ static int mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
 	txq_pcpu = this_cpu_ptr(txq->pcpu);
 	aggr_txq = &port->priv->aggr_txqs[smp_processor_id()];
 
+	if (skb_is_gso(skb)) {
+		frags = mvpp2_tx_tso(skb, dev, txq, aggr_txq, txq_pcpu);
+		goto out;
+	}
 	frags = skb_shinfo(skb)->nr_frags + 1;
 
 	/* Check number of available descriptors */
@@ -6304,22 +6448,21 @@ static int mvpp2_tx(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
-	txq_pcpu->reserved_num -= frags;
-	txq_pcpu->count += frags;
-	aggr_txq->count += frags;
-
-	/* Enable transmit */
-	wmb();
-	mvpp2_aggr_txq_pend_desc_add(port, frags);
-
-	if (txq_pcpu->size - txq_pcpu->count < MAX_SKB_FRAGS + 1) {
-		struct netdev_queue *nq = netdev_get_tx_queue(dev, txq_id);
-
-		netif_tx_stop_queue(nq);
-	}
 out:
 	if (frags > 0) {
 		struct mvpp2_pcpu_stats *stats = this_cpu_ptr(port->stats);
+		struct netdev_queue *nq = netdev_get_tx_queue(dev, txq_id);
+
+		txq_pcpu->reserved_num -= frags;
+		txq_pcpu->count += frags;
+		aggr_txq->count += frags;
+
+		/* Enable transmit */
+		wmb();
+		mvpp2_aggr_txq_pend_desc_add(port, frags);
+
+		if (txq_pcpu->size - txq_pcpu->count < MAX_SKB_FRAGS + 1)
+			netif_tx_stop_queue(nq);
 
 		u64_stats_update_begin(&stats->syncp);
 		stats->tx_packets++;
@@ -7496,7 +7639,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		}
 	}
 
-	features = NETIF_F_SG | NETIF_F_IP_CSUM;
+	features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
 	dev->features = features | NETIF_F_RXCSUM;
 	dev->hw_features |= features | NETIF_F_RXCSUM | NETIF_F_GRO;
 	dev->vlan_features |= features;
-- 
2.13.5

^ permalink raw reply related

* RE: [PATCH v2 net-next 1/3] ipv6: Prevent unexpected sk->sk_prot changes
From: Ilya Lesokhin @ 2017-08-23  7:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, davem@davemloft.net, davejwatson@fb.com,
	Aviad Yehezkel, Boris Pismenny
In-Reply-To: <1502808336.4936.78.camel@edumazet-glaptop3.roam.corp.google.com>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, August 15, 2017 5:46 PM
> To: Boris Pismenny <borisp@mellanox.com>
> Cc: Ilya Lesokhin <ilyal@mellanox.com>; netdev@vger.kernel.org;
> davem@davemloft.net; davejwatson@fb.com; Aviad Yehezkel
> <aviadye@mellanox.com>
> Subject: Re: [PATCH v2 net-next 1/3] ipv6: Prevent unexpected sk->sk_prot
> changes
> 
> I am just saying IPV6_ADDRFORM is becoming spaghetti code, and maybe
> this is time to make it modular. 
> 

Hi Eric,
I understand your concern, but I would like to postpone implementing this
feature until the ulp infrastructure stabilizes.
I don't even know how many users want to use IPV6_ADDRFORM on sockets with ulp.
I think that simply returning an error for ulp sockets in the meantime is a reasonable 
compromise.

what do you think about the patch below?

Subject: [PATCH 1/1] ipv6: Disable IPV6_ADDRFORM on sockets with ulp attached

The IPV6_ADDRFORM setsockopt works by overriding sk->sk_prot.
The current KTLS ulp also overrides sk->sk_prot.
Consequently, using IPV6_ADDRFORM when there is a ulp attached is
unsafe and this patch disables it.

Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
---
 net/ipv6/ipv6_sockglue.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 02d795f..d935948 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -185,8 +185,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 					retv = -EBUSY;
 					break;
 				}
-			} else if (sk->sk_protocol != IPPROTO_TCP)
+			} else if (sk->sk_protocol == IPPROTO_TCP) {
+				if (inet_csk(sk)->icsk_ulp_ops)
+					break;
+			} else {
 				break;
+			}
 
 			if (sk->sk_state != TCP_ESTABLISHED) {
 				retv = -ENOTCONN;


Thanks,
Ilya

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox