Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 00/13] Refactor classifier API to work with Qdisc/blocks without rtnl lock
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Vlad Buslov (13):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()
  net: sched: extend tcf_block with rcu
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_get() and tcf_block_put()
  net: sched: use reference counting for tcf blocks on rules update
  net: sched: add flags to Qdisc class ops struct

 include/linux/rtnetlink.h |   6 +
 include/net/pkt_cls.h     |  36 +++---
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  28 ++++-
 net/core/rtnetlink.c      |   6 +
 net/sched/cls_api.c       | 281 ++++++++++++++++++++++++++++++++--------------
 net/sched/sch_api.c       |  24 +++-
 net/sched/sch_atm.c       |  14 +--
 net/sched/sch_cake.c      |   4 +-
 net/sched/sch_cbq.c       |  15 +--
 net/sched/sch_cbs.c       |   2 +-
 net/sched/sch_drr.c       |   8 +-
 net/sched/sch_dsmark.c    |   6 +-
 net/sched/sch_fifo.c      |   2 +-
 net/sched/sch_fq_codel.c  |   4 +-
 net/sched/sch_generic.c   |  48 ++++++--
 net/sched/sch_hfsc.c      |  13 ++-
 net/sched/sch_htb.c       |  17 +--
 net/sched/sch_ingress.c   |  15 +--
 net/sched/sch_mq.c        |   4 +-
 net/sched/sch_mqprio.c    |   4 +-
 net/sched/sch_multiq.c    |  10 +-
 net/sched/sch_netem.c     |   2 +-
 net/sched/sch_prio.c      |  10 +-
 net/sched/sch_qfq.c       |   8 +-
 net/sched/sch_red.c       |   4 +-
 net/sched/sch_sfb.c       |   8 +-
 net/sched/sch_sfq.c       |   4 +-
 net/sched/sch_tbf.c       |   4 +-
 29 files changed, 394 insertions(+), 194 deletions(-)

-- 
2.7.5

^ permalink raw reply

* [PATCH net-next 04/13] net: sched: add helper function to take reference to Qdisc
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Implement function to take reference to Qdisc that relies on rcu read lock
instead of rtnl mutex. Function only takes reference to Qdisc if reference
counter isn't zero. Intended to be used by unlocked cls API.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 239c73f29471..f878afa58be4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
 	refcount_inc(&qdisc->refcnt);
 }
 
+/* Intended to be used by unlocked users, when concurrent qdisc release is
+ * possible.
+ */
+
+static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN)
+		return qdisc;
+	if (refcount_inc_not_zero(&qdisc->refcnt))
+		return qdisc;
+	return NULL;
+}
+
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK)
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/rtnetlink.h |  5 +++++
 include/net/pkt_sched.h   |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/sch_api.c       | 18 ++++++++++++++++++
 net/sched/sch_generic.c   | 25 ++++++++++++++++++++++++-
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dffbf665c086..d3dff3e41e6c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -84,6 +84,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
 	return rtnl_dereference(dev->ingress_queue);
 }
 
+static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device *dev)
+{
+	return rcu_dereference(dev->ingress_queue);
+}
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
 
 #ifdef CONFIG_NET_INGRESS
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7dc769e5452b..a16fbe9a2a67 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -102,6 +102,7 @@ int qdisc_set_default(const char *id);
 void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
 					struct nlattr *tab,
 					struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 18e22a5a6550..239c73f29471 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -90,6 +90,7 @@ struct Qdisc {
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 	int			padded;
 	refcount_t		refcnt;
+	struct rcu_head		rcu;
 
 	/*
 	 * For performance sake on SMP, we put highly modified fields at the end
@@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
+void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
 			       unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 836b32e6e8e8..8854c9b674b8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -315,6 +315,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 	return q;
 }
 
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
+{
+	struct Qdisc *q;
+	struct netdev_queue *nq;
+
+	if (!handle)
+		return NULL;
+	q = qdisc_match_from_root(dev->qdisc, handle);
+	if (q)
+		goto out;
+
+	nq = dev_ingress_queue_rcu(dev);
+	if (nq)
+		q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+out:
+	return q;
+}
+
 static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 {
 	unsigned long cl;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index bb778485ed88..2176fe9db750 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc)
 	kfree((char *) qdisc - qdisc->padded);
 }
 
+void qdisc_free_cb(struct rcu_head *head)
+{
+	struct Qdisc *q = container_of(head, struct Qdisc, rcu);
+
+	qdisc_free(q);
+}
+
 static void qdisc_destroy(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops  *ops = qdisc->ops;
@@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
 		kfree_skb_list(skb);
 	}
 
-	qdisc_free(qdisc);
+	call_rcu(&qdisc->rcu, qdisc_free_cb);
 }
 
 void qdisc_put(struct Qdisc *qdisc)
@@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_put);
 
+/* Version of qdisc_put() that is called with rtnl mutex unlocked.
+ * Intended to be used as optimization, this function only takes rtnl lock if
+ * qdisc reference counter reached zero.
+ */
+
+void qdisc_put_unlocked(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN ||
+	    !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
+		return;
+
+	qdisc_destroy(qdisc);
+	rtnl_unlock();
+}
+EXPORT_SYMBOL(qdisc_put_unlocked);
+
 /* Attach toplevel qdisc to device queue. */
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc)
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 01/13] net: core: netlink: add helper refcount dec and lock function
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/rtnetlink.h | 1 +
 net/core/rtnetlink.c      | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5225832bd6ff..dffbf665c086 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -34,6 +34,7 @@ extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
+extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
 extern wait_queue_head_t netdev_unregistering_wq;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 60c928894a78..4ea0b1413076 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -130,6 +130,12 @@ int rtnl_is_locked(void)
 }
 EXPORT_SYMBOL(rtnl_is_locked);
 
+bool refcount_dec_and_rtnl_lock(refcount_t *r)
+{
+	return refcount_dec_and_mutex_lock(r, &rtnl_mutex);
+}
+EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);
+
 #ifdef CONFIG_PROVE_LOCKING
 bool lockdep_rtnl_is_held(void)
 {
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 06/13] net: sched: change tcf block reference counter type to refcount_t
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c       | 59 ++++++++++++++++++++++++++++-------------------
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f878afa58be4..825e2bf6c5c3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -345,7 +345,7 @@ struct tcf_chain {
 struct tcf_block {
 	struct list_head chain_list;
 	u32 index; /* block index for shared blocks */
-	unsigned int refcnt;
+	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
 	struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cfa4a02a6a1a..c3c7d4e2f84c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 	if (!chain->index)
 		block->chain0.chain = NULL;
 	kfree(chain);
-	if (list_empty(&block->chain_list) && block->refcnt == 0)
+	if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
 		kfree(block);
 }
 
@@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	INIT_LIST_HEAD(&block->owner_list);
 	INIT_LIST_HEAD(&block->chain0.filter_chain_list);
 
-	block->refcnt = 1;
+	refcount_set(&block->refcnt, 1);
 	block->net = net;
 	block->index = block_index;
 
@@ -719,7 +719,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 		/* block_index not 0 means the shared block is requested */
 		block = tcf_block_lookup(net, ei->block_index);
 		if (block)
-			block->refcnt++;
+			refcount_inc(&block->refcnt);
 	}
 
 	if (!block) {
@@ -762,7 +762,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 err_block_insert:
 		kfree(block);
 	} else {
-		block->refcnt--;
+		refcount_dec(&block->refcnt);
 	}
 	return err;
 }
@@ -802,34 +802,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 	tcf_chain0_head_change_cb_del(block, ei);
 	tcf_block_owner_del(block, q, ei->binder_type);
 
-	if (block->refcnt == 1) {
-		if (tcf_block_shared(block))
-			tcf_block_remove(block, block->net);
-
-		/* Hold a refcnt for all chains, so that they don't disappear
-		 * while we are iterating.
+	if (refcount_dec_and_test(&block->refcnt)) {
+		/* Flushing/putting all chains will cause the block to be
+		 * deallocated when last chain is freed. However, if chain_list
+		 * is empty, block has to be manually deallocated. After block
+		 * reference counter reached 0, it is no longer possible to
+		 * increment it or add new chains to block.
 		 */
-		list_for_each_entry(chain, &block->chain_list, list)
-			tcf_chain_hold(chain);
+		bool free_block = list_empty(&block->chain_list);
 
-		list_for_each_entry(chain, &block->chain_list, list)
-			tcf_chain_flush(chain);
-	}
+		if (tcf_block_shared(block))
+			tcf_block_remove(block, block->net);
 
-	tcf_block_offload_unbind(block, q, ei);
+		if (!free_block) {
+			/* Hold a refcnt for all chains, so that they don't
+			 * disappear while we are iterating.
+			 */
+			list_for_each_entry(chain, &block->chain_list, list)
+				tcf_chain_hold(chain);
 
-	if (block->refcnt == 1) {
-		/* At this point, all the chains should have refcnt >= 1. */
-		list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
-			tcf_chain_put_explicitly_created(chain);
-			tcf_chain_put(chain);
+			list_for_each_entry(chain, &block->chain_list, list)
+				tcf_chain_flush(chain);
 		}
 
-		block->refcnt--;
-		if (list_empty(&block->chain_list))
+		tcf_block_offload_unbind(block, q, ei);
+
+		if (free_block) {
 			kfree(block);
+		} else {
+			/* At this point, all the chains should have
+			 * refcnt >= 1.
+			 */
+			list_for_each_entry_safe(chain, tmp, &block->chain_list,
+						 list) {
+				tcf_chain_put_explicitly_created(chain);
+				tcf_chain_put(chain);
+			}
+		}
 	} else {
-		block->refcnt--;
+		tcf_block_offload_unbind(block, q, ei);
 	}
 }
 EXPORT_SYMBOL(tcf_block_put_ext);
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 05/13] net: sched: use Qdisc rcu API instead of relying on rtnl lock
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 88 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1a67af8a6e8c..cfa4a02a6a1a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -527,6 +527,17 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
 	return idr_find(&tn->idr, block_index);
 }
 
+static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
+{
+	if (!q)
+		return;
+
+	if (rtnl_held)
+		qdisc_put(q);
+	else
+		qdisc_put_unlocked(q);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -537,6 +548,7 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 					struct netlink_ext_ack *extack)
 {
 	struct tcf_block *block;
+	int err = 0;
 
 	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
 		block = tcf_block_lookup(net, block_index);
@@ -548,55 +560,91 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 		const struct Qdisc_class_ops *cops;
 		struct net_device *dev;
 
+		rcu_read_lock();
+
 		/* Find link */
-		dev = __dev_get_by_index(net, ifindex);
-		if (!dev)
+		dev = dev_get_by_index_rcu(net, ifindex);
+		if (!dev) {
+			rcu_read_unlock();
 			return ERR_PTR(-ENODEV);
+		}
 
 		/* Find qdisc */
 		if (!*parent) {
 			*q = dev->qdisc;
 			*parent = (*q)->handle;
 		} else {
-			*q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+			*q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
 			if (!*q) {
 				NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
-				return ERR_PTR(-EINVAL);
+				err = -EINVAL;
+				goto errout_rcu;
 			}
 		}
 
+		*q = qdisc_refcount_inc_nz(*q);
+		if (!*q) {
+			NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+			err = -EINVAL;
+			goto errout_rcu;
+		}
+
 		/* Is it classful? */
 		cops = (*q)->ops->cl_ops;
 		if (!cops) {
 			NL_SET_ERR_MSG(extack, "Qdisc not classful");
-			return ERR_PTR(-EINVAL);
+			err = -EINVAL;
+			goto errout_rcu;
 		}
 
 		if (!cops->tcf_block) {
 			NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-			return ERR_PTR(-EOPNOTSUPP);
+			err = -EOPNOTSUPP;
+			goto errout_rcu;
 		}
 
+		/* At this point we know that qdisc is not noop_qdisc,
+		 * which means that qdisc holds a reference to net_device
+		 * and we hold a reference to qdisc, so it is safe to release
+		 * rcu read lock.
+		 */
+		rcu_read_unlock();
+
 		/* Do we search for filter, attached to class? */
 		if (TC_H_MIN(*parent)) {
 			*cl = cops->find(*q, *parent);
 			if (*cl == 0) {
 				NL_SET_ERR_MSG(extack, "Specified class doesn't exist");
-				return ERR_PTR(-ENOENT);
+				err = -ENOENT;
+				goto errout_qdisc;
 			}
 		}
 
 		/* And the last stroke */
 		block = cops->tcf_block(*q, *cl, extack);
-		if (!block)
-			return ERR_PTR(-EINVAL);
+		if (!block) {
+			err = -EINVAL;
+			goto errout_qdisc;
+		}
 		if (tcf_block_shared(block)) {
 			NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters");
-			return ERR_PTR(-EOPNOTSUPP);
+			err = -EOPNOTSUPP;
+			goto errout_qdisc;
 		}
 	}
 
 	return block;
+
+errout_rcu:
+	rcu_read_unlock();
+errout_qdisc:
+	tcf_qdisc_put(*q, true);
+	return ERR_PTR(err);
+}
+
+static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
+{
+	tcf_qdisc_put(q, true);
 }
 
 struct tcf_block_owner_item {
@@ -1332,6 +1380,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
@@ -1453,6 +1502,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	return err;
 }
 
@@ -1538,6 +1588,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	return err;
 }
 
@@ -1854,7 +1905,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 	chain_index = tca[TCA_CHAIN] ? nla_get_u32(tca[TCA_CHAIN]) : 0;
 	if (chain_index > TC_ACT_EXT_VAL_MASK) {
 		NL_SET_ERR_MSG(extack, "Specified chain index exceeds upper limit");
-		return -EINVAL;
+		err = -EINVAL;
+		goto errout_block;
 	}
 	chain = tcf_chain_lookup(block, chain_index);
 	if (n->nlmsg_type == RTM_NEWCHAIN) {
@@ -1866,23 +1918,27 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 				tcf_chain_hold(chain);
 			} else {
 				NL_SET_ERR_MSG(extack, "Filter chain already exists");
-				return -EEXIST;
+				err = -EEXIST;
+				goto errout_block;
 			}
 		} else {
 			if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 				NL_SET_ERR_MSG(extack, "Need both RTM_NEWCHAIN and NLM_F_CREATE to create a new chain");
-				return -ENOENT;
+				err = -ENOENT;
+				goto errout_block;
 			}
 			chain = tcf_chain_create(block, chain_index);
 			if (!chain) {
 				NL_SET_ERR_MSG(extack, "Failed to create filter chain");
-				return -ENOMEM;
+				err = -ENOMEM;
+				goto errout_block;
 			}
 		}
 	} else {
 		if (!chain || tcf_chain_held_by_acts_only(chain)) {
 			NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
-			return -EINVAL;
+			err = -EINVAL;
+			goto errout_block;
 		}
 		tcf_chain_hold(chain);
 	}
@@ -1924,6 +1980,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 
 errout:
 	tcf_chain_put(chain);
+errout_block:
+	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 07/13] net: sched: implement functions to put and flush all chains
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 55 +++++++++++++++++++++++++++++------------------------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c3c7d4e2f84c..58b2d8443f6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -538,6 +538,31 @@ static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 		qdisc_put_unlocked(q);
 }
 
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+	struct tcf_chain *chain;
+
+	/* Hold a refcnt for all chains, so that they don't disappear
+	 * while we are iterating.
+	 */
+	list_for_each_entry(chain, &block->chain_list, list)
+		tcf_chain_hold(chain);
+
+	list_for_each_entry(chain, &block->chain_list, list)
+		tcf_chain_flush(chain);
+}
+
+static void tcf_block_put_all_chains(struct tcf_block *block)
+{
+	struct tcf_chain *chain, *tmp;
+
+	/* At this point, all the chains should have refcnt >= 1. */
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
+		tcf_chain_put_explicitly_created(chain);
+		tcf_chain_put(chain);
+	}
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,8 +820,6 @@ EXPORT_SYMBOL(tcf_block_get);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei)
 {
-	struct tcf_chain *chain, *tmp;
-
 	if (!block)
 		return;
 	tcf_chain0_head_change_cb_del(block, ei);
@@ -813,32 +836,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 
 		if (tcf_block_shared(block))
 			tcf_block_remove(block, block->net);
-
-		if (!free_block) {
-			/* Hold a refcnt for all chains, so that they don't
-			 * disappear while we are iterating.
-			 */
-			list_for_each_entry(chain, &block->chain_list, list)
-				tcf_chain_hold(chain);
-
-			list_for_each_entry(chain, &block->chain_list, list)
-				tcf_chain_flush(chain);
-		}
-
+		if (!free_block)
+			tcf_block_flush_all_chains(block);
 		tcf_block_offload_unbind(block, q, ei);
 
-		if (free_block) {
+		if (free_block)
 			kfree(block);
-		} else {
-			/* At this point, all the chains should have
-			 * refcnt >= 1.
-			 */
-			list_for_each_entry_safe(chain, tmp, &block->chain_list,
-						 list) {
-				tcf_chain_put_explicitly_created(chain);
-				tcf_chain_put(chain);
-			}
-		}
+		else
+			tcf_block_put_all_chains(block);
 	} else {
 		tcf_block_offload_unbind(block, q, ei);
 	}
-- 
2.7.5

^ permalink raw reply related

* Re: [PATCH net-next v2 0/9] rtnetlink: add IFA_TARGET_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-09-06 12:33 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-kernel, kuznet, yoshfuji, pombredanne, kstewart,
	gregkh, dsahern, fw, ktkhai, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel
In-Reply-To: <20180905.222750.249661617396939609.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

On Wed, Sep 05, 2018 at 10:27:50PM -0700, David Miller wrote:
> From: Christian Brauner <christian@brauner.io>
> Date: Tue,  4 Sep 2018 21:53:46 +0200
> 
> > # v2 introduction:
>  ...
> 
> This looks good to me, series applied.
> 
> Thank you!

Thanks you!

Christian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net 2/3] tls: clear key material from kernel memory when do_tls_setsockopt_conf fails
From: Sabrina Dubroca @ 2018-09-06  7:49 UTC (permalink / raw)
  To: Boris Pismenny; +Cc: netdev, Ilya Lesokhin, Aviad Yehezkel, Dave Watson
In-Reply-To: <567f747b-e817-4a9b-1768-96575caea845@mellanox.com>

2018-09-05, 16:53:54 +0300, Boris Pismenny wrote:
> Hi Sabrina,
> 
> On 9/5/2018 4:21 PM, Sabrina Dubroca wrote:
> > Fixes: 3c4d7559159b ("tls: kernel TLS support")
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> >   net/tls/tls_main.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index 180b6640e531..0d432d025471 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> > @@ -499,7 +499,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
> >   	goto out;
> >   err_crypto_info:
> > -	memset(crypto_info, 0, sizeof(*crypto_info));
> > +	memzero_explicit(crypto_info, sizeof(struct tls12_crypto_info_aes_gcm_128));
> 
> Besides the key, there are other (not secret) information in
> tls12_crypto_info_aes_gcm_128. I'd prefer you do not delete it to enable
> users to obtain it (using getsockopt) in case we decide to implement a
> fallback to userspace in the future. Such a fallback must obtain the
> kernel's iv, and record sequence number.

The operation failed. There's no reason for this stale data to remain
in the kernel. And you couldn't recover it with getsockopt, because
you'll hit !TLS_CRYPTO_INFO_READY, since we're already resetting
tls_crypto_info. Resetting tls_crypto_info on failure is necessary,
otherwise your socket will be in a broken state, with TLS not setup
but unable to perform a new attempt.

Userspace already has all this information anyway, since it passed it
to the kernel, so why would it need to recover it from the kernel?

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net 3/3] tls: zero the crypto information from tls_context before freeing
From: Sabrina Dubroca @ 2018-09-06  7:42 UTC (permalink / raw)
  To: Vakul Garg
  Cc: netdev@vger.kernel.org, Boris Pismenny, Ilya Lesokhin,
	Aviad Yehezkel, Dave Watson
In-Reply-To: <DB7PR04MB425294A2EA0DA1170CC2C33E8B020@DB7PR04MB4252.eurprd04.prod.outlook.com>

2018-09-05, 13:48:48 +0000, Vakul Garg wrote:
> 
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > Behalf Of Sabrina Dubroca
> > Sent: Wednesday, September 5, 2018 6:52 PM
> > To: netdev@vger.kernel.org
> > Cc: Sabrina Dubroca <sd@queasysnail.net>; Boris Pismenny
> > <borisp@mellanox.com>; Ilya Lesokhin <ilyal@mellanox.com>; Aviad
> > Yehezkel <aviadye@mellanox.com>; Dave Watson <davejwatson@fb.com>
> > Subject: [PATCH net 3/3] tls: zero the crypto information from tls_context
> > before freeing
> > 
> > This contains key material in crypto_send_aes_gcm_128 and
> > crypto_recv_aes_gcm_128.
> > 
> > Fixes: 3c4d7559159b ("tls: kernel TLS support")
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> >  include/net/tls.h  |  1 +
> >  net/tls/tls_main.c | 14 ++++++++++++--
> >  2 files changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/tls.h b/include/net/tls.h index
> > d5c683e8bb22..2010d23112f9 100644
> > --- a/include/net/tls.h
> > +++ b/include/net/tls.h
> > @@ -180,6 +180,7 @@ struct tls_context {
> >  		struct tls_crypto_info crypto_recv;
> >  		struct tls12_crypto_info_aes_gcm_128
> > crypto_recv_aes_gcm_128;
> >  	};
> > +	char tls_crypto_ctx_end[0];
> 
> This makes the key zeroization dependent upon the position of unions
> in structure.

Yes. I could add char tls_crypto_ctx_start[0].

> Can you zeroise the crypto_send, crypto_recv separately using two
> memzero_explicit calls?

It's not crypto_send, it's crypto_send_aes_gcm_128. I don't know how
likely it is that another crypto algorithm will ever be added, but
then you'd have to switch to zeroing that thing.

I had a different version of the patch that gave a name to those
unions, but then I need to change all the references everywhere and
the patch becomes a bit ugly, especially for net.

struct tls_context {
	union {
		struct tls_crypto_info info;
		struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
	} crypto_send;
	union {
		struct tls_crypto_info info;
		struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
	} crypto_recv;

	[...]
}


Or even:

union tls_crypto_context {
	struct tls_crypto_info info;
	struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
};

struct tls_context {
	union tls_crypto_context crypto_send;
	union tls_crypto_context crypto_recv;

	[...]
}


-- 
Sabrina

^ permalink raw reply

* Re: [PATCH v3 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Arseny Maslennikov @ 2018-09-06  7:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180905164727.3108bce6@shemminger-XPS-13-9360>

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

On Wed, Sep 05, 2018 at 04:47:27PM +0100, Stephen Hemminger wrote:
> On Mon,  3 Sep 2018 19:13:16 +0300
> Arseny Maslennikov <ar@cs.msu.ru> wrote:
> 
> > +	if (ndev->dev_id == ndev->dev_port) {
> > +		netdev_info_once(ndev,
> > +			"\"%s\" wants to know my dev_id. "
> > +			"Should it look at dev_port instead?\n",
> > +			current->comm);
> > +		netdev_info_once(ndev,
> > +			"See Documentation/ABI/testing/sysfs-class-net for more info.\n");
> > +	}
> 
> Single line message is sufficient.
> Also don't break strings in messages.
> 

OK, will fix in v4.


(Sorry if the following is too off-topic here)
Multi-line messages in separate printk calls can be racy, I get that.
But I'd like to hear some reasoning behind the style decision to not
break a long string into many string literals. (I'll most certainly not
be alone in this, Documentation/process/ does not mention reasons, only
the requirements themselves)

The only drawback I currently see is that breaking a long message into
multiple string literals makes it impossible to git grep the kernel tree
for the whole message text.
However, splitting a long line this way allows us to nicely wrap the
code at 80 columns, which is a readability boon.

Are there any other reasons to avoid that? Except maybe matters of taste. :)

> > +	}
> > +
> > +	ret = sprintf(buf, "%#x\n", ndev->dev_id);
> > +
> > +	return ret;
> 
> Why not?
> 	return sprintf...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Arseny Maslennikov @ 2018-09-06  7:04 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180905135035.GT2977@mtr-leonro.mtl.com>

[-- Attachment #1: Type: text/plain, Size: 3311 bytes --]

On Wed, Sep 05, 2018 at 04:50:35PM +0300, Leon Romanovsky wrote:
> On Mon, Sep 03, 2018 at 07:13:16PM +0300, Arseny Maslennikov wrote:
> > Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 38 +++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 30f840f874b3..7386e5bde3d3 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -2386,6 +2386,42 @@ int ipoib_add_pkey_attr(struct net_device *dev)
> >  	return device_create_file(&dev->dev, &dev_attr_pkey);
> >  }
> >
> > +/*
> > + * We erroneously exposed the iface's port number in the dev_id
> > + * sysfs field long after dev_port was introduced for that purpose[1],
> > + * and we need to stop everyone from relying on that.
> > + * Let's overload the shower routine for the dev_id file here
> > + * to gently bring the issue up.
> > + *
> > + * [1] https://www.spinics.net/lists/netdev/msg272123.html
> > + */
> > +static ssize_t dev_id_show(struct device *dev,
> > +			   struct device_attribute *attr, char *buf)
> > +{
> > +	struct net_device *ndev = to_net_dev(dev);
> > +	ssize_t ret = -EINVAL;
> > +
> > +	if (ndev->dev_id == ndev->dev_port) {
> > +		netdev_info_once(ndev,
> > +			"\"%s\" wants to know my dev_id. "
> > +			"Should it look at dev_port instead?\n",
> > +			current->comm);
> > +		netdev_info_once(ndev,
> > +			"See Documentation/ABI/testing/sysfs-class-net for more info.\n");
> > +	}
> > +
> > +	ret = sprintf(buf, "%#x\n", ndev->dev_id);
> > +
> > +	return ret;
> > +}
> > +static DEVICE_ATTR_RO(dev_id);
> > +
> 
> I don't see this field among exposed by IPoIB, why should we expose it now?
> 

To deviate from standard netdev behaviour, which only prints the
field out. Doug wanted this to also print a deprecation message, and
netdev (obviously) does not do that. See below.

> > +int ipoib_intercept_dev_id_attr(struct net_device *dev)
> > +{
> > +	device_remove_file(&dev->dev, &dev_attr_dev_id);
> > +	return device_create_file(&dev->dev, &dev_attr_dev_id);
> 
> Why isn't enough to rely on netdev code?
> 

Netdev code relies on macros around a *static* function 'netdev_show',
which is defined in net/core/net-sysfs.c; it is not listed in any header
files, and the macros aren't as well. This all leads me to believe it
was not really meant to be used from outside net/core/net-sysfs.

The only way we could use any netdev code here is to set up our own
handler (again), printk() a message, then call netdev_show — but we have
no access to it.

Of course, it also may be that I'm terribly missing a clue.

> > +}
> > +
> >  static struct net_device *ipoib_add_port(const char *format,
> >  					 struct ib_device *hca, u8 port)
> >  {
> > @@ -2427,6 +2463,8 @@ static struct net_device *ipoib_add_port(const char *format,
> >  	 */
> >  	ndev->priv_destructor = ipoib_intf_free;
> >
> > +	if (ipoib_intercept_dev_id_attr(ndev))
> > +		goto sysfs_failed;
> >  	if (ipoib_cm_add_mode_attr(ndev))
> >  		goto sysfs_failed;
> >  	if (ipoib_add_pkey_attr(ndev))
> > --
> > 2.19.0.rc1
> >



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* RE: smsc95xx: Invalid max MTU
From: RaghuramChary.Jallipalli @ 2018-09-06  6:26 UTC (permalink / raw)
  To: stefan.wahren, UNGLinuxDriver, Woojung.Huh; +Cc: davem, netdev
In-Reply-To: <1102700056.29198.1536169142144@email.1und1.de>

> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index
> 06b4d29..420a0e4 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1318,6 +1318,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct
> usb_interface *intf)
>  	dev->net->ethtool_ops = &smsc95xx_ethtool_ops;
>  	dev->net->flags |= IFF_MULTICAST;
>  	dev->net->hard_header_len += SMSC95XX_TX_OVERHEAD_CSUM;
> +	dev->net->max_mtu = ETH_DATA_LEN;
>  	dev->hard_mtu = dev->net->mtu + dev->net->hard_header_len;
> 
>  	pdata->dev = dev;

Thanks for the patch.
Because device max_mtu is checked before changing the MTU value, I think your patch looks good to me.
Maybe you also want to add min_mtu too?

Thanks,
-Raghu

^ permalink raw reply

* Re: [PATCH] ath9k: add reset for airtime station debugfs
From: Toke Høiland-Jørgensen @ 2018-09-06 10:40 UTC (permalink / raw)
  To: Louie Lu
  Cc: kvalo-sgV2jX0FEOL9JmXXK+q4OQ, ath9k-devel-A+ZNKFmMK5xy9aJCnZT0Uw,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CADMDV=ws-iM6+L+CLyrGK=srV56jbYgBbOVWg2AxEX0gR6t8WQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Louie Lu <git-sgRGTR89XAc@public.gmane.org> writes:

> Toke Høiland-Jørgensen <toke-LJ9M9ZcSy1A@public.gmane.org> 於 2018年9月6日 週四 下午5:27寫道:
>
>> Louie Lu <git-sgRGTR89XAc@public.gmane.org> writes:
>>
>> > Let user can reset station airtime status by debugfs, it will
>> > reset all airtime deficit to ATH_AIRTIME_QUANTUM and reset rx/tx
>> > airtime accumulate to 0.
>>
>> No objections to the patch, but I'm curious which issues you were
>> debugging that led you to needing it? :)
>>
> I'm testing to get the packet queue time + airtime in
> ath_tx_process_buffer,

Right; I've been thinking that it would be useful to make the CoDel
enqueue time available to drivers. And minstrel, for that matter
(lowering the number of retries for packets that has queued for a long
time, for instance). Good to hear that others are looking into something
similar :)

> it would be useful if I can reset the station airtime accumulated
> value, so I can observe in each test round (e.g. 5 ping) airtime
> accumulated
>
> Also to reset the deficit to make sure it run like fresh one.

Yup, makes sense.

-Toke

^ permalink raw reply

* Re: [PATCH bpf-next 0/4] i40e AF_XDP zero-copy buffer leak fixes
From: Alexei Starovoitov @ 2018-09-06  5:55 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Jakub Kicinski, ast, Daniel Borkmann, Netdev, Jeff Kirsher,
	intel-wired-lan, Björn Töpel, Karlsson, Magnus,
	Magnus Karlsson
In-Reply-To: <CAJ+HfNjQCP7c7gCVJi=LRxK2LvQrNvn6cPfRACW6TPzM8339SA@mail.gmail.com>

On Wed, Sep 05, 2018 at 09:15:14PM +0200, Björn Töpel wrote:
> Den ons 5 sep. 2018 kl 19:14 skrev Jakub Kicinski
> <jakub.kicinski@netronome.com>:
> >
> > On Tue,  4 Sep 2018 20:11:01 +0200, Björn Töpel wrote:
> > > From: Björn Töpel <bjorn.topel@intel.com>
> > >
> > > This series addresses an AF_XDP zero-copy issue that buffers passed
> > > from userspace to the kernel was leaked when the hardware descriptor
> > > ring was torn down.
> > >
> > > The patches fixes the i40e AF_XDP zero-copy implementation.
> > >
> > > Thanks to Jakub Kicinski for pointing this out!
> > >
> > > Some background for folks that don't know the details: A zero-copy
> > > capable driver picks buffers off the fill ring and places them on the
> > > hardware Rx ring to be completed at a later point when DMA is
> > > complete. Similar on the Tx side; The driver picks buffers off the Tx
> > > ring and places them on the Tx hardware ring.
> > >
> > > In the typical flow, the Rx buffer will be placed onto an Rx ring
> > > (completed to the user), and the Tx buffer will be placed on the
> > > completion ring to notify the user that the transfer is done.
> > >
> > > However, if the driver needs to tear down the hardware rings for some
> > > reason (interface goes down, reconfiguration and such), the userspace
> > > buffers cannot be leaked. They have to be reused or completed back to
> > > userspace.
> > >
> > > The implementation does the following:
> > >
> > > * Outstanding Tx descriptors will be passed to the completion
> > >   ring. The Tx code has back-pressure mechanism in place, so that
> > >   enough empty space in the completion ring is guaranteed.
> > >
> > > * Outstanding Rx descriptors are temporarily stored on a stash/reuse
> > >   queue. The reuse queue is based on Jakub's RFC. When/if the HW rings
> > >   comes up again, entries from the stash are used to re-populate the
> > >   ring.
> > >
> > > * When AF_XDP ZC is enabled, disallow changing the number of hardware
> > >   descriptors via ethtool. Otherwise, the size of the stash/reuse
> > >   queue can grow unbounded.
> > >
> > > Going forward, introducing a "zero-copy allocator" analogous to Jesper
> > > Brouer's page pool would be a more robust and reuseable solution.
> > >
> > > Jakub: I've made a minor checkpatch-fix to your RFC, prior adding it
> > > into this series.
> >
> > Thanks for the fix! :)
> >
> > Out of curiosity, did checking the reuse queue have a noticeable impact
> > in your test (i.e. always using the _rq() helpers)?  You seem to be
> > adding an indirect call, would that not be way worse on a retpoline
> > kernel?
> 
> Do you mean the indirection in __i40e_alloc_rx_buffers_zc (patch #3)?
> The indirect call is elided by the __always_inline -- without that
> retpoline took 2.5Mpps worth of Rx. :-(
> 
> I'm only using the _rq helpers in the configuration/slow path, so the
> fast-path is unchanged.

Applied to bpf-next. Thanks.

^ permalink raw reply

* Re: [PATCH net 0/3] net/iucv: fixes 2018-09-05
From: David Miller @ 2018-09-06  5:32 UTC (permalink / raw)
  To: jwi; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180905145512.30165-1-jwi@linux.ibm.com>

From: Julian Wiedmann <jwi@linux.ibm.com>
Date: Wed,  5 Sep 2018 16:55:09 +0200

> please apply three straight-forward fixes for iucv. One that prevents
> leaking the skb on malformed inbound packets, one to fix the error
> handling on transmit error, and one to get rid of a compile warning.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf/verifier: properly clear union members after a ctx read
From: Alexei Starovoitov @ 2018-09-06  5:32 UTC (permalink / raw)
  To: Edward Cree; +Cc: daniel, ast, netdev
In-Reply-To: <719de66a-0cd4-262a-d2d9-578df3a3fdca@solarflare.com>

On Wed, Sep 05, 2018 at 02:47:22PM +0100, Edward Cree wrote:
> On 05/09/18 03:23, Alexei Starovoitov wrote:
> > So would you agree it's fair to add
> > Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> > ?
> Sure.  Though I don't think it needs backporting, as it's a conservative
>  bug (i.e. it merely prevents pruning, but that's safe security-wise).

agree. No backport necessary.

> > How about patch like the following:
> > ------------
> > From 422fd975ed78645ab67d2eb50ff6e1ff6fb3de32 Mon Sep 17 00:00:00 2001
> > From: Alexei Starovoitov <ast@kernel.org>
> > Date: Tue, 4 Sep 2018 19:13:44 -0700
> > Subject: [PATCH] bpf/verifier: fix verifier instability
> >
> > Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> > Debugged-by: Edward Cree <ecree@solarflare.com> 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Edward Cree <ecree@solarflare.com>

Thanks. I copy-pasted your commit log and pushed to bpf-next.

Much appreciate the time and effort spent on root causing these issues.

^ permalink raw reply

* [PATCH] cfg80211: remove redundant check of !scan_plan
From: Colin King @ 2018-09-06  9:58 UTC (permalink / raw)
  To: Johannes Berg, David S . Miller, linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The check for !scan_plan is redunant as this has been checked
in the proceeding statement and the code returns -ENOBUFS if
it is true. Remove the redundant check.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/wireless/nl80211.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d5f9b5235cdd..b4908bcb0d77 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -10690,8 +10690,7 @@ static int nl80211_send_wowlan_nd(struct sk_buff *msg,
 		if (!scan_plan)
 			return -ENOBUFS;
 
-		if (!scan_plan ||
-		    nla_put_u32(msg, NL80211_SCHED_SCAN_PLAN_INTERVAL,
+		if (nla_put_u32(msg, NL80211_SCHED_SCAN_PLAN_INTERVAL,
 				req->scan_plans[i].interval) ||
 		    (req->scan_plans[i].iterations &&
 		     nla_put_u32(msg, NL80211_SCHED_SCAN_PLAN_ITERATIONS,
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH RFC] net/mlx5_en: switch to Toeplitz RSS hash by default
From: Saeed Mahameed @ 2018-09-06  5:24 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Tariq Toukan, Linux Netdev List, Saeed Mahameed, Gal Pressman,
	Or Gerlitz, David S. Miller
In-Reply-To: <a081d829-35cf-5dd7-2772-db27e818e717@yandex-team.ru>

On Sun, Sep 2, 2018 at 2:55 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> On 02.09.2018 12:29, Tariq Toukan wrote:
>>
>>
>>
>> On 31/08/2018 2:29 PM, Konstantin Khlebnikov wrote:
>>>
>>> XOR (MLX5_RX_HASH_FN_INVERTED_XOR8) gives only 8 bits.
>>> It seems not enough for RFS. All other drivers use toeplitz.
>>>
>>> Driver mlx4_en uses Toeplitz by default and warns if hash XOR is used
>>> together with NETIF_F_RXHASH (enabled by default too): "Enabling both
>>> XOR Hash function and RX Hashing can limit RPS functionality".
>>>
>>> XOR is default in mlx5_en since commit 2be6967cdbc9
>>> ("net/mlx5e: Support ETH_RSS_HASH_XOR").
>>>
>>> Hash function could be set via ethtool. But it would be nice to have
>>> single standard for drivers or proper description why this one is
>>> special.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> ---
>>
>>
>> Hi Konstantin,
>>
>> Thanks for the patch.
>>
>> I understand the motivation.
>>
>> This change affects the default out-of-the-box behavior and requires a
>> full performance cycle. We'll run performance regression tomorrow, results
>> should be ready by EOW.
>>  > I'll update.
>
>
> Ok, thank you.
>
> The only mention I've found in your documentation
> http://www.mellanox.com/related-docs/prod_software/Mellanox_EN_for_Linux_User_Manual_v4_4.pdf
>
> is
> ---
> 1.1.10 RSS Support
> 1.1.10.1 RSS Hash Function
> The device has the ability to use XOR as the RSS distribution function,
> instead of the default
> Toplitz function.
> The XOR function can be better distributed among driver's receive queues in
> small number of
> streams, where it distributes each TCP/UDP stream to a different queue.
> ---
>
> So Toeplitz is supposed to be default hash function for all versions of
> drivers and hardware.
>
> Also XOR8 seems vulnerable for ddos - hash is predictable, no random\secret
> vector, only 8 bits.
> So, it's easy to route all flows into one point. As we got it by accident.
>
> Moreover, in kernel 4.4.y hash switch via ethtool is broken and does not
> work =)
>

is it broken in mlx5 only or for the whole kernel ?

If it is mlx5 then this might be the reason:
commit 2d75b2bc8a8c0ce5567a6ecef52e194d117efe3f
net/mlx5e: Add ethtool RSS configuration options

was submitted to kernel 4.3

and an important fix for hash function change was submitted to 4.5:

commit bdfc028de1b3cd59490d5413a5c87b0fa50040c2
Author: Tariq Toukan <tariqt@mellanox.com>
Date:   Mon Feb 29 21:17:12 2016 +0200

    net/mlx5e: Fix ethtool RX hash func configuration change

    We should modify TIRs explicitly to apply the new RSS configuration.
    The light ndo close/open calls do not "refresh" them.

    Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options')


>>
>> Regards,
>> Tariq

^ permalink raw reply

* Re: [PATCH net] net/sched: fix memory leak in act_tunnel_key_init()
From: David Miller @ 2018-09-06  5:19 UTC (permalink / raw)
  To: dcaratti; +Cc: xiyou.wangcong, simon.horman, amir, jhs, netdev
In-Reply-To: <d0a72d1371e790505a8141592c6af904c9b24031.1536079973.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Tue,  4 Sep 2018 19:00:19 +0200

> If users try to install act_tunnel_key 'set' rules with duplicate values
> of 'index', the tunnel metadata are allocated, but never released. Then,
> kmemleak complains as follows:
 ...
> This problem theoretically happens also in case users attempt to setup a
> geneve rule having wrong configuration data, or when the kernel fails to
> allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
> metadata also in the above conditions.
> 
> Addresses-Coverity-ID: 1373974 ("Resource leak")
> Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
> Fixes: 0ed5269f9e41f ("net/sched: add tunnel option support to act_tunnel_key")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] nfp: separate VXLAN and GRE feature handling
From: David Miller @ 2018-09-06  5:18 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20180904152833.20351-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue,  4 Sep 2018 08:28:33 -0700

> VXLAN and GRE FW features have to currently be both advertised
> for the driver to enable them.  Separate the handling.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/3] nfp: improve the new rtsym helpers
From: David Miller @ 2018-09-06  5:18 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20180904143733.16362-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue,  4 Sep 2018 07:37:30 -0700

> This set fixes a bug in ABS rtsym handling I added in net-next,
> it expands the error checking and reporting on the rtsym accesses.

Series applied.

^ permalink raw reply

* Re: [PATCH v2 net-next] failover: Add missing check to validate 'slave_dev' in net_failover_slave_unregister
From: David Miller @ 2018-09-06  5:16 UTC (permalink / raw)
  To: yuehaibing
  Cc: sridhar.samudrala, stephen, dan.carpenter, alexander.h.duyck,
	jeffrey.t.kirsher, liran.alon, joao.m.martins, netdev,
	kernel-janitors
In-Reply-To: <1536029786-21710-1-git-send-email-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Tue, 4 Sep 2018 02:56:26 +0000

> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/net/net_failover.c: In function 'net_failover_slave_unregister':
> drivers/net/net_failover.c:598:35: warning:
>  variable 'primary_dev' set but not used [-Wunused-but-set-variable]
> 
> There should check the validity of 'slave_dev'.
> 
> Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
> v2: use WARN_ON_ONCE as Liran Alon suggested

Applied.

^ permalink raw reply

* Re: [Patch net] tipc: orphan sock in tipc_release()
From: David Miller @ 2018-09-06  5:15 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, tipc-discussion, jon.maloy, ying.xue
In-Reply-To: <20180904021241.11426-2-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon,  3 Sep 2018 19:12:41 -0700

> Before we unlock the sock in tipc_release(), we have to
> detach sk->sk_socket from sk, otherwise a parallel
> tipc_sk_fill_sock_diag() could stil read it after we
> free this socket.
> 
> Fixes: c30b70deb5f4 ("tipc: implement socket diagnostics for AF_TIPC")
> Reported-and-tested-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com
> Cc: Jon Maloy <jon.maloy@ericsson.com>
> Cc: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] mac80211: fix TX status reporting for ieee80211s
From: Johannes Berg @ 2018-09-06  9:47 UTC (permalink / raw)
  To: Yuan-Chi Pang; +Cc: David S. Miller, linux-wireless, netdev, linux-kernel
In-Reply-To: <1536227104.68915.5.camel@gmail.com>

On Thu, 2018-09-06 at 17:45 +0800, Yuan-Chi Pang wrote:
> 
> I received the comment of my previous submission that I should fix
> indentation. But if I use the same indentation as before, I got "line
> over 80 characters" warning by checkpatch. What should I do?
> 
> > diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
> > index daf9db3..4286673 100644
> > --- a/net/mac80211/mesh_hwmp.c
> > +++ b/net/mac80211/mesh_hwmp.c
> > @@ -295,15 +295,11 @@ int mesh_path_error_tx(struct
> > ieee80211_sub_if_data *sdata,
> >  }
> >  
> >  void ieee80211s_update_metric(struct ieee80211_local *local,
> > -		struct sta_info *sta, struct sk_buff *skb)
> > +		struct sta_info *sta, struct ieee80211_tx_status
> > *st)

Here? Seems easy - add a linebreak between sta and status?

johannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox