Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ath9k: add reset for airtime station debugfs
From: Toke Høiland-Jørgensen @ 2018-09-06 10:40 UTC (permalink / raw)
  To: Louie Lu
  Cc: kvalo-sgV2jX0FEOL9JmXXK+q4OQ, ath9k-devel-A+ZNKFmMK5xy9aJCnZT0Uw,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CADMDV=ws-iM6+L+CLyrGK=srV56jbYgBbOVWg2AxEX0gR6t8WQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Louie Lu <git-sgRGTR89XAc@public.gmane.org> writes:

> Toke Høiland-Jørgensen <toke-LJ9M9ZcSy1A@public.gmane.org> 於 2018年9月6日 週四 下午5:27寫道:
>
>> Louie Lu <git-sgRGTR89XAc@public.gmane.org> writes:
>>
>> > Let user can reset station airtime status by debugfs, it will
>> > reset all airtime deficit to ATH_AIRTIME_QUANTUM and reset rx/tx
>> > airtime accumulate to 0.
>>
>> No objections to the patch, but I'm curious which issues you were
>> debugging that led you to needing it? :)
>>
> I'm testing to get the packet queue time + airtime in
> ath_tx_process_buffer,

Right; I've been thinking that it would be useful to make the CoDel
enqueue time available to drivers. And minstrel, for that matter
(lowering the number of retries for packets that has queued for a long
time, for instance). Good to hear that others are looking into something
similar :)

> it would be useful if I can reset the station airtime accumulated
> value, so I can observe in each test round (e.g. 5 ping) airtime
> accumulated
>
> Also to reset the deficit to make sure it run like fresh one.

Yup, makes sense.

-Toke

^ permalink raw reply

* RE: smsc95xx: Invalid max MTU
From: RaghuramChary.Jallipalli @ 2018-09-06  6:26 UTC (permalink / raw)
  To: stefan.wahren, UNGLinuxDriver, Woojung.Huh; +Cc: davem, netdev
In-Reply-To: <1102700056.29198.1536169142144@email.1und1.de>

> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index
> 06b4d29..420a0e4 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1318,6 +1318,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct
> usb_interface *intf)
>  	dev->net->ethtool_ops = &smsc95xx_ethtool_ops;
>  	dev->net->flags |= IFF_MULTICAST;
>  	dev->net->hard_header_len += SMSC95XX_TX_OVERHEAD_CSUM;
> +	dev->net->max_mtu = ETH_DATA_LEN;
>  	dev->hard_mtu = dev->net->mtu + dev->net->hard_header_len;
> 
>  	pdata->dev = dev;

Thanks for the patch.
Because device max_mtu is checked before changing the MTU value, I think your patch looks good to me.
Maybe you also want to add min_mtu too?

Thanks,
-Raghu

^ permalink raw reply

* Re: [PATCH v3 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Arseny Maslennikov @ 2018-09-06  7:04 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180905135035.GT2977@mtr-leonro.mtl.com>

[-- Attachment #1: Type: text/plain, Size: 3311 bytes --]

On Wed, Sep 05, 2018 at 04:50:35PM +0300, Leon Romanovsky wrote:
> On Mon, Sep 03, 2018 at 07:13:16PM +0300, Arseny Maslennikov wrote:
> > Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 38 +++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 30f840f874b3..7386e5bde3d3 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -2386,6 +2386,42 @@ int ipoib_add_pkey_attr(struct net_device *dev)
> >  	return device_create_file(&dev->dev, &dev_attr_pkey);
> >  }
> >
> > +/*
> > + * We erroneously exposed the iface's port number in the dev_id
> > + * sysfs field long after dev_port was introduced for that purpose[1],
> > + * and we need to stop everyone from relying on that.
> > + * Let's overload the shower routine for the dev_id file here
> > + * to gently bring the issue up.
> > + *
> > + * [1] https://www.spinics.net/lists/netdev/msg272123.html
> > + */
> > +static ssize_t dev_id_show(struct device *dev,
> > +			   struct device_attribute *attr, char *buf)
> > +{
> > +	struct net_device *ndev = to_net_dev(dev);
> > +	ssize_t ret = -EINVAL;
> > +
> > +	if (ndev->dev_id == ndev->dev_port) {
> > +		netdev_info_once(ndev,
> > +			"\"%s\" wants to know my dev_id. "
> > +			"Should it look at dev_port instead?\n",
> > +			current->comm);
> > +		netdev_info_once(ndev,
> > +			"See Documentation/ABI/testing/sysfs-class-net for more info.\n");
> > +	}
> > +
> > +	ret = sprintf(buf, "%#x\n", ndev->dev_id);
> > +
> > +	return ret;
> > +}
> > +static DEVICE_ATTR_RO(dev_id);
> > +
> 
> I don't see this field among exposed by IPoIB, why should we expose it now?
> 

To deviate from standard netdev behaviour, which only prints the
field out. Doug wanted this to also print a deprecation message, and
netdev (obviously) does not do that. See below.

> > +int ipoib_intercept_dev_id_attr(struct net_device *dev)
> > +{
> > +	device_remove_file(&dev->dev, &dev_attr_dev_id);
> > +	return device_create_file(&dev->dev, &dev_attr_dev_id);
> 
> Why isn't enough to rely on netdev code?
> 

Netdev code relies on macros around a *static* function 'netdev_show',
which is defined in net/core/net-sysfs.c; it is not listed in any header
files, and the macros aren't as well. This all leads me to believe it
was not really meant to be used from outside net/core/net-sysfs.

The only way we could use any netdev code here is to set up our own
handler (again), printk() a message, then call netdev_show — but we have
no access to it.

Of course, it also may be that I'm terribly missing a clue.

> > +}
> > +
> >  static struct net_device *ipoib_add_port(const char *format,
> >  					 struct ib_device *hca, u8 port)
> >  {
> > @@ -2427,6 +2463,8 @@ static struct net_device *ipoib_add_port(const char *format,
> >  	 */
> >  	ndev->priv_destructor = ipoib_intf_free;
> >
> > +	if (ipoib_intercept_dev_id_attr(ndev))
> > +		goto sysfs_failed;
> >  	if (ipoib_cm_add_mode_attr(ndev))
> >  		goto sysfs_failed;
> >  	if (ipoib_add_pkey_attr(ndev))
> > --
> > 2.19.0.rc1
> >



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v3 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Arseny Maslennikov @ 2018-09-06  7:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180905164727.3108bce6@shemminger-XPS-13-9360>

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

On Wed, Sep 05, 2018 at 04:47:27PM +0100, Stephen Hemminger wrote:
> On Mon,  3 Sep 2018 19:13:16 +0300
> Arseny Maslennikov <ar@cs.msu.ru> wrote:
> 
> > +	if (ndev->dev_id == ndev->dev_port) {
> > +		netdev_info_once(ndev,
> > +			"\"%s\" wants to know my dev_id. "
> > +			"Should it look at dev_port instead?\n",
> > +			current->comm);
> > +		netdev_info_once(ndev,
> > +			"See Documentation/ABI/testing/sysfs-class-net for more info.\n");
> > +	}
> 
> Single line message is sufficient.
> Also don't break strings in messages.
> 

OK, will fix in v4.


(Sorry if the following is too off-topic here)
Multi-line messages in separate printk calls can be racy, I get that.
But I'd like to hear some reasoning behind the style decision to not
break a long string into many string literals. (I'll most certainly not
be alone in this, Documentation/process/ does not mention reasons, only
the requirements themselves)

The only drawback I currently see is that breaking a long message into
multiple string literals makes it impossible to git grep the kernel tree
for the whole message text.
However, splitting a long line this way allows us to nicely wrap the
code at 80 columns, which is a readability boon.

Are there any other reasons to avoid that? Except maybe matters of taste. :)

> > +	}
> > +
> > +	ret = sprintf(buf, "%#x\n", ndev->dev_id);
> > +
> > +	return ret;
> 
> Why not?
> 	return sprintf...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net 3/3] tls: zero the crypto information from tls_context before freeing
From: Sabrina Dubroca @ 2018-09-06  7:42 UTC (permalink / raw)
  To: Vakul Garg
  Cc: netdev@vger.kernel.org, Boris Pismenny, Ilya Lesokhin,
	Aviad Yehezkel, Dave Watson
In-Reply-To: <DB7PR04MB425294A2EA0DA1170CC2C33E8B020@DB7PR04MB4252.eurprd04.prod.outlook.com>

2018-09-05, 13:48:48 +0000, Vakul Garg wrote:
> 
> 
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> > Behalf Of Sabrina Dubroca
> > Sent: Wednesday, September 5, 2018 6:52 PM
> > To: netdev@vger.kernel.org
> > Cc: Sabrina Dubroca <sd@queasysnail.net>; Boris Pismenny
> > <borisp@mellanox.com>; Ilya Lesokhin <ilyal@mellanox.com>; Aviad
> > Yehezkel <aviadye@mellanox.com>; Dave Watson <davejwatson@fb.com>
> > Subject: [PATCH net 3/3] tls: zero the crypto information from tls_context
> > before freeing
> > 
> > This contains key material in crypto_send_aes_gcm_128 and
> > crypto_recv_aes_gcm_128.
> > 
> > Fixes: 3c4d7559159b ("tls: kernel TLS support")
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> >  include/net/tls.h  |  1 +
> >  net/tls/tls_main.c | 14 ++++++++++++--
> >  2 files changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/tls.h b/include/net/tls.h index
> > d5c683e8bb22..2010d23112f9 100644
> > --- a/include/net/tls.h
> > +++ b/include/net/tls.h
> > @@ -180,6 +180,7 @@ struct tls_context {
> >  		struct tls_crypto_info crypto_recv;
> >  		struct tls12_crypto_info_aes_gcm_128
> > crypto_recv_aes_gcm_128;
> >  	};
> > +	char tls_crypto_ctx_end[0];
> 
> This makes the key zeroization dependent upon the position of unions
> in structure.

Yes. I could add char tls_crypto_ctx_start[0].

> Can you zeroise the crypto_send, crypto_recv separately using two
> memzero_explicit calls?

It's not crypto_send, it's crypto_send_aes_gcm_128. I don't know how
likely it is that another crypto algorithm will ever be added, but
then you'd have to switch to zeroing that thing.

I had a different version of the patch that gave a name to those
unions, but then I need to change all the references everywhere and
the patch becomes a bit ugly, especially for net.

struct tls_context {
	union {
		struct tls_crypto_info info;
		struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
	} crypto_send;
	union {
		struct tls_crypto_info info;
		struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
	} crypto_recv;

	[...]
}


Or even:

union tls_crypto_context {
	struct tls_crypto_info info;
	struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
};

struct tls_context {
	union tls_crypto_context crypto_send;
	union tls_crypto_context crypto_recv;

	[...]
}


-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net 2/3] tls: clear key material from kernel memory when do_tls_setsockopt_conf fails
From: Sabrina Dubroca @ 2018-09-06  7:49 UTC (permalink / raw)
  To: Boris Pismenny; +Cc: netdev, Ilya Lesokhin, Aviad Yehezkel, Dave Watson
In-Reply-To: <567f747b-e817-4a9b-1768-96575caea845@mellanox.com>

2018-09-05, 16:53:54 +0300, Boris Pismenny wrote:
> Hi Sabrina,
> 
> On 9/5/2018 4:21 PM, Sabrina Dubroca wrote:
> > Fixes: 3c4d7559159b ("tls: kernel TLS support")
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> >   net/tls/tls_main.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index 180b6640e531..0d432d025471 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> > @@ -499,7 +499,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
> >   	goto out;
> >   err_crypto_info:
> > -	memset(crypto_info, 0, sizeof(*crypto_info));
> > +	memzero_explicit(crypto_info, sizeof(struct tls12_crypto_info_aes_gcm_128));
> 
> Besides the key, there are other (not secret) information in
> tls12_crypto_info_aes_gcm_128. I'd prefer you do not delete it to enable
> users to obtain it (using getsockopt) in case we decide to implement a
> fallback to userspace in the future. Such a fallback must obtain the
> kernel's iv, and record sequence number.

The operation failed. There's no reason for this stale data to remain
in the kernel. And you couldn't recover it with getsockopt, because
you'll hit !TLS_CRYPTO_INFO_READY, since we're already resetting
tls_crypto_info. Resetting tls_crypto_info on failure is necessary,
otherwise your socket will be in a broken state, with TLS not setup
but unable to perform a new attempt.

Userspace already has all this information anyway, since it passed it
to the kernel, so why would it need to recover it from the kernel?

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH net-next v2 0/9] rtnetlink: add IFA_TARGET_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-09-06 12:33 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-kernel, kuznet, yoshfuji, pombredanne, kstewart,
	gregkh, dsahern, fw, ktkhai, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel
In-Reply-To: <20180905.222750.249661617396939609.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

On Wed, Sep 05, 2018 at 10:27:50PM -0700, David Miller wrote:
> From: Christian Brauner <christian@brauner.io>
> Date: Tue,  4 Sep 2018 21:53:46 +0200
> 
> > # v2 introduction:
>  ...
> 
> This looks good to me, series applied.
> 
> Thank you!

Thanks you!

Christian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net-next 07/13] net: sched: implement functions to put and flush all chains
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 55 +++++++++++++++++++++++++++++------------------------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c3c7d4e2f84c..58b2d8443f6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -538,6 +538,31 @@ static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 		qdisc_put_unlocked(q);
 }
 
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+	struct tcf_chain *chain;
+
+	/* Hold a refcnt for all chains, so that they don't disappear
+	 * while we are iterating.
+	 */
+	list_for_each_entry(chain, &block->chain_list, list)
+		tcf_chain_hold(chain);
+
+	list_for_each_entry(chain, &block->chain_list, list)
+		tcf_chain_flush(chain);
+}
+
+static void tcf_block_put_all_chains(struct tcf_block *block)
+{
+	struct tcf_chain *chain, *tmp;
+
+	/* At this point, all the chains should have refcnt >= 1. */
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
+		tcf_chain_put_explicitly_created(chain);
+		tcf_chain_put(chain);
+	}
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,8 +820,6 @@ EXPORT_SYMBOL(tcf_block_get);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei)
 {
-	struct tcf_chain *chain, *tmp;
-
 	if (!block)
 		return;
 	tcf_chain0_head_change_cb_del(block, ei);
@@ -813,32 +836,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 
 		if (tcf_block_shared(block))
 			tcf_block_remove(block, block->net);
-
-		if (!free_block) {
-			/* Hold a refcnt for all chains, so that they don't
-			 * disappear while we are iterating.
-			 */
-			list_for_each_entry(chain, &block->chain_list, list)
-				tcf_chain_hold(chain);
-
-			list_for_each_entry(chain, &block->chain_list, list)
-				tcf_chain_flush(chain);
-		}
-
+		if (!free_block)
+			tcf_block_flush_all_chains(block);
 		tcf_block_offload_unbind(block, q, ei);
 
-		if (free_block) {
+		if (free_block)
 			kfree(block);
-		} else {
-			/* At this point, all the chains should have
-			 * refcnt >= 1.
-			 */
-			list_for_each_entry_safe(chain, tmp, &block->chain_list,
-						 list) {
-				tcf_chain_put_explicitly_created(chain);
-				tcf_chain_put(chain);
-			}
-		}
+		else
+			tcf_block_put_all_chains(block);
 	} else {
 		tcf_block_offload_unbind(block, q, ei);
 	}
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 05/13] net: sched: use Qdisc rcu API instead of relying on rtnl lock
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 88 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1a67af8a6e8c..cfa4a02a6a1a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -527,6 +527,17 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
 	return idr_find(&tn->idr, block_index);
 }
 
+static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
+{
+	if (!q)
+		return;
+
+	if (rtnl_held)
+		qdisc_put(q);
+	else
+		qdisc_put_unlocked(q);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -537,6 +548,7 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 					struct netlink_ext_ack *extack)
 {
 	struct tcf_block *block;
+	int err = 0;
 
 	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
 		block = tcf_block_lookup(net, block_index);
@@ -548,55 +560,91 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 		const struct Qdisc_class_ops *cops;
 		struct net_device *dev;
 
+		rcu_read_lock();
+
 		/* Find link */
-		dev = __dev_get_by_index(net, ifindex);
-		if (!dev)
+		dev = dev_get_by_index_rcu(net, ifindex);
+		if (!dev) {
+			rcu_read_unlock();
 			return ERR_PTR(-ENODEV);
+		}
 
 		/* Find qdisc */
 		if (!*parent) {
 			*q = dev->qdisc;
 			*parent = (*q)->handle;
 		} else {
-			*q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+			*q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
 			if (!*q) {
 				NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
-				return ERR_PTR(-EINVAL);
+				err = -EINVAL;
+				goto errout_rcu;
 			}
 		}
 
+		*q = qdisc_refcount_inc_nz(*q);
+		if (!*q) {
+			NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+			err = -EINVAL;
+			goto errout_rcu;
+		}
+
 		/* Is it classful? */
 		cops = (*q)->ops->cl_ops;
 		if (!cops) {
 			NL_SET_ERR_MSG(extack, "Qdisc not classful");
-			return ERR_PTR(-EINVAL);
+			err = -EINVAL;
+			goto errout_rcu;
 		}
 
 		if (!cops->tcf_block) {
 			NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-			return ERR_PTR(-EOPNOTSUPP);
+			err = -EOPNOTSUPP;
+			goto errout_rcu;
 		}
 
+		/* At this point we know that qdisc is not noop_qdisc,
+		 * which means that qdisc holds a reference to net_device
+		 * and we hold a reference to qdisc, so it is safe to release
+		 * rcu read lock.
+		 */
+		rcu_read_unlock();
+
 		/* Do we search for filter, attached to class? */
 		if (TC_H_MIN(*parent)) {
 			*cl = cops->find(*q, *parent);
 			if (*cl == 0) {
 				NL_SET_ERR_MSG(extack, "Specified class doesn't exist");
-				return ERR_PTR(-ENOENT);
+				err = -ENOENT;
+				goto errout_qdisc;
 			}
 		}
 
 		/* And the last stroke */
 		block = cops->tcf_block(*q, *cl, extack);
-		if (!block)
-			return ERR_PTR(-EINVAL);
+		if (!block) {
+			err = -EINVAL;
+			goto errout_qdisc;
+		}
 		if (tcf_block_shared(block)) {
 			NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters");
-			return ERR_PTR(-EOPNOTSUPP);
+			err = -EOPNOTSUPP;
+			goto errout_qdisc;
 		}
 	}
 
 	return block;
+
+errout_rcu:
+	rcu_read_unlock();
+errout_qdisc:
+	tcf_qdisc_put(*q, true);
+	return ERR_PTR(err);
+}
+
+static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
+{
+	tcf_qdisc_put(q, true);
 }
 
 struct tcf_block_owner_item {
@@ -1332,6 +1380,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
@@ -1453,6 +1502,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	return err;
 }
 
@@ -1538,6 +1588,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	return err;
 }
 
@@ -1854,7 +1905,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 	chain_index = tca[TCA_CHAIN] ? nla_get_u32(tca[TCA_CHAIN]) : 0;
 	if (chain_index > TC_ACT_EXT_VAL_MASK) {
 		NL_SET_ERR_MSG(extack, "Specified chain index exceeds upper limit");
-		return -EINVAL;
+		err = -EINVAL;
+		goto errout_block;
 	}
 	chain = tcf_chain_lookup(block, chain_index);
 	if (n->nlmsg_type == RTM_NEWCHAIN) {
@@ -1866,23 +1918,27 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 				tcf_chain_hold(chain);
 			} else {
 				NL_SET_ERR_MSG(extack, "Filter chain already exists");
-				return -EEXIST;
+				err = -EEXIST;
+				goto errout_block;
 			}
 		} else {
 			if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 				NL_SET_ERR_MSG(extack, "Need both RTM_NEWCHAIN and NLM_F_CREATE to create a new chain");
-				return -ENOENT;
+				err = -ENOENT;
+				goto errout_block;
 			}
 			chain = tcf_chain_create(block, chain_index);
 			if (!chain) {
 				NL_SET_ERR_MSG(extack, "Failed to create filter chain");
-				return -ENOMEM;
+				err = -ENOMEM;
+				goto errout_block;
 			}
 		}
 	} else {
 		if (!chain || tcf_chain_held_by_acts_only(chain)) {
 			NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
-			return -EINVAL;
+			err = -EINVAL;
+			goto errout_block;
 		}
 		tcf_chain_hold(chain);
 	}
@@ -1924,6 +1980,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 
 errout:
 	tcf_chain_put(chain);
+errout_block:
+	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 06/13] net: sched: change tcf block reference counter type to refcount_t
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c       | 59 ++++++++++++++++++++++++++++-------------------
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f878afa58be4..825e2bf6c5c3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -345,7 +345,7 @@ struct tcf_chain {
 struct tcf_block {
 	struct list_head chain_list;
 	u32 index; /* block index for shared blocks */
-	unsigned int refcnt;
+	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
 	struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cfa4a02a6a1a..c3c7d4e2f84c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 	if (!chain->index)
 		block->chain0.chain = NULL;
 	kfree(chain);
-	if (list_empty(&block->chain_list) && block->refcnt == 0)
+	if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
 		kfree(block);
 }
 
@@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	INIT_LIST_HEAD(&block->owner_list);
 	INIT_LIST_HEAD(&block->chain0.filter_chain_list);
 
-	block->refcnt = 1;
+	refcount_set(&block->refcnt, 1);
 	block->net = net;
 	block->index = block_index;
 
@@ -719,7 +719,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 		/* block_index not 0 means the shared block is requested */
 		block = tcf_block_lookup(net, ei->block_index);
 		if (block)
-			block->refcnt++;
+			refcount_inc(&block->refcnt);
 	}
 
 	if (!block) {
@@ -762,7 +762,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 err_block_insert:
 		kfree(block);
 	} else {
-		block->refcnt--;
+		refcount_dec(&block->refcnt);
 	}
 	return err;
 }
@@ -802,34 +802,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 	tcf_chain0_head_change_cb_del(block, ei);
 	tcf_block_owner_del(block, q, ei->binder_type);
 
-	if (block->refcnt == 1) {
-		if (tcf_block_shared(block))
-			tcf_block_remove(block, block->net);
-
-		/* Hold a refcnt for all chains, so that they don't disappear
-		 * while we are iterating.
+	if (refcount_dec_and_test(&block->refcnt)) {
+		/* Flushing/putting all chains will cause the block to be
+		 * deallocated when last chain is freed. However, if chain_list
+		 * is empty, block has to be manually deallocated. After block
+		 * reference counter reached 0, it is no longer possible to
+		 * increment it or add new chains to block.
 		 */
-		list_for_each_entry(chain, &block->chain_list, list)
-			tcf_chain_hold(chain);
+		bool free_block = list_empty(&block->chain_list);
 
-		list_for_each_entry(chain, &block->chain_list, list)
-			tcf_chain_flush(chain);
-	}
+		if (tcf_block_shared(block))
+			tcf_block_remove(block, block->net);
 
-	tcf_block_offload_unbind(block, q, ei);
+		if (!free_block) {
+			/* Hold a refcnt for all chains, so that they don't
+			 * disappear while we are iterating.
+			 */
+			list_for_each_entry(chain, &block->chain_list, list)
+				tcf_chain_hold(chain);
 
-	if (block->refcnt == 1) {
-		/* At this point, all the chains should have refcnt >= 1. */
-		list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
-			tcf_chain_put_explicitly_created(chain);
-			tcf_chain_put(chain);
+			list_for_each_entry(chain, &block->chain_list, list)
+				tcf_chain_flush(chain);
 		}
 
-		block->refcnt--;
-		if (list_empty(&block->chain_list))
+		tcf_block_offload_unbind(block, q, ei);
+
+		if (free_block) {
 			kfree(block);
+		} else {
+			/* At this point, all the chains should have
+			 * refcnt >= 1.
+			 */
+			list_for_each_entry_safe(chain, tmp, &block->chain_list,
+						 list) {
+				tcf_chain_put_explicitly_created(chain);
+				tcf_chain_put(chain);
+			}
+		}
 	} else {
-		block->refcnt--;
+		tcf_block_offload_unbind(block, q, ei);
 	}
 }
 EXPORT_SYMBOL(tcf_block_put_ext);
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 01/13] net: core: netlink: add helper refcount dec and lock function
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/rtnetlink.h | 1 +
 net/core/rtnetlink.c      | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5225832bd6ff..dffbf665c086 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -34,6 +34,7 @@ extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
+extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
 extern wait_queue_head_t netdev_unregistering_wq;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 60c928894a78..4ea0b1413076 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -130,6 +130,12 @@ int rtnl_is_locked(void)
 }
 EXPORT_SYMBOL(rtnl_is_locked);
 
+bool refcount_dec_and_rtnl_lock(refcount_t *r)
+{
+	return refcount_dec_and_mutex_lock(r, &rtnl_mutex);
+}
+EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);
+
 #ifdef CONFIG_PROVE_LOCKING
 bool lockdep_rtnl_is_held(void)
 {
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/rtnetlink.h |  5 +++++
 include/net/pkt_sched.h   |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/sch_api.c       | 18 ++++++++++++++++++
 net/sched/sch_generic.c   | 25 ++++++++++++++++++++++++-
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dffbf665c086..d3dff3e41e6c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -84,6 +84,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
 	return rtnl_dereference(dev->ingress_queue);
 }
 
+static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device *dev)
+{
+	return rcu_dereference(dev->ingress_queue);
+}
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
 
 #ifdef CONFIG_NET_INGRESS
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7dc769e5452b..a16fbe9a2a67 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -102,6 +102,7 @@ int qdisc_set_default(const char *id);
 void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
 					struct nlattr *tab,
 					struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 18e22a5a6550..239c73f29471 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -90,6 +90,7 @@ struct Qdisc {
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 	int			padded;
 	refcount_t		refcnt;
+	struct rcu_head		rcu;
 
 	/*
 	 * For performance sake on SMP, we put highly modified fields at the end
@@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
+void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
 			       unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 836b32e6e8e8..8854c9b674b8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -315,6 +315,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 	return q;
 }
 
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
+{
+	struct Qdisc *q;
+	struct netdev_queue *nq;
+
+	if (!handle)
+		return NULL;
+	q = qdisc_match_from_root(dev->qdisc, handle);
+	if (q)
+		goto out;
+
+	nq = dev_ingress_queue_rcu(dev);
+	if (nq)
+		q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+out:
+	return q;
+}
+
 static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 {
 	unsigned long cl;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index bb778485ed88..2176fe9db750 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc)
 	kfree((char *) qdisc - qdisc->padded);
 }
 
+void qdisc_free_cb(struct rcu_head *head)
+{
+	struct Qdisc *q = container_of(head, struct Qdisc, rcu);
+
+	qdisc_free(q);
+}
+
 static void qdisc_destroy(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops  *ops = qdisc->ops;
@@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
 		kfree_skb_list(skb);
 	}
 
-	qdisc_free(qdisc);
+	call_rcu(&qdisc->rcu, qdisc_free_cb);
 }
 
 void qdisc_put(struct Qdisc *qdisc)
@@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_put);
 
+/* Version of qdisc_put() that is called with rtnl mutex unlocked.
+ * Intended to be used as optimization, this function only takes rtnl lock if
+ * qdisc reference counter reached zero.
+ */
+
+void qdisc_put_unlocked(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN ||
+	    !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
+		return;
+
+	qdisc_destroy(qdisc);
+	rtnl_unlock();
+}
+EXPORT_SYMBOL(qdisc_put_unlocked);
+
 /* Attach toplevel qdisc to device queue. */
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc)
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 04/13] net: sched: add helper function to take reference to Qdisc
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Implement function to take reference to Qdisc that relies on rcu read lock
instead of rtnl mutex. Function only takes reference to Qdisc if reference
counter isn't zero. Intended to be used by unlocked cls API.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 239c73f29471..f878afa58be4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
 	refcount_inc(&qdisc->refcnt);
 }
 
+/* Intended to be used by unlocked users, when concurrent qdisc release is
+ * possible.
+ */
+
+static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN)
+		return qdisc;
+	if (refcount_inc_not_zero(&qdisc->refcnt))
+		return qdisc;
+	return NULL;
+}
+
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK)
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 00/13] Refactor classifier API to work with Qdisc/blocks without rtnl lock
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Vlad Buslov (13):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()
  net: sched: extend tcf_block with rcu
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_get() and tcf_block_put()
  net: sched: use reference counting for tcf blocks on rules update
  net: sched: add flags to Qdisc class ops struct

 include/linux/rtnetlink.h |   6 +
 include/net/pkt_cls.h     |  36 +++---
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  28 ++++-
 net/core/rtnetlink.c      |   6 +
 net/sched/cls_api.c       | 281 ++++++++++++++++++++++++++++++++--------------
 net/sched/sch_api.c       |  24 +++-
 net/sched/sch_atm.c       |  14 +--
 net/sched/sch_cake.c      |   4 +-
 net/sched/sch_cbq.c       |  15 +--
 net/sched/sch_cbs.c       |   2 +-
 net/sched/sch_drr.c       |   8 +-
 net/sched/sch_dsmark.c    |   6 +-
 net/sched/sch_fifo.c      |   2 +-
 net/sched/sch_fq_codel.c  |   4 +-
 net/sched/sch_generic.c   |  48 ++++++--
 net/sched/sch_hfsc.c      |  13 ++-
 net/sched/sch_htb.c       |  17 +--
 net/sched/sch_ingress.c   |  15 +--
 net/sched/sch_mq.c        |   4 +-
 net/sched/sch_mqprio.c    |   4 +-
 net/sched/sch_multiq.c    |  10 +-
 net/sched/sch_netem.c     |   2 +-
 net/sched/sch_prio.c      |  10 +-
 net/sched/sch_qfq.c       |   8 +-
 net/sched/sch_red.c       |   4 +-
 net/sched/sch_sfb.c       |   8 +-
 net/sched/sch_sfq.c       |   4 +-
 net/sched/sch_tbf.c       |   4 +-
 29 files changed, 394 insertions(+), 194 deletions(-)

-- 
2.7.5

^ permalink raw reply

* [PATCH net-next 08/13] net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Functions tcf_block_get{_ext}() and tcf_block_put{_ext}() actually
attach/detach block to specific Qdisc besides just taking/putting
reference. Rename them according to their purpose.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h    | 36 ++++++++++++++++++------------------
 net/sched/cls_api.c      | 31 +++++++++++++++----------------
 net/sched/sch_atm.c      | 12 ++++++------
 net/sched/sch_cake.c     |  4 ++--
 net/sched/sch_cbq.c      | 13 +++++++------
 net/sched/sch_drr.c      |  4 ++--
 net/sched/sch_dsmark.c   |  4 ++--
 net/sched/sch_fq_codel.c |  4 ++--
 net/sched/sch_hfsc.c     | 11 ++++++-----
 net/sched/sch_htb.c      | 13 +++++++------
 net/sched/sch_ingress.c  | 15 ++++++++-------
 net/sched/sch_multiq.c   |  4 ++--
 net/sched/sch_prio.c     |  4 ++--
 net/sched/sch_qfq.c      |  4 ++--
 net/sched/sch_sfb.c      |  4 ++--
 net/sched/sch_sfq.c      |  4 ++--
 16 files changed, 85 insertions(+), 82 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 75a3f3fdb359..9c11f8d83c1c 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -44,15 +44,15 @@ struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
 				       u32 chain_index);
 void tcf_chain_put_by_act(struct tcf_chain *chain);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
-int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
-		  struct netlink_ext_ack *extack);
-int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
-		      struct tcf_block_ext_info *ei,
-		      struct netlink_ext_ack *extack);
-void tcf_block_put(struct tcf_block *block);
-void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
-		       struct tcf_block_ext_info *ei);
+int tcf_block_attach(struct tcf_block **p_block,
+		     struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+		     struct netlink_ext_ack *extack);
+int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
+			 struct tcf_block_ext_info *ei,
+			 struct netlink_ext_ack *extack);
+void tcf_block_detach(struct tcf_block *block);
+void tcf_block_detach_ext(struct tcf_block *block, struct Qdisc *q,
+			  struct tcf_block_ext_info *ei);
 
 static inline bool tcf_block_shared(struct tcf_block *block)
 {
@@ -92,28 +92,28 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 
 #else
 static inline
-int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
-		  struct netlink_ext_ack *extack)
+int tcf_block_attach(struct tcf_block **p_block,
+		     struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+		     struct netlink_ext_ack *extack)
 {
 	return 0;
 }
 
 static inline
-int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
-		      struct tcf_block_ext_info *ei,
-		      struct netlink_ext_ack *extack)
+int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
+			 struct tcf_block_ext_info *ei,
+			 struct netlink_ext_ack *extack)
 {
 	return 0;
 }
 
-static inline void tcf_block_put(struct tcf_block *block)
+static inline void tcf_block_detach(struct tcf_block *block)
 {
 }
 
 static inline
-void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
-		       struct tcf_block_ext_info *ei)
+void tcf_block_detach_ext(struct tcf_block *block, struct Qdisc *q,
+			  struct tcf_block_ext_info *ei)
 {
 }
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 58b2d8443f6a..f11da74dd339 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -731,9 +731,9 @@ static void tcf_block_owner_del(struct tcf_block *block,
 	WARN_ON(1);
 }
 
-int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
-		      struct tcf_block_ext_info *ei,
-		      struct netlink_ext_ack *extack)
+int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
+			 struct tcf_block_ext_info *ei,
+			 struct netlink_ext_ack *extack)
 {
 	struct net *net = qdisc_net(q);
 	struct tcf_block *block = NULL;
@@ -791,7 +791,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 	}
 	return err;
 }
-EXPORT_SYMBOL(tcf_block_get_ext);
+EXPORT_SYMBOL(tcf_block_attach_ext);
 
 static void tcf_chain_head_change_dflt(struct tcf_proto *tp_head, void *priv)
 {
@@ -800,9 +800,9 @@ static void tcf_chain_head_change_dflt(struct tcf_proto *tp_head, void *priv)
 	rcu_assign_pointer(*p_filter_chain, tp_head);
 }
 
-int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
-		  struct netlink_ext_ack *extack)
+int tcf_block_attach(struct tcf_block **p_block,
+		     struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+		     struct netlink_ext_ack *extack)
 {
 	struct tcf_block_ext_info ei = {
 		.chain_head_change = tcf_chain_head_change_dflt,
@@ -810,15 +810,15 @@ int tcf_block_get(struct tcf_block **p_block,
 	};
 
 	WARN_ON(!p_filter_chain);
-	return tcf_block_get_ext(p_block, q, &ei, extack);
+	return tcf_block_attach_ext(p_block, q, &ei, extack);
 }
-EXPORT_SYMBOL(tcf_block_get);
+EXPORT_SYMBOL(tcf_block_attach);
 
 /* XXX: Standalone actions are not allowed to jump to any chain, and bound
  * actions should be all removed after flushing.
  */
-void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
-		       struct tcf_block_ext_info *ei)
+void tcf_block_detach_ext(struct tcf_block *block, struct Qdisc *q,
+			  struct tcf_block_ext_info *ei)
 {
 	if (!block)
 		return;
@@ -848,18 +848,17 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 		tcf_block_offload_unbind(block, q, ei);
 	}
 }
-EXPORT_SYMBOL(tcf_block_put_ext);
+EXPORT_SYMBOL(tcf_block_detach_ext);
 
-void tcf_block_put(struct tcf_block *block)
+void tcf_block_detach(struct tcf_block *block)
 {
 	struct tcf_block_ext_info ei = {0, };
 
 	if (!block)
 		return;
-	tcf_block_put_ext(block, block->q, &ei);
+	tcf_block_detach_ext(block, block->q, &ei);
 }
-
-EXPORT_SYMBOL(tcf_block_put);
+EXPORT_SYMBOL(tcf_block_detach);
 
 struct tcf_block_cb {
 	struct list_head list;
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index d714d3747bcb..f7a3728d17fe 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -151,7 +151,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
 	list_del_init(&flow->list);
 	pr_debug("atm_tc_put: qdisc %p\n", flow->q);
 	qdisc_put(flow->q);
-	tcf_block_put(flow->block);
+	tcf_block_detach(flow->block);
 	if (flow->sock) {
 		pr_debug("atm_tc_put: f_count %ld\n",
 			file_count(flow->sock->file));
@@ -283,8 +283,8 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 		goto err_out;
 	}
 
-	error = tcf_block_get(&flow->block, &flow->filter_list, sch,
-			      extack);
+	error = tcf_block_attach(&flow->block, &flow->filter_list, sch,
+				 extack);
 	if (error) {
 		kfree(flow);
 		goto err_out;
@@ -552,8 +552,8 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt,
 		p->link.q = &noop_qdisc;
 	pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q);
 
-	err = tcf_block_get(&p->link.block, &p->link.filter_list, sch,
-			    extack);
+	err = tcf_block_attach(&p->link.block, &p->link.filter_list, sch,
+			       extack);
 	if (err)
 		return err;
 
@@ -583,7 +583,7 @@ static void atm_tc_destroy(struct Qdisc *sch)
 
 	pr_debug("atm_tc_destroy(sch %p,[qdisc %p])\n", sch, p);
 	list_for_each_entry(flow, &p->flows, list) {
-		tcf_block_put(flow->block);
+		tcf_block_detach(flow->block);
 		flow->block = NULL;
 	}
 
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index c07c30b916d5..87b840ecb8b8 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -2603,7 +2603,7 @@ static void cake_destroy(struct Qdisc *sch)
 	struct cake_sched_data *q = qdisc_priv(sch);
 
 	qdisc_watchdog_cancel(&q->watchdog);
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 	kvfree(q->tins);
 }
 
@@ -2636,7 +2636,7 @@ static int cake_init(struct Qdisc *sch, struct nlattr *opt,
 			return err;
 	}
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 4dc05409e3fb..f78643748653 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1164,7 +1164,8 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt,
 	if (!q->link.R_tab)
 		return -EINVAL;
 
-	err = tcf_block_get(&q->link.block, &q->link.filter_list, sch, extack);
+	err = tcf_block_attach(&q->link.block, &q->link.filter_list, sch,
+			       extack);
 	if (err)
 		goto put_rtab;
 
@@ -1205,7 +1206,7 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt,
 	return 0;
 
 put_block:
-	tcf_block_put(q->link.block);
+	tcf_block_detach(q->link.block);
 
 put_rtab:
 	qdisc_put_rtab(q->link.R_tab);
@@ -1417,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
 
 	WARN_ON(cl->filters);
 
-	tcf_block_put(cl->block);
+	tcf_block_detach(cl->block);
 	qdisc_put(cl->q);
 	qdisc_put_rtab(cl->R_tab);
 	gen_kill_estimator(&cl->rate_est);
@@ -1442,7 +1443,7 @@ static void cbq_destroy(struct Qdisc *sch)
 	 */
 	for (h = 0; h < q->clhash.hashsize; h++) {
 		hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) {
-			tcf_block_put(cl->block);
+			tcf_block_detach(cl->block);
 			cl->block = NULL;
 		}
 	}
@@ -1597,7 +1598,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 	if (cl == NULL)
 		goto failure;
 
-	err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
+	err = tcf_block_attach(&cl->block, &cl->filter_list, sch, extack);
 	if (err) {
 		kfree(cl);
 		return err;
@@ -1610,7 +1611,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 					tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Couldn't create new estimator");
-			tcf_block_put(cl->block);
+			tcf_block_detach(cl->block);
 			kfree(cl);
 			goto failure;
 		}
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index cdebaed0f8cf..d9b98806e3ec 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -427,7 +427,7 @@ static int drr_init_qdisc(struct Qdisc *sch, struct nlattr *opt,
 	struct drr_sched *q = qdisc_priv(sch);
 	int err;
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 	err = qdisc_class_hash_init(&q->clhash);
@@ -461,7 +461,7 @@ static void drr_destroy_qdisc(struct Qdisc *sch)
 	struct hlist_node *next;
 	unsigned int i;
 
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry_safe(cl, next, &q->clhash.hash[i],
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index f6f480784bc6..88cbba60a0ac 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -348,7 +348,7 @@ static int dsmark_init(struct Qdisc *sch, struct nlattr *opt,
 	if (!opt)
 		goto errout;
 
-	err = tcf_block_get(&p->block, &p->filter_list, sch, extack);
+	err = tcf_block_attach(&p->block, &p->filter_list, sch, extack);
 	if (err)
 		return err;
 
@@ -411,7 +411,7 @@ static void dsmark_destroy(struct Qdisc *sch)
 
 	pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p);
 
-	tcf_block_put(p->block);
+	tcf_block_detach(p->block);
 	qdisc_put(p->q);
 	if (p->mv != p->embedded)
 		kfree(p->mv);
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 6c0a9d5dbf94..e9a14ae424ee 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -454,7 +454,7 @@ static void fq_codel_destroy(struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 	kvfree(q->backlogs);
 	kvfree(q->flows);
 }
@@ -484,7 +484,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt,
 			goto init_failure;
 	}
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		goto init_failure;
 
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index b18ec1f6de60..ed1bcae948ca 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1034,7 +1034,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
-	err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
+	err = tcf_block_attach(&cl->block, &cl->filter_list, sch, extack);
 	if (err) {
 		kfree(cl);
 		return err;
@@ -1046,7 +1046,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 					qdisc_root_sleeping_running(sch),
 					tca[TCA_RATE]);
 		if (err) {
-			tcf_block_put(cl->block);
+			tcf_block_detach(cl->block);
 			kfree(cl);
 			return err;
 		}
@@ -1091,7 +1091,7 @@ hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
 {
 	struct hfsc_sched *q = qdisc_priv(sch);
 
-	tcf_block_put(cl->block);
+	tcf_block_detach(cl->block);
 	qdisc_put(cl->qdisc);
 	gen_kill_estimator(&cl->rate_est);
 	if (cl != &q->root)
@@ -1409,7 +1409,8 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt,
 		return err;
 	q->eligible = RB_ROOT;
 
-	err = tcf_block_get(&q->root.block, &q->root.filter_list, sch, extack);
+	err = tcf_block_attach(&q->root.block, &q->root.filter_list, sch,
+			       extack);
 	if (err)
 		return err;
 
@@ -1506,7 +1507,7 @@ hfsc_destroy_qdisc(struct Qdisc *sch)
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry(cl, &q->clhash.hash[i], cl_common.hnode) {
-			tcf_block_put(cl->block);
+			tcf_block_detach(cl->block);
 			cl->block = NULL;
 		}
 	}
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 862a33b9e2e0..39341efbb7c8 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1023,7 +1023,7 @@ static int htb_init(struct Qdisc *sch, struct nlattr *opt,
 	if (!opt)
 		return -EINVAL;
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 
@@ -1227,7 +1227,7 @@ static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
 		qdisc_put(cl->un.leaf.q);
 	}
 	gen_kill_estimator(&cl->rate_est);
-	tcf_block_put(cl->block);
+	tcf_block_detach(cl->block);
 	kfree(cl);
 }
 
@@ -1245,11 +1245,11 @@ static void htb_destroy(struct Qdisc *sch)
 	 * because filter need its target class alive to be able to call
 	 * unbind_filter on it (without Oops).
 	 */
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry(cl, &q->clhash.hash[i], common.hnode) {
-			tcf_block_put(cl->block);
+			tcf_block_detach(cl->block);
 			cl->block = NULL;
 		}
 	}
@@ -1387,7 +1387,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 		if (!cl)
 			goto failure;
 
-		err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
+		err = tcf_block_attach(&cl->block, &cl->filter_list, sch,
+				       extack);
 		if (err) {
 			kfree(cl);
 			goto failure;
@@ -1399,7 +1400,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 						qdisc_root_sleeping_running(sch),
 						tca[TCA_RATE] ? : &est.nla);
 			if (err) {
-				tcf_block_put(cl->block);
+				tcf_block_detach(cl->block);
 				kfree(cl);
 				goto failure;
 			}
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index ce3f55259d0d..e3a91c9e938e 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -91,14 +91,14 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
 	q->block_info.chain_head_change = clsact_chain_head_change;
 	q->block_info.chain_head_change_priv = &q->miniqp;
 
-	return tcf_block_get_ext(&q->block, sch, &q->block_info, extack);
+	return tcf_block_attach_ext(&q->block, sch, &q->block_info, extack);
 }
 
 static void ingress_destroy(struct Qdisc *sch)
 {
 	struct ingress_sched_data *q = qdisc_priv(sch);
 
-	tcf_block_put_ext(q->block, sch, &q->block_info);
+	tcf_block_detach_ext(q->block, sch, &q->block_info);
 	net_dec_ingress_queue();
 }
 
@@ -224,8 +224,8 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 	q->ingress_block_info.chain_head_change = clsact_chain_head_change;
 	q->ingress_block_info.chain_head_change_priv = &q->miniqp_ingress;
 
-	err = tcf_block_get_ext(&q->ingress_block, sch, &q->ingress_block_info,
-				extack);
+	err = tcf_block_attach_ext(&q->ingress_block, sch,
+				   &q->ingress_block_info, extack);
 	if (err)
 		return err;
 
@@ -235,15 +235,16 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 	q->egress_block_info.chain_head_change = clsact_chain_head_change;
 	q->egress_block_info.chain_head_change_priv = &q->miniqp_egress;
 
-	return tcf_block_get_ext(&q->egress_block, sch, &q->egress_block_info, extack);
+	return tcf_block_attach_ext(&q->egress_block, sch,
+				    &q->egress_block_info, extack);
 }
 
 static void clsact_destroy(struct Qdisc *sch)
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
 
-	tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
-	tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
+	tcf_block_detach_ext(q->egress_block, sch, &q->egress_block_info);
+	tcf_block_detach_ext(q->ingress_block, sch, &q->ingress_block_info);
 
 	net_dec_ingress_queue();
 	net_dec_egress_queue();
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 7410ce4d0321..296d5a2b053a 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -173,7 +173,7 @@ multiq_destroy(struct Qdisc *sch)
 	int band;
 	struct multiq_sched_data *q = qdisc_priv(sch);
 
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 	for (band = 0; band < q->bands; band++)
 		qdisc_put(q->queues[band]);
 
@@ -248,7 +248,7 @@ static int multiq_init(struct Qdisc *sch, struct nlattr *opt,
 	if (!opt)
 		return -EINVAL;
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index f8af98621179..16c5815a86c1 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -172,7 +172,7 @@ prio_destroy(struct Qdisc *sch)
 	int prio;
 	struct prio_sched_data *q = qdisc_priv(sch);
 
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 	prio_offload(sch, NULL);
 	for (prio = 0; prio < q->bands; prio++)
 		qdisc_put(q->queues[prio]);
@@ -242,7 +242,7 @@ static int prio_init(struct Qdisc *sch, struct nlattr *opt,
 	if (!opt)
 		return -EINVAL;
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index dc37c4ead439..ec136a0c06a4 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1424,7 +1424,7 @@ static int qfq_init_qdisc(struct Qdisc *sch, struct nlattr *opt,
 	int i, j, err;
 	u32 max_cl_shift, maxbudg_shift, max_classes;
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 
@@ -1482,7 +1482,7 @@ static void qfq_destroy_qdisc(struct Qdisc *sch)
 	struct hlist_node *next;
 	unsigned int i;
 
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry_safe(cl, next, &q->clhash.hash[i],
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index bab506b01a32..ac3edf6b0bbd 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -468,7 +468,7 @@ static void sfb_destroy(struct Qdisc *sch)
 {
 	struct sfb_sched_data *q = qdisc_priv(sch);
 
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 	qdisc_put(q->qdisc);
 }
 
@@ -556,7 +556,7 @@ static int sfb_init(struct Qdisc *sch, struct nlattr *opt,
 	struct sfb_sched_data *q = qdisc_priv(sch);
 	int err;
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 2f2678197760..72c7b8d5c3f7 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -713,7 +713,7 @@ static void sfq_destroy(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 
-	tcf_block_put(q->block);
+	tcf_block_detach(q->block);
 	q->perturb_period = 0;
 	del_timer_sync(&q->perturb_timer);
 	sfq_free(q->ht);
@@ -731,7 +731,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt,
 	q->sch = sch;
 	timer_setup(&q->perturb_timer, sfq_perturbation, TIMER_DEFERRABLE);
 
-	err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
+	err = tcf_block_attach(&q->block, &q->filter_list, sch, extack);
 	if (err)
 		return err;
 
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 11/13] net: sched: implement tcf_block_get() and tcf_block_put()
From: Vlad Buslov @ 2018-09-06  7:59 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Implement get/put function for blocks that only take/release the reference
and perform deallocation. These functions are intended to be used by
unlocked rules update path to always hold reference to block while working
with it. They use on new fine-grained locking mechanisms introduced in
previous patches in this set, instead of relying on global protection
provided by rtnl lock.

Extract code that is common with tcf_block_detach_ext() into common
function __tcf_block_put().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 70 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 48 insertions(+), 22 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f06aa9313a58..5d9f91331d26 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -537,6 +537,19 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
 	return idr_find(&tn->idr, block_index);
 }
 
+static struct tcf_block *tcf_block_get(struct net *net, u32 block_index)
+{
+	struct tcf_block *block;
+
+	rcu_read_lock();
+	block = tcf_block_lookup(net, block_index);
+	if (block && !refcount_inc_not_zero(&block->refcnt))
+		block = NULL;
+	rcu_read_unlock();
+
+	return block;
+}
+
 static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 {
 	if (!q)
@@ -573,6 +586,40 @@ static void tcf_block_put_all_chains(struct tcf_block *block)
 	}
 }
 
+static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
+			    struct tcf_block_ext_info *ei)
+{
+	if (refcount_dec_and_test(&block->refcnt)) {
+		/* Flushing/putting all chains will cause the block to be
+		 * deallocated when last chain is freed. However, if chain_list
+		 * is empty, block has to be manually deallocated. After block
+		 * reference counter reached 0, it is no longer possible to
+		 * increment it or add new chains to block.
+		 */
+		bool free_block = list_empty(&block->chain_list);
+
+		if (tcf_block_shared(block))
+			tcf_block_remove(block, block->net);
+		if (!free_block)
+			tcf_block_flush_all_chains(block);
+
+		if (q)
+			tcf_block_offload_unbind(block, q, ei);
+
+		if (free_block)
+			kfree_rcu(block, rcu);
+		else
+			tcf_block_put_all_chains(block);
+	} else if (q) {
+		tcf_block_offload_unbind(block, q, ei);
+	}
+}
+
+static void tcf_block_put(struct tcf_block *block)
+{
+	__tcf_block_put(block, NULL, NULL);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -835,28 +882,7 @@ void tcf_block_detach_ext(struct tcf_block *block, struct Qdisc *q,
 	tcf_chain0_head_change_cb_del(block, ei);
 	tcf_block_owner_del(block, q, ei->binder_type);
 
-	if (refcount_dec_and_test(&block->refcnt)) {
-		/* Flushing/putting all chains will cause the block to be
-		 * deallocated when last chain is freed. However, if chain_list
-		 * is empty, block has to be manually deallocated. After block
-		 * reference counter reached 0, it is no longer possible to
-		 * increment it or add new chains to block.
-		 */
-		bool free_block = list_empty(&block->chain_list);
-
-		if (tcf_block_shared(block))
-			tcf_block_remove(block, block->net);
-		if (!free_block)
-			tcf_block_flush_all_chains(block);
-		tcf_block_offload_unbind(block, q, ei);
-
-		if (free_block)
-			kfree_rcu(block, rcu);
-		else
-			tcf_block_put_all_chains(block);
-	} else {
-		tcf_block_offload_unbind(block, q, ei);
-	}
+	__tcf_block_put(block, q, ei);
 }
 EXPORT_SYMBOL(tcf_block_detach_ext);
 
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 12/13] net: sched: use reference counting for tcf blocks on rules update
From: Vlad Buslov @ 2018-09-06  7:59 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

In order to remove dependency on rtnl lock on rules update path, always
take reference to block while using it on rules update path. Change
tcf_block_get() error handling to properly release block with reference
counting, instead of just destroying it, in order to accommodate potential
concurrent users.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 5d9f91331d26..8808818a1d24 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -633,7 +633,7 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 	int err = 0;
 
 	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_lookup(net, block_index);
+		block = tcf_block_get(net, block_index);
 		if (!block) {
 			NL_SET_ERR_MSG(extack, "Block of given index was not found");
 			return ERR_PTR(-EINVAL);
@@ -713,6 +713,14 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 			err = -EOPNOTSUPP;
 			goto errout_qdisc;
 		}
+
+		/* Always take reference to block in order to support execution
+		 * of rules update path of cls API without rtnl lock. Caller
+		 * must release block when it is finished using it. 'if' block
+		 * of this conditional obtain reference to block by calling
+		 * tcf_block_get().
+		 */
+		refcount_inc(&block->refcnt);
 	}
 
 	return block;
@@ -726,6 +734,8 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 
 static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
 {
+	if (!IS_ERR_OR_NULL(block))
+		tcf_block_put(block);
 	tcf_qdisc_put(q, true);
 }
 
@@ -794,21 +804,16 @@ int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
 {
 	struct net *net = qdisc_net(q);
 	struct tcf_block *block = NULL;
-	bool created = false;
 	int err;
 
-	if (ei->block_index) {
+	if (ei->block_index)
 		/* block_index not 0 means the shared block is requested */
-		block = tcf_block_lookup(net, ei->block_index);
-		if (block)
-			refcount_inc(&block->refcnt);
-	}
+		block = tcf_block_get(net, ei->block_index);
 
 	if (!block) {
 		block = tcf_block_create(net, q, ei->block_index, extack);
 		if (IS_ERR(block))
 			return PTR_ERR(block);
-		created = true;
 		if (tcf_block_shared(block)) {
 			err = tcf_block_insert(block, net, extack);
 			if (err)
@@ -838,14 +843,8 @@ int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
 err_chain0_head_change_cb_add:
 	tcf_block_owner_del(block, q, ei->binder_type);
 err_block_owner_add:
-	if (created) {
-		if (tcf_block_shared(block))
-			tcf_block_remove(block, net);
 err_block_insert:
-		kfree_rcu(block, rcu);
-	} else {
-		refcount_dec(&block->refcnt);
-	}
+	tcf_block_put(block);
 	return err;
 }
 EXPORT_SYMBOL(tcf_block_attach_ext);
@@ -1738,7 +1737,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		return err;
 
 	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_lookup(net, tcm->tcm_block_index);
+		block = tcf_block_get(net, tcm->tcm_block_index);
 		if (!block)
 			goto out;
 		/* If we work with block index, q is NULL and parent value
@@ -1797,6 +1796,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		}
 	}
 
+	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+		tcf_block_put(block);
 	cb->args[0] = index;
 
 out:
@@ -2061,7 +2062,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 		return err;
 
 	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_lookup(net, tcm->tcm_block_index);
+		block = tcf_block_get(net, tcm->tcm_block_index);
 		if (!block)
 			goto out;
 		/* If we work with block index, q is NULL and parent value
@@ -2128,6 +2129,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 		index++;
 	}
 
+	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+		tcf_block_put(block);
 	cb->args[0] = index;
 
 out:
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 09/13] net: sched: extend tcf_block with rcu
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Extend tcf_block with rcu to allow safe deallocation when it is accessed
concurrently.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 1 +
 net/sched/cls_api.c       | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 825e2bf6c5c3..2b87b47c49f6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -357,6 +357,7 @@ struct tcf_block {
 		struct tcf_chain *chain;
 		struct list_head filter_chain_list;
 	} chain0;
+	struct rcu_head rcu;
 };
 
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f11da74dd339..502b2da8a885 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -241,7 +241,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 		block->chain0.chain = NULL;
 	kfree(chain);
 	if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
-		kfree(block);
+		kfree_rcu(block, rcu);
 }
 
 static void tcf_chain_hold(struct tcf_chain *chain)
@@ -785,7 +785,7 @@ int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
 		if (tcf_block_shared(block))
 			tcf_block_remove(block, net);
 err_block_insert:
-		kfree(block);
+		kfree_rcu(block, rcu);
 	} else {
 		refcount_dec(&block->refcnt);
 	}
@@ -841,7 +841,7 @@ void tcf_block_detach_ext(struct tcf_block *block, struct Qdisc *q,
 		tcf_block_offload_unbind(block, q, ei);
 
 		if (free_block)
-			kfree(block);
+			kfree_rcu(block, rcu);
 		else
 			tcf_block_put_all_chains(block);
 	} else {
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 10/13] net: sched: protect block idr with spinlock
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Protect block idr access with spinlock, instead of relying on rtnl lock.
Take tn->idr_lock spinlock during block insertion and removal.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 502b2da8a885..f06aa9313a58 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -473,6 +473,7 @@ tcf_chain0_head_change_cb_del(struct tcf_block *block,
 }
 
 struct tcf_net {
+	spinlock_t idr_lock; /* Protects idr */
 	struct idr idr;
 };
 
@@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, struct net *net,
 			    struct netlink_ext_ack *extack)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
+	int err;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&tn->idr_lock);
+	err = idr_alloc_u32(&tn->idr, block, &block->index, block->index,
+			    GFP_NOWAIT);
+	spin_unlock(&tn->idr_lock);
+	idr_preload_end();
 
-	return idr_alloc_u32(&tn->idr, block, &block->index, block->index,
-			     GFP_KERNEL);
+	return err;
 }
 
 static void tcf_block_remove(struct tcf_block *block, struct net *net)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+	spin_lock(&tn->idr_lock);
 	idr_remove(&tn->idr, block->index);
+	spin_unlock(&tn->idr_lock);
 }
 
 static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
@@ -2284,6 +2294,7 @@ static __net_init int tcf_net_init(struct net *net)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+	spin_lock_init(&tn->idr_lock);
 	idr_init(&tn->idr);
 	return 0;
 }
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 13/13] net: sched: add flags to Qdisc class ops struct
From: Vlad Buslov @ 2018-09-06  7:59 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Extend Qdisc_class_ops with flags. Create enum to hold possible class ops
flag values. Add first class ops flags value QDISC_CLASS_OPS_DOIT_UNLOCKED
to indicate that class ops functions can be called without taking rtnl
lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2b87b47c49f6..bc4082961726 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -174,6 +174,7 @@ static inline int qdisc_avail_bulklimit(const struct netdev_queue *txq)
 }
 
 struct Qdisc_class_ops {
+	unsigned int		flags;
 	/* Child qdisc manipulation */
 	struct netdev_queue *	(*select_queue)(struct Qdisc *, struct tcmsg *);
 	int			(*graft)(struct Qdisc *, unsigned long cl,
@@ -205,6 +206,13 @@ struct Qdisc_class_ops {
 					struct gnet_dump *);
 };
 
+/* Qdisc_class_ops flag values */
+
+/* Implements API that doesn't require rtnl lock */
+enum qdisc_class_ops_flags {
+	QDISC_CLASS_OPS_DOIT_UNLOCKED = 1,
+};
+
 struct Qdisc_ops {
 	struct Qdisc_ops	*next;
 	const struct Qdisc_class_ops	*cl_ops;
-- 
2.7.5

^ permalink raw reply related

* [PATCH net-next 02/13] net: sched: rename qdisc_destroy() to qdisc_put()
From: Vlad Buslov @ 2018-09-06  7:58 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc, Vlad Buslov
In-Reply-To: <1536220742-25650-1-git-send-email-vladbu@mellanox.com>

Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  2 +-
 net/sched/sch_api.c       |  6 +++---
 net/sched/sch_atm.c       |  2 +-
 net/sched/sch_cbq.c       |  2 +-
 net/sched/sch_cbs.c       |  2 +-
 net/sched/sch_drr.c       |  4 ++--
 net/sched/sch_dsmark.c    |  2 +-
 net/sched/sch_fifo.c      |  2 +-
 net/sched/sch_generic.c   | 23 ++++++++++++++---------
 net/sched/sch_hfsc.c      |  2 +-
 net/sched/sch_htb.c       |  4 ++--
 net/sched/sch_mq.c        |  4 ++--
 net/sched/sch_mqprio.c    |  4 ++--
 net/sched/sch_multiq.c    |  6 +++---
 net/sched/sch_netem.c     |  2 +-
 net/sched/sch_prio.c      |  6 +++---
 net/sched/sch_qfq.c       |  4 ++--
 net/sched/sch_red.c       |  4 ++--
 net/sched/sch_sfb.c       |  4 ++--
 net/sched/sch_tbf.c       |  4 ++--
 20 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..18e22a5a6550 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -554,7 +554,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
-void qdisc_destroy(struct Qdisc *qdisc);
+void qdisc_put(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
 			       unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 98541c6399db..836b32e6e8e8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -921,7 +921,7 @@ static void notify_and_destroy(struct net *net, struct sk_buff *skb,
 		qdisc_notify(net, skb, n, clid, old, new);
 
 	if (old)
-		qdisc_destroy(old);
+		qdisc_put(old);
 }
 
 /* Graft qdisc "new" to class "classid" of qdisc "parent" or
@@ -974,7 +974,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 				qdisc_refcount_inc(new);
 
 			if (!ingress)
-				qdisc_destroy(old);
+				qdisc_put(old);
 		}
 
 skip:
@@ -1568,7 +1568,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 	err = qdisc_graft(dev, p, skb, n, clid, q, NULL, extack);
 	if (err) {
 		if (q)
-			qdisc_destroy(q);
+			qdisc_put(q);
 		return err;
 	}
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index cd49afca9617..d714d3747bcb 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -150,7 +150,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
 	pr_debug("atm_tc_put: destroying\n");
 	list_del_init(&flow->list);
 	pr_debug("atm_tc_put: qdisc %p\n", flow->q);
-	qdisc_destroy(flow->q);
+	qdisc_put(flow->q);
 	tcf_block_put(flow->block);
 	if (flow->sock) {
 		pr_debug("atm_tc_put: f_count %ld\n",
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f42025d53cfe..4dc05409e3fb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1418,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
 	WARN_ON(cl->filters);
 
 	tcf_block_put(cl->block);
-	qdisc_destroy(cl->q);
+	qdisc_put(cl->q);
 	qdisc_put_rtab(cl->R_tab);
 	gen_kill_estimator(&cl->rate_est);
 	if (cl != &q->link)
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index e26a24017faa..e689e11b6d0f 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -379,7 +379,7 @@ static void cbs_destroy(struct Qdisc *sch)
 	cbs_disable_offload(dev, q);
 
 	if (q->qdisc)
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e0b0cf8a9939..cdebaed0f8cf 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -134,7 +134,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 					    tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Failed to replace estimator");
-			qdisc_destroy(cl->qdisc);
+			qdisc_put(cl->qdisc);
 			kfree(cl);
 			return err;
 		}
@@ -153,7 +153,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 static void drr_destroy_class(struct Qdisc *sch, struct drr_class *cl)
 {
 	gen_kill_estimator(&cl->rate_est);
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	kfree(cl);
 }
 
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 049714c57075..f6f480784bc6 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -412,7 +412,7 @@ static void dsmark_destroy(struct Qdisc *sch)
 	pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p);
 
 	tcf_block_put(p->block);
-	qdisc_destroy(p->q);
+	qdisc_put(p->q);
 	if (p->mv != p->embedded)
 		kfree(p->mv);
 }
diff --git a/net/sched/sch_fifo.c b/net/sched/sch_fifo.c
index 24893d3b5d22..3809c9bf8896 100644
--- a/net/sched/sch_fifo.c
+++ b/net/sched/sch_fifo.c
@@ -177,7 +177,7 @@ struct Qdisc *fifo_create_dflt(struct Qdisc *sch, struct Qdisc_ops *ops,
 	if (q) {
 		err = fifo_set_limit(q, limit);
 		if (err < 0) {
-			qdisc_destroy(q);
+			qdisc_put(q);
 			q = NULL;
 		}
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 69078c82963e..bb778485ed88 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -901,7 +901,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 	if (!ops->init || ops->init(sch, NULL, extack) == 0)
 		return sch;
 
-	qdisc_destroy(sch);
+	qdisc_put(sch);
 	return NULL;
 }
 EXPORT_SYMBOL(qdisc_create_dflt);
@@ -941,15 +941,11 @@ void qdisc_free(struct Qdisc *qdisc)
 	kfree((char *) qdisc - qdisc->padded);
 }
 
-void qdisc_destroy(struct Qdisc *qdisc)
+static void qdisc_destroy(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops  *ops = qdisc->ops;
 	struct sk_buff *skb, *tmp;
 
-	if (qdisc->flags & TCQ_F_BUILTIN ||
-	    !refcount_dec_and_test(&qdisc->refcnt))
-		return;
-
 #ifdef CONFIG_NET_SCHED
 	qdisc_hash_del(qdisc);
 
@@ -976,7 +972,16 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
 	qdisc_free(qdisc);
 }
-EXPORT_SYMBOL(qdisc_destroy);
+
+void qdisc_put(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN ||
+	    !refcount_dec_and_test(&qdisc->refcnt))
+		return;
+
+	qdisc_destroy(qdisc);
+}
+EXPORT_SYMBOL(qdisc_put);
 
 /* Attach toplevel qdisc to device queue. */
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
@@ -1270,7 +1275,7 @@ static void shutdown_scheduler_queue(struct net_device *dev,
 		rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
 		dev_queue->qdisc_sleeping = qdisc_default;
 
-		qdisc_destroy(qdisc);
+		qdisc_put(qdisc);
 	}
 }
 
@@ -1279,7 +1284,7 @@ void dev_shutdown(struct net_device *dev)
 	netdev_for_each_tx_queue(dev, shutdown_scheduler_queue, &noop_qdisc);
 	if (dev_ingress_queue(dev))
 		shutdown_scheduler_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
-	qdisc_destroy(dev->qdisc);
+	qdisc_put(dev->qdisc);
 	dev->qdisc = &noop_qdisc;
 
 	WARN_ON(timer_pending(&dev->watchdog_timer));
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3278a76f6861..b18ec1f6de60 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1092,7 +1092,7 @@ hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
 	struct hfsc_sched *q = qdisc_priv(sch);
 
 	tcf_block_put(cl->block);
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	gen_kill_estimator(&cl->rate_est);
 	if (cl != &q->root)
 		kfree(cl);
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 43c4bfe625a9..862a33b9e2e0 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1224,7 +1224,7 @@ static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
 {
 	if (!cl->level) {
 		WARN_ON(!cl->un.leaf.q);
-		qdisc_destroy(cl->un.leaf.q);
+		qdisc_put(cl->un.leaf.q);
 	}
 	gen_kill_estimator(&cl->rate_est);
 	tcf_block_put(cl->block);
@@ -1425,7 +1425,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			/* turn parent into inner node */
 			qdisc_reset(parent->un.leaf.q);
 			qdisc_tree_reduce_backlog(parent->un.leaf.q, qlen, backlog);
-			qdisc_destroy(parent->un.leaf.q);
+			qdisc_put(parent->un.leaf.q);
 			if (parent->prio_activity)
 				htb_deactivate(q, parent);
 
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index d6b8ae4ed7a3..f20f3a0f8424 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -65,7 +65,7 @@ static void mq_destroy(struct Qdisc *sch)
 	if (!priv->qdiscs)
 		return;
 	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
-		qdisc_destroy(priv->qdiscs[ntx]);
+		qdisc_put(priv->qdiscs[ntx]);
 	kfree(priv->qdiscs);
 }
 
@@ -119,7 +119,7 @@ static void mq_attach(struct Qdisc *sch)
 		qdisc = priv->qdiscs[ntx];
 		old = dev_graft_qdisc(qdisc->dev_queue, qdisc);
 		if (old)
-			qdisc_destroy(old);
+			qdisc_put(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
 			qdisc_hash_add(qdisc, false);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 0e9d761cdd80..d364e63c396d 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -40,7 +40,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 		for (ntx = 0;
 		     ntx < dev->num_tx_queues && priv->qdiscs[ntx];
 		     ntx++)
-			qdisc_destroy(priv->qdiscs[ntx]);
+			qdisc_put(priv->qdiscs[ntx]);
 		kfree(priv->qdiscs);
 	}
 
@@ -300,7 +300,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		qdisc = priv->qdiscs[ntx];
 		old = dev_graft_qdisc(qdisc->dev_queue, qdisc);
 		if (old)
-			qdisc_destroy(old);
+			qdisc_put(old);
 		if (ntx < dev->real_num_tx_queues)
 			qdisc_hash_add(qdisc, false);
 	}
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 1da7ea8de0ad..7410ce4d0321 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -175,7 +175,7 @@ multiq_destroy(struct Qdisc *sch)
 
 	tcf_block_put(q->block);
 	for (band = 0; band < q->bands; band++)
-		qdisc_destroy(q->queues[band]);
+		qdisc_put(q->queues[band]);
 
 	kfree(q->queues);
 }
@@ -204,7 +204,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
 			q->queues[i] = &noop_qdisc;
 			qdisc_tree_reduce_backlog(child, child->q.qlen,
 						  child->qstats.backlog);
-			qdisc_destroy(child);
+			qdisc_put(child);
 		}
 	}
 
@@ -228,7 +228,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
 					qdisc_tree_reduce_backlog(old,
 								  old->q.qlen,
 								  old->qstats.backlog);
-					qdisc_destroy(old);
+					qdisc_put(old);
 				}
 				sch_tree_unlock(sch);
 			}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index ad18a2052416..553c9f602336 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1032,7 +1032,7 @@ static void netem_destroy(struct Qdisc *sch)
 
 	qdisc_watchdog_cancel(&q->watchdog);
 	if (q->qdisc)
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 	dist_free(q->delay_dist);
 	dist_free(q->slot_dist);
 }
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 222e53d3d27a..f8af98621179 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -175,7 +175,7 @@ prio_destroy(struct Qdisc *sch)
 	tcf_block_put(q->block);
 	prio_offload(sch, NULL);
 	for (prio = 0; prio < q->bands; prio++)
-		qdisc_destroy(q->queues[prio]);
+		qdisc_put(q->queues[prio]);
 }
 
 static int prio_tune(struct Qdisc *sch, struct nlattr *opt,
@@ -205,7 +205,7 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt,
 					      extack);
 		if (!queues[i]) {
 			while (i > oldbands)
-				qdisc_destroy(queues[--i]);
+				qdisc_put(queues[--i]);
 			return -ENOMEM;
 		}
 	}
@@ -220,7 +220,7 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt,
 
 		qdisc_tree_reduce_backlog(child, child->q.qlen,
 					  child->qstats.backlog);
-		qdisc_destroy(child);
+		qdisc_put(child);
 	}
 
 	for (i = oldbands; i < q->bands; i++) {
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bb1a9c11fc54..dc37c4ead439 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -526,7 +526,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	return 0;
 
 destroy_class:
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	kfree(cl);
 	return err;
 }
@@ -537,7 +537,7 @@ static void qfq_destroy_class(struct Qdisc *sch, struct qfq_class *cl)
 
 	qfq_rm_from_agg(q, cl);
 	gen_kill_estimator(&cl->rate_est);
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	kfree(cl);
 }
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 56c181c3feeb..3ce6c0a2c493 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -181,7 +181,7 @@ static void red_destroy(struct Qdisc *sch)
 
 	del_timer_sync(&q->adapt_timer);
 	red_offload(sch, false);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 }
 
 static const struct nla_policy red_policy[TCA_RED_MAX + 1] = {
@@ -233,7 +233,7 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
 	if (child) {
 		qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
 					  q->qdisc->qstats.backlog);
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 		q->qdisc = child;
 	}
 
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 7cbdad8419b7..bab506b01a32 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -469,7 +469,7 @@ static void sfb_destroy(struct Qdisc *sch)
 	struct sfb_sched_data *q = qdisc_priv(sch);
 
 	tcf_block_put(q->block);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 }
 
 static const struct nla_policy sfb_policy[TCA_SFB_MAX + 1] = {
@@ -523,7 +523,7 @@ static int sfb_change(struct Qdisc *sch, struct nlattr *opt,
 
 	qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
 				  q->qdisc->qstats.backlog);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 	q->qdisc = child;
 
 	q->rehash_interval = msecs_to_jiffies(ctl->rehash_interval);
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 6f74a426f159..dd29de1418b7 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -392,7 +392,7 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt,
 	if (child) {
 		qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
 					  q->qdisc->qstats.backlog);
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 		q->qdisc = child;
 	}
 	q->limit = qopt->limit;
@@ -438,7 +438,7 @@ static void tbf_destroy(struct Qdisc *sch)
 	struct tbf_sched_data *q = qdisc_priv(sch);
 
 	qdisc_watchdog_cancel(&q->watchdog);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 }
 
 static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb)
-- 
2.7.5

^ permalink raw reply related

* KASAN: use-after-free Read in bpf_tcp_close (2)
From: syzbot @ 2018-09-06  8:22 UTC (permalink / raw)
  To: ast, daniel, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    11f026b4e306 libbpf: Remove the duplicate checking of func..
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=153c55ca400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4c7e83258d6e0156
dashboard link: https://syzkaller.appspot.com/bug?extid=579adfa56843da894bc5
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+579adfa56843da894bc5@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in atomic_read  
include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: use-after-free in virt_spin_lock  
arch/x86/include/asm/qspinlock.h:65 [inline]
BUG: KASAN: use-after-free in native_queued_spin_lock_slowpath+0x189/0x1220  
kernel/locking/qspinlock.c:305
Read of size 4 at addr ffff8801bfd651a0 by task syz-executor1/9753

CPU: 1 PID: 9753 Comm: syz-executor1 Not tainted 4.19.0-rc2+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
  atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
  virt_spin_lock arch/x86/include/asm/qspinlock.h:65 [inline]
  native_queued_spin_lock_slowpath+0x189/0x1220  
kernel/locking/qspinlock.c:305
  pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:679 [inline]
  queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:32 [inline]
  queued_spin_lock include/asm-generic/qspinlock.h:88 [inline]
  do_raw_spin_lock+0x1a7/0x200 kernel/locking/spinlock_debug.c:113
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
  _raw_spin_lock_bh+0x39/0x40 kernel/locking/spinlock.c:168
  bpf_tcp_close+0x68e/0x10d0 kernel/bpf/sockmap.c:349
  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:428
  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:457
  __sock_release+0xd7/0x250 net/socket.c:579
  sock_close+0x19/0x20 net/socket.c:1139
  __fput+0x38a/0xa40 fs/file_table.c:278
  ____fput+0x15/0x20 fs/file_table.c:309
  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:193 [inline]
  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x410c51
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 19 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffeacb37d60 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000410c51
RDX: 0000000000000000 RSI: 0000000000731f70 RDI: 0000000000000007
RBP: 0000000000000000 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 00007ffeacb37c90 R11: 0000000000000293 R12: 0000000000000010
R13: 0000000000022aa9 R14: 000000000000004d R15: badc0ffeebadface

Allocated by task 9754:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kmem_cache_alloc_trace+0x152/0x730 mm/slab.c:3620
  kmalloc include/linux/slab.h:513 [inline]
  kzalloc include/linux/slab.h:707 [inline]
  sock_map_alloc+0x209/0x430 kernel/bpf/sockmap.c:1653
  find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
  map_create+0x3bd/0x1100 kernel/bpf/syscall.c:509
  __do_sys_bpf kernel/bpf/syscall.c:2356 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2333 [inline]
  __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2333
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 13:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x210 mm/slab.c:3813
  sock_map_remove_complete kernel/bpf/sockmap.c:1561 [inline]
  sock_map_free+0x428/0x570 kernel/bpf/sockmap.c:1756
  bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:290
  process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
  worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
  kthread+0x35a/0x420 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

The buggy address belongs to the object at ffff8801bfd65080
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 288 bytes inside of
  512-byte region [ffff8801bfd65080, ffff8801bfd65280)
The buggy address belongs to the page:
page:ffffea0006ff5940 count:1 mapcount:0 mapping:ffff8801dac00940  
index:0xffff8801bfd65300
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea000702a488 ffffea00075c4148 ffff8801dac00940
raw: ffff8801bfd65300 ffff8801bfd65080 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801bfd65080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801bfd65100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801bfd65180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                ^
  ffff8801bfd65200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801bfd65280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* [PATCH net-next] failover: Fix error return code in net_failover_create
From: YueHaibing @ 2018-09-06 13:04 UTC (permalink / raw)
  To: davem, alexander.h.duyck, jeffrey.t.kirsher, liran.alon,
	sridhar.samudrala, stephen, dan.carpenter
  Cc: linux-kernel, netdev, YueHaibing

if failover_register failed, 'err' code should be set correctly

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/net_failover.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index 192ae1c..cbcb9f2 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -761,8 +761,10 @@ struct failover *net_failover_create(struct net_device *standby_dev)
 	netif_carrier_off(failover_dev);
 
 	failover = failover_register(failover_dev, &net_failover_ops);
-	if (IS_ERR(failover))
+	if (IS_ERR(failover)) {
+		err = PTR_ERR(failover);
 		goto err_failover_register;
+	}
 
 	return failover;
 
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Eric Dumazet @ 2018-09-06  8:30 UTC (permalink / raw)
  To: Vlad Buslov, netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc
In-Reply-To: <1536220742-25650-4-git-send-email-vladbu@mellanox.com>



On 09/06/2018 12:58 AM, Vlad Buslov wrote:

...

> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 18e22a5a6550..239c73f29471 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -90,6 +90,7 @@ struct Qdisc {
>  	struct gnet_stats_queue	__percpu *cpu_qstats;
>  	int			padded;
>  	refcount_t		refcnt;
> +	struct rcu_head		rcu;
>  
>  	/*
>  	 * For performance sake on SMP, we put highly modified fields at the end

Probably better to move this at the end of struct Qdisc,
not risking unexpected performance regressions in fast path.

^ permalink raw reply

* Re: [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Kirill Tkhai @ 2018-09-06  8:39 UTC (permalink / raw)
  To: Eric Dumazet, Vlad Buslov, netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, paulmck,
	nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
	lucien.xin, jakub.kicinski, christian.brauner, jbenc
In-Reply-To: <a5b99bb3-9076-ba0f-5b8b-efe8529e93fa@gmail.com>

On 06.09.2018 11:30, Eric Dumazet wrote:
> 
> 
> On 09/06/2018 12:58 AM, Vlad Buslov wrote:
> 
> ...
> 
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 18e22a5a6550..239c73f29471 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -90,6 +90,7 @@ struct Qdisc {
>>  	struct gnet_stats_queue	__percpu *cpu_qstats;
>>  	int			padded;
>>  	refcount_t		refcnt;
>> +	struct rcu_head		rcu;
>>  
>>  	/*
>>  	 * For performance sake on SMP, we put highly modified fields at the end
> 
> Probably better to move this at the end of struct Qdisc,
> not risking unexpected performance regressions in fast path.

Do you mean regressions on UP? On SMP it looks like this field
fits in the unused gap created by:

	struct sk_buff_head     gso_skb ____cacheline_aligned_in_smp;

Kirill

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox