netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <nikolay@nvidia.com>
To: Vladimir Oltean <olteanv@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Cc: <netdev@vger.kernel.org>, Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Eric Dumazet <edumazet@google.com>,
	George McCollister <george.mccollister@gmail.com>,
	Oleksij Rempel <o.rempel@pengutronix.de>,
	Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	Arnd Bergmann <arnd@arndb.de>, Taehee Yoo <ap420073@gmail.com>,
	Jiri Pirko <jiri@mellanox.com>, Florian Westphal <fw@strlen.de>
Subject: Re: [PATCH v4 net-next 16/18] net: bonding: ensure .ndo_get_stats64 can sleep
Date: Fri, 8 Jan 2021 11:58:12 +0200	[thread overview]
Message-ID: <2a790830-9957-2985-2f5e-e87e6693a723@nvidia.com> (raw)
In-Reply-To: <20210108002005.3429956-17-olteanv@gmail.com>

On 08/01/2021 02:20, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> There is an effort to convert .ndo_get_stats64 to sleepable context, and
> for that to work, we need to prevent callers of dev_get_stats from using
> atomic locking.
> 
> The bonding driver retrieves its statistics recursively from its lower
> interfaces, with additional care to only count packets sent/received
> while those lowers were actually enslaved to the bond - see commit
> 5f0c5f73e5ef ("bonding: make global bonding stats more reliable").
> 
> Since commit 87163ef9cda7 ("bonding: remove last users of bond->lock and
> bond->lock itself"), the bonding driver uses the following protection
> for its array of slaves: RCU for readers and rtnl_mutex for updaters.
> This is not great because there is another movement [ somehow
> simultaneous with the one to make .ndo_get_stats64 sleepable ] to reduce
> driver usage of rtnl_mutex. This makes sense, because the rtnl_mutex has
> become a very contended resource.
> 
> The aforementioned commit removed an interesting comment:
> 
> 	/* [...] we can't hold bond->lock [...] because we'll
> 	 * deadlock. The only solution is to rely on the fact
> 	 * that we're under rtnl_lock here, and the slaves
> 	 * list won't change. This doesn't solve the problem
> 	 * of setting the slave's MTU while it is
> 	 * transmitting, but the assumption is that the base
> 	 * driver can handle that.
> 	 *
> 	 * TODO: figure out a way to safely iterate the slaves
> 	 * list, but without holding a lock around the actual
> 	 * call to the base driver.
> 	 */
> 
> The above summarizes pretty well the challenges we have with nested
> bonding interfaces (bond over bond over bond over...), which need to be
> addressed by a better locking scheme that also not relies on the bloated
> rtnl_mutex.
> 
> Instead of using something as broad as the rtnl_mutex to ensure
> serialization of updates to the slave array, we can reintroduce a
> private mutex in the bonding driver, called slaves_lock.
> This mutex circles the only updater, bond_update_slave_arr, and ensures
> that whatever other readers want to see a consistent slave array, they
> don't need to hold the rtnl_mutex for that.
> 
> Now _of_course_ I did not convert the entire driver to use
> bond_for_each_slave protected by the bond->slaves_lock, and
> rtnl_dereference to bond_dereference. I just started that process by
> converting the one reader I needed: ndo_get_stats64. Not only is it nice
> to not hold rtnl_mutex in .ndo_get_stats64, but it is also in fact
> forbidden to do so (since top-level callers may hold netif_lists_lock,
> which is a sub-lock of the rtnl_mutex, and therefore this would cause a
> lock inversion and a deadlock).
> 
> To solve the nesting problem, the simple way is to not hold any locks
> when recursing into the slave netdev operation, which is exactly the
> approach that we take. We can "cheat" and use dev_hold to take a
> reference on the slave net_device, which is enough to ensure that
> netdev_wait_allrefs() waits until we finish, and the kernel won't fault.
> However, the slave structure might no longer be valid, just its
> associated net_device. That isn't a biggie. We just need to do some more
> work to ensure that the slave exists after we took the statistics, and
> if it still does, reapply the logic from Andy's commit 5f0c5f73e5ef.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> Changes in v4:
> Now there is code to propagate errors.
> 
> Changes in v3:
> - Added kfree(dev_stats);
> - Removed the now useless stats_lock (bond->bond_stats and
>   slave->slave_stats are now protected by bond->slaves_lock)
> 
> Changes in v2:
> Switched to the new scheme of holding just a refcnt to the slave
> interfaces while recursing with dev_get_stats.
> 
>  drivers/net/bonding/bond_main.c | 113 ++++++++++++++------------------
>  include/net/bonding.h           |  49 +++++++++++++-
>  2 files changed, 99 insertions(+), 63 deletions(-)
> 
[snip]
> +static inline int bond_get_slave_arr(struct bonding *bond,
> +				     struct net_device ***slaves,
> +				     int *num_slaves)
> +{
> +	struct net *net = dev_net(bond->dev);
> +	struct list_head *iter;
> +	struct slave *slave;
> +	int i = 0;
> +
> +	mutex_lock(&bond->slaves_lock);
> +
> +	*slaves = kcalloc(bond->slave_cnt, sizeof(*slaves), GFP_KERNEL);
> +	if (!(*slaves)) {
> +		netif_lists_unlock(net);
> +		return -ENOMEM;
> +	}

The error path looks wrong, you unlock netif_lists and return with slaves_lock held.

Cheers,
 Nik

> +
> +	bond_for_each_slave(bond, slave, iter) {
> +		dev_hold(slave->dev);
> +		*slaves[i++] = slave->dev;
> +	}
> +
> +	*num_slaves = bond->slave_cnt;
> +
> +	mutex_unlock(&bond->slaves_lock);
> +
> +	return 0;
> +}
> +
> +static inline void bond_put_slave_arr(struct net_device **slaves,
> +				      int num_slaves)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_slaves; i++)
> +		dev_put(slaves[i]);
> +
> +	kfree(slaves);
> +}
> +
>  #define BOND_PRI_RESELECT_ALWAYS	0
>  #define BOND_PRI_RESELECT_BETTER	1
>  #define BOND_PRI_RESELECT_FAILURE	2
> 


  reply	other threads:[~2021-01-08  9:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08  0:19 [PATCH v4 net-next 00/18] Make .ndo_get_stats64 sleepable Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 01/18] net: mark dev_base_lock for deprecation Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 02/18] net: introduce a mutex for the netns interface lists Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 03/18] net: procfs: hold netif_lists_lock when retrieving device statistics Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 04/18] net: sysfs: don't hold dev_base_lock while " Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 05/18] s390/appldata_net_sum: hold the netdev lists lock when " Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 06/18] parisc/led: reindent the code that gathers " Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 07/18] parisc/led: hold the netdev lists lock when retrieving " Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 08/18] net: make dev_get_stats return void Vladimir Oltean
2021-01-08 10:14   ` Eric Dumazet
2021-01-08 10:31     ` Vladimir Oltean
2021-01-08 10:38       ` Eric Dumazet
2021-01-08  0:19 ` [PATCH v4 net-next 09/18] net: allow ndo_get_stats64 to return an int error code Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 10/18] scsi: fcoe: propagate errors from dev_get_stats Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 11/18] net: openvswitch: " Vladimir Oltean
2021-01-08  0:19 ` [PATCH v4 net-next 12/18] net: " Vladimir Oltean
2021-01-08  0:20 ` [PATCH v4 net-next 13/18] net: terminate " Vladimir Oltean
2021-01-08  0:20 ` [PATCH v4 net-next 14/18] net: openvswitch: ensure dev_get_stats can sleep Vladimir Oltean
2021-01-08  0:20 ` [PATCH v4 net-next 15/18] net: net_failover: ensure .ndo_get_stats64 " Vladimir Oltean
2021-01-08  0:20 ` [PATCH v4 net-next 16/18] net: bonding: " Vladimir Oltean
2021-01-08  9:58   ` Nikolay Aleksandrov [this message]
2021-01-08  0:20 ` [PATCH v4 net-next 17/18] net: mark ndo_get_stats64 as being able to sleep Vladimir Oltean
2021-01-08  0:20 ` [PATCH v4 net-next 18/18] net: remove obsolete comments about ndo_get_stats64 context from eth drivers Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a790830-9957-2985-2f5e-e87e6693a723@nvidia.com \
    --to=nikolay@nvidia.com \
    --cc=andrew@lunn.ch \
    --cc=andy@greyhouse.net \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=fw@strlen.de \
    --cc=george.mccollister@gmail.com \
    --cc=j.vosburgh@gmail.com \
    --cc=jiri@mellanox.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=o.rempel@pengutronix.de \
    --cc=olteanv@gmail.com \
    --cc=stephen@networkplumber.org \
    --cc=vfalico@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).