From: Vladimir Oltean <olteanv@gmail.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
Eric Dumazet <eric.dumazet@gmail.com>,
Jakub Kicinski <kuba@kernel.org>, netdev <netdev@vger.kernel.org>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
Jiri Benc <jbenc@redhat.com>, Or Gerlitz <ogerlitz@mellanox.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jamal Hadi Salim <jhs@mojatatu.com>, Andrew Lunn <andrew@lunn.ch>,
Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Correct usage of dev_base_lock in 2020
Date: Mon, 30 Nov 2020 23:53:22 +0200 [thread overview]
Message-ID: <20201130215322.7arp3scumobdnvtz@skbuf> (raw)
In-Reply-To: <CANn89iJGA8qWBJ97nnNGNOuLNUYF5WPnL+qi722KYCD7kvKyCg@mail.gmail.com>
On Mon, Nov 30, 2020 at 10:46:00PM +0100, Eric Dumazet wrote:
> You can not use dev_base_lock() or RCU and call an ndo_get_stats64()
> that could sleep.
>
> You can not for example start changing bonding, since bond_get_stats()
> could be called from non-sleepable context (net/core/net-procfs.c)
>
> I am still referring to your patch adding :
>
> + if (!rtnl_locked)
> + rtnl_lock();
>
> This is all I said.
Ah, ok, well I didn't show you all the patches, did I?
-----------------------------[cut here]-----------------------------
From d62c65ef6cb357e1b8c5a4ab189718e157a569ae Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Thu, 26 Nov 2020 23:08:57 +0200
Subject: [PATCH] net: procfs: retrieve device statistics under RTNL, not RCU
In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.
The /proc/net/dev file uses an RCU read-side critical section to ensure
the integrity of the list of network interfaces, because it iterates
through all net devices in the netns to show their statistics.
We still need some protection against an interface registering or
deregistering, and the writer-side lock, the RTNL mutex, is fine for
that, because it offers sleepable context.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
net/core/net-procfs.c | 11 ++++++-----
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index c714e6a9dad4..1429e8c066d8 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/netdevice.h>
#include <linux/proc_fs.h>
+#include <linux/rtnetlink.h>
#include <linux/seq_file.h>
#include <net/wext.h>
@@ -21,7 +22,7 @@ static inline struct net_device *dev_from_same_bucket(struct seq_file *seq, loff
unsigned int count = 0, offset = get_offset(*pos);
h = &net->dev_index_head[get_bucket(*pos)];
- hlist_for_each_entry_rcu(dev, h, index_hlist) {
+ hlist_for_each_entry(dev, h, index_hlist) {
if (++count == offset)
return dev;
}
@@ -51,9 +52,9 @@ static inline struct net_device *dev_from_bucket(struct seq_file *seq, loff_t *p
* in detail.
*/
static void *dev_seq_start(struct seq_file *seq, loff_t *pos)
- __acquires(RCU)
+ __acquires(rtnl_mutex)
{
- rcu_read_lock();
+ rtnl_lock();
if (!*pos)
return SEQ_START_TOKEN;
@@ -70,9 +71,9 @@ static void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos)
}
static void dev_seq_stop(struct seq_file *seq, void *v)
- __releases(RCU)
+ __releases(rtnl_mutex)
{
- rcu_read_unlock();
+ rtnl_unlock();
}
static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
-----------------------------[cut here]-----------------------------
and:
-----------------------------[cut here]-----------------------------
From 0c51569116b3844d0d99831697a8e4134814d50e Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon, 30 Nov 2020 02:51:01 +0200
Subject: [PATCH] net: sysfs: retrieve device statistics unlocked
In the effort of making .ndo_get_stats64 be able to sleep, we need to
ensure the callers of dev_get_stats do not use atomic context.
I need to preface this by saying that I have no idea why netstat_show
takes the dev_base_lock rwlock. Two things are for certain:
(a) it's not due to dev_isalive requiring it for some mystical reason,
because broadcast_show() also calls dev_isalive() and has had no
problem existing since the beginning of git.
(b) the dev_get_stats function definitely does not need dev_base_lock
protection either. In fact, holding the dev_base_lock is the entire
problem here, because we want to make dev_get_stats sleepable, and
holding a rwlock gives us atomic context.
So since no protection seems to be necessary, just run unlocked while
retrieving the /sys/class/net/eth0/statistics/* values.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
net/core/net-sysfs.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 999b70c59761..0782a476b424 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -585,14 +585,13 @@ static ssize_t netstat_show(const struct device *d,
WARN_ON(offset > sizeof(struct rtnl_link_stats64) ||
offset % sizeof(u64) != 0);
- read_lock(&dev_base_lock);
if (dev_isalive(dev)) {
struct rtnl_link_stats64 temp;
const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *)stats) + offset));
}
- read_unlock(&dev_base_lock);
+
return ret;
}
-----------------------------[cut here]-----------------------------
In all fairness, I think there is precedent in the kernel for changing
so much RCU-protected code to use RTNL. I can't recall the exact link
now, except for this one:
https://patchwork.ozlabs.org/project/netdev/patch/1410306738-18036-2-git-send-email-xiyou.wangcong@gmail.com/,
but you and Cong have changed a lot of RCU-protected accesses in IPv6,
because the read-side critical section was taking too much time and was
not sleepable/preemptible.
Now, I do agree that there's only so much we can keep adding to the RTNL
mutex. I guess somebody needs to start the migration towards a different
mutex. I'll prepare some patches then, and rework the ones that I have.
next prev parent reply other threads:[~2020-11-30 21:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20201129182435.jgqfjbekqmmtaief@skbuf>
2020-11-29 20:58 ` Correct usage of dev_base_lock in 2020 Vladimir Oltean
2020-11-30 5:12 ` Stephen Hemminger
2020-11-30 10:41 ` Eric Dumazet
2020-11-30 18:14 ` Jakub Kicinski
2020-11-30 18:30 ` Eric Dumazet
2020-11-30 18:48 ` Vladimir Oltean
2020-11-30 19:00 ` Eric Dumazet
2020-11-30 19:03 ` Vladimir Oltean
2020-11-30 19:22 ` Eric Dumazet
2020-11-30 19:32 ` Vladimir Oltean
2020-11-30 21:41 ` Florian Fainelli
2020-11-30 19:46 ` Vladimir Oltean
2020-11-30 20:18 ` Eric Dumazet
2020-11-30 20:21 ` Stephen Hemminger
2020-11-30 20:26 ` Vladimir Oltean
2020-11-30 20:29 ` Eric Dumazet
2020-11-30 20:36 ` Vladimir Oltean
2020-11-30 20:43 ` Eric Dumazet
2020-11-30 20:50 ` Vladimir Oltean
2020-11-30 21:00 ` Eric Dumazet
2020-11-30 21:11 ` Vladimir Oltean
2020-11-30 21:46 ` Eric Dumazet
2020-11-30 21:53 ` Vladimir Oltean [this message]
2020-11-30 22:20 ` Eric Dumazet
2020-11-30 22:41 ` Vladimir Oltean
2020-12-01 14:42 ` Pablo Neira Ayuso
2020-12-01 18:58 ` Vladimir Oltean
2020-12-10 4:32 ` [PATCH] net: bonding: retrieve device statistics under RTNL, not RCU kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201130215322.7arp3scumobdnvtz@skbuf \
--to=olteanv@gmail.com \
--cc=andrew@lunn.ch \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=f.fainelli@gmail.com \
--cc=jbenc@redhat.com \
--cc=jhs@mojatatu.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=paul.gortmaker@windriver.com \
--cc=stephen@networkplumber.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).