From: ebiederm@xmission.com (Eric W. Biederman)
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>
Subject: [REVIEW][PATCH] proc/sysctl: Don't grab i_lock under sysctl_lock.
Date: Tue, 21 Feb 2017 14:41:34 +1300 [thread overview]
Message-ID: <87a89gqq0h.fsf_-_@xmission.com> (raw)
In-Reply-To: <20170219084202.GJ29622@ZenIV.linux.org.uk> (Al Viro's message of "Sun, 19 Feb 2017 08:42:12 +0000")
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> This patch has locking problem. I've got lockdep splat under LTP.
>
> [ 6633.115456] ======================================================
> [ 6633.115502] [ INFO: possible circular locking dependency detected ]
> [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L
> [ 6633.115584] -------------------------------------------------------
> [ 6633.115627] ksm02/284980 is trying to acquire lock:
> [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
> [ 6633.115834] but task is already holding lock:
> [ 6633.115882] (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
> [ 6633.116026] which lock already depends on the new lock.
> [ 6633.116026]
> [ 6633.116080]
> [ 6633.116080] the existing dependency chain (in reverse order) is:
> [ 6633.116117]
> -> #2 (sysctl_lock){+.+...}:
> -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
> -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
>
> d_lock nests inside i_lock
> sysctl_lock nests inside d_lock in d_compare
>
> This patch adds i_lock nesting inside sysctl_lock.
Al Viro <viro@ZenIV.linux.org.uk> replied:
> Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd
> try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
> drop sysctl_lock() before it and regain after. Make sure that no inodes
> are added to the list ones ->unregistering has been set and use RCU list
> primitives for modifying the inode list, with sysctl_lock still used to
> serialize its modifications.
>
> Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
> igrab() is safe there. Since we don't drop inode reference until after we'd
> passed beyond it in the list, list_for_each_entry_rcu() should be fine.
I agree with Al Viro's analsysis of the situtation.
Fixes: 802e348c6b77 ("proc/sysctl: prune stale dentries during unregistering")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
This is my cleaned up version of Al Viro's proposed fix.
I have tested it and the lockdep warnings go away, and
I have fixed a few trivial to ensure things work as intended.
Unless someone sees a problem I am going to add this fix to my tree and
then send a pull request to Linus.
fs/proc/proc_sysctl.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8efb1e10b025..3e64c6502dc8 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -266,21 +266,19 @@ static void proc_sys_prune_dcache(struct ctl_table_header *head)
struct inode *inode, *prev = NULL;
struct proc_inode *ei;
- list_for_each_entry(ei, &head->inodes, sysctl_inodes) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) {
inode = igrab(&ei->vfs_inode);
if (inode) {
- spin_unlock(&sysctl_lock);
+ rcu_read_unlock();
iput(prev);
prev = inode;
d_prune_aliases(inode);
- spin_lock(&sysctl_lock);
+ rcu_read_lock();
}
}
- if (prev) {
- spin_unlock(&sysctl_lock);
- iput(prev);
- spin_lock(&sysctl_lock);
- }
+ rcu_read_unlock();
+ iput(prev);
}
/* called under sysctl_lock, will reacquire if has to wait */
@@ -296,10 +294,10 @@ static void start_unregistering(struct ctl_table_header *p)
p->unregistering = &wait;
spin_unlock(&sysctl_lock);
wait_for_completion(&wait);
- spin_lock(&sysctl_lock);
} else {
/* anything non-NULL; we'll never dereference it */
p->unregistering = ERR_PTR(-EINVAL);
+ spin_unlock(&sysctl_lock);
}
/*
* Prune dentries for unregistered sysctls: namespaced sysctls
@@ -310,6 +308,7 @@ static void start_unregistering(struct ctl_table_header *p)
* do not remove from the list until nobody holds it; walking the
* list in do_sysctl() relies on that.
*/
+ spin_lock(&sysctl_lock);
erase_header(p);
}
@@ -455,11 +454,17 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
inode->i_ino = get_next_ino();
ei = PROC_I(inode);
- ei->sysctl = head;
- ei->sysctl_entry = table;
spin_lock(&sysctl_lock);
- list_add(&ei->sysctl_inodes, &head->inodes);
+ if (unlikely(head->unregistering)) {
+ spin_unlock(&sysctl_lock);
+ iput(inode);
+ inode = NULL;
+ goto out;
+ }
+ ei->sysctl = head;
+ ei->sysctl_entry = table;
+ list_add_rcu(&ei->sysctl_inodes, &head->inodes);
head->count++;
spin_unlock(&sysctl_lock);
@@ -487,7 +492,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
- list_del(&PROC_I(inode)->sysctl_inodes);
+ list_del_rcu(&PROC_I(inode)->sysctl_inodes);
if (!--head->count)
kfree_rcu(head, rcu);
spin_unlock(&sysctl_lock);
--
2.10.1
next prev parent reply other threads:[~2017-02-21 1:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-08 10:48 [PATCH] proc/sysctl: drop unregistered stale dentries as soon as possible Konstantin Khlebnikov
2017-02-08 21:48 ` Andrew Morton
2017-02-09 3:53 ` Al Viro
2017-02-09 7:36 ` Konstantin Khlebnikov
2017-02-09 8:40 ` Al Viro
2017-02-10 7:35 ` [PATCH] proc/sysctl: prune stale dentries during unregistering Konstantin Khlebnikov
2017-02-10 7:47 ` Al Viro
2017-02-10 7:54 ` Konstantin Khlebnikov
2017-02-13 9:54 ` Eric W. Biederman
2017-02-18 18:55 ` Konstantin Khlebnikov
2017-02-19 8:42 ` Al Viro
2017-02-21 1:41 ` Eric W. Biederman [this message]
2017-02-21 8:40 ` [REVIEW][PATCH] proc/sysctl: Don't grab i_lock under sysctl_lock Konstantin Khlebnikov
2017-02-21 19:29 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a89gqq0h.fsf_-_@xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox