netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sysctl: lockdep support.
       [not found]                     ` <1237577688.4667.68.camel@laptop>
@ 2009-03-21  7:39                       ` Eric W. Biederman
  2009-03-21  7:40                         ` [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-21  7:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev


The problem:  There is a class of deadlocks that we currently
have in the kernel that lockdep does not recognize. 

In particular with the network stack we can have:

rtnl_lock();                         use_table();
unregister_netdevice();              rtnl_lock();
unregister_sysctl_table();           ....
wait_for_completion();               rtnl_lock();
....                                 unuse_table()
rtnl_unlock();                       complete();



Where we never make it to the lines labled ....

My patch following patches treats the sysctl use count as a read/writer
lock for purposes of lockdep.

The code works but I'm not certain I have plugged into lockdep quite
correctly.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time.
  2009-03-21  7:39                       ` [PATCH 0/2] sysctl: lockdep support Eric W. Biederman
@ 2009-03-21  7:40                         ` Eric W. Biederman
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-21  7:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev


The current code works fine, and is actually not buggy but it
does prevent enabling the use of lockdep to check refcounting
vs lock holding ordering problems.   So since we can ensure
that we are only hold a single sysctl_head at a time.  Allowing
lockdep to complain.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
---
 fs/proc/proc_sysctl.c |   24 ++++++++++--------------
 1 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 94fcfff..46eb34c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -79,7 +79,6 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 {
 	struct ctl_table_header *head = grab_header(dir);
 	struct ctl_table *table = PROC_I(dir)->sysctl_entry;
-	struct ctl_table_header *h = NULL;
 	struct qstr *name = &dentry->d_name;
 	struct ctl_table *p;
 	struct inode *inode;
@@ -97,10 +96,11 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 
 	p = find_in_table(table, name);
 	if (!p) {
-		for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
-			if (h->attached_to != table)
+		sysctl_head_finish(head);
+		for (head = sysctl_head_next(NULL); head; head = sysctl_head_next(head)) {
+			if (head->attached_to != table)
 				continue;
-			p = find_in_table(h->attached_by, name);
+			p = find_in_table(head->attached_by, name);
 			if (p)
 				break;
 		}
@@ -110,9 +110,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
 		goto out;
 
 	err = ERR_PTR(-ENOMEM);
-	inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
-	if (h)
-		sysctl_head_finish(h);
+	inode = proc_sys_make_inode(dir->i_sb, head, p);
 
 	if (!inode)
 		goto out;
@@ -243,7 +241,6 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	struct inode *inode = dentry->d_inode;
 	struct ctl_table_header *head = grab_header(inode);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
-	struct ctl_table_header *h = NULL;
 	unsigned long pos;
 	int ret = -EINVAL;
 
@@ -277,14 +274,13 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
 	if (ret)
 		goto out;
 
-	for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
-		if (h->attached_to != table)
+	sysctl_head_finish(head);
+	for (head = sysctl_head_next(NULL); head; head = sysctl_head_next(head)) {
+		if (head->attached_to != table)
 			continue;
-		ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
-		if (ret) {
-			sysctl_head_finish(h);
+		ret = scan(head, head->attached_by, &pos, filp, dirent, filldir);
+		if (ret)
 			break;
-		}
 	}
 	ret = 1;
 out:
-- 
1.6.1.2.350.g88cc


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:40                         ` [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time Eric W. Biederman
@ 2009-03-21  7:42                           ` Eric W. Biederman
  2009-03-30 22:26                             ` Andrew Morton
                                               ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-21  7:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev


It is possible for get lock ordering deadlocks between locks
and waiting for the sysctl used count to drop to zero.  We have
recently observed one of these in the networking code.

So teach the sysctl code how to speak lockdep so the kernel
can warn about these kinds of rare issues proactively.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
---
 include/linux/sysctl.h |    4 ++
 kernel/sysctl.c        |  108 ++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 91 insertions(+), 21 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 39d471d..ec9b1dd 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -28,6 +28,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/lockdep.h>
 
 struct file;
 struct completion;
@@ -1087,6 +1088,9 @@ struct ctl_table_header
 	struct ctl_table *attached_by;
 	struct ctl_table *attached_to;
 	struct ctl_table_header *parent;
+#ifdef CONFIG_PROVE_LOCKING
+	struct lockdep_map dep_map;
+#endif
 };
 
 /* struct ctl_path describes where in the hierarchy a table is added */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c5ef44f..ea8cc39 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1454,12 +1454,63 @@ static struct ctl_table dev_table[] = {
 
 static DEFINE_SPINLOCK(sysctl_lock);
 
+#ifndef CONFIG_PROVE_LOCKING
+
+# define lock_sysctl() spin_lock(&sysctl_lock)
+# define unlock_sysctl() spin_unlock(&sysctl_lock)
+
+static inline void table_acquire_use(struct ctl_table_header *hdr) { }
+static inline void table_release_use(struct ctl_table_header *hdr) { }
+static inline void table_acquire(struct ctl_table_header *hdr) { }
+static inline void table_contended(struct ctl_table_header *hdr) { }
+static inline void table_acquired(struct ctl_table_header *hdr) { }
+static inline void table_release(struct ctl_table_header *hdr) { }
+
+#else	/* CONFIG_PROVE_LOCKING */
+
+#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
+#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
+
+static inline void table_acquire_use(struct ctl_table_header *hdr)
+{
+	lock_acquire(&hdr->dep_map, 0, 0, 1, 2, NULL, _RET_IP_);
+	lock_acquired(&hdr->dep_map, _RET_IP_);
+}
+
+static inline void table_release_use(struct ctl_table_header *hdr)
+{
+	lock_release(&hdr->dep_map, 0, _RET_IP_);
+}
+
+static inline void table_acquire(struct ctl_table_header *hdr)
+{
+	lock_acquire(&hdr->dep_map, 0, 0, 0, 2, NULL, _RET_IP_);
+}
+
+static inline void table_contended(struct ctl_table_header *hdr)
+{
+	lock_contended(&hdr->dep_map, _RET_IP_);
+}
+
+static inline void table_acquired(struct ctl_table_header *hdr)
+{
+	lock_acquired(&hdr->dep_map, _RET_IP_);
+}
+
+static inline void table_release(struct ctl_table_header *hdr)
+{
+	lock_release(&hdr->dep_map, 0, _RET_IP_);
+}
+
+#endif /* CONFIG_PROVE_LOCKING */
+
 /* called under sysctl_lock */
 static int use_table(struct ctl_table_header *p)
 {
 	if (unlikely(p->unregistering))
 		return 0;
 	p->used++;
+	table_acquire_use(p);
 	return 1;
 }
 
@@ -1469,6 +1520,8 @@ static void unuse_table(struct ctl_table_header *p)
 	if (!--p->used)
 		if (unlikely(p->unregistering))
 			complete(p->unregistering);
+
+	table_release_use(p);
 }
 
 /* called under sysctl_lock, will reacquire if has to wait */
@@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
 	 * if p->used is 0, nobody will ever touch that entry again;
 	 * we'll eliminate all paths to it before dropping sysctl_lock
 	 */
+	table_acquire(p);
 	if (unlikely(p->used)) {
 		struct completion wait;
+		table_contended(p);
+
 		init_completion(&wait);
 		p->unregistering = &wait;
-		spin_unlock(&sysctl_lock);
+		unlock_sysctl();
 		wait_for_completion(&wait);
-		spin_lock(&sysctl_lock);
+		lock_sysctl();
 	} else {
 		/* anything non-NULL; we'll never dereference it */
 		p->unregistering = ERR_PTR(-EINVAL);
 	}
+	table_acquired(p);
+
 	/*
 	 * do not remove from the list until nobody holds it; walking the
 	 * list in do_sysctl() relies on that.
 	 */
 	list_del_init(&p->ctl_entry);
+
+	table_release(p);
 }
 
 void sysctl_head_get(struct ctl_table_header *head)
 {
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	head->count++;
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 void sysctl_head_put(struct ctl_table_header *head)
 {
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (!--head->count)
 		kfree(head);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head)
 {
 	if (!head)
 		BUG();
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (!use_table(head))
 		head = ERR_PTR(-ENOENT);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 	return head;
 }
 
@@ -1526,9 +1586,9 @@ void sysctl_head_finish(struct ctl_table_header *head)
 {
 	if (!head)
 		return;
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	unuse_table(head);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 static struct ctl_table_set *
@@ -1555,7 +1615,7 @@ struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
 	struct ctl_table_header *head;
 	struct list_head *tmp;
 
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (prev) {
 		head = prev;
 		tmp = &prev->ctl_entry;
@@ -1568,7 +1628,7 @@ struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
 
 		if (!use_table(head))
 			goto next;
-		spin_unlock(&sysctl_lock);
+		unlock_sysctl();
 		return head;
 	next:
 		root = head->root;
@@ -1587,7 +1647,7 @@ struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
 		tmp = header_list->next;
 	}
 out:
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 	return NULL;
 }
 
@@ -1598,9 +1658,9 @@ struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
 
 void register_sysctl_root(struct ctl_table_root *root)
 {
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	list_add_tail(&root->root_list, &sysctl_table_root.root_list);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 #ifdef CONFIG_SYSCTL_SYSCALL
@@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
 		return NULL;
 	}
 #endif
-	spin_lock(&sysctl_lock);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	{
+		static struct lock_class_key __key;
+		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
+	}
+#endif	
+	lock_sysctl();
 	header->set = lookup_header_set(root, namespaces);
 	header->attached_by = header->ctl_table;
 	header->attached_to = root_table;
@@ -1966,7 +2032,7 @@ struct ctl_table_header *__register_sysctl_paths(
 	}
 	header->parent->count++;
 	list_add_tail(&header->ctl_entry, &header->set->list);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 
 	return header;
 }
@@ -2018,7 +2084,7 @@ void unregister_sysctl_table(struct ctl_table_header * header)
 	if (header == NULL)
 		return;
 
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	start_unregistering(header);
 	if (!--header->parent->count) {
 		WARN_ON(1);
@@ -2026,21 +2092,21 @@ void unregister_sysctl_table(struct ctl_table_header * header)
 	}
 	if (!--header->count)
 		kfree(header);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 }
 
 int sysctl_is_seen(struct ctl_table_header *p)
 {
 	struct ctl_table_set *set = p->set;
 	int res;
-	spin_lock(&sysctl_lock);
+	lock_sysctl();
 	if (p->unregistering)
 		res = 0;
 	else if (!set->is_seen)
 		res = 1;
 	else
 		res = set->is_seen(set);
-	spin_unlock(&sysctl_lock);
+	unlock_sysctl();
 	return res;
 }
 
-- 
1.6.1.2.350.g88cc


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
@ 2009-03-30 22:26                             ` Andrew Morton
  2009-03-30 22:53                               ` Eric W. Biederman
  2009-03-31  8:10                               ` Peter Zijlstra
  2009-03-31  8:17                             ` Peter Zijlstra
  2009-04-10  9:18                             ` Andrew Morton
  2 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2009-03-30 22:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev


So I merged these two patches.  I designated them as to-be-merged via
Alexey, with a Cc:Peter (IOW: wakey wakey ;))

Regarding this:

> From: ebiederm@xmission.com (Eric W. Biederman)
> ...
> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>

It doesn't make a lot of sense to have a different Author: address,
IMO.  So I rewrote the From: address to be ebiederm@aristanetworks.com,
OK?

But it would be better if you were to do this, so please do put an
explicit From: line at the top of your changelogs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 22:26                             ` Andrew Morton
@ 2009-03-30 22:53                               ` Eric W. Biederman
  2009-03-30 23:18                                 ` Andrew Morton
  2009-03-31  8:10                               ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-30 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev

Andrew Morton <akpm@linux-foundation.org> writes:

> So I merged these two patches.  I designated them as to-be-merged via
> Alexey, with a Cc:Peter (IOW: wakey wakey ;))

It is more sysctl than proc but whichever.  As long as people aren't
blind sided at the last minute I'm fine.

> Regarding this:
>
>> From: ebiederm@xmission.com (Eric W. Biederman)
>> ...
>> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
>
> It doesn't make a lot of sense to have a different Author: address,
> IMO.  So I rewrote the From: address to be ebiederm@aristanetworks.com,
> OK?

>From my perspective it does.  It is much easier to use my personal
email setup for doing kernel work.  But I need to indicate I am
signing off as an employee of AristaNetworks.  Saying aristanetworks
in my signed-off-by seems a little more official.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 22:53                               ` Eric W. Biederman
@ 2009-03-30 23:18                                 ` Andrew Morton
  2009-03-30 23:50                                   ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2009-03-30 23:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev

On Mon, 30 Mar 2009 15:53:04 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > So I merged these two patches.  I designated them as to-be-merged via
> > Alexey, with a Cc:Peter (IOW: wakey wakey ;))
> 
> It is more sysctl than proc but whichever.  As long as people aren't
> blind sided at the last minute I'm fine.

Yep.  But keeping all fs/proc/ changes in Alexey's tree keeps things
neat and is part of my secret maintainer-impressment plot.

> > Regarding this:
> >
> >> From: ebiederm@xmission.com (Eric W. Biederman)
> >> ...
> >> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
> >
> > It doesn't make a lot of sense to have a different Author: address,
> > IMO.  So I rewrote the From: address to be ebiederm@aristanetworks.com,
> > OK?
> 
> >From my perspective it does.  It is much easier to use my personal
> email setup for doing kernel work.  But I need to indicate I am
> signing off as an employee of AristaNetworks.  Saying aristanetworks
> in my signed-off-by seems a little more official.

OK, fine, but to avoid confusion and to overtly announce this
preference, please do add the From: Line?

Plus that saves me from having to edit
"ebiederm@xmission.com (Eric W. Biederman)" to
"Eric W. Biederman <ebiederm@xmission.com>"

a zillion times ;)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 23:18                                 ` Andrew Morton
@ 2009-03-30 23:50                                   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-30 23:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, jbeulich, tglx, mingo, hpa, linux-kernel, ego, peterz,
	adobriyan, netdev

Andrew Morton <akpm@linux-foundation.org> writes:

> On Mon, 30 Mar 2009 15:53:04 -0700
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> Andrew Morton <akpm@linux-foundation.org> writes:
>> 
>> > So I merged these two patches.  I designated them as to-be-merged via
>> > Alexey, with a Cc:Peter (IOW: wakey wakey ;))
>> 
>> It is more sysctl than proc but whichever.  As long as people aren't
>> blind sided at the last minute I'm fine.
>
> Yep.  But keeping all fs/proc/ changes in Alexey's tree keeps things
> neat and is part of my secret maintainer-impressment plot.

Sounds like a plan.  You work on that.  I will work on undermining
the plot by breaking /proc into it's several separate filesystems.

> OK, fine, but to avoid confusion and to overtly announce this
> preference, please do add the From: Line?

Sounds like a plan.  In truth I have been deleting that line from my
patches anyway....  It just seemed silly to have it in there when
the real From matched.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-30 22:26                             ` Andrew Morton
  2009-03-30 22:53                               ` Eric W. Biederman
@ 2009-03-31  8:10                               ` Peter Zijlstra
  2009-03-31  8:47                                 ` Eric W. Biederman
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2009-03-31  8:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, mingo, jbeulich, tglx, mingo, hpa,
	linux-kernel, ego, adobriyan, netdev

On Mon, 2009-03-30 at 15:26 -0700, Andrew Morton wrote:
> So I merged these two patches.  I designated them as to-be-merged via
> Alexey, with a Cc:Peter (IOW: wakey wakey ;))

Ah, I was under the impression they weren't quite finished yet,
apparently I was mistaken.

Eric, didn't you need recursive locks fixed and such?

Let me have a look at them.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
  2009-03-30 22:26                             ` Andrew Morton
@ 2009-03-31  8:17                             ` Peter Zijlstra
  2009-03-31 13:40                               ` Eric W. Biederman
  2009-04-10  9:18                             ` Andrew Morton
  2 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2009-03-31  8:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
> It is possible for get lock ordering deadlocks between locks
> and waiting for the sysctl used count to drop to zero.  We have
> recently observed one of these in the networking code.
> 
> So teach the sysctl code how to speak lockdep so the kernel
> can warn about these kinds of rare issues proactively.

It would be very good to extend this changelog with a more detailed
explanation of the deadlock in question.

Let me see if I got it right:

We're holding a lock, while waiting for the refcount to drop to 0.
Dropping that refcount is blocked on that lock.

Something like that?

> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
> ---
>  include/linux/sysctl.h |    4 ++
>  kernel/sysctl.c        |  108 ++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 91 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 39d471d..ec9b1dd 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -28,6 +28,7 @@
>  #include <linux/kernel.h>
>  #include <linux/types.h>
>  #include <linux/compiler.h>
> +#include <linux/lockdep.h>
>  
>  struct file;
>  struct completion;
> @@ -1087,6 +1088,9 @@ struct ctl_table_header
>  	struct ctl_table *attached_by;
>  	struct ctl_table *attached_to;
>  	struct ctl_table_header *parent;
> +#ifdef CONFIG_PROVE_LOCKING
> +	struct lockdep_map dep_map;
> +#endif
>  };
>  
>  /* struct ctl_path describes where in the hierarchy a table is added */
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c5ef44f..ea8cc39 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1454,12 +1454,63 @@ static struct ctl_table dev_table[] = {
>  
>  static DEFINE_SPINLOCK(sysctl_lock);
>  
> +#ifndef CONFIG_PROVE_LOCKING
> +
> +# define lock_sysctl() spin_lock(&sysctl_lock)
> +# define unlock_sysctl() spin_unlock(&sysctl_lock)
> +
> +static inline void table_acquire_use(struct ctl_table_header *hdr) { }
> +static inline void table_release_use(struct ctl_table_header *hdr) { }
> +static inline void table_acquire(struct ctl_table_header *hdr) { }
> +static inline void table_contended(struct ctl_table_header *hdr) { }
> +static inline void table_acquired(struct ctl_table_header *hdr) { }
> +static inline void table_release(struct ctl_table_header *hdr) { }
> +
> +#else	/* CONFIG_PROVE_LOCKING */
> +
> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)

Uhmm, Please explain that -- without a proper explanation this is a NAK.

> +static inline void table_acquire_use(struct ctl_table_header *hdr)
> +{
> +	lock_acquire(&hdr->dep_map, 0, 0, 1, 2, NULL, _RET_IP_);
> +	lock_acquired(&hdr->dep_map, _RET_IP_);
> +}
> +
> +static inline void table_release_use(struct ctl_table_header *hdr)
> +{
> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
> +}
> +
> +static inline void table_acquire(struct ctl_table_header *hdr)
> +{
> +	lock_acquire(&hdr->dep_map, 0, 0, 0, 2, NULL, _RET_IP_);
> +}
> +
> +static inline void table_contended(struct ctl_table_header *hdr)
> +{
> +	lock_contended(&hdr->dep_map, _RET_IP_);
> +}
> +
> +static inline void table_acquired(struct ctl_table_header *hdr)
> +{
> +	lock_acquired(&hdr->dep_map, _RET_IP_);
> +}
> +
> +static inline void table_release(struct ctl_table_header *hdr)
> +{
> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
> +}
> +
> +#endif /* CONFIG_PROVE_LOCKING */
> +
>  /* called under sysctl_lock */
>  static int use_table(struct ctl_table_header *p)
>  {
>  	if (unlikely(p->unregistering))
>  		return 0;
>  	p->used++;
> +	table_acquire_use(p);
>  	return 1;
>  }
>  
> @@ -1469,6 +1520,8 @@ static void unuse_table(struct ctl_table_header *p)
>  	if (!--p->used)
>  		if (unlikely(p->unregistering))
>  			complete(p->unregistering);
> +
> +	table_release_use(p);
>  }
>  
>  /* called under sysctl_lock, will reacquire if has to wait */
> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
>  	 * if p->used is 0, nobody will ever touch that entry again;
>  	 * we'll eliminate all paths to it before dropping sysctl_lock
>  	 */
> +	table_acquire(p);
>  	if (unlikely(p->used)) {
>  		struct completion wait;
> +		table_contended(p);
> +
>  		init_completion(&wait);
>  		p->unregistering = &wait;
> -		spin_unlock(&sysctl_lock);
> +		unlock_sysctl();
>  		wait_for_completion(&wait);
> -		spin_lock(&sysctl_lock);
> +		lock_sysctl();
>  	} else {
>  		/* anything non-NULL; we'll never dereference it */
>  		p->unregistering = ERR_PTR(-EINVAL);
>  	}
> +	table_acquired(p);
> +
>  	/*
>  	 * do not remove from the list until nobody holds it; walking the
>  	 * list in do_sysctl() relies on that.
>  	 */
>  	list_del_init(&p->ctl_entry);
> +
> +	table_release(p);
>  }
>  

> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
>  		return NULL;
>  	}
>  #endif
> -	spin_lock(&sysctl_lock);
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +	{
> +		static struct lock_class_key __key;
> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
> +	}
> +#endif	

This means every sysctl thingy gets the same class, is that
intended/desired?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31  8:10                               ` Peter Zijlstra
@ 2009-03-31  8:47                                 ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-31  8:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, mingo, jbeulich, tglx, mingo, hpa, linux-kernel,
	ego, adobriyan, netdev

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, 2009-03-30 at 15:26 -0700, Andrew Morton wrote:
>> So I merged these two patches.  I designated them as to-be-merged via
>> Alexey, with a Cc:Peter (IOW: wakey wakey ;))
>
> Ah, I was under the impression they weren't quite finished yet,
> apparently I was mistaken.
>
> Eric, didn't you need recursive locks fixed and such?

In theory.  In practice I removed the one case that could legitimately
recurse.

> Let me have a look at them.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31  8:17                             ` Peter Zijlstra
@ 2009-03-31 13:40                               ` Eric W. Biederman
  2009-03-31 15:35                                 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-31 13:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

Peter Zijlstra <peterz@infradead.org> writes:

> On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
>> It is possible for get lock ordering deadlocks between locks
>> and waiting for the sysctl used count to drop to zero.  We have
>> recently observed one of these in the networking code.
>> 
>> So teach the sysctl code how to speak lockdep so the kernel
>> can warn about these kinds of rare issues proactively.
>
> It would be very good to extend this changelog with a more detailed
> explanation of the deadlock in question.
>
> Let me see if I got it right:
>
> We're holding a lock, while waiting for the refcount to drop to 0.
> Dropping that refcount is blocked on that lock.
>
> Something like that?

Exactly.

I must have written an explanation so many times that it got
lost when I wrote that commit message.

In particular the problem can be see with /proc/sys/net/ipv4/conf/*/forwarding.

The problem is that the handler for fowarding takes the rtnl_lock
with the reference count held.

Then we call unregister_sysctl_table under the rtnl_lock.
which waits for the reference count to go to zero.


>> Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
>> ---
>>  include/linux/sysctl.h |    4 ++
>>  kernel/sysctl.c        |  108 ++++++++++++++++++++++++++++++++++++++---------
>>  2 files changed, 91 insertions(+), 21 deletions(-)
>> 
>> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
>> index 39d471d..ec9b1dd 100644
>> --- a/include/linux/sysctl.h
>> +++ b/include/linux/sysctl.h
>> @@ -28,6 +28,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/types.h>
>>  #include <linux/compiler.h>
>> +#include <linux/lockdep.h>
>>  
>>  struct file;
>>  struct completion;
>> @@ -1087,6 +1088,9 @@ struct ctl_table_header
>>  	struct ctl_table *attached_by;
>>  	struct ctl_table *attached_to;
>>  	struct ctl_table_header *parent;
>> +#ifdef CONFIG_PROVE_LOCKING
>> +	struct lockdep_map dep_map;
>> +#endif
>>  };
>>  
>>  /* struct ctl_path describes where in the hierarchy a table is added */
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index c5ef44f..ea8cc39 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -1454,12 +1454,63 @@ static struct ctl_table dev_table[] = {
>>  
>>  static DEFINE_SPINLOCK(sysctl_lock);
>>  
>> +#ifndef CONFIG_PROVE_LOCKING
>> +
>> +# define lock_sysctl() spin_lock(&sysctl_lock)
>> +# define unlock_sysctl() spin_unlock(&sysctl_lock)
>> +
>> +static inline void table_acquire_use(struct ctl_table_header *hdr) { }
>> +static inline void table_release_use(struct ctl_table_header *hdr) { }
>> +static inline void table_acquire(struct ctl_table_header *hdr) { }
>> +static inline void table_contended(struct ctl_table_header *hdr) { }
>> +static inline void table_acquired(struct ctl_table_header *hdr) { }
>> +static inline void table_release(struct ctl_table_header *hdr) { }
>> +
>> +#else	/* CONFIG_PROVE_LOCKING */
>> +
>> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
>> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
>
> Uhmm, Please explain that -- without a proper explanation this is a NAK.

If the refcount is to be considered a lock.  sysctl_lock must be considered
the internals of that lock.  lockdep gets extremely confused otherwise.
Since the spinlock is static to this file I'm not especially worried
about it.

>> +static inline void table_acquire_use(struct ctl_table_header *hdr)
>> +{
>> +	lock_acquire(&hdr->dep_map, 0, 0, 1, 2, NULL, _RET_IP_);
>> +	lock_acquired(&hdr->dep_map, _RET_IP_);
>> +}
>> +
>> +static inline void table_release_use(struct ctl_table_header *hdr)
>> +{
>> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
>> +}
>> +
>> +static inline void table_acquire(struct ctl_table_header *hdr)
>> +{
>> +	lock_acquire(&hdr->dep_map, 0, 0, 0, 2, NULL, _RET_IP_);
>> +}
>> +
>> +static inline void table_contended(struct ctl_table_header *hdr)
>> +{
>> +	lock_contended(&hdr->dep_map, _RET_IP_);
>> +}
>> +
>> +static inline void table_acquired(struct ctl_table_header *hdr)
>> +{
>> +	lock_acquired(&hdr->dep_map, _RET_IP_);
>> +}
>> +
>> +static inline void table_release(struct ctl_table_header *hdr)
>> +{
>> +	lock_release(&hdr->dep_map, 0, _RET_IP_);
>> +}
>> +
>> +#endif /* CONFIG_PROVE_LOCKING */
>> +
>>  /* called under sysctl_lock */
>>  static int use_table(struct ctl_table_header *p)
>>  {
>>  	if (unlikely(p->unregistering))
>>  		return 0;
>>  	p->used++;
>> +	table_acquire_use(p);
>>  	return 1;
>>  }
>>  
>> @@ -1469,6 +1520,8 @@ static void unuse_table(struct ctl_table_header *p)
>>  	if (!--p->used)
>>  		if (unlikely(p->unregistering))
>>  			complete(p->unregistering);
>> +
>> +	table_release_use(p);
>>  }
>>  
>>  /* called under sysctl_lock, will reacquire if has to wait */
>> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
>>  	 * if p->used is 0, nobody will ever touch that entry again;
>>  	 * we'll eliminate all paths to it before dropping sysctl_lock
>>  	 */
>> +	table_acquire(p);
>>  	if (unlikely(p->used)) {
>>  		struct completion wait;
>> +		table_contended(p);
>> +
>>  		init_completion(&wait);
>>  		p->unregistering = &wait;
>> -		spin_unlock(&sysctl_lock);
>> +		unlock_sysctl();
>>  		wait_for_completion(&wait);
>> -		spin_lock(&sysctl_lock);
>> +		lock_sysctl();
>>  	} else {
>>  		/* anything non-NULL; we'll never dereference it */
>>  		p->unregistering = ERR_PTR(-EINVAL);
>>  	}
>> +	table_acquired(p);
>> +
>>  	/*
>>  	 * do not remove from the list until nobody holds it; walking the
>>  	 * list in do_sysctl() relies on that.
>>  	 */
>>  	list_del_init(&p->ctl_entry);
>> +
>> +	table_release(p);
>>  }
>>  
>
>> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
>>  		return NULL;
>>  	}
>>  #endif
>> -	spin_lock(&sysctl_lock);
>> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>> +	{
>> +		static struct lock_class_key __key;
>> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
>> +	}
>> +#endif	
>
> This means every sysctl thingy gets the same class, is that
> intended/desired?

There is only one place we initialize it, and as far as I know really
only one place we take it.  Which is the definition of a lockdep
class as far as I know.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31 13:40                               ` Eric W. Biederman
@ 2009-03-31 15:35                                 ` Peter Zijlstra
  2009-03-31 22:44                                   ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2009-03-31 15:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

On Tue, 2009-03-31 at 06:40 -0700, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
> >> It is possible for get lock ordering deadlocks between locks
> >> and waiting for the sysctl used count to drop to zero.  We have
> >> recently observed one of these in the networking code.
> >> 
> >> So teach the sysctl code how to speak lockdep so the kernel
> >> can warn about these kinds of rare issues proactively.
> >
> > It would be very good to extend this changelog with a more detailed
> > explanation of the deadlock in question.
> >
> > Let me see if I got it right:
> >
> > We're holding a lock, while waiting for the refcount to drop to 0.
> > Dropping that refcount is blocked on that lock.
> >
> > Something like that?
> 
> Exactly.
> 
> I must have written an explanation so many times that it got
> lost when I wrote that commit message.
> 
> In particular the problem can be see with /proc/sys/net/ipv4/conf/*/forwarding.
> 
> The problem is that the handler for fowarding takes the rtnl_lock
> with the reference count held.
> 
> Then we call unregister_sysctl_table under the rtnl_lock.
> which waits for the reference count to go to zero.

> >> +
> >> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
> >> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
> >
> > Uhmm, Please explain that -- without a proper explanation this is a NAK.
> 
> If the refcount is to be considered a lock.  sysctl_lock must be considered
> the internals of that lock.  lockdep gets extremely confused otherwise.
> Since the spinlock is static to this file I'm not especially worried
> about it.

Usually lock internal locks still get lockdep coverage. Let see if we
can find a way for this to be true even here. I suspect the below to
cause the issue:

> >>  /* called under sysctl_lock, will reacquire if has to wait */
> >> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
> >>  	 * if p->used is 0, nobody will ever touch that entry again;
> >>  	 * we'll eliminate all paths to it before dropping sysctl_lock
> >>  	 */
> >> +	table_acquire(p);
> >>  	if (unlikely(p->used)) {
> >>  		struct completion wait;
> >> +		table_contended(p);
> >> +
> >>  		init_completion(&wait);
> >>  		p->unregistering = &wait;
> >> -		spin_unlock(&sysctl_lock);
> >> +		unlock_sysctl();
> >>  		wait_for_completion(&wait);
> >> -		spin_lock(&sysctl_lock);
> >> +		lock_sysctl();
> >>  	} else {
> >>  		/* anything non-NULL; we'll never dereference it */
> >>  		p->unregistering = ERR_PTR(-EINVAL);
> >>  	}
> >> +	table_acquired(p);
> >> +
> >>  	/*
> >>  	 * do not remove from the list until nobody holds it; walking the
> >>  	 * list in do_sysctl() relies on that.
> >>  	 */
> >>  	list_del_init(&p->ctl_entry);
> >> +
> >> +	table_release(p);
> >>  }

There you acquire the table while holding the spinlock, generating:
sysctl_lock -> table_lock, however you then release the sysctl_lock and
re-acquire it, generating table_lock -> sysctl_lock.

Humm, can't we write that differently?


> >> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
> >>  		return NULL;
> >>  	}
> >>  #endif
> >> -	spin_lock(&sysctl_lock);
> >> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> >> +	{
> >> +		static struct lock_class_key __key;
> >> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
> >> +	}
> >> +#endif	
> >
> > This means every sysctl thingy gets the same class, is that
> > intended/desired?
> 
> There is only one place we initialize it, and as far as I know really
> only one place we take it.  Which is the definition of a lockdep
> class as far as I know.

Indeed, just checking.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-31 15:35                                 ` Peter Zijlstra
@ 2009-03-31 22:44                                   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2009-03-31 22:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Ingo Molnar, Jan Beulich, tglx, mingo, hpa,
	linux-kernel, Gautham R Shenoy, Alexey Dobriyan, netdev

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, 2009-03-31 at 06:40 -0700, Eric W. Biederman wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Sat, 2009-03-21 at 00:42 -0700, Eric W. Biederman wrote:
>> >> It is possible for get lock ordering deadlocks between locks
>> >> and waiting for the sysctl used count to drop to zero.  We have
>> >> recently observed one of these in the networking code.
>> >> 
>> >> So teach the sysctl code how to speak lockdep so the kernel
>> >> can warn about these kinds of rare issues proactively.
>> >
>> > It would be very good to extend this changelog with a more detailed
>> > explanation of the deadlock in question.
>> >
>> > Let me see if I got it right:
>> >
>> > We're holding a lock, while waiting for the refcount to drop to 0.
>> > Dropping that refcount is blocked on that lock.
>> >
>> > Something like that?
>> 
>> Exactly.
>> 
>> I must have written an explanation so many times that it got
>> lost when I wrote that commit message.
>> 
>> In particular the problem can be see with /proc/sys/net/ipv4/conf/*/forwarding.
>> 
>> The problem is that the handler for fowarding takes the rtnl_lock
>> with the reference count held.
>> 
>> Then we call unregister_sysctl_table under the rtnl_lock.
>> which waits for the reference count to go to zero.
>
>> >> +
>> >> +#  define lock_sysctl() __raw_spin_lock(&sysctl_lock.raw_lock)
>> >> +#  define unlock_sysctl() __raw_spin_unlock(&sysctl_lock.raw_lock)
>> >
>> > Uhmm, Please explain that -- without a proper explanation this is a NAK.
>> 
>> If the refcount is to be considered a lock.  sysctl_lock must be considered
>> the internals of that lock.  lockdep gets extremely confused otherwise.
>> Since the spinlock is static to this file I'm not especially worried
>> about it.
>
> Usually lock internal locks still get lockdep coverage. Let see if we
> can find a way for this to be true even here. I suspect the below to
> cause the issue:
>
>> >>  /* called under sysctl_lock, will reacquire if has to wait */
>> >> @@ -1478,47 +1531,54 @@ static void start_unregistering(struct ctl_table_header *p)
>> >>  	 * if p->used is 0, nobody will ever touch that entry again;
>> >>  	 * we'll eliminate all paths to it before dropping sysctl_lock
>> >>  	 */
>> >> +	table_acquire(p);
>> >>  	if (unlikely(p->used)) {
>> >>  		struct completion wait;
>> >> +		table_contended(p);
>> >> +
>> >>  		init_completion(&wait);
>> >>  		p->unregistering = &wait;
>> >> -		spin_unlock(&sysctl_lock);
>> >> +		unlock_sysctl();
>> >>  		wait_for_completion(&wait);
>> >> -		spin_lock(&sysctl_lock);
>> >> +		lock_sysctl();
>> >>  	} else {
>> >>  		/* anything non-NULL; we'll never dereference it */
>> >>  		p->unregistering = ERR_PTR(-EINVAL);
>> >>  	}
>> >> +	table_acquired(p);
>> >> +
>> >>  	/*
>> >>  	 * do not remove from the list until nobody holds it; walking the
>> >>  	 * list in do_sysctl() relies on that.
>> >>  	 */
>> >>  	list_del_init(&p->ctl_entry);
>> >> +
>> >> +	table_release(p);
>> >>  }
>
> There you acquire the table while holding the spinlock, generating:
> sysctl_lock -> table_lock, however you then release the sysctl_lock and
> re-acquire it, generating table_lock -> sysctl_lock.
>
> Humm, can't we write that differently?

That is an artifact of sysctl_lock being used to implement
table_lock as best as I can tell.  The case you point
out I could probably play with where I claim the lock
is acquired and make it work.

__sysctl_head_next on the read side is trickier.
We come in with table_lock held for read.
We grab sysctl_lock.
We release table_lock (aka the reference count is decremented)
We grab table_lock on the next table (aka the reference count is incremented)
We release sysctl_lock

If we generate lockdep annotations for that it would seem to transition
through the states:
table_lock
table_lock -> sysctl_lock
sysctl_lock
sysctl_lock -> table_lock
table_lock

Short of saying table_lock is an implementation detail.  Used to
make certain operations atomic I do not see how to model this case.

Let me take a slightly simpler case and ask how that gets modeled.
Looking at rwsem.  Ok all of the annotations are outside of the
spin_lock.  So in some sense we are sloppy, and fib to lockdep
about when the we acquire/release a lock.  In another sense
we are simply respecting the abstraction.

I guess I can take a look and see if I can model things a slightly
more lossy fashion so I don't need to do the __raw_spin_lock thing.


>> >> @@ -1951,7 +2011,13 @@ struct ctl_table_header *__register_sysctl_paths(
>> >>  		return NULL;
>> >>  	}
>> >>  #endif
>> >> -	spin_lock(&sysctl_lock);
>> >> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>> >> +	{
>> >> +		static struct lock_class_key __key;
>> >> +		lockdep_init_map(&header->dep_map, "sysctl_used", &__key, 0);
>> >> +	}
>> >> +#endif	
>> >
>> > This means every sysctl thingy gets the same class, is that
>> > intended/desired?
>> 
>> There is only one place we initialize it, and as far as I know really
>> only one place we take it.  Which is the definition of a lockdep
>> class as far as I know.
>
> Indeed, just checking.

The only difference I can possibly see is read side versus write side.
Or in my case refcount side versus wait side.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] sysctl:  lockdep support for sysctl reference counting.
  2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
  2009-03-30 22:26                             ` Andrew Morton
  2009-03-31  8:17                             ` Peter Zijlstra
@ 2009-04-10  9:18                             ` Andrew Morton
  2 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2009-04-10  9:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Jan Beulich, tglx, mingo, hpa, linux-kernel,
	Gautham R Shenoy, Peter Zijlstra, Alexey Dobriyan, netdev

On Sat, 21 Mar 2009 00:42:19 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote:

> It is possible for get lock ordering deadlocks between locks
> and waiting for the sysctl used count to drop to zero.  We have
> recently observed one of these in the networking code.
> 
> So teach the sysctl code how to speak lockdep so the kernel
> can warn about these kinds of rare issues proactively.

`make headers_check':

/usr/src/25/usr/include/linux/sysctl.h:31: included file 'linux/lockdep.h' is not exported
make[2]: *** [/usr/src/25/usr/include/linux/.check] Error 1

It looks like we'll need version 2 on these patches..

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-04-10  9:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <49B91A7E.76E4.0078.0@novell.com>
     [not found] ` <tip-6a09dfa870ba0ed21b1124539968a36b42660661@git.kernel.org>
     [not found]   ` <1236934491.5188.209.camel@laptop>
     [not found]     ` <49BA33BE.76E4.0078.0@novell.com>
     [not found]       ` <1236937423.22914.3698.camel@twins>
     [not found]         ` <20090313103828.GB31094@elte.hu>
     [not found]           ` <m1d4cd9n4k.fsf@fess.ebiederm.org>
     [not found]             ` <20090320085205.GB16021@elte.hu>
     [not found]               ` <m14oxo8qis.fsf@fess.ebiederm.org>
     [not found]                 ` <20090320182404.GA31629@elte.hu>
     [not found]                   ` <1237575134.4667.5.camel@laptop>
     [not found]                     ` <1237577688.4667.68.camel@laptop>
2009-03-21  7:39                       ` [PATCH 0/2] sysctl: lockdep support Eric W. Biederman
2009-03-21  7:40                         ` [PATCH 1/2] sysctl: Don't take the use count of multiple heads at a time Eric W. Biederman
2009-03-21  7:42                           ` [PATCH 2/2] sysctl: lockdep support for sysctl reference counting Eric W. Biederman
2009-03-30 22:26                             ` Andrew Morton
2009-03-30 22:53                               ` Eric W. Biederman
2009-03-30 23:18                                 ` Andrew Morton
2009-03-30 23:50                                   ` Eric W. Biederman
2009-03-31  8:10                               ` Peter Zijlstra
2009-03-31  8:47                                 ` Eric W. Biederman
2009-03-31  8:17                             ` Peter Zijlstra
2009-03-31 13:40                               ` Eric W. Biederman
2009-03-31 15:35                                 ` Peter Zijlstra
2009-03-31 22:44                                   ` Eric W. Biederman
2009-04-10  9:18                             ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).