[cgroup or VFS ?] INFO: possible recursive locking detected

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [cgroup or VFS ?] INFO: possible recursive locking detected
@ 2009-01-05  3:23 Li Zefan
  2009-01-08  3:45 ` Li Zefan
  0 siblings, 1 reply; 12+ messages in thread
From: Li Zefan @ 2009-01-05  3:23 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Paul Menage, Al Viro, Arjan van de Ven

Thread 1:
  for ((; ;))
  {
      mount -t cpuset xxx /mnt > /dev/null 2>&1
      cat /mnt/cpus > /dev/null 2>&1
      umount /mnt > /dev/null 2>&1
  }

Thread 2:
  for ((; ;))
  {
      mount -t cpuset xxx /mnt > /dev/null 2>&1
      umount /mnt > /dev/null 2>&1
  }

(Note: It is irrelevant which cgroup subsys is used.)

After a while a lockdep warning showed up:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28 #479
---------------------------------------------
mount/13554 is trying to acquire lock:
 (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321

but task is already holding lock:
 (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321

other info that might help us debug this:
1 lock held by mount/13554:
 #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321

stack backtrace:
Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
Call Trace:
 [<c044ad2e>] validate_chain+0x4c6/0xbbd
 [<c044ba9b>] __lock_acquire+0x676/0x700
 [<c044bb82>] lock_acquire+0x5d/0x7a
 [<c049d888>] ? sget+0x5e/0x321
 [<c061b9b8>] down_write+0x34/0x50
 [<c049d888>] ? sget+0x5e/0x321
 [<c049d888>] sget+0x5e/0x321
 [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
 [<c045959f>] ? cgroup_test_super+0x0/0x2f
 [<c045bcea>] cgroup_get_sb+0x98/0x2e7
 [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
 [<c049dfa4>] vfs_kern_mount+0x40/0x7b
 [<c049e02d>] do_kern_mount+0x37/0xbf
 [<c04af4a0>] do_mount+0x5c3/0x61a
 [<c04addd2>] ? copy_mount_options+0x2c/0x111
 [<c04af560>] sys_mount+0x69/0xa0
 [<c0403251>] sysenter_do_call+0x12/0x31

The cause is after alloc_super() and then retry, an old entry in list
fs_supers is found, so grab_super(old) is called, but both functions
hold s_umount lock:

struct super_block *sget(...)
{
	...
retry:
	spin_lock(&sb_lock);
	if (test) {
		list_for_each_entry(old, &type->fs_supers, s_instances) {
			if (!test(old, data))
				continue;
			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
				goto retry;
			if (s)
				destroy_super(s);
			return old;
		}
	}
	if (!s) {
		spin_unlock(&sb_lock);
		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
		if (!s)
			return ERR_PTR(-ENOMEM);
		goto retry;
	}
	...
}

It seems like a false positive, and seems like VFS but not cgroup needs
to be fixed ?

And I noticed this commit:

commit 897c6ff9568bcb102ffc6b465ebe1def0cba829d
Author: Arjan van de Ven <arjan@infradead.org>
Date:   Mon Jul 3 00:25:28 2006 -0700

    [PATCH] lockdep: annotate sb ->s_umount

    The s_umount rwsem needs to be classified as per-superblock since it's
    perfectly legit to keep multiple of those recursively in the VFS locking
    rules.

    Has no effect on non-lockdep kernels.

The changelog said s_umount needs to be classified as per-sb, but actually
it made it as per-filesystem. And there is no way to mark all instances
of a given lock as distinct.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-01-05  3:23 [cgroup or VFS ?] INFO: possible recursive locking detected Li Zefan
@ 2009-01-08  3:45 ` Li Zefan
  2009-02-09 11:23   ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Li Zefan @ 2009-01-08  3:45 UTC (permalink / raw)
  To: Al Viro; +Cc: LKML, Andrew Morton, Paul Menage, Arjan van de Ven, linux-fsdevel

CC: linux-fsdevel@vger.kernel.org

> The cause is after alloc_super() and then retry, an old entry in list
> fs_supers is found, so grab_super(old) is called, but both functions
> hold s_umount lock:
> 

Hi Al Viro,

I hacked into the kernel with the patch below (I think It's ok for me
to comment out bdev->bd_mount_sem for testing):

========================
--- a/fs/super.c
+++ b/fs/super.c
@@ -338,6 +338,7 @@ struct super_block *sget(struct file_system_type *type,
 	struct super_block *s = NULL;
 	struct super_block *old;
 	int err;
+	static int count;
 
 retry:
 	spin_lock(&sb_lock);
@@ -354,6 +355,10 @@ retry:
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
+		if (!strcmp(type->name, "ext3")) {
+			if (count++ % 2 == 0)
+				msleep(150);
+		}
 		s = alloc_super(type);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
@@ -770,9 +775,9 @@ int get_sb_bdev(struct file_system_type *fs_type,
 	 * will protect the lockfs code from trying to start a snapshot
 	 * while we are mounting
 	 */
-	down(&bdev->bd_mount_sem);
+//	down(&bdev->bd_mount_sem);
 	s = sget(fs_type, test_bdev_super, set_bdev_super, bdev);
-	up(&bdev->bd_mount_sem);
+//	up(&bdev->bd_mount_sem);
 	if (IS_ERR(s))
 		goto error_s;
======================== 

And ran 2 threads:
	for ((; ;))  # thread 1
	{
		mount -t ext3 /dev/sda9 /mnt1
		umount /mnt1
	}

	for ((; ;))  # thread 2
	{
		mount -t ext3 /dev/sda9 /mnt2
		umount /mnt2
	}

And I got the same lockdep warning immediately, so I think it's
VFS's issue.

=============================================
[ INFO: possible recursive locking detected ]
2.6.28-mc #497
---------------------------------------------
mount/3103 is trying to acquire lock:
 (&type->s_umount_key#15){----}, at: [<c04a2a56>] sget+0x58/0x33d

but task is already holding lock:
 (&type->s_umount_key#15){----}, at: [<c04a2c01>] sget+0x203/0x33d

other info that might help us debug this:
1 lock held by mount/3103:
 #0:  (&type->s_umount_key#15){----}, at: [<c04a2c01>] sget+0x203/0x33d

stack backtrace:
Pid: 3103, comm: mount Not tainted 2.6.28-mc #497
Call Trace:
 [<c044e01a>] validate_chain+0x4c6/0xbbd
 [<c043679c>] ? lock_timer_base+0x24/0x43
 [<c044ed87>] __lock_acquire+0x676/0x700
 [<c044ee6e>] lock_acquire+0x5d/0x7a
 [<c04a2a56>] ? sget+0x58/0x33d
 [<c06223d8>] down_write+0x34/0x50
 [<c04a2a56>] ? sget+0x58/0x33d
 [<c04a2a56>] sget+0x58/0x33d
 [<c04a26f8>] ? set_bdev_super+0x0/0x17
 [<c04a270f>] ? test_bdev_super+0x0/0x16
 [<c04a3510>] get_sb_bdev+0x52/0x125
 [<c04b3d00>] ? alloc_vfsmnt+0x71/0xe8
 [<c0488e9a>] ? kstrdup+0x31/0x53
 [<f80ac930>] ext3_get_sb+0x18/0x1a [ext3]
 [<f80adef0>] ? ext3_fill_super+0x0/0x1438 [ext3]
 [<c04a3195>] vfs_kern_mount+0x40/0x7b
 [<c04a321e>] do_kern_mount+0x37/0xbf
 [<c04b4786>] do_mount+0x5dc/0x633
 [<c04b311e>] ? copy_mount_options+0x2c/0x111
 [<c04b4846>] sys_mount+0x69/0xa0
 [<c0403351>] sysenter_do_call+0x12/0x31

> struct super_block *sget(...)
> {
> 	...
> retry:
> 	spin_lock(&sb_lock);
> 	if (test) {
> 		list_for_each_entry(old, &type->fs_supers, s_instances) {
> 			if (!test(old, data))
> 				continue;
> 			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
> 				goto retry;
> 			if (s)
> 				destroy_super(s);
> 			return old;
> 		}
> 	}
> 	if (!s) {
> 		spin_unlock(&sb_lock);
> 		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
> 		if (!s)
> 			return ERR_PTR(-ENOMEM);
> 		goto retry;
> 	}
> 	...
> }
> 
> It seems like a false positive, and seems like VFS but not cgroup needs
> to be fixed ?
> 
> And I noticed this commit:
> 
> commit 897c6ff9568bcb102ffc6b465ebe1def0cba829d
> Author: Arjan van de Ven <arjan@infradead.org>
> Date:   Mon Jul 3 00:25:28 2006 -0700
> 
>     [PATCH] lockdep: annotate sb ->s_umount
> 
>     The s_umount rwsem needs to be classified as per-superblock since it's
>     perfectly legit to keep multiple of those recursively in the VFS locking
>     rules.
> 
>     Has no effect on non-lockdep kernels.
> 
> The changelog said s_umount needs to be classified as per-sb, but actually
> it made it as per-filesystem. And there is no way to mark all instances
> of a given lock as distinct.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-01-08  3:45 ` Li Zefan
@ 2009-02-09 11:23   ` Al Viro
  2009-02-09 11:38     ` Li Zefan
  2009-02-09 11:48     ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2009-02-09 11:23 UTC (permalink / raw)
  To: Li Zefan
  Cc: LKML, Andrew Morton, Paul Menage, Arjan van de Ven, linux-fsdevel

On Thu, Jan 08, 2009 at 11:45:43AM +0800, Li Zefan wrote:
> Hi Al Viro,
> 
> I hacked into the kernel with the patch below (I think It's ok for me
> to comment out bdev->bd_mount_sem for testing):

> And ran 2 threads:
> 	for ((; ;))  # thread 1
> 	{
> 		mount -t ext3 /dev/sda9 /mnt1
> 		umount /mnt1
> 	}
> 
> 	for ((; ;))  # thread 2
> 	{
> 		mount -t ext3 /dev/sda9 /mnt2
> 		umount /mnt2
> 	}
> 
> And I got the same lockdep warning immediately, so I think it's
> VFS's issue.

It's a lockdep issue, actually.  It _is_ a false positive; we could get rid
of that if we took destroy_super(s); just before grab_super(), but I really
do not believe that there's any point.

Frankly, I'd rather see if there's any way to teach lockdep that this instance
of lock is getting initialized into "one writer" state and that yes, we know
that it's not visible to anyone, so doing that is safe, TYVM, even though
we are under spinlock.  Then take that sucker to just before set().

In any case, I really do not believe that it might have anything to do with
the WARN_ON() from another thread...

Comments?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-09 11:23   ` Al Viro
@ 2009-02-09 11:38     ` Li Zefan
  2009-02-09 11:48     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Li Zefan @ 2009-02-09 11:38 UTC (permalink / raw)
  To: Al Viro; +Cc: LKML, Andrew Morton, Paul Menage, Arjan van de Ven, linux-fsdevel

Al Viro wrote:
> On Thu, Jan 08, 2009 at 11:45:43AM +0800, Li Zefan wrote:
>> Hi Al Viro,
>>
>> I hacked into the kernel with the patch below (I think It's ok for me
>> to comment out bdev->bd_mount_sem for testing):
> 
>> And ran 2 threads:
>> 	for ((; ;))  # thread 1
>> 	{
>> 		mount -t ext3 /dev/sda9 /mnt1
>> 		umount /mnt1
>> 	}
>>
>> 	for ((; ;))  # thread 2
>> 	{
>> 		mount -t ext3 /dev/sda9 /mnt2
>> 		umount /mnt2
>> 	}
>>
>> And I got the same lockdep warning immediately, so I think it's
>> VFS's issue.
> 
> It's a lockdep issue, actually.  It _is_ a false positive; we could get rid

Yes, I believe it's a false positive when I looked into this issue.

> of that if we took destroy_super(s); just before grab_super(), but I really
> do not believe that there's any point.
> 
> Frankly, I'd rather see if there's any way to teach lockdep that this instance
> of lock is getting initialized into "one writer" state and that yes, we know
> that it's not visible to anyone, so doing that is safe, TYVM, even though
> we are under spinlock.  Then take that sucker to just before set().
> 

It would be nice if we can do this way..

> In any case, I really do not believe that it might have anything to do with
> the WARN_ON() from another thread...
> 

agreed. I don't think they are related, and that's why I sent 2 different reports.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-09 11:23   ` Al Viro
  2009-02-09 11:38     ` Li Zefan
@ 2009-02-09 11:48     ` Peter Zijlstra
  2009-02-10  3:06       ` Li Zefan
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2009-02-09 11:48 UTC (permalink / raw)
  To: Al Viro
  Cc: Li Zefan, LKML, Andrew Morton, Paul Menage, Arjan van de Ven,
	linux-fsdevel

On Mon, 2009-02-09 at 11:23 +0000, Al Viro wrote:
> On Thu, Jan 08, 2009 at 11:45:43AM +0800, Li Zefan wrote:
> > Hi Al Viro,
> > 
> > I hacked into the kernel with the patch below (I think It's ok for me
> > to comment out bdev->bd_mount_sem for testing):
> 
> > And ran 2 threads:
> > 	for ((; ;))  # thread 1
> > 	{
> > 		mount -t ext3 /dev/sda9 /mnt1
> > 		umount /mnt1
> > 	}
> > 
> > 	for ((; ;))  # thread 2
> > 	{
> > 		mount -t ext3 /dev/sda9 /mnt2
> > 		umount /mnt2
> > 	}
> > 
> > And I got the same lockdep warning immediately, so I think it's
> > VFS's issue.
> 
> It's a lockdep issue, actually.  It _is_ a false positive; we could get rid
> of that if we took destroy_super(s); just before grab_super(), but I really
> do not believe that there's any point.
> 
> Frankly, I'd rather see if there's any way to teach lockdep that this instance
> of lock is getting initialized into "one writer" state and that yes, we know
> that it's not visible to anyone, so doing that is safe, TYVM, even though
> we are under spinlock.  Then take that sucker to just before set().
> 
> In any case, I really do not believe that it might have anything to do with
> the WARN_ON() from another thread...
> 
> Comments?

It seems to me we can simply put the new s_umount instance in a
different subclass. Its a bit unusual to use _nested for the outer lock,
but lockdep doesn't particularly cares about subclass order.

If there's any issue with the callers of sget() assuming the s_umount
lock being of sublcass 0, then there is another annotation we can use to
fix that, but lets not bother with that if this is sufficient.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 fs/super.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 645e540..34ddc86 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -82,7 +82,22 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		 * lock ordering than usbfs:
 		 */
 		lockdep_set_class(&s->s_lock, &type->s_lock_key);
-		down_write(&s->s_umount);
+		/*
+		 * sget() can have s_umount recursion.
+		 *
+		 * When it cannot find a suitable sb, it allocates a new
+		 * one (this one), and tries again to find a suitable old
+		 * one.
+		 *
+		 * In case that succeeds, it will acquire the s_umount
+		 * lock of the old one. Since these are clearly distrinct
+		 * locks, and this object isn't exposed yet, there's no
+		 * risk of deadlocks.
+		 *
+		 * Annotate this by putting this lock in a different
+		 * subclass.
+		 */
+		down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING);
 		s->s_count = S_BIAS;
 		atomic_set(&s->s_active, 1);
 		mutex_init(&s->s_vfs_rename_mutex);



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-09 11:48     ` Peter Zijlstra
@ 2009-02-10  3:06       ` Li Zefan
  2009-02-10  4:37         ` Al Viro
  2009-02-10  8:32         ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Li Zefan @ 2009-02-10  3:06 UTC (permalink / raw)
  To: Peter Zijlstra, Al Viro
  Cc: LKML, Andrew Morton, Paul Menage, Arjan van de Ven, linux-fsdevel

> It seems to me we can simply put the new s_umount instance in a
> different subclass. Its a bit unusual to use _nested for the outer lock,
> but lockdep doesn't particularly cares about subclass order.
> 
> If there's any issue with the callers of sget() assuming the s_umount
> lock being of sublcass 0, then there is another annotation we can use to
> fix that, but lets not bother with that if this is sufficient.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

Tested-by: Li Zefan <lizf@cn.fujitsu.com>

Thanks!

a minor comment

> +		 * lock of the old one. Since these are clearly distrinct

s/distrinct/distinct


BTW, I found another bug in current code:

From: Li Zefan <lizf@cn.fujitsu.com>
Date: Tue, 10 Feb 2009 10:55:53 +0800
Subject: [PATCH] vfs: add missing unlock in sget()

We should release s->s_umount before calling destroy_super(s).

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 fs/super.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 61dce00..8bdf981 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -356,8 +356,10 @@ retry:
 				continue;
 			if (!grab_super(old))
 				goto retry;
-			if (s)
+			if (s) {
+				up_write(&s->s_umount);
 				destroy_super(s);
+			}
 			return old;
 		}
 	}
-- 
1.5.4.rc3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-10  3:06       ` Li Zefan
@ 2009-02-10  4:37         ` Al Viro
  2009-02-10  5:19           ` Li Zefan
  2009-02-10  8:32         ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2009-02-10  4:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: Peter Zijlstra, LKML, Andrew Morton, Paul Menage,
	Arjan van de Ven, linux-fsdevel

On Tue, Feb 10, 2009 at 11:06:39AM +0800, Li Zefan wrote:

> BTW, I found another bug in current code:

Why is that a bug?  If _anything_ had been trying to acquire the
rwsem in question, it would be fscked anyway.  Not to mention
that nothing could have ever seen that struct super_block in this
particular case, as a general rule
	* if something might be blocked on your mutex/rwsem or spinning
on a spinlock, releasing it before you free the object won't save your
arse.
	You have no promise whatsoever that whoever's been trying to
get the lock in question will even get out of the locking primitive
before the memory that contains the lock gets freed.  In case of superblocks
in general, you don't free them until ->s_count hits zero.  At that point
anything as much as remembering the address of that superblock is already
FUBAR.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-10  4:37         ` Al Viro
@ 2009-02-10  5:19           ` Li Zefan
  2009-02-10  6:07             ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Li Zefan @ 2009-02-10  5:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, LKML, Andrew Morton, Paul Menage,
	Arjan van de Ven, linux-fsdevel

Al Viro wrote:
> On Tue, Feb 10, 2009 at 11:06:39AM +0800, Li Zefan wrote:
> 
>> BTW, I found another bug in current code:
> 
> Why is that a bug?  If _anything_ had been trying to acquire the
> rwsem in question, it would be fscked anyway.  Not to mention
> that nothing could have ever seen that struct super_block in this
> particular case, as a general rule
> 	* if something might be blocked on your mutex/rwsem or spinning
> on a spinlock, releasing it before you free the object won't save your
> arse.
> 	You have no promise whatsoever that whoever's been trying to
> get the lock in question will even get out of the locking primitive
> before the memory that contains the lock gets freed.  In case of superblocks
> in general, you don't free them until ->s_count hits zero.  At that point
> anything as much as remembering the address of that superblock is already
> FUBAR.
> 

This is not the general case. This sb won't be seen by anyone, and destroy_super()
is called on a sb with ->s_count == 1 and ->s_umount held.

Actually I ran into this after Peter's patch applied:

=========================
[ BUG: held lock freed! ]
-------------------------
mount/13413 is freeing memory e767a800-e767abff, with a lock still held there!
 (&type->s_umount_key#29/1){--..}, at: [<c04a4104>] sget+0x1ea/0x324
2 locks held by mount/13413:
 #0:  (&type->s_umount_key#29/1){--..}, at: [<c04a4104>] sget+0x1ea/0x324
 #1:  (&type->s_umount_key#30){--..}, at: [<c04a3f72>] sget+0x58/0x324

stack backtrace:
Pid: 13413, comm: mount Not tainted 2.6.29-rc4 #548
Call Trace:
 [<c044d865>] debug_check_no_locks_freed+0xc9/0x105
 [<c049c86d>] kfree+0x82/0xd1
 [<c04a41e0>] ? sget+0x2c6/0x324
 [<c04a41e0>] sget+0x2c6/0x324
 [<c045dda9>] ? cgroup_set_super+0x0/0x3e
 [<c045cf6f>] ? cgroup_test_super+0x0/0x2f
 [<c045f840>] cgroup_get_sb+0x8d/0x284
 [<c0489216>] ? kstrdup+0x31/0x53
 [<c04a46aa>] vfs_kern_mount+0x40/0x7b
 [<c04a4733>] do_kern_mount+0x37/0xbf
 [<c04b5dc6>] do_mount+0x5c4/0x61b
 [<c04b477a>] ? copy_mount_options+0x2c/0x111
 [<c04b5e86>] sys_mount+0x69/0xa0
 [<c0403351>] sysenter_do_call+0x12/0x31


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-10  5:19           ` Li Zefan
@ 2009-02-10  6:07             ` Al Viro
  2009-02-10  9:25               ` Li Zefan
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2009-02-10  6:07 UTC (permalink / raw)
  To: Li Zefan
  Cc: Peter Zijlstra, LKML, Andrew Morton, Paul Menage,
	Arjan van de Ven, linux-fsdevel

On Tue, Feb 10, 2009 at 01:19:17PM +0800, Li Zefan wrote:
> > 	You have no promise whatsoever that whoever's been trying to
> > get the lock in question will even get out of the locking primitive
> > before the memory that contains the lock gets freed.  In case of superblocks
> > in general, you don't free them until ->s_count hits zero.  At that point
> > anything as much as remembering the address of that superblock is already
> > FUBAR.
> > 
> 
> This is not the general case. This sb won't be seen by anyone, and destroy_super()
> is called on a sb with ->s_count == 1 and ->s_umount held.

... so in this case we have even a stronger warranty of everything being
OK with freeing it while locked.  "Nothing has ever seen its address"
means that entire struct contents is fair game...

As for the other question, you are leaving a reference to root hanging from
superblock still on the list (grab_super() will fail on it, but that's it)
and you have code that might look into the damn thing (test callback you
pass to sget()).  Dereferencing pointers to freed objects is not nice, to
put it mildly...

BTW, which dentries are going to stick around until that point?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-10  6:07             ` Al Viro
@ 2009-02-10  9:25               ` Li Zefan
  2009-02-12  6:14                 ` Li Zefan
  0 siblings, 1 reply; 12+ messages in thread
From: Li Zefan @ 2009-02-10  9:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, LKML, Andrew Morton, Paul Menage,
	Arjan van de Ven, linux-fsdevel

Al Viro wrote:
> On Tue, Feb 10, 2009 at 01:19:17PM +0800, Li Zefan wrote:
>>> 	You have no promise whatsoever that whoever's been trying to
>>> get the lock in question will even get out of the locking primitive
>>> before the memory that contains the lock gets freed.  In case of superblocks
>>> in general, you don't free them until ->s_count hits zero.  At that point
>>> anything as much as remembering the address of that superblock is already
>>> FUBAR.
>>>
>> This is not the general case. This sb won't be seen by anyone, and destroy_super()
>> is called on a sb with ->s_count == 1 and ->s_umount held.
> 
> ... so in this case we have even a stronger warranty of everything being
> OK with freeing it while locked.  "Nothing has ever seen its address"
> means that entire struct contents is fair game...
> 

Yes, this won't cause bad things, but I think it's better to make lock/unlock
consistent, and we have to make lockdep happy.

> As for the other question, you are leaving a reference to root hanging from
> superblock still on the list (grab_super() will fail on it, but that's it)
> and you have code that might look into the damn thing (test callback you
> pass to sget()).  Dereferencing pointers to freed objects is not nice, to
> put it mildly...
> 

It's clear to me now, thanks for the explanation. Though I failed to trigger
this bug, I managed to trigger it if I set sb->s_fs_info to NULL just after
kfree(root).

> BTW, which dentries are going to stick around until that point?
> 

Not sure if I got what you mean. cgroup_kill_sb() will be called only if there
are no sub-dirs.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-10  9:25               ` Li Zefan
@ 2009-02-12  6:14                 ` Li Zefan
  0 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2009-02-12  6:14 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, LKML, Andrew Morton, Paul Menage,
	Arjan van de Ven, linux-fsdevel

>> BTW, which dentries are going to stick around until that point?
>>
> 
> Not sure if I got what you mean. cgroup_kill_sb() will be called only if there
> are no sub-dirs.
> 

Though at this point there won't be any sub-dirs, we have dentries of all the
control files (normal file) in the root dir.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [cgroup or VFS ?] INFO: possible recursive locking detected
  2009-02-10  3:06       ` Li Zefan
  2009-02-10  4:37         ` Al Viro
@ 2009-02-10  8:32         ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2009-02-10  8:32 UTC (permalink / raw)
  To: Li Zefan
  Cc: Al Viro, LKML, Andrew Morton, Paul Menage, Arjan van de Ven,
	linux-fsdevel

On Tue, 2009-02-10 at 11:06 +0800, Li Zefan wrote:
> > It seems to me we can simply put the new s_umount instance in a
> > different subclass. Its a bit unusual to use _nested for the outer lock,
> > but lockdep doesn't particularly cares about subclass order.
> > 
> > If there's any issue with the callers of sget() assuming the s_umount
> > lock being of sublcass 0, then there is another annotation we can use to
> > fix that, but lets not bother with that if this is sufficient.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> Tested-by: Li Zefan <lizf@cn.fujitsu.com>
> 
> Thanks!
> 
> a minor comment
> 
> > +		 * lock of the old one. Since these are clearly distrinct
> 
> s/distrinct/distinct

Yes, someone else was kind enough to point that out as well :-)

Al, do you want a resend or will you fix that up when you add the patch
to your queue?

> BTW, I found another bug in current code:

Yep, looks good, freeing held locks makes lockdep unhappy.

> From: Li Zefan <lizf@cn.fujitsu.com>
> Date: Tue, 10 Feb 2009 10:55:53 +0800
> Subject: [PATCH] vfs: add missing unlock in sget()
> 
> We should release s->s_umount before calling destroy_super(s).
> 
> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
> ---


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-02-12  6:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-05  3:23 [cgroup or VFS ?] INFO: possible recursive locking detected Li Zefan
2009-01-08  3:45 ` Li Zefan
2009-02-09 11:23   ` Al Viro
2009-02-09 11:38     ` Li Zefan
2009-02-09 11:48     ` Peter Zijlstra
2009-02-10  3:06       ` Li Zefan
2009-02-10  4:37         ` Al Viro
2009-02-10  5:19           ` Li Zefan
2009-02-10  6:07             ` Al Viro
2009-02-10  9:25               ` Li Zefan
2009-02-12  6:14                 ` Li Zefan
2009-02-10  8:32         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox