* Re: [tree] latest kill-the-BKL tree, v12 [not found] ` <a4423d670904151558r4252c7eamd115793fb36a9163@mail.gmail.com> @ 2009-04-15 23:07 ` Ingo Molnar 2009-04-15 23:13 ` Trond Myklebust 2009-04-15 23:35 ` Frederic Weisbecker 0 siblings, 2 replies; 6+ messages in thread From: Ingo Molnar @ 2009-04-15 23:07 UTC (permalink / raw) To: Alexander Beregalov, Peter Zijlstra, linux-nfs, netdev Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Jeff Mahoney, ReiserFS Development List, Chris Mason * Alexander Beregalov <a.beregalov@gmail.com> wrote: > 2009/4/14 Ingo Molnar <mingo@elte.hu>: > > > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > >> On Tue, Apr 14, 2009 at 05:34:22AM +0200, Frederic Weisbecker wrote: > >> > Ingo, > >> > > >> > This small patchset fixes some deadlocks I've faced after trying > >> > some pressures with dbench on a reiserfs partition. > >> > > >> > There is still some work pending such as adding some checks to ensure we > >> > _always_ release the lock before sleeping, as you suggested. > >> > Also I have to fix a lockdep warning reported by Alessio Igor Bogani. > >> > And also some optimizations.... > >> > > >> > Thanks, > >> > Frederic. > >> > > >> > Frederic Weisbecker (3): > >> > kill-the-BKL/reiserfs: provide a tool to lock only once the write lock > >> > kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file > >> > kill-the-BKL/reiserfs: only acquire the write lock once in > >> > reiserfs_dirty_inode > >> > > >> > fs/reiserfs/inode.c | 10 +++++++--- > >> > fs/reiserfs/lock.c | 26 ++++++++++++++++++++++++++ > >> > fs/reiserfs/super.c | 15 +++++++++------ > >> > include/linux/reiserfs_fs.h | 2 ++ > >> > 4 files changed, 44 insertions(+), 9 deletions(-) > >> > > >> > >> Hi > >> > >> The same test - dbench on reiserfs on loop on sparc64. > >> > >> [ INFO: possible circular locking dependency detected ] > >> 2.6.30-rc1-00457-gb21597d-dirty #2 > > > > I'm wondering ... your version hash suggests you used vanilla > > upstream as a base for your test. There's a string of other fixes > > from Frederic in tip:core/kill-the-BKL branch, have you picked them > > all up when you did your testing? > > > > The most coherent way to test this would be to pick up the latest > > core/kill-the-BKL git tree from: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/kill-the-BKL > > > > I did not know about this branch, now I am testing it and there is > no more problem with that testcase (dbench). > > I will continue testing. thanks for testing it! It seems reiserfs with Frederic's changes appears to be more stable now on your system. I saw your NFS circular locking kill-the-BKL problem report on LKML - also attached below. Hopefully someone on the Cc: list with NFS experience can point out the BKL assumption that is causing this. Ingo ----- Forwarded message from Alexander Beregalov <a.beregalov@gmail.com> ----- Date: Wed, 15 Apr 2009 22:08:01 +0400 From: Alexander Beregalov <a.beregalov@gmail.com> To: linux-kernel <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>, linux-nfs@vger.kernel.org Subject: [core/kill-the-BKL] nfs3: possible circular locking dependency Hi I have pulled core/kill-the-BKL on top of 2.6.30-rc2. device: '0:18': device_add ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc2-00057-g30aa902-dirty #5 ------------------------------------------------------- mount.nfs/1740 is trying to acquire lock: (kernel_mutex){+.+.+.}, at: [<00000000006f32dc>] lock_kernel+0x28/0x3c but task is already holding lock: (&type->s_umount_key#24/1){+.+.+.}, at: [<00000000004b88a0>] sget+0x228/0x36c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&type->s_umount_key#24/1){+.+.+.}: [<00000000004776d0>] lock_acquire+0x5c/0x74 [<0000000000469f5c>] down_write_nested+0x38/0x50 [<00000000004b88a0>] sget+0x228/0x36c [<00000000005688fc>] nfs_get_sb+0x80c/0xa7c [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 [<00000000004b7f84>] do_kern_mount+0x30/0xcc [<00000000004cf300>] do_mount+0x7c8/0x80c [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 -> #0 (kernel_mutex){+.+.+.}: [<00000000004776d0>] lock_acquire+0x5c/0x74 [<00000000006f0ebc>] mutex_lock_nested+0x48/0x380 [<00000000006f32dc>] lock_kernel+0x28/0x3c [<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c [<00000000006f0620>] __wait_on_bit+0x64/0xc0 [<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c [<00000000006d2938>] __rpc_execute+0x150/0x2b4 [<00000000006d2ac0>] rpc_execute+0x24/0x34 [<00000000006cc338>] rpc_run_task+0x64/0x74 [<00000000006cc474>] rpc_call_sync+0x58/0x7c [<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0 [<0000000000572024>] do_proc_get_root+0x6c/0x10c [<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c [<000000000056401c>] nfs_get_root+0x34/0x17c [<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 [<00000000004b7f84>] do_kern_mount+0x30/0xcc [<00000000004cf300>] do_mount+0x7c8/0x80c [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 other info that might help us debug this: 1 lock held by mount.nfs/1740: #0: (&type->s_umount_key#24/1){+.+.+.}, at: [<00000000004b88a0>] sget+0x228/0x36c stack backtrace: Call Trace: [00000000004755ac] print_circular_bug_tail+0xfc/0x10c [0000000000476e24] __lock_acquire+0x12f0/0x1b40 [00000000004776d0] lock_acquire+0x5c/0x74 [00000000006f0ebc] mutex_lock_nested+0x48/0x380 [00000000006f32dc] lock_kernel+0x28/0x3c [00000000006d20ec] rpc_wait_bit_killable+0x64/0x8c [00000000006f0620] __wait_on_bit+0x64/0xc0 [00000000006f06e4] out_of_line_wait_on_bit+0x68/0x7c [00000000006d2938] __rpc_execute+0x150/0x2b4 [00000000006d2ac0] rpc_execute+0x24/0x34 [00000000006cc338] rpc_run_task+0x64/0x74 [00000000006cc474] rpc_call_sync+0x58/0x7c [00000000005717b0] nfs3_rpc_wrapper+0x24/0xa0 [0000000000572024] do_proc_get_root+0x6c/0x10c [00000000005720dc] nfs3_proc_get_root+0x18/0x5c [000000000056401c] nfs_get_root+0x34/0x17c device: '0:19': device_add ----- End forwarded message ----- -- To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [tree] latest kill-the-BKL tree, v12 2009-04-15 23:07 ` [tree] latest kill-the-BKL tree, v12 Ingo Molnar @ 2009-04-15 23:13 ` Trond Myklebust 2009-04-15 23:35 ` Frederic Weisbecker 1 sibling, 0 replies; 6+ messages in thread From: Trond Myklebust @ 2009-04-15 23:13 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Beregalov, Peter Zijlstra, linux-nfs, netdev, Frederic Weisbecker, LKML, Alessio Igor Bogani, Jeff Mahoney, ReiserFS Development List, Chris Mason On Thu, 2009-04-16 at 01:07 +0200, Ingo Molnar wrote: > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > 2009/4/14 Ingo Molnar <mingo@elte.hu>: > > > > > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > > > >> On Tue, Apr 14, 2009 at 05:34:22AM +0200, Frederic Weisbecker wrote: > > >> > Ingo, > > >> > > > >> > This small patchset fixes some deadlocks I've faced after trying > > >> > some pressures with dbench on a reiserfs partition. > > >> > > > >> > There is still some work pending such as adding some checks to ensure we > > >> > _always_ release the lock before sleeping, as you suggested. > > >> > Also I have to fix a lockdep warning reported by Alessio Igor Bogani. > > >> > And also some optimizations.... > > >> > > > >> > Thanks, > > >> > Frederic. > > >> > > > >> > Frederic Weisbecker (3): > > >> > kill-the-BKL/reiserfs: provide a tool to lock only once the write lock > > >> > kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file > > >> > kill-the-BKL/reiserfs: only acquire the write lock once in > > >> > reiserfs_dirty_inode > > >> > > > >> > fs/reiserfs/inode.c | 10 +++++++--- > > >> > fs/reiserfs/lock.c | 26 ++++++++++++++++++++++++++ > > >> > fs/reiserfs/super.c | 15 +++++++++------ > > >> > include/linux/reiserfs_fs.h | 2 ++ > > >> > 4 files changed, 44 insertions(+), 9 deletions(-) > > >> > > > >> > > >> Hi > > >> > > >> The same test - dbench on reiserfs on loop on sparc64. > > >> > > >> [ INFO: possible circular locking dependency detected ] > > >> 2.6.30-rc1-00457-gb21597d-dirty #2 > > > > > > I'm wondering ... your version hash suggests you used vanilla > > > upstream as a base for your test. There's a string of other fixes > > > from Frederic in tip:core/kill-the-BKL branch, have you picked them > > > all up when you did your testing? > > > > > > The most coherent way to test this would be to pick up the latest > > > core/kill-the-BKL git tree from: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/kill-the-BKL > > > > > > > I did not know about this branch, now I am testing it and there is > > no more problem with that testcase (dbench). > > > > I will continue testing. > > thanks for testing it! It seems reiserfs with Frederic's changes > appears to be more stable now on your system. > > I saw your NFS circular locking kill-the-BKL problem report on LKML > - also attached below. > > Hopefully someone on the Cc: list with NFS experience can point out > the BKL assumption that is causing this. I have no idea what Alexander is seeing. There should be no BKL dependencies at all left in the RPC client code. Most of the NFS client code is clean too, with only the posix lock code, and the NFSv4 callback server remaining... Cheers Trond ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [tree] latest kill-the-BKL tree, v12 2009-04-15 23:07 ` [tree] latest kill-the-BKL tree, v12 Ingo Molnar 2009-04-15 23:13 ` Trond Myklebust @ 2009-04-15 23:35 ` Frederic Weisbecker 2009-04-16 8:51 ` Ingo Molnar 1 sibling, 1 reply; 6+ messages in thread From: Frederic Weisbecker @ 2009-04-15 23:35 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Beregalov, Peter Zijlstra, linux-nfs, netdev, LKML, Alessio Igor Bogani, Jeff Mahoney, ReiserFS Development List, Chris Mason On Thu, Apr 16, 2009 at 01:07:36AM +0200, Ingo Molnar wrote: > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > 2009/4/14 Ingo Molnar <mingo@elte.hu>: > > > > > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > > > >> On Tue, Apr 14, 2009 at 05:34:22AM +0200, Frederic Weisbecker wrote: > > >> > Ingo, > > >> > > > >> > This small patchset fixes some deadlocks I've faced after trying > > >> > some pressures with dbench on a reiserfs partition. > > >> > > > >> > There is still some work pending such as adding some checks to ensure we > > >> > _always_ release the lock before sleeping, as you suggested. > > >> > Also I have to fix a lockdep warning reported by Alessio Igor Bogani. > > >> > And also some optimizations.... > > >> > > > >> > Thanks, > > >> > Frederic. > > >> > > > >> > Frederic Weisbecker (3): > > >> > kill-the-BKL/reiserfs: provide a tool to lock only once the write lock > > >> > kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file > > >> > kill-the-BKL/reiserfs: only acquire the write lock once in > > >> > reiserfs_dirty_inode > > >> > > > >> > fs/reiserfs/inode.c | 10 +++++++--- > > >> > fs/reiserfs/lock.c | 26 ++++++++++++++++++++++++++ > > >> > fs/reiserfs/super.c | 15 +++++++++------ > > >> > include/linux/reiserfs_fs.h | 2 ++ > > >> > 4 files changed, 44 insertions(+), 9 deletions(-) > > >> > > > >> > > >> Hi > > >> > > >> The same test - dbench on reiserfs on loop on sparc64. > > >> > > >> [ INFO: possible circular locking dependency detected ] > > >> 2.6.30-rc1-00457-gb21597d-dirty #2 > > > > > > I'm wondering ... your version hash suggests you used vanilla > > > upstream as a base for your test. There's a string of other fixes > > > from Frederic in tip:core/kill-the-BKL branch, have you picked them > > > all up when you did your testing? > > > > > > The most coherent way to test this would be to pick up the latest > > > core/kill-the-BKL git tree from: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/kill-the-BKL > > > > > > > I did not know about this branch, now I am testing it and there is > > no more problem with that testcase (dbench). > > > > I will continue testing. > > thanks for testing it! It seems reiserfs with Frederic's changes > appears to be more stable now on your system. Yeah, thanks a lot for this testing! > I saw your NFS circular locking kill-the-BKL problem report on LKML > - also attached below. > > Hopefully someone on the Cc: list with NFS experience can point out > the BKL assumption that is causing this. > > Ingo > > ----- Forwarded message from Alexander Beregalov <a.beregalov@gmail.com> ----- > > Date: Wed, 15 Apr 2009 22:08:01 +0400 > From: Alexander Beregalov <a.beregalov@gmail.com> > To: linux-kernel <linux-kernel@vger.kernel.org>, > Ingo Molnar <mingo@elte.hu>, linux-nfs@vger.kernel.org > Subject: [core/kill-the-BKL] nfs3: possible circular locking dependency > > Hi > > I have pulled core/kill-the-BKL on top of 2.6.30-rc2. > > device: '0:18': device_add > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.30-rc2-00057-g30aa902-dirty #5 > ------------------------------------------------------- > mount.nfs/1740 is trying to acquire lock: > (kernel_mutex){+.+.+.}, at: [<00000000006f32dc>] lock_kernel+0x28/0x3c > > but task is already holding lock: > (&type->s_umount_key#24/1){+.+.+.}, at: [<00000000004b88a0>] sget+0x228/0x36c > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&type->s_umount_key#24/1){+.+.+.}: > [<00000000004776d0>] lock_acquire+0x5c/0x74 > [<0000000000469f5c>] down_write_nested+0x38/0x50 > [<00000000004b88a0>] sget+0x228/0x36c > [<00000000005688fc>] nfs_get_sb+0x80c/0xa7c > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > [<00000000004cf300>] do_mount+0x7c8/0x80c > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 > > -> #0 (kernel_mutex){+.+.+.}: > [<00000000004776d0>] lock_acquire+0x5c/0x74 > [<00000000006f0ebc>] mutex_lock_nested+0x48/0x380 > [<00000000006f32dc>] lock_kernel+0x28/0x3c > [<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c > [<00000000006f0620>] __wait_on_bit+0x64/0xc0 > [<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c > [<00000000006d2938>] __rpc_execute+0x150/0x2b4 > [<00000000006d2ac0>] rpc_execute+0x24/0x34 > [<00000000006cc338>] rpc_run_task+0x64/0x74 > [<00000000006cc474>] rpc_call_sync+0x58/0x7c > [<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0 > [<0000000000572024>] do_proc_get_root+0x6c/0x10c > [<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c > [<000000000056401c>] nfs_get_root+0x34/0x17c > [<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > [<00000000004cf300>] do_mount+0x7c8/0x80c > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 This is still the dependency between bkl and s_umount_key that has been reported recently. I wonder if this is not a problem in the fs layer. I should investigate on it. Thanks. > other info that might help us debug this: > > 1 lock held by mount.nfs/1740: > #0: (&type->s_umount_key#24/1){+.+.+.}, at: [<00000000004b88a0>] > sget+0x228/0x36c > > stack backtrace: > Call Trace: > [00000000004755ac] print_circular_bug_tail+0xfc/0x10c > [0000000000476e24] __lock_acquire+0x12f0/0x1b40 > [00000000004776d0] lock_acquire+0x5c/0x74 > [00000000006f0ebc] mutex_lock_nested+0x48/0x380 > [00000000006f32dc] lock_kernel+0x28/0x3c > [00000000006d20ec] rpc_wait_bit_killable+0x64/0x8c > [00000000006f0620] __wait_on_bit+0x64/0xc0 > [00000000006f06e4] out_of_line_wait_on_bit+0x68/0x7c > [00000000006d2938] __rpc_execute+0x150/0x2b4 > [00000000006d2ac0] rpc_execute+0x24/0x34 > [00000000006cc338] rpc_run_task+0x64/0x74 > [00000000006cc474] rpc_call_sync+0x58/0x7c > [00000000005717b0] nfs3_rpc_wrapper+0x24/0xa0 > [0000000000572024] do_proc_get_root+0x6c/0x10c > [00000000005720dc] nfs3_proc_get_root+0x18/0x5c > [000000000056401c] nfs_get_root+0x34/0x17c > device: '0:19': device_add > > ----- End forwarded message ----- -- To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [tree] latest kill-the-BKL tree, v12 2009-04-15 23:35 ` Frederic Weisbecker @ 2009-04-16 8:51 ` Ingo Molnar [not found] ` <20090416085153.GC9813-X9Un+BFzKDI@public.gmane.org> 2009-04-16 16:40 ` Frederic Weisbecker 0 siblings, 2 replies; 6+ messages in thread From: Ingo Molnar @ 2009-04-16 8:51 UTC (permalink / raw) To: Frederic Weisbecker Cc: Alexander Beregalov, Peter Zijlstra, linux-nfs, netdev, LKML, Alessio Igor Bogani, Jeff Mahoney, ReiserFS Development List, Chris Mason * Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Thu, Apr 16, 2009 at 01:07:36AM +0200, Ingo Molnar wrote: > > > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > > > 2009/4/14 Ingo Molnar <mingo@elte.hu>: > > > > > > > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > > > > > >> On Tue, Apr 14, 2009 at 05:34:22AM +0200, Frederic Weisbecker wrote: > > > >> > Ingo, > > > >> > > > > >> > This small patchset fixes some deadlocks I've faced after trying > > > >> > some pressures with dbench on a reiserfs partition. > > > >> > > > > >> > There is still some work pending such as adding some checks to ensure we > > > >> > _always_ release the lock before sleeping, as you suggested. > > > >> > Also I have to fix a lockdep warning reported by Alessio Igor Bogani. > > > >> > And also some optimizations.... > > > >> > > > > >> > Thanks, > > > >> > Frederic. > > > >> > > > > >> > Frederic Weisbecker (3): > > > >> > kill-the-BKL/reiserfs: provide a tool to lock only once the write lock > > > >> > kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file > > > >> > kill-the-BKL/reiserfs: only acquire the write lock once in > > > >> > reiserfs_dirty_inode > > > >> > > > > >> > fs/reiserfs/inode.c | 10 +++++++--- > > > >> > fs/reiserfs/lock.c | 26 ++++++++++++++++++++++++++ > > > >> > fs/reiserfs/super.c | 15 +++++++++------ > > > >> > include/linux/reiserfs_fs.h | 2 ++ > > > >> > 4 files changed, 44 insertions(+), 9 deletions(-) > > > >> > > > > >> > > > >> Hi > > > >> > > > >> The same test - dbench on reiserfs on loop on sparc64. > > > >> > > > >> [ INFO: possible circular locking dependency detected ] > > > >> 2.6.30-rc1-00457-gb21597d-dirty #2 > > > > > > > > I'm wondering ... your version hash suggests you used vanilla > > > > upstream as a base for your test. There's a string of other fixes > > > > from Frederic in tip:core/kill-the-BKL branch, have you picked them > > > > all up when you did your testing? > > > > > > > > The most coherent way to test this would be to pick up the latest > > > > core/kill-the-BKL git tree from: > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/kill-the-BKL > > > > > > > > > > I did not know about this branch, now I am testing it and there is > > > no more problem with that testcase (dbench). > > > > > > I will continue testing. > > > > thanks for testing it! It seems reiserfs with Frederic's changes > > appears to be more stable now on your system. > > > > > Yeah, thanks a lot for this testing! > > > > > I saw your NFS circular locking kill-the-BKL problem report on LKML > > - also attached below. > > > > Hopefully someone on the Cc: list with NFS experience can point out > > the BKL assumption that is causing this. > > > > Ingo > > > > ----- Forwarded message from Alexander Beregalov <a.beregalov@gmail.com> ----- > > > > Date: Wed, 15 Apr 2009 22:08:01 +0400 > > From: Alexander Beregalov <a.beregalov@gmail.com> > > To: linux-kernel <linux-kernel@vger.kernel.org>, > > Ingo Molnar <mingo@elte.hu>, linux-nfs@vger.kernel.org > > Subject: [core/kill-the-BKL] nfs3: possible circular locking dependency > > > > Hi > > > > I have pulled core/kill-the-BKL on top of 2.6.30-rc2. > > > > device: '0:18': device_add > > > > ======================================================= > > [ INFO: possible circular locking dependency detected ] > > 2.6.30-rc2-00057-g30aa902-dirty #5 > > ------------------------------------------------------- > > mount.nfs/1740 is trying to acquire lock: > > (kernel_mutex){+.+.+.}, at: [<00000000006f32dc>] lock_kernel+0x28/0x3c > > > > but task is already holding lock: > > (&type->s_umount_key#24/1){+.+.+.}, at: [<00000000004b88a0>] sget+0x228/0x36c > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #1 (&type->s_umount_key#24/1){+.+.+.}: > > [<00000000004776d0>] lock_acquire+0x5c/0x74 > > [<0000000000469f5c>] down_write_nested+0x38/0x50 > > [<00000000004b88a0>] sget+0x228/0x36c > > [<00000000005688fc>] nfs_get_sb+0x80c/0xa7c > > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > > [<00000000004cf300>] do_mount+0x7c8/0x80c > > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 > > > > -> #0 (kernel_mutex){+.+.+.}: > > [<00000000004776d0>] lock_acquire+0x5c/0x74 > > [<00000000006f0ebc>] mutex_lock_nested+0x48/0x380 > > [<00000000006f32dc>] lock_kernel+0x28/0x3c > > [<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c > > [<00000000006f0620>] __wait_on_bit+0x64/0xc0 > > [<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c > > [<00000000006d2938>] __rpc_execute+0x150/0x2b4 > > [<00000000006d2ac0>] rpc_execute+0x24/0x34 > > [<00000000006cc338>] rpc_run_task+0x64/0x74 > > [<00000000006cc474>] rpc_call_sync+0x58/0x7c > > [<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0 > > [<0000000000572024>] do_proc_get_root+0x6c/0x10c > > [<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c > > [<000000000056401c>] nfs_get_root+0x34/0x17c > > [<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c > > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > > [<00000000004cf300>] do_mount+0x7c8/0x80c > > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 > > > > > This is still the dependency between bkl and s_umount_key that has > been reported recently. I wonder if this is not a problem in the > fs layer. I should investigate on it. The problem seem to be that this NFS call context: -> #0 (kernel_mutex){+.+.+.}: [<00000000004776d0>] lock_acquire+0x5c/0x74 [<00000000006f0ebc>] mutex_lock_nested+0x48/0x380 [<00000000006f32dc>] lock_kernel+0x28/0x3c [<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c [<00000000006f0620>] __wait_on_bit+0x64/0xc0 [<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c [<00000000006d2938>] __rpc_execute+0x150/0x2b4 [<00000000006d2ac0>] rpc_execute+0x24/0x34 [<00000000006cc338>] rpc_run_task+0x64/0x74 [<00000000006cc474>] rpc_call_sync+0x58/0x7c [<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0 [<0000000000572024>] do_proc_get_root+0x6c/0x10c [<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c [<000000000056401c>] nfs_get_root+0x34/0x17c [<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 [<00000000004b7f84>] do_kern_mount+0x30/0xcc [<00000000004cf300>] do_mount+0x7c8/0x80c [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 Can be called with the BKL held - and then it schedule()s with the BKL held, creating dependencies. I did the quick hack below (a year ago! :-) but indeed that's probably wrong: we just drop and then re-acquire the BKL at a very low level - inverting the dependency chain. It's not a problem of the NFS code, it's the probem of vfs_kern_mount taking the BKL. Maybe it would be better if nfs_get_sb() dropped the BKL (knowing that it's called with the BKL held) - since it does not rely on the BKL? Not rpc_wait_bit_killable(). Ingo --------------> From 352e0d25def53e6b36234e4dc2083ca7f5d712a9 Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@elte.hu> Date: Wed, 14 May 2008 17:31:41 +0200 Subject: [PATCH] remove the BKL: restructure NFS code the naked schedule() in rpc_wait_bit_killable() caused the BKL to be auto-dropped in the past. avoid the immediate hang in such code. Note that this still leaves some other locking dependencies to be sorted out in the NFS code. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- net/sunrpc/sched.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 6eab9bf..e12e571 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -224,9 +224,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); static int rpc_wait_bit_killable(void *word) { + int bkl = kernel_locked(); + if (fatal_signal_pending(current)) return -ERESTARTSYS; + if (bkl) + unlock_kernel(); schedule(); + if (bkl) + lock_kernel(); return 0; } ^ permalink raw reply related [flat|nested] 6+ messages in thread
[parent not found: <20090416085153.GC9813-X9Un+BFzKDI@public.gmane.org>]
* Re: [tree] latest kill-the-BKL tree, v12 [not found] ` <20090416085153.GC9813-X9Un+BFzKDI@public.gmane.org> @ 2009-04-16 14:35 ` Alessio Igor Bogani 0 siblings, 0 replies; 6+ messages in thread From: Alessio Igor Bogani @ 2009-04-16 14:35 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Alexander Beregalov, Peter Zijlstra, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, LKML, Jeff Mahoney, ReiserFS Development List, Chris Mason Dear Sir Molnar, 2009/4/16 Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>: [...] >> This is still the dependency between bkl and s_umount_key that has >> been reported recently. I wonder if this is not a problem in the >> fs layer. I should investigate on it. > > The problem seem to be that this NFS call context: > > -> #0 (kernel_mutex){+.+.+.}: > [<00000000004776d0>] lock_acquire+0x5c/0x74 > [<00000000006f0ebc>] mutex_lock_nested+0x48/0x380 > [<00000000006f32dc>] lock_kernel+0x28/0x3c > [<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c > [<00000000006f0620>] __wait_on_bit+0x64/0xc0 > [<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c > [<00000000006d2938>] __rpc_execute+0x150/0x2b4 > [<00000000006d2ac0>] rpc_execute+0x24/0x34 > [<00000000006cc338>] rpc_run_task+0x64/0x74 > [<00000000006cc474>] rpc_call_sync+0x58/0x7c > [<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0 > [<0000000000572024>] do_proc_get_root+0x6c/0x10c > [<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c > [<000000000056401c>] nfs_get_root+0x34/0x17c > [<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > [<00000000004cf300>] do_mount+0x7c8/0x80c > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 Proposed patch that i just sent (http://marc.info/?l=linux-kernel&m=123989213917572&w=2) seems fix the lock dependency. I don't know if it is the right way to solve the problem in any case but it works on my laptop, at least. Ciao, Alessio -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [tree] latest kill-the-BKL tree, v12 2009-04-16 8:51 ` Ingo Molnar [not found] ` <20090416085153.GC9813-X9Un+BFzKDI@public.gmane.org> @ 2009-04-16 16:40 ` Frederic Weisbecker 1 sibling, 0 replies; 6+ messages in thread From: Frederic Weisbecker @ 2009-04-16 16:40 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Beregalov, Peter Zijlstra, linux-nfs, netdev, LKML, Alessio Igor Bogani, Jeff Mahoney, ReiserFS Development List, Chris Mason On Thu, Apr 16, 2009 at 10:51:53AM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > On Thu, Apr 16, 2009 at 01:07:36AM +0200, Ingo Molnar wrote: > > > > > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > > > > > 2009/4/14 Ingo Molnar <mingo@elte.hu>: > > > > > > > > > > * Alexander Beregalov <a.beregalov@gmail.com> wrote: > > > > > > > > > >> On Tue, Apr 14, 2009 at 05:34:22AM +0200, Frederic Weisbecker wrote: > > > > >> > Ingo, > > > > >> > > > > > >> > This small patchset fixes some deadlocks I've faced after trying > > > > >> > some pressures with dbench on a reiserfs partition. > > > > >> > > > > > >> > There is still some work pending such as adding some checks to ensure we > > > > >> > _always_ release the lock before sleeping, as you suggested. > > > > >> > Also I have to fix a lockdep warning reported by Alessio Igor Bogani. > > > > >> > And also some optimizations.... > > > > >> > > > > > >> > Thanks, > > > > >> > Frederic. > > > > >> > > > > > >> > Frederic Weisbecker (3): > > > > >> > kill-the-BKL/reiserfs: provide a tool to lock only once the write lock > > > > >> > kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file > > > > >> > kill-the-BKL/reiserfs: only acquire the write lock once in > > > > >> > reiserfs_dirty_inode > > > > >> > > > > > >> > fs/reiserfs/inode.c | 10 +++++++--- > > > > >> > fs/reiserfs/lock.c | 26 ++++++++++++++++++++++++++ > > > > >> > fs/reiserfs/super.c | 15 +++++++++------ > > > > >> > include/linux/reiserfs_fs.h | 2 ++ > > > > >> > 4 files changed, 44 insertions(+), 9 deletions(-) > > > > >> > > > > > >> > > > > >> Hi > > > > >> > > > > >> The same test - dbench on reiserfs on loop on sparc64. > > > > >> > > > > >> [ INFO: possible circular locking dependency detected ] > > > > >> 2.6.30-rc1-00457-gb21597d-dirty #2 > > > > > > > > > > I'm wondering ... your version hash suggests you used vanilla > > > > > upstream as a base for your test. There's a string of other fixes > > > > > from Frederic in tip:core/kill-the-BKL branch, have you picked them > > > > > all up when you did your testing? > > > > > > > > > > The most coherent way to test this would be to pick up the latest > > > > > core/kill-the-BKL git tree from: > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/kill-the-BKL > > > > > > > > > > > > > I did not know about this branch, now I am testing it and there is > > > > no more problem with that testcase (dbench). > > > > > > > > I will continue testing. > > > > > > thanks for testing it! It seems reiserfs with Frederic's changes > > > appears to be more stable now on your system. > > > > > > > > > > Yeah, thanks a lot for this testing! > > > > > > > > > I saw your NFS circular locking kill-the-BKL problem report on LKML > > > - also attached below. > > > > > > Hopefully someone on the Cc: list with NFS experience can point out > > > the BKL assumption that is causing this. > > > > > > Ingo > > > > > > ----- Forwarded message from Alexander Beregalov <a.beregalov@gmail.com> ----- > > > > > > Date: Wed, 15 Apr 2009 22:08:01 +0400 > > > From: Alexander Beregalov <a.beregalov@gmail.com> > > > To: linux-kernel <linux-kernel@vger.kernel.org>, > > > Ingo Molnar <mingo@elte.hu>, linux-nfs@vger.kernel.org > > > Subject: [core/kill-the-BKL] nfs3: possible circular locking dependency > > > > > > Hi > > > > > > I have pulled core/kill-the-BKL on top of 2.6.30-rc2. > > > > > > device: '0:18': device_add > > > > > > ======================================================= > > > [ INFO: possible circular locking dependency detected ] > > > 2.6.30-rc2-00057-g30aa902-dirty #5 > > > ------------------------------------------------------- > > > mount.nfs/1740 is trying to acquire lock: > > > (kernel_mutex){+.+.+.}, at: [<00000000006f32dc>] lock_kernel+0x28/0x3c > > > > > > but task is already holding lock: > > > (&type->s_umount_key#24/1){+.+.+.}, at: [<00000000004b88a0>] sget+0x228/0x36c > > > > > > which lock already depends on the new lock. > > > > > > > > > the existing dependency chain (in reverse order) is: > > > > > > -> #1 (&type->s_umount_key#24/1){+.+.+.}: > > > [<00000000004776d0>] lock_acquire+0x5c/0x74 > > > [<0000000000469f5c>] down_write_nested+0x38/0x50 > > > [<00000000004b88a0>] sget+0x228/0x36c > > > [<00000000005688fc>] nfs_get_sb+0x80c/0xa7c > > > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > > > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > > > [<00000000004cf300>] do_mount+0x7c8/0x80c > > > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > > > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 > > > > > > -> #0 (kernel_mutex){+.+.+.}: > > > [<00000000004776d0>] lock_acquire+0x5c/0x74 > > > [<00000000006f0ebc>] mutex_lock_nested+0x48/0x380 > > > [<00000000006f32dc>] lock_kernel+0x28/0x3c > > > [<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c > > > [<00000000006f0620>] __wait_on_bit+0x64/0xc0 > > > [<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c > > > [<00000000006d2938>] __rpc_execute+0x150/0x2b4 > > > [<00000000006d2ac0>] rpc_execute+0x24/0x34 > > > [<00000000006cc338>] rpc_run_task+0x64/0x74 > > > [<00000000006cc474>] rpc_call_sync+0x58/0x7c > > > [<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0 > > > [<0000000000572024>] do_proc_get_root+0x6c/0x10c > > > [<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c > > > [<000000000056401c>] nfs_get_root+0x34/0x17c > > > [<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c > > > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > > > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > > > [<00000000004cf300>] do_mount+0x7c8/0x80c > > > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > > > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 > > > > > > > > > > This is still the dependency between bkl and s_umount_key that has > > been reported recently. I wonder if this is not a problem in the > > fs layer. I should investigate on it. > > The problem seem to be that this NFS call context: > > -> #0 (kernel_mutex){+.+.+.}: > [<00000000004776d0>] lock_acquire+0x5c/0x74 > [<00000000006f0ebc>] mutex_lock_nested+0x48/0x380 > [<00000000006f32dc>] lock_kernel+0x28/0x3c > [<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c > [<00000000006f0620>] __wait_on_bit+0x64/0xc0 > [<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c > [<00000000006d2938>] __rpc_execute+0x150/0x2b4 > [<00000000006d2ac0>] rpc_execute+0x24/0x34 > [<00000000006cc338>] rpc_run_task+0x64/0x74 > [<00000000006cc474>] rpc_call_sync+0x58/0x7c > [<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0 > [<0000000000572024>] do_proc_get_root+0x6c/0x10c > [<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c > [<000000000056401c>] nfs_get_root+0x34/0x17c > [<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c > [<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4 > [<00000000004b7f84>] do_kern_mount+0x30/0xcc > [<00000000004cf300>] do_mount+0x7c8/0x80c > [<00000000004ed2a4>] compat_sys_mount+0x224/0x274 > [<0000000000406154>] linux_sparc_syscall32+0x34/0x40 > > Can be called with the BKL held - and then it schedule()s with the > BKL held, creating dependencies. I did the quick hack below (a year > ago! :-) but indeed that's probably wrong: we just drop and then > re-acquire the BKL at a very low level - inverting the dependency > chain. Indeed, the problem remains if we do that :-) > It's not a problem of the NFS code, it's the probem of > vfs_kern_mount taking the BKL. Yes, and I think the idea of Alessio to remove the Bkl at this level is the right way. Even though this patch is beeing discussed, I think it opened the right direction to dig. > Maybe it would be better if nfs_get_sb() dropped the BKL (knowing > that it's called with the BKL held) - since it does not rely on the > BKL? Not rpc_wait_bit_killable(). I wonder if it is not dropped because it implicitly protects something else. May be simply concurrent accesses to the superblock? Frederic. > Ingo > > --------------> > From 352e0d25def53e6b36234e4dc2083ca7f5d712a9 Mon Sep 17 00:00:00 2001 > From: Ingo Molnar <mingo@elte.hu> > Date: Wed, 14 May 2008 17:31:41 +0200 > Subject: [PATCH] remove the BKL: restructure NFS code > > the naked schedule() in rpc_wait_bit_killable() caused the BKL to > be auto-dropped in the past. > > avoid the immediate hang in such code. Note that this still leaves > some other locking dependencies to be sorted out in the NFS code. > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > --- > net/sunrpc/sched.c | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > index 6eab9bf..e12e571 100644 > --- a/net/sunrpc/sched.c > +++ b/net/sunrpc/sched.c > @@ -224,9 +224,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); > > static int rpc_wait_bit_killable(void *word) > { > + int bkl = kernel_locked(); > + > if (fatal_signal_pending(current)) > return -ERESTARTSYS; > + if (bkl) > + unlock_kernel(); > schedule(); > + if (bkl) > + lock_kernel(); Yeah as you said, it may not drop but invert the dependency. > return 0; > } > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-04-16 16:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1239680065-25013-1-git-send-email-fweisbec@gmail.com>
[not found] ` <20090414045109.GA26908@orion>
[not found] ` <20090414090146.GH27003@elte.hu>
[not found] ` <a4423d670904151558r4252c7eamd115793fb36a9163@mail.gmail.com>
2009-04-15 23:07 ` [tree] latest kill-the-BKL tree, v12 Ingo Molnar
2009-04-15 23:13 ` Trond Myklebust
2009-04-15 23:35 ` Frederic Weisbecker
2009-04-16 8:51 ` Ingo Molnar
[not found] ` <20090416085153.GC9813-X9Un+BFzKDI@public.gmane.org>
2009-04-16 14:35 ` Alessio Igor Bogani
2009-04-16 16:40 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).