From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Al Viro <viro@ZenIV.linux.org.uk>,
Alessio Igor Bogani <abogani@texware.it>,
Alexander Viro <viro@ftp.linux.org.uk>,
Frederic Weisbecker <fweisbec@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Jonathan Corbet <corbet@lwn.net>
Subject: Re: [PATCH -tip] remove the BKL: Replace BKL in mount/umount syscalls with a mutex
Date: Fri, 17 Apr 2009 20:34:43 +0200 [thread overview]
Message-ID: <20090417183443.GA27120@elte.hu> (raw)
In-Reply-To: <alpine.LFD.2.00.0904171010470.4042@localhost.localdomain>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> >
> > Anyway, it seems quite clear that the first thing is to push the current
> > BKL usage down into the filesystems -- which should be somewhat
> > straight-forward.
>
> Yes, if somebody sends the obvious mechanical patch, we can apply that
> easily. Then, most common filesystems can probably remove the BKL
> trivially by maintainers that know that they don't do anything at all with
> it.
>
> Of course, right now we do hold the BKL over _multiple_ downcalls, so in
> that sense it's not actually totally 100% correct and straightforward to
> just move it down. Eg in the generic_shutdown_super() case we do
>
> lock_kernel();
> ->write_super();
> ->put_super();
> invalidate_inodes();
> unlock_kernel();
>
> and obviously if we split it up so that we push a lock_kernel()
> into both, we end up unlocking in between. I doubt anything cares,
> but it's still a technical difference.
>
> There are similar issues with 'remount' holding the BKL over
> longer sequences.
>
> Btw, the superblock code really does seem to depend on
> lock_kernel. Those "sb->s_flags" accesses are literally not
> protected by anything else afaik.
The very narrow case we want to solve is this place in the NFS code
that calls schedule() with the BKL held:
[<00000000006d20ec>] rpc_wait_bit_killable+0x64/0x8c
[<00000000006f0620>] __wait_on_bit+0x64/0xc0
[<00000000006f06e4>] out_of_line_wait_on_bit+0x68/0x7c
[<00000000006d2938>] __rpc_execute+0x150/0x2b4
[<00000000006d2ac0>] rpc_execute+0x24/0x34
[<00000000006cc338>] rpc_run_task+0x64/0x74
[<00000000006cc474>] rpc_call_sync+0x58/0x7c
[<00000000005717b0>] nfs3_rpc_wrapper+0x24/0xa0
[<0000000000572024>] do_proc_get_root+0x6c/0x10c
[<00000000005720dc>] nfs3_proc_get_root+0x18/0x5c
[<000000000056401c>] nfs_get_root+0x34/0x17c
[<0000000000568adc>] nfs_get_sb+0x9ec/0xa7c
[<00000000004b7ec8>] vfs_kern_mount+0x44/0xa4
[<00000000004b7f84>] do_kern_mount+0x30/0xcc
[<00000000004cf300>] do_mount+0x7c8/0x80c
[<00000000004ed2a4>] compat_sys_mount+0x224/0x274
[<0000000000406154>] linux_sparc_syscall32+0x34/0x40
This creates circular locking if the BKL is a plain mutex and if
that mutex is dropped there (it's a too lowlevel place with many
locks held, so a re-acquire inverts the locking dependency).
I.e. the NFS code wants to drop the BKL at a high level, in
nfs_get_sb() - the NFS folks already confirmed that they have no
internal BKL dependencies. Preferably by never getting called with
the BKL held by the VFS layer.
Of course we could hack around it and add an unlock_kernel()
lock_kernel() pair into nfs_get_sb(), but we thought we'd be kernel
nice citizens and improve the general situation too :)
Ingo
next prev parent reply other threads:[~2009-04-17 18:35 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-16 14:27 [PATCH -tip] remove the BKL: Replace BKL in mount/umount syscalls with a mutex Alessio Igor Bogani
2009-04-16 14:36 ` Christoph Hellwig
2009-04-16 16:49 ` Ingo Molnar
2009-04-16 17:01 ` Christoph Hellwig
2009-04-16 17:13 ` Ingo Molnar
2009-04-17 0:05 ` Al Viro
2009-04-16 16:06 ` Ingo Molnar
2009-04-16 16:58 ` Ingo Molnar
2009-04-16 23:56 ` Al Viro
2009-04-17 0:01 ` Ingo Molnar
2009-04-17 0:13 ` Al Viro
2009-04-17 0:27 ` Ingo Molnar
2009-04-17 0:38 ` Al Viro
2009-04-17 16:56 ` Ingo Molnar
2009-04-17 17:04 ` Peter Zijlstra
2009-04-17 17:21 ` Linus Torvalds
2009-04-17 17:31 ` Jonathan Corbet
2009-04-17 18:03 ` Linus Torvalds
2009-04-17 18:44 ` Matthew Wilcox
2009-04-22 17:28 ` J. Bruce Fields
2009-04-17 18:08 ` Al Viro
2009-04-17 18:34 ` Ingo Molnar [this message]
2009-04-17 17:41 ` Al Viro
2009-04-17 17:34 ` Al Viro
2009-04-16 23:49 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090417183443.GA27120@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=abogani@texware.it \
--cc=corbet@lwn.net \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox