From: Neil Brown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org, chucklever@gmail.com,
Christoph Hellwig <hch@infradead.org>,
linux-mtd@lists.infradead.org, viro@zeniv.linux.org.uk,
linux-fsdevel@vger.kernel.org,
David Woodhouse <dwmw2@infradead.org>
Subject: Re: [RFC] Reinstate NFS exportability for JFFS2.
Date: Tue, 5 Aug 2008 08:37:01 +1000 [thread overview]
Message-ID: <18583.33933.818892.71415@notabene.brown> (raw)
In-Reply-To: message from J. Bruce Fields on Monday August 4
On Monday August 4, bfields@fieldses.org wrote:
> On Mon, Aug 04, 2008 at 11:03:27AM +1000, Neil Brown wrote:
> > - use i_mutex to protect internal changes too, and drop i_mutex while
> > doing internal restructuring. This would need some VFS changes so
> > that dropping i_mutex would be safe. It would require some way to
> > lock an individual dentry. Probably you would lock it under
> > i_mutex by setting a flag bit, wait for the flag on some inode-wide
> > waitqueue, and drop the lock by clearing the flag and waking the
> > waitqueue. And you are never allowed to look at ->d_inode if the
> > lock flag is set.
>
> Isn't there a lot of kernel code that assumes that following ->d_inode
> is safe? Or are you only worried about looking at certain
> filesystem-internal fields of d_inode?
I overstated that restriction a bit.
The way ->d_inode works is that it is set once and then never changed.
A rename doesn't change ->d_inode, it changes the name in the dentry,
and possibly the ->d_parent pointer. An unlink also leaves ->d_inode
alone and just unhashes the dentry. ->d_inode is never cleared until
the dentry is freed.
So ->d_inode starts as NULL, is (possibly) set to a value, and stays
that way.
Currently if the name exists, then ->d_inode will be set non-NULL
under i_mutex and so a NULL value will never be visible outside of
i_mutex. If the name doesn't exist, a NULL value *will* be visible
outside of i_mutex and it could subsequently get (under i_mutex) set
when name is created.
If i_mutex is allowed to be dropped early, then a premature NULL could
be visible in ->d_inode for a name that does exist. This is the case
the needs to be guarded against.
So when I said "you are never allowed to look at ->d_inode ...." I
could have said "you are never allowed to see a NULL in ->d_inode ..."
While lots of places dereference ->d_inode, relative few do it in a
context where ->d_inode could be NULL, and these are probably all in
fs/namei.c or related code. Most of them would be fairly easy to get
to work with dentry locking.
The problem case is rename (as it always is). With rename, d_move
needs to be called after the filesystem has committed to the rename,
but before i_mutex is dropped. That would be awkward for filesystems
that needed to collect garbage or whatever before completing the
rename.
I would probably address this by allowing ->rename to return -EAGAIN
with the meaning that i_mutex was dropped but nothing was committed
to, and that vfs_rename_* should try again from the top. Presumably
->rename will have done some garbage collection or whatever is needed
and the next call to ->rename has a much better chance of going
through without needing to drop the lock. I don't know if livelocks
could be a problem with this approach.
>
> The locking required to keep the filesystem namespace consistent is
> difficult and important, so I think changing it would require an
> extremely careful description of the new design and a commitment from
> some filesystem developers to actually take advantage of it....
An "extremely careful description of the new design" is something we
could happily see more of in the kernel world, isn't it :-)
Agreed.
NeilBrown
next prev parent reply other threads:[~2008-08-04 22:37 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-01 19:42 [RFC] Reinstate NFS exportability for JFFS2 David Woodhouse
2008-05-01 20:48 ` Christoph Hellwig
2008-05-01 22:44 ` David Woodhouse
2008-05-02 1:38 ` Neil Brown
2008-05-02 11:37 ` David Woodhouse
2008-05-02 14:08 ` J. Bruce Fields
2008-07-31 21:54 ` David Woodhouse
2008-08-01 0:16 ` Neil Brown
2008-08-01 0:40 ` David Woodhouse
2008-08-01 0:52 ` David Woodhouse
2008-08-01 0:53 ` Chuck Lever
2008-08-01 1:00 ` David Woodhouse
2008-08-01 1:31 ` Chuck Lever
2008-08-01 8:13 ` David Woodhouse
2008-08-01 13:35 ` David Woodhouse
2008-08-01 13:56 ` David Woodhouse
2008-08-01 16:05 ` Chuck Lever
2008-08-01 16:19 ` David Woodhouse
2008-08-01 17:47 ` Chuck Lever
2008-08-02 18:26 ` J. Bruce Fields
2008-08-02 20:42 ` David Woodhouse
2008-08-02 21:33 ` J. Bruce Fields
2008-08-03 8:39 ` David Woodhouse
2008-08-03 11:56 ` Neil Brown
2008-08-03 17:15 ` Chuck Lever
2008-08-04 1:03 ` Neil Brown
2008-08-04 6:19 ` Chuck Lever
2008-08-05 8:51 ` Dave Chinner
2008-08-05 8:59 ` David Woodhouse
2008-08-05 9:47 ` Dave Chinner
2008-08-05 23:06 ` Neil Brown
2008-08-06 0:08 ` Dave Chinner
2008-08-06 19:56 ` J. Bruce Fields
2008-08-06 20:10 ` David Woodhouse
2008-08-09 16:47 ` David Woodhouse
2008-08-09 19:55 ` David Woodhouse
2008-08-09 20:01 ` [PATCH 1/4] Factor out nfsd_do_readdir() into its own function David Woodhouse
2008-08-09 20:07 ` Christoph Hellwig
2008-08-09 20:02 ` [PATCH 2/4] Copy XFS readdir hack into nfsd code David Woodhouse
2008-08-09 20:08 ` Christoph Hellwig
2008-08-09 20:03 ` [PATCH 3/4] Remove XFS buffered readdir hack David Woodhouse
2008-08-09 20:09 ` Christoph Hellwig
2008-08-09 20:03 ` [PATCH 4/4] Reinstate NFS exportability David Woodhouse
2008-08-09 20:10 ` Christoph Hellwig
2008-08-04 18:41 ` [RFC] Reinstate NFS exportability for JFFS2 J. Bruce Fields
2008-08-04 22:37 ` Neil Brown [this message]
2008-08-17 18:22 ` Andreas Dilger
2008-08-01 2:14 ` Neil Brown
2008-08-01 8:50 ` David Woodhouse
2008-08-01 10:03 ` Al Viro
2008-08-01 23:11 ` Neil Brown
2008-07-31 21:54 ` [PATCH 1/4] Factor out nfsd_do_readdir() into its own function David Woodhouse
2008-07-31 21:54 ` [PATCH 2/4] Copy XFS readdir hack into nfsd code, introduce FS_NO_LOOKUP_IN_READDIR flag David Woodhouse
2008-07-31 21:55 ` [PATCH 3/4] Switch XFS to using FS_NO_LOOKUP_IN_READDIR, remove local readdir hack David Woodhouse
2008-07-31 21:55 ` [PATCH 4/4] [JFFS2] Reinstate NFS exportability David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18583.33933.818892.71415@notabene.brown \
--to=neilb@suse.de \
--cc=bfields@fieldses.org \
--cc=chucklever@gmail.com \
--cc=dwmw2@infradead.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox