* Re: Problem during implementing NFS support
2008-07-20 19:11 Problem during implementing NFS support Balaji Rao
@ 2008-07-20 16:43 ` David Woodhouse
2008-07-20 19:36 ` Balaji Rao
2008-07-21 8:10 ` Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: David Woodhouse @ 2008-07-20 16:43 UTC (permalink / raw)
To: Balaji Rao; +Cc: linux-btrfs, Chris Mason
On Mon, 2008-07-21 at 00:41 +0530, Balaji Rao wrote:
> Hi,
>
> There's a problem in btrfs_readdir that tries to lock a root node with one
> being held. This happens when NFS calls vfs_readdir function with a nfs
> specific filldir function pointer. This filldir function, called with the
> lock held calls btrfs_lookup, which tries to take the same lock. So, it keeps
> waiting on lock_page indefinitely - a deadlock. This is not seen if the inode
> is RAM in which case, lookup is not called.
We've seen precisely this problem (nfs3 readdirplus calling ->lookup
from within ->readdir and deadlocking) in a number of other file
systems.
I have an evil workaround hack for JFFS2 at
http://lists.infradead.org/pipermail/linux-mtd/2007-March/017663.html
> Why can't we allow multiple readers to read a page ?
In the general case we don't allow that because if there's a writer
waiting, they could end up waiting for _ever_ if you keep letting a
constant stream of new readers take the lock.
The only reason the readdirplus code is calling ->lookup() is so that it
can then generate a file handle. It doesn't really _need_ to pull every
inode into the icache as it does; it only needs the fh.
Perhaps we could extend the file system's readdir() method so it returns
the filehandle directly, instead. That might be a cleaner and more
efficient solution all round. Or we could add a ->lookup_fh() method
which only generates the filehandle, perhaps. Although you'd still have
the locking issues with the latter if it can _ever_ be called other than
from readdir().
The way GFS1 (and also XFS iirc) handles it is to build up a complete
list of responses to readdir() in a buffer, drop the lock, and then
iterate over that buffer calling filldir(). I don't much like that
version either.
--
dwmw2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Problem during implementing NFS support
@ 2008-07-20 19:11 Balaji Rao
2008-07-20 16:43 ` David Woodhouse
0 siblings, 1 reply; 6+ messages in thread
From: Balaji Rao @ 2008-07-20 19:11 UTC (permalink / raw)
To: linux-btrfs; +Cc: Chris Mason
Hi,
There's a problem in btrfs_readdir that tries to lock a root node with one
being held. This happens when NFS calls vfs_readdir function with a nfs
specific filldir function pointer. This filldir function, called with the
lock held calls btrfs_lookup, which tries to take the same lock. So, it keeps
waiting on lock_page indefinitely - a deadlock. This is not seen if the inode
is RAM in which case, lookup is not called.
Why can't we allow multiple readers to read a page ?
Please clarify.
Balaji Rao
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problem during implementing NFS support
2008-07-20 16:43 ` David Woodhouse
@ 2008-07-20 19:36 ` Balaji Rao
2008-07-20 20:08 ` David Woodhouse
2008-07-21 8:10 ` Christoph Hellwig
1 sibling, 1 reply; 6+ messages in thread
From: Balaji Rao @ 2008-07-20 19:36 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-btrfs, Chris Mason
On Sunday 20 July 2008 10:13:50 pm David Woodhouse wrote:
> On Mon, 2008-07-21 at 00:41 +0530, Balaji Rao wrote:
> > Hi,
> >
> > There's a problem in btrfs_readdir that tries to lock a root node with
> > one being held. This happens when NFS calls vfs_readdir function with a
> > nfs specific filldir function pointer. This filldir function, called with
> > the lock held calls btrfs_lookup, which tries to take the same lock. So,
> > it keeps waiting on lock_page indefinitely - a deadlock. This is not seen
> > if the inode is RAM in which case, lookup is not called.
>
> We've seen precisely this problem (nfs3 readdirplus calling ->lookup
> from within ->readdir and deadlocking) in a number of other file
> systems.
>
oh, ok!
> I have an evil workaround hack for JFFS2 at
> http://lists.infradead.org/pipermail/linux-mtd/2007-March/017663.html
>
The idea of the patch seems correct to me, that once we "own" the lock, an
attempt to take it again should be a nop.
Balaji
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problem during implementing NFS support
2008-07-20 19:36 ` Balaji Rao
@ 2008-07-20 20:08 ` David Woodhouse
0 siblings, 0 replies; 6+ messages in thread
From: David Woodhouse @ 2008-07-20 20:08 UTC (permalink / raw)
To: Balaji Rao; +Cc: linux-btrfs, Chris Mason
On Mon, 2008-07-21 at 01:06 +0530, Balaji Rao wrote:
> The idea of the patch seems correct to me, that once we "own" the
> lock, an attempt to take it again should be a nop.
Well, kind of correct. There are potentially race conditions with the
handling of f->readdir_process, but given that we only ever really
compare it to 'current', it's actually going to turn out OK in practice.
You won't ever actually get a false negative (and deadlock) or a false
positive (and go through lookup() without locking), as far as I can
tell.
But still, it's ugly as sin and I'd much rather come up with a _proper_
fix in the VFS. I was looking at it at one point, and didn't actually
apply that patch to the JFFS2 tree.
Since general btrfs work is now more of a priority for me than getting
JFFS2 to be NFS-exportable was, I think I'll pick up where I left off
with that.
In the meantime, we have an evil hack which at least ought to work for
now.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
--
dwmw2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problem during implementing NFS support
2008-07-20 16:43 ` David Woodhouse
2008-07-20 19:36 ` Balaji Rao
@ 2008-07-21 8:10 ` Christoph Hellwig
2008-07-21 12:58 ` Chris Mason
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2008-07-21 8:10 UTC (permalink / raw)
To: David Woodhouse; +Cc: Balaji Rao, linux-btrfs, Chris Mason
On Sun, Jul 20, 2008 at 09:43:50AM -0700, David Woodhouse wrote:
> The way GFS1 (and also XFS iirc) handles it is to build up a complete
> list of responses to readdir() in a buffer, drop the lock, and then
> iterate over that buffer calling filldir(). I don't much like that
> version either.
Yes. My prefered mid-term solution would to simply lift that code from
XFS (where it's nicely isolated and all code is prefixed with hack_) to
nfsd so that local users don't aren't penalized for this.
OCFS2 also ran into this lately, so someone at Oracle might already
be working on it. And ubifs will also run into it once it gets nfs
serving support..
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problem during implementing NFS support
2008-07-21 8:10 ` Christoph Hellwig
@ 2008-07-21 12:58 ` Chris Mason
0 siblings, 0 replies; 6+ messages in thread
From: Chris Mason @ 2008-07-21 12:58 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: David Woodhouse, Balaji Rao, linux-btrfs
On Mon, 2008-07-21 at 04:10 -0400, Christoph Hellwig wrote:
> On Sun, Jul 20, 2008 at 09:43:50AM -0700, David Woodhouse wrote:
> > The way GFS1 (and also XFS iirc) handles it is to build up a complete
> > list of responses to readdir() in a buffer, drop the lock, and then
> > iterate over that buffer calling filldir(). I don't much like that
> > version either.
>
> Yes. My prefered mid-term solution would to simply lift that code from
> XFS (where it's nicely isolated and all code is prefixed with hack_) to
> nfsd so that local users don't aren't penalized for this.
>
Reiserfs also uses the local buffer and lock dropping trick, and that's
what I would do for now in btrfs.
But, the extent allocation tree also needs recursive locking, so I'm
going to take a stab at that as soon as I get the pending patch list
merged.
-chris
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-07-21 12:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-20 19:11 Problem during implementing NFS support Balaji Rao
2008-07-20 16:43 ` David Woodhouse
2008-07-20 19:36 ` Balaji Rao
2008-07-20 20:08 ` David Woodhouse
2008-07-21 8:10 ` Christoph Hellwig
2008-07-21 12:58 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox