* lifetime of DCACHE_DISCONECTED dentries @ 2010-11-12 18:43 J. Bruce Fields 2010-11-13 11:53 ` Nick Piggin 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-11-12 18:43 UTC (permalink / raw) To: linux-nfs, linux-fsdevel DCACHE_DISCONECTED dentries aren't always getting destroyed as soon as I'd have expected in the NFSv4 case. I'm not sure what the right fix is; any ideas? The below at least demonstrates the problem. --b. commit bb125ffbe5fc3f80ac7a5b20f51cc542c175cd49 Author: J. Bruce Fields <bfields@redhat.com> Date: Thu Nov 11 19:22:00 2010 -0500 dput: free DCACHE_DISCONNECTED dentries sooner DCACHE_DISCONECTED dentries are normally left around for the benefit of future nfsd operations. But there's no point keeping them around once the inode has been deleted. Without this patch client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp associated with the nfsd open. d_obtain_alias() doesn't like this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Signed-off-by: J. Bruce Fields <bfields@redhat.com> diff --git a/fs/dcache.c b/fs/dcache.c index 28fa7e5..5132f13 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -241,6 +241,10 @@ repeat: /* Unreachable? Get rid of it */ if (d_unhashed(dentry)) goto kill_it; + if (dentry->d_flags & DCACHE_DISCONNECTED + && dentry->d_inode + && dentry->d_inode->i_nlink == 0) + goto kill_it; if (list_empty(&dentry->d_lru)) { dentry->d_flags |= DCACHE_REFERENCED; dentry_lru_add(dentry); ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: lifetime of DCACHE_DISCONECTED dentries 2010-11-12 18:43 lifetime of DCACHE_DISCONECTED dentries J. Bruce Fields @ 2010-11-13 11:53 ` Nick Piggin 2010-11-15 17:48 ` J. Bruce Fields 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2010-11-13 11:53 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, linux-fsdevel On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > DCACHE_DISCONECTED dentries aren't always getting destroyed as soon as > I'd have expected in the NFSv4 case. > > I'm not sure what the right fix is; any ideas? The below at least > demonstrates the problem. > > --b. > > commit bb125ffbe5fc3f80ac7a5b20f51cc542c175cd49 > Author: J. Bruce Fields <bfields@redhat.com> > Date: Thu Nov 11 19:22:00 2010 -0500 > > dput: free DCACHE_DISCONNECTED dentries sooner > > DCACHE_DISCONECTED dentries are normally left around for the benefit of > future nfsd operations. But there's no point keeping them around once > the inode has been deleted. > > Without this patch > > client$ mount -tnfs4 server:/export/ /mnt/ > client$ tail -f /mnt/FOO > ... > server$ df -i /export > server$ rm /export/FOO > (^C the tail -f) > server$ df -i /export > server$ echo 2 >/proc/sys/vm/drop_caches > server$ df -i /export > > the df's will show that the inode is not freed on the filesystem until > the last step, when it could have been freed after killing the client's > tail -f. On-disk data won't be deallocated either, leading to possible > spurious ENOSPC. > > This occurs because when the client does the close, it arrives in a > compound with a putfh and a close, processed like: > > - putfh: look up the filehandle. The only alias found for the > inode will be DCACHE_UNHASHED alias referenced by the filp > associated with the nfsd open. d_obtain_alias() doesn't like > this, so it creates a new DCACHE_DISCONECTED dentry and > returns that instead. This seems to be where the thing goes wrong. It isn't a hashed dentry at this point here, so d_obtain_alias should not be making one. I think the inode i_nlink games are much more appropriate on this side of the equation, rather than the dput side (after all, d_obtain_alias is setting up an alias for the inode). Can you even put the link check into __d_find_alias? - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || !d_unhashed(alias)) { Something like that? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: lifetime of DCACHE_DISCONECTED dentries 2010-11-13 11:53 ` Nick Piggin @ 2010-11-15 17:48 ` J. Bruce Fields [not found] ` <20101115174837.GB10044-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-11-15 17:48 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-nfs, linux-fsdevel On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: > On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > > DCACHE_DISCONECTED dentries aren't always getting destroyed as soon as > > I'd have expected in the NFSv4 case. > > > > I'm not sure what the right fix is; any ideas? The below at least > > demonstrates the problem. > > > > --b. > > > > commit bb125ffbe5fc3f80ac7a5b20f51cc542c175cd49 > > Author: J. Bruce Fields <bfields@redhat.com> > > Date: Thu Nov 11 19:22:00 2010 -0500 > > > > dput: free DCACHE_DISCONNECTED dentries sooner > > > > DCACHE_DISCONECTED dentries are normally left around for the benefit of > > future nfsd operations. But there's no point keeping them around once > > the inode has been deleted. > > > > Without this patch > > > > client$ mount -tnfs4 server:/export/ /mnt/ > > client$ tail -f /mnt/FOO > > ... > > server$ df -i /export > > server$ rm /export/FOO > > (^C the tail -f) > > server$ df -i /export > > server$ echo 2 >/proc/sys/vm/drop_caches > > server$ df -i /export > > > > the df's will show that the inode is not freed on the filesystem until > > the last step, when it could have been freed after killing the client's > > tail -f. On-disk data won't be deallocated either, leading to possible > > spurious ENOSPC. > > > > This occurs because when the client does the close, it arrives in a > > compound with a putfh and a close, processed like: > > > > - putfh: look up the filehandle. The only alias found for the > > inode will be DCACHE_UNHASHED alias referenced by the filp > > associated with the nfsd open. d_obtain_alias() doesn't like > > this, so it creates a new DCACHE_DISCONECTED dentry and > > returns that instead. > > This seems to be where the thing goes wrong. It isn't a hashed dentry at > this point here, so d_obtain_alias should not be making one. Sounds sensible. (But can you think of any actual bugs that will result from trying to add a new hashed dentry in this case?) > I think the inode i_nlink games are much more appropriate on this side of > the equation, rather than the dput side (after all, d_obtain_alias is setting > up an alias for the inode). > > Can you even put the link check into __d_find_alias? > > - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || > !d_unhashed(alias)) { > > Something like that? The immediate result of that would be for the close rpc (or any rpc's sent after the file was unlinked) to fail with ESTALE. But nfsd already holds an open file in this case, and you could argue that it should be using that from the start. So, we could modify nfsd to add a hash mapping filehandles to the filp's that it knows about, and have nfsd consult that hash before calling dentry_to_fh. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20101115174837.GB10044-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: lifetime of DCACHE_DISCONECTED dentries [not found] ` <20101115174837.GB10044-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2010-11-16 6:45 ` Nick Piggin 2010-11-29 3:56 ` Nick Piggin 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2010-11-16 6:45 UTC (permalink / raw) To: J. Bruce Fields Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: >> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <bfields@fieldses.org> wrote: >> > - putfh: look up the filehandle. The only alias found for the >> > inode will be DCACHE_UNHASHED alias referenced by the filp >> > associated with the nfsd open. d_obtain_alias() doesn't like >> > this, so it creates a new DCACHE_DISCONECTED dentry and >> > returns that instead. >> >> This seems to be where the thing goes wrong. It isn't a hashed dentry at >> this point here, so d_obtain_alias should not be making one. > > Sounds sensible. (But can you think of any actual bugs that will result > from trying to add a new hashed dentry in this case?) Well, this one? :) >> I think the inode i_nlink games are much more appropriate on this side of >> the equation, rather than the dput side (after all, d_obtain_alias is setting >> up an alias for the inode). >> >> Can you even put the link check into __d_find_alias? >> >> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { >> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || >> !d_unhashed(alias)) { >> >> Something like that? > > The immediate result of that would be for the close rpc (or any rpc's > sent after the file was unlinked) to fail with ESTALE. Why is that? Seems like it would be a bug, because a hashed dentry may be unhashed at any time concurrently to nfsd operation, so it should be able to tolerate that so long as it has a ref on the inode? > But nfsd already holds an open file in this case, and you could argue > that it should be using that from the start. Yes. > So, we could modify nfsd to add a hash mapping filehandles to the filp's > that it knows about, and have nfsd consult that hash before calling > dentry_to_fh. Could be an option. It would be a pity not just be able to use the alias list. What exactly goes wrong when it gets an unhashed alias back? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: lifetime of DCACHE_DISCONECTED dentries 2010-11-16 6:45 ` Nick Piggin @ 2010-11-29 3:56 ` Nick Piggin 2010-11-29 19:32 ` J. Bruce Fields 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2010-11-29 3:56 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, linux-fsdevel On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <npiggin@gmail.com> wrote: > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields@fieldses.org> wrote: >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: >>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > >>> > - putfh: look up the filehandle. The only alias found for the >>> > inode will be DCACHE_UNHASHED alias referenced by the filp >>> > associated with the nfsd open. d_obtain_alias() doesn't like >>> > this, so it creates a new DCACHE_DISCONECTED dentry and >>> > returns that instead. >>> >>> This seems to be where the thing goes wrong. It isn't a hashed dentry at >>> this point here, so d_obtain_alias should not be making one. >> >> Sounds sensible. (But can you think of any actual bugs that will result >> from trying to add a new hashed dentry in this case?) > > Well, this one? :) > > >>> I think the inode i_nlink games are much more appropriate on this side of >>> the equation, rather than the dput side (after all, d_obtain_alias is setting >>> up an alias for the inode). >>> >>> Can you even put the link check into __d_find_alias? >>> >>> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { >>> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || >>> !d_unhashed(alias)) { >>> >>> Something like that? >> >> The immediate result of that would be for the close rpc (or any rpc's >> sent after the file was unlinked) to fail with ESTALE. > > Why is that? Seems like it would be a bug, because a hashed dentry may > be unhashed at any time concurrently to nfsd operation, so it should be > able to tolerate that so long as it has a ref on the inode? Ping? Did you work out why nfs fails with ESTALE in that case? It seems to work in my testing (and do the right thing with freeing the inode). -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: lifetime of DCACHE_DISCONECTED dentries 2010-11-29 3:56 ` Nick Piggin @ 2010-11-29 19:32 ` J. Bruce Fields [not found] ` <20101129193248.GA9897-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-11-29 19:32 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-nfs, linux-fsdevel On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote: > On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <npiggin@gmail.com> wrote: > > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: > >>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > >>> > - putfh: look up the filehandle. The only alias found for the > >>> > inode will be DCACHE_UNHASHED alias referenced by the filp > >>> > associated with the nfsd open. d_obtain_alias() doesn't like > >>> > this, so it creates a new DCACHE_DISCONECTED dentry and > >>> > returns that instead. > >>> > >>> This seems to be where the thing goes wrong. It isn't a hashed dentry at > >>> this point here, so d_obtain_alias should not be making one. > >> > >> Sounds sensible. (But can you think of any actual bugs that will result > >> from trying to add a new hashed dentry in this case?) > > > > Well, this one? :) > > > > > >>> I think the inode i_nlink games are much more appropriate on this side of > >>> the equation, rather than the dput side (after all, d_obtain_alias is setting > >>> up an alias for the inode). > >>> > >>> Can you even put the link check into __d_find_alias? > >>> > >>> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > >>> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || > >>> !d_unhashed(alias)) { > >>> > >>> Something like that? > >> > >> The immediate result of that would be for the close rpc (or any rpc's > >> sent after the file was unlinked) to fail with ESTALE. > > > > Why is that? Seems like it would be a bug, because a hashed dentry may > > be unhashed at any time concurrently to nfsd operation, so it should be > > able to tolerate that so long as it has a ref on the inode? > > Ping? Did you work out why nfs fails with ESTALE in that case? It seems > to work in my testing (and do the right thing with freeing the inode). Bah, sorry, I read too quickly, got the sense of the test backwards, and thought you were suggesting __d_find_alias() shouldn't return an alias in the i_nlink == 0 case! Yes, agreed, that should solve my problem. But what's the reason for the d_unhashed() check now? Could we get rid of it entirely? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20101129193248.GA9897-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: lifetime of DCACHE_DISCONECTED dentries [not found] ` <20101129193248.GA9897-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2010-11-30 1:00 ` Nick Piggin [not found] ` <AANLkTikwzDJ_q65==uxDsAhp3h8bU7Rkt7U9gVgRAK0D-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2010-11-30 1:00 UTC (permalink / raw) To: J. Bruce Fields Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote: >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <npiggin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields@fieldses.org> wrote: >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: >> >>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <bfields@fieldses.org> wrote: >> > >> >>> > - putfh: look up the filehandle. The only alias found for the >> >>> > inode will be DCACHE_UNHASHED alias referenced by the filp >> >>> > associated with the nfsd open. d_obtain_alias() doesn't like >> >>> > this, so it creates a new DCACHE_DISCONECTED dentry and >> >>> > returns that instead. >> >>> >> >>> This seems to be where the thing goes wrong. It isn't a hashed dentry at >> >>> this point here, so d_obtain_alias should not be making one. >> >> >> >> Sounds sensible. (But can you think of any actual bugs that will result >> >> from trying to add a new hashed dentry in this case?) >> > >> > Well, this one? :) >> > >> > >> >>> I think the inode i_nlink games are much more appropriate on this side of >> >>> the equation, rather than the dput side (after all, d_obtain_alias is setting >> >>> up an alias for the inode). >> >>> >> >>> Can you even put the link check into __d_find_alias? >> >>> >> >>> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { >> >>> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || >> >>> !d_unhashed(alias)) { >> >>> >> >>> Something like that? >> >> >> >> The immediate result of that would be for the close rpc (or any rpc's >> >> sent after the file was unlinked) to fail with ESTALE. >> > >> > Why is that? Seems like it would be a bug, because a hashed dentry may >> > be unhashed at any time concurrently to nfsd operation, so it should be >> > able to tolerate that so long as it has a ref on the inode? >> >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems >> to work in my testing (and do the right thing with freeing the inode). > > Bah, sorry, I read too quickly, got the sense of the test backwards, and > thought you were suggesting __d_find_alias() shouldn't return an alias > in the i_nlink == 0 case! > > Yes, agreed, that should solve my problem. OK, good. > But what's the reason for the d_unhashed() check now? Could we get rid > of it entirely? Well when the inode still has links I think we actually do want any new references to go to hashed dentries. Definitely for d_splice_alias. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <AANLkTikwzDJ_q65==uxDsAhp3h8bU7Rkt7U9gVgRAK0D-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: lifetime of DCACHE_DISCONECTED dentries [not found] ` <AANLkTikwzDJ_q65==uxDsAhp3h8bU7Rkt7U9gVgRAK0D-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-11-30 18:39 ` J. Bruce Fields 2010-12-03 22:33 ` [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries J. Bruce Fields 1 sibling, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2010-11-30 18:39 UTC (permalink / raw) To: Nick Piggin Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote: > On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <bfields-uC3wQj2KruMpug/h7KTFAQ@public.gmane.orgg> wrote: > > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote: > >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <npiggin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: > >> >>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > >> > > >> >>> > - putfh: look up the filehandle. The only alias found for the > >> >>> > inode will be DCACHE_UNHASHED alias referenced by the filp > >> >>> > associated with the nfsd open. d_obtain_alias() doesn't like > >> >>> > this, so it creates a new DCACHE_DISCONECTED dentry and > >> >>> > returns that instead. > >> >>> > >> >>> This seems to be where the thing goes wrong. It isn't a hashed dentry at > >> >>> this point here, so d_obtain_alias should not be making one. > >> >> > >> >> Sounds sensible. (But can you think of any actual bugs that will result > >> >> from trying to add a new hashed dentry in this case?) > >> > > >> > Well, this one? :) > >> > > >> > > >> >>> I think the inode i_nlink games are much more appropriate on this side of > >> >>> the equation, rather than the dput side (after all, d_obtain_alias is setting > >> >>> up an alias for the inode). > >> >>> > >> >>> Can you even put the link check into __d_find_alias? > >> >>> > >> >>> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > >> >>> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || > >> >>> !d_unhashed(alias)) { > >> >>> > >> >>> Something like that? > >> >> > >> >> The immediate result of that would be for the close rpc (or any rpc's > >> >> sent after the file was unlinked) to fail with ESTALE. > >> > > >> > Why is that? Seems like it would be a bug, because a hashed dentry may > >> > be unhashed at any time concurrently to nfsd operation, so it should be > >> > able to tolerate that so long as it has a ref on the inode? > >> > >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems > >> to work in my testing (and do the right thing with freeing the inode). > > > > Bah, sorry, I read too quickly, got the sense of the test backwards, and > > thought you were suggesting __d_find_alias() shouldn't return an alias > > in the i_nlink == 0 case! > > > > Yes, agreed, that should solve my problem. > > OK, good. > > > But what's the reason for the d_unhashed() check now? Could we get rid > > of it entirely? > > Well when the inode still has links I think we actually do want any new > references to go to hashed dentries. Definitely for d_splice_alias. OK, makes sense. Should we stick a changelog on it and pass it along to someone? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries [not found] ` <AANLkTikwzDJ_q65==uxDsAhp3h8bU7Rkt7U9gVgRAK0D-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-11-30 18:39 ` J. Bruce Fields @ 2010-12-03 22:33 ` J. Bruce Fields 2010-12-13 5:19 ` Nick Piggin 1 sibling, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-12-03 22:33 UTC (permalink / raw) To: Alexander Viro Cc: Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Without this patch client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp associated with the nfsd open. d_obtain_alias() doesn't like this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Cc: Nick Piggin <npiggin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/dcache.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote: > On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <bfields-uC3wQj2KruMpug/h7KTFAQ@public.gmane.orgg> wrote: > > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote: > >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <npiggin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: > >> >>> Can you even put the link check into __d_find_alias? > >> >>> > >> >>> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > >> >>> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || > >> >>> !d_unhashed(alias)) { > >> >>> > >> >>> Something like that? > >> >> > >> >> The immediate result of that would be for the close rpc (or any rpc's > >> >> sent after the file was unlinked) to fail with ESTALE. > >> > > >> > Why is that? Seems like it would be a bug, because a hashed dentry may > >> > be unhashed at any time concurrently to nfsd operation, so it should be > >> > able to tolerate that so long as it has a ref on the inode? > >> > >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems > >> to work in my testing (and do the right thing with freeing the inode). > > > > Bah, sorry, I read too quickly, got the sense of the test backwards, and > > thought you were suggesting __d_find_alias() shouldn't return an alias > > in the i_nlink == 0 case! > > > > Yes, agreed, that should solve my problem. > > OK, good. > > > But what's the reason for the d_unhashed() check now? Could we get rid > > of it entirely? > > Well when the inode still has links I think we actually do want any new > references to go to hashed dentries. Definitely for d_splice_alias. So here's a version with a changelog; objections? --b. diff --git a/fs/dcache.c b/fs/dcache.c index 23702a9..afa8a0d 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -368,7 +368,7 @@ static struct dentry * __d_find_alias(struct inode *inode, int want_discon) next = tmp->next; prefetch(next); alias = list_entry(tmp, struct dentry, d_alias); - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || !d_unhashed(alias)) { if (IS_ROOT(alias) && (alias->d_flags & DCACHE_DISCONNECTED)) discon_alias = alias; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries 2010-12-03 22:33 ` [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries J. Bruce Fields @ 2010-12-13 5:19 ` Nick Piggin 2010-12-14 22:01 ` J. Bruce Fields 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2010-12-13 5:19 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Alexander Viro, Nick Piggin, linux-nfs, linux-fsdevel On Fri, Dec 03, 2010 at 05:33:27PM -0500, J. Bruce Fields wrote: > From: J. Bruce Fields <bfields@redhat.com> > > Without this patch > > client$ mount -tnfs4 server:/export/ /mnt/ > client$ tail -f /mnt/FOO > ... > server$ df -i /export > server$ rm /export/FOO > (^C the tail -f) > server$ df -i /export > server$ echo 2 >/proc/sys/vm/drop_caches > server$ df -i /export > > the df's will show that the inode is not freed on the filesystem until > the last step, when it could have been freed after killing the client's > tail -f. On-disk data won't be deallocated either, leading to possible > spurious ENOSPC. > > This occurs because when the client does the close, it arrives in a > compound with a putfh and a close, processed like: > > - putfh: look up the filehandle. The only alias found for the > inode will be DCACHE_UNHASHED alias referenced by the filp > associated with the nfsd open. d_obtain_alias() doesn't like > this, so it creates a new DCACHE_DISCONECTED dentry and > returns that instead. > > Nick Piggin suggested fixing this by allowing d_obtain_alias to return > the unhashed dentry that is referenced by the filp, instead of making it > create a new dentry. > > Cc: Nick Piggin <npiggin@gmail.com> > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > --- > fs/dcache.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote: > > On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote: > > >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <npiggin@gmail.com> wrote: > > >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > > >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: > > >> >>> Can you even put the link check into __d_find_alias? > > >> >>> > > >> >>> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > > >> >>> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || > > >> >>> !d_unhashed(alias)) { > > >> >>> > > >> >>> Something like that? > > >> >> > > >> >> The immediate result of that would be for the close rpc (or any rpc's > > >> >> sent after the file was unlinked) to fail with ESTALE. > > >> > > > >> > Why is that? Seems like it would be a bug, because a hashed dentry may > > >> > be unhashed at any time concurrently to nfsd operation, so it should be > > >> > able to tolerate that so long as it has a ref on the inode? > > >> > > >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems > > >> to work in my testing (and do the right thing with freeing the inode). > > > > > > Bah, sorry, I read too quickly, got the sense of the test backwards, and > > > thought you were suggesting __d_find_alias() shouldn't return an alias > > > in the i_nlink == 0 case! > > > > > > Yes, agreed, that should solve my problem. > > > > OK, good. > > > > > But what's the reason for the d_unhashed() check now? Could we get rid > > > of it entirely? > > > > Well when the inode still has links I think we actually do want any new > > references to go to hashed dentries. Definitely for d_splice_alias. > > So here's a version with a changelog; objections? Not sure where Al's hiding... But I would like to update the comments, and perhaps even a new add a new function here (or new flag to __d_find_alias). AFAIKS, the callers are OK, however I suppose d_splice_alias and d_materialise_unique should not have unlinked inodes at this point, so at least a BUG_ON for them might be a good idea? > > --b. > > diff --git a/fs/dcache.c b/fs/dcache.c > index 23702a9..afa8a0d 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -368,7 +368,7 @@ static struct dentry * __d_find_alias(struct inode *inode, int want_discon) > next = tmp->next; > prefetch(next); > alias = list_entry(tmp, struct dentry, d_alias); > - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || !d_unhashed(alias)) { > if (IS_ROOT(alias) && > (alias->d_flags & DCACHE_DISCONNECTED)) > discon_alias = alias; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries 2010-12-13 5:19 ` Nick Piggin @ 2010-12-14 22:01 ` J. Bruce Fields [not found] ` <20101214220102.GM24828-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-12-14 22:01 UTC (permalink / raw) To: Nick Piggin; +Cc: Alexander Viro, Nick Piggin, linux-nfs, linux-fsdevel On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote: > On Fri, Dec 03, 2010 at 05:33:27PM -0500, J. Bruce Fields wrote: > > From: J. Bruce Fields <bfields@redhat.com> > > > > Without this patch > > > > client$ mount -tnfs4 server:/export/ /mnt/ > > client$ tail -f /mnt/FOO > > ... > > server$ df -i /export > > server$ rm /export/FOO > > (^C the tail -f) > > server$ df -i /export > > server$ echo 2 >/proc/sys/vm/drop_caches > > server$ df -i /export > > > > the df's will show that the inode is not freed on the filesystem until > > the last step, when it could have been freed after killing the client's > > tail -f. On-disk data won't be deallocated either, leading to possible > > spurious ENOSPC. > > > > This occurs because when the client does the close, it arrives in a > > compound with a putfh and a close, processed like: > > > > - putfh: look up the filehandle. The only alias found for the > > inode will be DCACHE_UNHASHED alias referenced by the filp > > associated with the nfsd open. d_obtain_alias() doesn't like > > this, so it creates a new DCACHE_DISCONECTED dentry and > > returns that instead. > > > > Nick Piggin suggested fixing this by allowing d_obtain_alias to return > > the unhashed dentry that is referenced by the filp, instead of making it > > create a new dentry. > > > > Cc: Nick Piggin <npiggin@gmail.com> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > --- > > fs/dcache.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote: > > > On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote: > > > >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <npiggin@gmail.com> wrote: > > > >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote: > > > >> >>> Can you even put the link check into __d_find_alias? > > > >> >>> > > > >> >>> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > > > >> >>> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || > > > >> >>> !d_unhashed(alias)) { > > > >> >>> > > > >> >>> Something like that? > > > >> >> > > > >> >> The immediate result of that would be for the close rpc (or any rpc's > > > >> >> sent after the file was unlinked) to fail with ESTALE. > > > >> > > > > >> > Why is that? Seems like it would be a bug, because a hashed dentry may > > > >> > be unhashed at any time concurrently to nfsd operation, so it should be > > > >> > able to tolerate that so long as it has a ref on the inode? > > > >> > > > >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems > > > >> to work in my testing (and do the right thing with freeing the inode). > > > > > > > > Bah, sorry, I read too quickly, got the sense of the test backwards, and > > > > thought you were suggesting __d_find_alias() shouldn't return an alias > > > > in the i_nlink == 0 case! > > > > > > > > Yes, agreed, that should solve my problem. > > > > > > OK, good. > > > > > > > But what's the reason for the d_unhashed() check now? Could we get rid > > > > of it entirely? > > > > > > Well when the inode still has links I think we actually do want any new > > > references to go to hashed dentries. Definitely for d_splice_alias. > > > > So here's a version with a changelog; objections? > > Not sure where Al's hiding... > > But I would like to update the comments, and perhaps even a new > add a new function here (or new flag to __d_find_alias). > > AFAIKS, the callers are OK, however I suppose d_splice_alias and > d_materialise_unique should not have unlinked inodes at this point, > so at least a BUG_ON for them might be a good idea? That does sound safer. I'm pretty confused by the various __di_splice_alias callers. I'll go search through them and see if I can understand better.... --b. > > > > > --b. > > > > diff --git a/fs/dcache.c b/fs/dcache.c > > index 23702a9..afa8a0d 100644 > > --- a/fs/dcache.c > > +++ b/fs/dcache.c > > @@ -368,7 +368,7 @@ static struct dentry * __d_find_alias(struct inode *inode, int want_discon) > > next = tmp->next; > > prefetch(next); > > alias = list_entry(tmp, struct dentry, d_alias); > > - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { > > + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || !d_unhashed(alias)) { > > if (IS_ROOT(alias) && > > (alias->d_flags & DCACHE_DISCONNECTED)) > > discon_alias = alias; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20101214220102.GM24828-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* [PATCH] fs/dcache: use standard list macro for d_find_alias [not found] ` <20101214220102.GM24828-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2010-12-17 17:53 ` J. Bruce Fields 2010-12-17 18:00 ` [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries J. Bruce Fields 1 sibling, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2010-12-17 17:53 UTC (permalink / raw) To: Nick Piggin Cc: Alexander Viro, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/dcache.c | 9 +-------- 1 files changed, 1 insertions(+), 8 deletions(-) While I'm here.... Am I overlooking some reason we're not doing this the easy way? diff --git a/fs/dcache.c b/fs/dcache.c index 23702a9..5ed93cd 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -358,16 +358,9 @@ EXPORT_SYMBOL(dget_locked); static struct dentry * __d_find_alias(struct inode *inode, int want_discon) { - struct list_head *head, *next, *tmp; struct dentry *alias, *discon_alias=NULL; - head = &inode->i_dentry; - next = inode->i_dentry.next; - while (next != head) { - tmp = next; - next = tmp->next; - prefetch(next); - alias = list_entry(tmp, struct dentry, d_alias); + list_for_each_entry(alias, &inode->i_dentry, d_alias) { if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { if (IS_ROOT(alias) && (alias->d_flags & DCACHE_DISCONNECTED)) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20101214220102.GM24828-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2010-12-17 17:53 ` [PATCH] fs/dcache: use standard list macro for d_find_alias J. Bruce Fields @ 2010-12-17 18:00 ` J. Bruce Fields 2010-12-18 2:01 ` Nick Piggin 1 sibling, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-12-17 18:00 UTC (permalink / raw) To: Nick Piggin Cc: Alexander Viro, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Without this patch client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp associated with the nfsd open. d_obtain_alias() doesn't like this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/dcache.c | 26 ++++++++++++++++++++++++-- 1 files changed, 24 insertions(+), 2 deletions(-) On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote: > On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote: > > Not sure where Al's hiding... > > > > But I would like to update the comments, and perhaps even a new > > add a new function here (or new flag to __d_find_alias). > > > > AFAIKS, the callers are OK, however I suppose d_splice_alias and > > d_materialise_unique should not have unlinked inodes at this point, > > so at least a BUG_ON for them might be a good idea? > > That does sound safer. I'm pretty confused by the various > __di_splice_alias callers. I'll go search through them and see if I can > understand better.... I think a new __d_find_alias flag would just make it more confusing. And it looks like all d_obtain_alias needs is something much simpler. This works for me. --b. diff --git a/fs/dcache.c b/fs/dcache.c index 5ed93cd..d5f9da4 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent, return dentry_hashtable + (hash & D_HASHMASK); } +static struct dentry * __d_find_any_alias(struct inode *inode) +{ + struct dentry *alias; + + if (list_empty(&inode->i_dentry)) + return NULL; + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); + __dget_locked(alias); + return alias; +} + +static struct dentry * d_find_any_alias(struct inode *inode) +{ + struct dentry *de; + + spin_lock(&dcache_lock); + de = __d_find_alias(inode, 0); + spin_unlock(&dcache_lock); + return de; +} + + /** * d_obtain_alias - find or allocate a dentry for a given inode * @inode: inode to allocate the dentry for @@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode) if (IS_ERR(inode)) return ERR_CAST(inode); - res = d_find_alias(inode); + res = d_find_any_alias(inode); if (res) goto out_iput; @@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode) tmp->d_parent = tmp; /* make sure dput doesn't croak */ spin_lock(&dcache_lock); - res = __d_find_alias(inode, 0); + res = __d_find_any_alias(inode); if (res) { spin_unlock(&dcache_lock); dput(tmp); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries 2010-12-17 18:00 ` [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries J. Bruce Fields @ 2010-12-18 2:01 ` Nick Piggin 2010-12-18 16:16 ` J. Bruce Fields 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2010-12-18 2:01 UTC (permalink / raw) To: J. Bruce Fields Cc: Nick Piggin, Alexander Viro, Nick Piggin, linux-nfs, linux-fsdevel On Fri, Dec 17, 2010 at 01:00:23PM -0500, J. Bruce Fields wrote: > From: J. Bruce Fields <bfields@redhat.com> > > Without this patch > > client$ mount -tnfs4 server:/export/ /mnt/ > client$ tail -f /mnt/FOO > ... > server$ df -i /export > server$ rm /export/FOO > (^C the tail -f) > server$ df -i /export > server$ echo 2 >/proc/sys/vm/drop_caches > server$ df -i /export > > the df's will show that the inode is not freed on the filesystem until > the last step, when it could have been freed after killing the client's > tail -f. On-disk data won't be deallocated either, leading to possible > spurious ENOSPC. > > This occurs because when the client does the close, it arrives in a > compound with a putfh and a close, processed like: > > - putfh: look up the filehandle. The only alias found for the > inode will be DCACHE_UNHASHED alias referenced by the filp > associated with the nfsd open. d_obtain_alias() doesn't like > this, so it creates a new DCACHE_DISCONECTED dentry and > returns that instead. > > Nick Piggin suggested fixing this by allowing d_obtain_alias to return > the unhashed dentry that is referenced by the filp, instead of making it > create a new dentry. > > Leave __d_find_alias() alone to avoid changing behavior of other > callers. > > Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, > hashed or unhashed, disconnected or not, should work. > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > --- > fs/dcache.c | 26 ++++++++++++++++++++++++-- > 1 files changed, 24 insertions(+), 2 deletions(-) > > On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote: > > On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote: > > > Not sure where Al's hiding... > > > > > > But I would like to update the comments, and perhaps even a new > > > add a new function here (or new flag to __d_find_alias). > > > > > > AFAIKS, the callers are OK, however I suppose d_splice_alias and > > > d_materialise_unique should not have unlinked inodes at this point, > > > so at least a BUG_ON for them might be a good idea? > > > > That does sound safer. I'm pretty confused by the various > > __di_splice_alias callers. I'll go search through them and see if I can > > understand better.... > > I think a new __d_find_alias flag would just make it more confusing. > And it looks like all d_obtain_alias needs is something much simpler. > This works for me. Well I don't really know if this is clearer. By convention, __ prefix version should be same as non prefix version but just take fewer locks. Why do you have the new d_find_any_alias()? > > --b. > > diff --git a/fs/dcache.c b/fs/dcache.c > index 5ed93cd..d5f9da4 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent, > return dentry_hashtable + (hash & D_HASHMASK); > } > > +static struct dentry * __d_find_any_alias(struct inode *inode) > +{ > + struct dentry *alias; > + > + if (list_empty(&inode->i_dentry)) > + return NULL; > + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); > + __dget_locked(alias); > + return alias; > +} > + > +static struct dentry * d_find_any_alias(struct inode *inode) > +{ > + struct dentry *de; > + > + spin_lock(&dcache_lock); > + de = __d_find_alias(inode, 0); > + spin_unlock(&dcache_lock); > + return de; > +} > + > + > /** > * d_obtain_alias - find or allocate a dentry for a given inode > * @inode: inode to allocate the dentry for > @@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode) > if (IS_ERR(inode)) > return ERR_CAST(inode); > > - res = d_find_alias(inode); > + res = d_find_any_alias(inode); > if (res) > goto out_iput; > > @@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode) > tmp->d_parent = tmp; /* make sure dput doesn't croak */ > > spin_lock(&dcache_lock); > - res = __d_find_alias(inode, 0); > + res = __d_find_any_alias(inode); > if (res) { > spin_unlock(&dcache_lock); > dput(tmp); > -- > 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries 2010-12-18 2:01 ` Nick Piggin @ 2010-12-18 16:16 ` J. Bruce Fields [not found] ` <20101218161609.GA22150-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-12-18 16:16 UTC (permalink / raw) To: Nick Piggin Cc: Alexander Viro, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Sat, Dec 18, 2010 at 01:01:18PM +1100, Nick Piggin wrote: > On Fri, Dec 17, 2010 at 01:00:23PM -0500, J. Bruce Fields wrote: > > From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > > > Without this patch > > > > client$ mount -tnfs4 server:/export/ /mnt/ > > client$ tail -f /mnt/FOO > > ... > > server$ df -i /export > > server$ rm /export/FOO > > (^C the tail -f) > > server$ df -i /export > > server$ echo 2 >/proc/sys/vm/drop_caches > > server$ df -i /export > > > > the df's will show that the inode is not freed on the filesystem until > > the last step, when it could have been freed after killing the client's > > tail -f. On-disk data won't be deallocated either, leading to possible > > spurious ENOSPC. > > > > This occurs because when the client does the close, it arrives in a > > compound with a putfh and a close, processed like: > > > > - putfh: look up the filehandle. The only alias found for the > > inode will be DCACHE_UNHASHED alias referenced by the filp > > associated with the nfsd open. d_obtain_alias() doesn't like > > this, so it creates a new DCACHE_DISCONECTED dentry and > > returns that instead. > > > > Nick Piggin suggested fixing this by allowing d_obtain_alias to return > > the unhashed dentry that is referenced by the filp, instead of making it > > create a new dentry. > > > > Leave __d_find_alias() alone to avoid changing behavior of other > > callers. > > > > Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, > > hashed or unhashed, disconnected or not, should work. > > > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > --- > > fs/dcache.c | 26 ++++++++++++++++++++++++-- > > 1 files changed, 24 insertions(+), 2 deletions(-) > > > > On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote: > > > On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote: > > > > Not sure where Al's hiding... > > > > > > > > But I would like to update the comments, and perhaps even a new > > > > add a new function here (or new flag to __d_find_alias). > > > > > > > > AFAIKS, the callers are OK, however I suppose d_splice_alias and > > > > d_materialise_unique should not have unlinked inodes at this point, > > > > so at least a BUG_ON for them might be a good idea? > > > > > > That does sound safer. I'm pretty confused by the various > > > __di_splice_alias callers. I'll go search through them and see if I can > > > understand better.... > > > > I think a new __d_find_alias flag would just make it more confusing. > > And it looks like all d_obtain_alias needs is something much simpler. > > This works for me. > > Well I don't really know if this is clearer. By convention, > __ prefix version should be same as non prefix version but just > take fewer locks. Why do you have the new d_find_any_alias()? Argh, apologies; with typo fixed. --b. commit 5a911af645aae9992d465d57c397237fb35d1f93 Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Date: Fri Dec 17 09:05:04 2010 -0500 fs/dcache: allow __d_obtain_alias() to return unhashed dentries Without this patch client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp associated with the nfsd open. d_obtain_alias() doesn't like this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> diff --git a/fs/dcache.c b/fs/dcache.c index 5ed93cd..5c014a5 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent, return dentry_hashtable + (hash & D_HASHMASK); } +static struct dentry * __d_find_any_alias(struct inode *inode) +{ + struct dentry *alias; + + if (list_empty(&inode->i_dentry)) + return NULL; + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); + __dget_locked(alias); + return alias; +} + +static struct dentry * d_find_any_alias(struct inode *inode) +{ + struct dentry *de; + + spin_lock(&dcache_lock); + de = __d_find_any_alias(inode); + spin_unlock(&dcache_lock); + return de; +} + + /** * d_obtain_alias - find or allocate a dentry for a given inode * @inode: inode to allocate the dentry for @@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode) if (IS_ERR(inode)) return ERR_CAST(inode); - res = d_find_alias(inode); + res = d_find_any_alias(inode); if (res) goto out_iput; @@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode) tmp->d_parent = tmp; /* make sure dput doesn't croak */ spin_lock(&dcache_lock); - res = __d_find_alias(inode, 0); + res = __d_find_any_alias(inode); if (res) { spin_unlock(&dcache_lock); dput(tmp); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <20101218161609.GA22150-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20101218161609.GA22150-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2010-12-19 14:53 ` Nick Piggin [not found] ` <AANLkTingRv_gtRSctGzMfYrKg02M_sKj97HSQPRm_mA_-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2010-12-19 14:53 UTC (permalink / raw) To: J. Bruce Fields Cc: Nick Piggin, Alexander Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Sun, Dec 19, 2010 at 3:16 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Sat, Dec 18, 2010 at 01:01:18PM +1100, Nick Piggin wrote: >> On Fri, Dec 17, 2010 at 01:00:23PM -0500, J. Bruce Fields wrote: >> > From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> > On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote: >> > I think a new __d_find_alias flag would just make it more confusing. >> > And it looks like all d_obtain_alias needs is something much simpler. >> > This works for me. >> >> Well I don't really know if this is clearer. By convention, >> __ prefix version should be same as non prefix version but just >> take fewer locks. Why do you have the new d_find_any_alias()? > > Argh, apologies; with typo fixed. OK that makes a lot more sense :) > --b. > > commit 5a911af645aae9992d465d57c397237fb35d1f93 > Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Date: Fri Dec 17 09:05:04 2010 -0500 > > fs/dcache: allow __d_obtain_alias() to return unhashed dentries > > Without this patch > > client$ mount -tnfs4 server:/export/ /mnt/ > client$ tail -f /mnt/FOO > ... > server$ df -i /export > server$ rm /export/FOO > (^C the tail -f) > server$ df -i /export > server$ echo 2 >/proc/sys/vm/drop_caches > server$ df -i /export > > the df's will show that the inode is not freed on the filesystem until > the last step, when it could have been freed after killing the client's > tail -f. On-disk data won't be deallocated either, leading to possible > spurious ENOSPC. > > This occurs because when the client does the close, it arrives in a > compound with a putfh and a close, processed like: > > - putfh: look up the filehandle. The only alias found for the > inode will be DCACHE_UNHASHED alias referenced by the filp > associated with the nfsd open. d_obtain_alias() doesn't like > this, so it creates a new DCACHE_DISCONECTED dentry and > returns that instead. > > Nick Piggin suggested fixing this by allowing d_obtain_alias to return > the unhashed dentry that is referenced by the filp, instead of making it > create a new dentry. > > Leave __d_find_alias() alone to avoid changing behavior of other > callers. > > Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, > hashed or unhashed, disconnected or not, should work. Yeah, because when the call returns, the name could be concurrently unhashed, but the nfsd operation must still work (either handle the race or be unaffected by it) presumably. Therefore: hashed status should not matter here. Acked-by: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > diff --git a/fs/dcache.c b/fs/dcache.c > index 5ed93cd..5c014a5 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent, > return dentry_hashtable + (hash & D_HASHMASK); > } > > +static struct dentry * __d_find_any_alias(struct inode *inode) > +{ > + struct dentry *alias; > + > + if (list_empty(&inode->i_dentry)) > + return NULL; > + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); > + __dget_locked(alias); > + return alias; > +} > + > +static struct dentry * d_find_any_alias(struct inode *inode) > +{ > + struct dentry *de; > + > + spin_lock(&dcache_lock); > + de = __d_find_any_alias(inode); > + spin_unlock(&dcache_lock); > + return de; > +} > + > + > /** > * d_obtain_alias - find or allocate a dentry for a given inode > * @inode: inode to allocate the dentry for > @@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode) > if (IS_ERR(inode)) > return ERR_CAST(inode); > > - res = d_find_alias(inode); > + res = d_find_any_alias(inode); > if (res) > goto out_iput; > > @@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode) > tmp->d_parent = tmp; /* make sure dput doesn't croak */ > > spin_lock(&dcache_lock); > - res = __d_find_alias(inode, 0); > + res = __d_find_any_alias(inode); > if (res) { > spin_unlock(&dcache_lock); > dput(tmp); > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <AANLkTingRv_gtRSctGzMfYrKg02M_sKj97HSQPRm_mA_-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <AANLkTingRv_gtRSctGzMfYrKg02M_sKj97HSQPRm_mA_-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-12-27 23:46 ` J. Bruce Fields [not found] ` <20101227234641.GA22248-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2010-12-27 23:46 UTC (permalink / raw) To: Alexander Viro Cc: Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Acked-by: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/dcache.c | 26 ++++++++++++++++++++++++-- 1 files changed, 24 insertions(+), 2 deletions(-) On Mon, Dec 20, 2010 at 01:53:10AM +1100, Nick Piggin wrote: > On Sun, Dec 19, 2010 at 3:16 AM, J. Bruce Fields <bfields-uC3wQj2KruMpug/h7KTFAQ@public.gmane.orgg> wrote: > > On Sat, Dec 18, 2010 at 01:01:18PM +1100, Nick Piggin wrote: > >> Well I don't really know if this is clearer. By convention, > >> __ prefix version should be same as non prefix version but just > >> take fewer locks. Why do you have the new d_find_any_alias()? > > > > Argh, apologies; with typo fixed. > > OK that makes a lot more sense :) Thanks very much for your help!--b. diff --git a/fs/dcache.c b/fs/dcache.c index 5ed93cd..5c014a5 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent, return dentry_hashtable + (hash & D_HASHMASK); } +static struct dentry * __d_find_any_alias(struct inode *inode) +{ + struct dentry *alias; + + if (list_empty(&inode->i_dentry)) + return NULL; + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); + __dget_locked(alias); + return alias; +} + +static struct dentry * d_find_any_alias(struct inode *inode) +{ + struct dentry *de; + + spin_lock(&dcache_lock); + de = __d_find_any_alias(inode); + spin_unlock(&dcache_lock); + return de; +} + + /** * d_obtain_alias - find or allocate a dentry for a given inode * @inode: inode to allocate the dentry for @@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode) if (IS_ERR(inode)) return ERR_CAST(inode); - res = d_find_alias(inode); + res = d_find_any_alias(inode); if (res) goto out_iput; @@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode) tmp->d_parent = tmp; /* make sure dput doesn't croak */ spin_lock(&dcache_lock); - res = __d_find_alias(inode, 0); + res = __d_find_any_alias(inode); if (res) { spin_unlock(&dcache_lock); dput(tmp); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <20101227234641.GA22248-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20101227234641.GA22248-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2011-01-18 20:45 ` J. Bruce Fields [not found] ` <20110118204509.GA10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2011-01-18 20:45 UTC (permalink / raw) To: Alexander Viro Cc: Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA An updated version attempting to take into account the new locking. (Nick, did I get this right?) --b. From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Date: Fri, 17 Dec 2010 09:05:04 -0500 Subject: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/dcache.c | 26 ++++++++++++++++++++++++-- 1 files changed, 24 insertions(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 9f493ee..2849258 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode) } EXPORT_SYMBOL(d_alloc_root); +static struct dentry * __d_find_any_alias(struct inode *inode) +{ + struct dentry *alias; + + if (list_empty(&inode->i_dentry)) + return NULL; + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); + __dget(alias); + return alias; +} + +static struct dentry * d_find_any_alias(struct inode *inode) +{ + struct dentry *de; + + spin_lock(&inode->i_lock); + de = __d_find_any_alias(inode); + spin_unlock(&inode->i_lock); + return de; +} + + /** * d_obtain_alias - find or allocate a dentry for a given inode * @inode: inode to allocate the dentry for @@ -1550,7 +1572,7 @@ struct dentry *d_obtain_alias(struct inode *inode) if (IS_ERR(inode)) return ERR_CAST(inode); - res = d_find_alias(inode); + res = d_find_any_alias(inode); if (res) goto out_iput; @@ -1563,7 +1585,7 @@ struct dentry *d_obtain_alias(struct inode *inode) spin_lock(&inode->i_lock); - res = __d_find_alias(inode, 0); + res = __d_find_any_alias(inode); if (res) { spin_unlock(&inode->i_lock); dput(tmp); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <20110118204509.GA10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20110118204509.GA10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2011-01-18 22:02 ` Nick Piggin [not found] ` <AANLkTikL2CDSWQJ1QH_Y4G-j70Vd=VesNMMnYTmMGHC9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: Nick Piggin @ 2011-01-18 22:02 UTC (permalink / raw) To: J. Bruce Fields Cc: Alexander Viro, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Wed, Jan 19, 2011 at 7:45 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > Nick Piggin suggested fixing this by allowing d_obtain_alias to return > the unhashed dentry that is referenced by the filp, instead of making it > create a new dentry. > > Leave __d_find_alias() alone to avoid changing behavior of other > callers. > > Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, > hashed or unhashed, disconnected or not, should work. > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > fs/dcache.c | 26 ++++++++++++++++++++++++-- > 1 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/fs/dcache.c b/fs/dcache.c > index 9f493ee..2849258 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode) > } > EXPORT_SYMBOL(d_alloc_root); > > +static struct dentry * __d_find_any_alias(struct inode *inode) > +{ > + struct dentry *alias; > + > + if (list_empty(&inode->i_dentry)) > + return NULL; > + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); > + __dget(alias); > + return alias; > +} > + > +static struct dentry * d_find_any_alias(struct inode *inode) > +{ > + struct dentry *de; > + > + spin_lock(&inode->i_lock); Yes, i_dentry/d_alias is protected by i_lock, so it looks fine. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <AANLkTikL2CDSWQJ1QH_Y4G-j70Vd=VesNMMnYTmMGHC9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <AANLkTikL2CDSWQJ1QH_Y4G-j70Vd=VesNMMnYTmMGHC9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-01-18 22:08 ` J. Bruce Fields [not found] ` <20110118220817.GF10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2011-01-18 22:08 UTC (permalink / raw) To: Nick Piggin Cc: Alexander Viro, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Wed, Jan 19, 2011 at 09:02:59AM +1100, Nick Piggin wrote: > On Wed, Jan 19, 2011 at 7:45 AM, J. Bruce Fields <bfields-uC3wQj2KruMpug/h7KTFAQ@public.gmane.orgg> wrote: > > @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode) > > } > > EXPORT_SYMBOL(d_alloc_root); > > > > +static struct dentry * __d_find_any_alias(struct inode *inode) > > +{ > > + struct dentry *alias; > > + > > + if (list_empty(&inode->i_dentry)) > > + return NULL; > > + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); > > + __dget(alias); > > + return alias; > > +} > > + > > +static struct dentry * d_find_any_alias(struct inode *inode) > > +{ > > + struct dentry *de; > > + > > + spin_lock(&inode->i_lock); > > Yes, i_dentry/d_alias is protected by i_lock, so it looks fine. OK, thanks; I'm assuming it's OK to re-add your ack in that case. Al, could you apply? (Or if this should go in through an nfsd tree, or some other way, let me know.) --b. From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Date: Fri, 17 Dec 2010 09:05:04 -0500 Subject: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle. The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Acked-by: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/dcache.c | 26 ++++++++++++++++++++++++-- 1 files changed, 24 insertions(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 9f493ee..2849258 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode) } EXPORT_SYMBOL(d_alloc_root); +static struct dentry * __d_find_any_alias(struct inode *inode) +{ + struct dentry *alias; + + if (list_empty(&inode->i_dentry)) + return NULL; + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); + __dget(alias); + return alias; +} + +static struct dentry * d_find_any_alias(struct inode *inode) +{ + struct dentry *de; + + spin_lock(&inode->i_lock); + de = __d_find_any_alias(inode); + spin_unlock(&inode->i_lock); + return de; +} + + /** * d_obtain_alias - find or allocate a dentry for a given inode * @inode: inode to allocate the dentry for @@ -1550,7 +1572,7 @@ struct dentry *d_obtain_alias(struct inode *inode) if (IS_ERR(inode)) return ERR_CAST(inode); - res = d_find_alias(inode); + res = d_find_any_alias(inode); if (res) goto out_iput; @@ -1563,7 +1585,7 @@ struct dentry *d_obtain_alias(struct inode *inode) spin_lock(&inode->i_lock); - res = __d_find_alias(inode, 0); + res = __d_find_any_alias(inode); if (res) { spin_unlock(&inode->i_lock); dput(tmp); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <20110118220817.GF10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20110118220817.GF10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2011-03-08 18:13 ` J. Bruce Fields [not found] ` <20110308181320.GA15566-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2011-03-08 18:13 UTC (permalink / raw) To: Nick Piggin Cc: Alexander Viro, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Tue, Jan 18, 2011 at 05:08:17PM -0500, J. Bruce Fields wrote: > On Wed, Jan 19, 2011 at 09:02:59AM +1100, Nick Piggin wrote: > > On Wed, Jan 19, 2011 at 7:45 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode) > > > } > > > EXPORT_SYMBOL(d_alloc_root); > > > > > > +static struct dentry * __d_find_any_alias(struct inode *inode) > > > +{ > > > + struct dentry *alias; > > > + > > > + if (list_empty(&inode->i_dentry)) > > > + return NULL; > > > + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); > > > + __dget(alias); > > > + return alias; > > > +} > > > + > > > +static struct dentry * d_find_any_alias(struct inode *inode) > > > +{ > > > + struct dentry *de; > > > + > > > + spin_lock(&inode->i_lock); > > > > Yes, i_dentry/d_alias is protected by i_lock, so it looks fine. > > OK, thanks; I'm assuming it's OK to re-add your ack in that case. Al, > could you apply? (Or if this should go in through an nfsd tree, or some > other way, let me know.) Al, do you have this in your queue to look at? Need me to resend? Or should it take some other route? --b. > > --b. > > From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Date: Fri, 17 Dec 2010 09:05:04 -0500 > Subject: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries > > Without this patch, inodes are not promptly freed on last close of an > unlinked file by an nfs client: > > client$ mount -tnfs4 server:/export/ /mnt/ > client$ tail -f /mnt/FOO > ... > server$ df -i /export > server$ rm /export/FOO > (^C the tail -f) > server$ df -i /export > server$ echo 2 >/proc/sys/vm/drop_caches > server$ df -i /export > > the df's will show that the inode is not freed on the filesystem until > the last step, when it could have been freed after killing the client's > tail -f. On-disk data won't be deallocated either, leading to possible > spurious ENOSPC. > > This occurs because when the client does the close, it arrives in a > compound with a putfh and a close, processed like: > > - putfh: look up the filehandle. The only alias found for the > inode will be DCACHE_UNHASHED alias referenced by the filp > this, so it creates a new DCACHE_DISCONECTED dentry and > returns that instead. > - close: closes the existing filp, which is destroyed > immediately by dput() since it's DCACHE_UNHASHED. > - end of the compound: release the reference > to the current filehandle, and dput() the new > DCACHE_DISCONECTED dentry, which gets put on the > unused list instead of being destroyed immediately. > > Nick Piggin suggested fixing this by allowing d_obtain_alias to return > the unhashed dentry that is referenced by the filp, instead of making it > create a new dentry. > > Leave __d_find_alias() alone to avoid changing behavior of other > callers. > > Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, > hashed or unhashed, disconnected or not, should work. > > Acked-by: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > fs/dcache.c | 26 ++++++++++++++++++++++++-- > 1 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/fs/dcache.c b/fs/dcache.c > index 9f493ee..2849258 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode) > } > EXPORT_SYMBOL(d_alloc_root); > > +static struct dentry * __d_find_any_alias(struct inode *inode) > +{ > + struct dentry *alias; > + > + if (list_empty(&inode->i_dentry)) > + return NULL; > + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias); > + __dget(alias); > + return alias; > +} > + > +static struct dentry * d_find_any_alias(struct inode *inode) > +{ > + struct dentry *de; > + > + spin_lock(&inode->i_lock); > + de = __d_find_any_alias(inode); > + spin_unlock(&inode->i_lock); > + return de; > +} > + > + > /** > * d_obtain_alias - find or allocate a dentry for a given inode > * @inode: inode to allocate the dentry for > @@ -1550,7 +1572,7 @@ struct dentry *d_obtain_alias(struct inode *inode) > if (IS_ERR(inode)) > return ERR_CAST(inode); > > - res = d_find_alias(inode); > + res = d_find_any_alias(inode); > if (res) > goto out_iput; > > @@ -1563,7 +1585,7 @@ struct dentry *d_obtain_alias(struct inode *inode) > > > spin_lock(&inode->i_lock); > - res = __d_find_alias(inode, 0); > + res = __d_find_any_alias(inode); > if (res) { > spin_unlock(&inode->i_lock); > dput(tmp); > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20110308181320.GA15566-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20110308181320.GA15566-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2011-03-10 10:58 ` Al Viro [not found] ` <20110310105821.GE22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: Al Viro @ 2011-03-10 10:58 UTC (permalink / raw) To: J. Bruce Fields Cc: Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Neil Brown On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote: > Al, do you have this in your queue to look at? Need me to resend? Or > should it take some other route? It's in queue, but I'd be a lot happier if I understood what's going on with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing is, unless I'm missing something we ought to use __d_find_any_alias() there as well. Directories really, _really_ should not have more than one alias. And what we get is really weird: * find (the only) alias * if it doesn't exist, create one (OK, no problem) * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED, move it (also fine, modulo rather useless BUG_ON() in there) * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED, add a new alias and say nothing. The last part looks very strange. I'd been staring at the history of this function and I _think_ it's a rudiment of period when we used that stuff for non-directories as well. It used to be directory-only; then Neil had switched it to non-directories as well (in "Fix disconnected dentries on NFS exports" back in 2004), adding want_discon argument to __d_find_alias() in process and using it instead of open-coded equivalent of __d_find_any_alias(). Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias" he'd made that code directory-only again, at which point we arrived to the current situation. Neil, is there some reason I'm missing that makes __d_find_alias() right in there? And do we still need the second argument in __d_find_alias()? Anyway, Bruce's patch goes in tonight's push, but I'd love to see that mess cleaned up as well or at least explained. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20110310105821.GE22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20110310105821.GE22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2011-03-11 4:07 ` NeilBrown [not found] ` <20110311150749.2fa2be66-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: NeilBrown @ 2011-03-11 4:07 UTC (permalink / raw) To: Al Viro Cc: J. Bruce Fields, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote: > > > Al, do you have this in your queue to look at? Need me to resend? Or > > should it take some other route? > > It's in queue, but I'd be a lot happier if I understood what's going on > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing > is, unless I'm missing something we ought to use __d_find_any_alias() > there as well. Directories really, _really_ should not have more than > one alias. And what we get is really weird: > * find (the only) alias > * if it doesn't exist, create one (OK, no problem) > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED, > move it (also fine, modulo rather useless BUG_ON() in there) > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED, > add a new alias and say nothing. > > The last part looks very strange. I'd been staring at the history of this > function and I _think_ it's a rudiment of period when we used that stuff for > non-directories as well. It used to be directory-only; then Neil had > switched it to non-directories as well (in "Fix disconnected dentries on NFS > exports" back in 2004), adding want_discon argument to __d_find_alias() in > process and using it instead of open-coded equivalent of __d_find_any_alias(). > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias" > he'd made that code directory-only again, at which point we arrived to the > current situation. > > Neil, is there some reason I'm missing that makes __d_find_alias() right in > there? And do we still need the second argument in __d_find_alias()? > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that > mess cleaned up as well or at least explained. ahh.... that takes me back ... :-) Thanks for identifying those patches Al - it saved me a lot of time. What can I say.... the first patch is clearly wrong, for the reasons mentioned in the second patch. And the second patch is wrong, partly because of the reasons you give and partly because it re-introduces the the bug that the first patch aimed to fix. All very sad really. So what should it do. 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence was only a hint, its absence was a strong statement. If the flag is set, the dentry might not be linked to the root. If it is clear, it definitely is link through to the root. However I think it was used with stronger intent than that. Now it seems to mean a little bit more: If it is set and the dentry is hashed, then it must be on the sb->s_anon list. This is a significant which I never noticed (I haven't been watching). Originally a disconnected dentry would be attached (and hashed) to its parent. Then that parent would get its own parent and so on until it was attached all the way to the root. Only then would be start clearing DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if that is correct. Anyway, the original idea was: 1/ when you create an anonymous dentry for nfsd, set DCACHE_DISCONNECTED. 2/ when nfsd sees DCACHE_DISCONNECTED it makes sure there is a connection to the root, and only clears DCACHE_DISCONNECTED when that connection is certain. 3/ no-one else should look at DCACHE_DISCONNECTED. 2/ directories always have at most one dentry. It might be IS_ROOT(), and it might be DCACHE_DISCONNECTED. non-directories can have multiple dentries, but (and here is where the various patches changes things) if there is an IS_ROOT() dentry, it must be the only hashed dentry. (IS_ROOT dentries are 'hashed' on the s_anon chain). 3/ when nfsd finds an inode, it first tries to get a hashed dentry - any hashed dentry. If necessary (and as a last resort) it will create an anonymous dentry. For directories, an unhashed dentry will do too (I think ... not 100% clear on that just now). This is d_obtain_alias 4/ When nfsd wants the dentry to have a name - either because it is a directory or because 'subtree_check' is set - and finds DCACHE_DISCONNECTED is set is find the parent inode and makes sure it has a non-DCACHE_DISCONNECTED entry (see 3 and 4). Once it finds that the parent has a non-DCACHE_DISCONNECTED dentry, it clears DCACHE_DISCONNECTED on the child. So what do we have: d_obtain_alias: finds a hashed alias (or any alias for a directory) and if there isn't one, creates an IS_ROOT alias, sets DCACHE_DISCONNECTED, and adds to sb->s_anon. This is used by various "get_parent" and "get_dentry" routines as you would expect. ceph uses it in open_root_dentry which seems a bit odd - maybe it is OK. d_splice_alias This is more complicated than it should be - with the complication in __d_find_alias. It's job is to see if the inode already has an IS_ROOT alias. If it does, it should d_move that alias in place of the one it was given, else instantiate the one it was given. It doesn't have to look very hard. A directory will only have one dentry, so that one either IS_ROOT or not. If a non-directory has an IS_ROOT dentry, it will be the only hashed dentry. So there is no question of preferring non-IS_ROOT dentries. If there is one, it will be the only one. I'm a bit uncomfortable that d_splice_alias drops all locks before calling d_move. Normally when d_move is called, both directories are locked in some way. In the case the source is not a directory bit is the state of being anonymous. Theoretically two calls to d_splice_alias on the same inode could try to d_move the same anonymous dentry to two different places, which would be bad. So I think some locking is needed there. d_materialise_unique Not sure what this is for... it looks a lot like d_splice_alias, only it is a lot more complicated. The comments in the code and in the original commit sound exactly like d_splice_alias. One of these two should certainly go. It clears DCACHE_DISCONNECTED which is probably the wrong thing to do. __d_drop assumes that if DCACHE_DISCONNECTED is set then the dentry is hashed to s_anon. This is wrong - d_splice_alias can invalidate this. It should be testing IS_ROOT(). And IS_ROOT will only ever be hashed to s_anon. So I'm suggesting the following patch as a start. I'm still worried about the locking for d_move in d_splice_alias, and I think we can discard d_materialise_unique, but I would want to understand why it is different first. NeilBrown diff --git a/fs/dcache.c b/fs/dcache.c index 2a6bd9a..a6f1884 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -327,7 +327,7 @@ static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent) void __d_drop(struct dentry *dentry) { if (!(dentry->d_flags & DCACHE_UNHASHED)) { - if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED)) { + if (unlikely(IS_ROOT(dentry))) { bit_spin_lock(0, (unsigned long *)&dentry->d_sb->s_anon.first); dentry->d_flags |= DCACHE_UNHASHED; @@ -565,52 +565,29 @@ EXPORT_SYMBOL(dget_parent); /** * d_find_alias - grab a hashed alias of inode * @inode: inode in question - * @want_discon: flag, used by d_splice_alias, to request - * that only a DISCONNECTED alias be returned. + * @want_discon: flag, used by d_splice_alias to request + * that only an IS_ROOT alias be returned. * * If inode has a hashed alias, or is a directory and has any alias, * acquire the reference to alias and return it. Otherwise return NULL. * Notice that if inode is a directory there can be only one alias and * it can be unhashed only if it has no children, or if it is the root * of a filesystem. - * - * If the inode has an IS_ROOT, DCACHE_DISCONNECTED alias, then prefer - * any other hashed alias over that one unless @want_discon is set, - * in which case only return an IS_ROOT, DCACHE_DISCONNECTED alias. + * If want_discon is set, then only return an IS_ROOT alias. */ static struct dentry *__d_find_alias(struct inode *inode, int want_discon) { - struct dentry *alias, *discon_alias; + struct dentry *alias; -again: - discon_alias = NULL; list_for_each_entry(alias, &inode->i_dentry, d_alias) { spin_lock(&alias->d_lock); - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { - if (IS_ROOT(alias) && - (alias->d_flags & DCACHE_DISCONNECTED)) { - discon_alias = alias; - } else if (!want_discon) { - __dget_dlock(alias); - spin_unlock(&alias->d_lock); - return alias; - } - } - spin_unlock(&alias->d_lock); - } - if (discon_alias) { - alias = discon_alias; - spin_lock(&alias->d_lock); - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) { - if (IS_ROOT(alias) && - (alias->d_flags & DCACHE_DISCONNECTED)) { - __dget_dlock(alias); - spin_unlock(&alias->d_lock); - return alias; - } + if ((S_ISDIR(inode->i_mode) || !d_unhashed(alias)) + && (!want_discon || IS_ROOT(alias))) { + __dget_dlock(alias); + spin_unlock(&alias->d_lock); + return alias; } spin_unlock(&alias->d_lock); - goto again; } return NULL; } @@ -1599,9 +1576,9 @@ EXPORT_SYMBOL(d_obtain_alias); * @inode: the inode which may have a disconnected dentry * @dentry: a negative dentry which we want to point to the inode. * - * If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and - * DCACHE_DISCONNECTED), then d_move that in place of the given dentry - * and return it, else simply d_add the inode to the dentry and return NULL. + * If inode and has a 'disconnected' dentry (i.e. IS_ROOT()) then + * d_move that in place of the given dentry and return it, else simply + * d_add the inode to the dentry and return NULL. * * This is needed in the lookup routine of any filesystem that is exportable * (via knfsd) so that we can build dcache paths to directories effectively. @@ -1614,11 +1591,10 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) { struct dentry *new = NULL; - if (inode && S_ISDIR(inode->i_mode)) { + if (inode) { spin_lock(&inode->i_lock); new = __d_find_alias(inode, 1); if (new) { - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); spin_unlock(&inode->i_lock); security_d_instantiate(new, inode); d_move(new, dentry); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <20110311150749.2fa2be66-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20110311150749.2fa2be66-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2012-02-14 17:03 ` J. Bruce Fields [not found] ` <20120214170300.GA4309-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-06-28 13:59 ` J. Bruce Fields 1 sibling, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-02-14 17:03 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > > > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote: > > > > > Al, do you have this in your queue to look at? Need me to resend? Or > > > should it take some other route? > > > > It's in queue, but I'd be a lot happier if I understood what's going on > > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing > > is, unless I'm missing something we ought to use __d_find_any_alias() > > there as well. Directories really, _really_ should not have more than > > one alias. And what we get is really weird: > > * find (the only) alias > > * if it doesn't exist, create one (OK, no problem) > > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED, > > move it (also fine, modulo rather useless BUG_ON() in there) > > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED, > > add a new alias and say nothing. > > > > The last part looks very strange. I'd been staring at the history of this > > function and I _think_ it's a rudiment of period when we used that stuff for > > non-directories as well. It used to be directory-only; then Neil had > > switched it to non-directories as well (in "Fix disconnected dentries on NFS > > exports" back in 2004), adding want_discon argument to __d_find_alias() in > > process and using it instead of open-coded equivalent of __d_find_any_alias(). > > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias" > > he'd made that code directory-only again, at which point we arrived to the > > current situation. > > > > Neil, is there some reason I'm missing that makes __d_find_alias() right in > > there? And do we still need the second argument in __d_find_alias()? > > > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that > > mess cleaned up as well or at least explained. Nothing like seeing somebody actually run across a bug here to motivate getting back to this.... So, like Al I'm wondering whether the __d_find_alias in d_splice_alias should be a __d_find_any_alias. Disclaimer: the bug manifested in rhel5, and I haven't looked closely at upstream yet, though it seems like it would have the same problem. In the particular case I'm seeing, the directory already has an alias that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a second alias. (And we notice because a rename happens to pick up one of each, and the rename locking then tries to take the parent's i_mutex twice (since the rename code think it's dealing with a cross-directory rename)). Using __d_find_any_alias seems to fix the problem. Would rehashing an UNHASHED dentry here violate any important assumptions? > I'm a bit uncomfortable that d_splice_alias drops all locks before > calling d_move. Normally when d_move is called, both directories are > locked in some way. In the case the source is not a directory bit > is the state of being anonymous. Theoretically two calls to > d_splice_alias on the same inode could try to d_move the same > anonymous dentry to two different places, which would be bad. > So I think some locking is needed there. I think d_splice_alias is only ever called from lookup, under the parent's i_mutex? So if we have two tasks in the d_move here, then the two callers must be doing lookups in two different directories, and holding the i_mutex on both of them. The directory can't have been renamed while either caller was holding the i_mutex, so this is really a directory that's returned on lookup from two different parents. That sounds like a bug already. Could it happen on a corrupted filesystem maybe? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20120214170300.GA4309-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120214170300.GA4309-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2012-02-15 16:56 ` J. Bruce Fields [not found] ` <20120215165633.GE12490-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-02-15 16:56 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Tue, Feb 14, 2012 at 12:03:00PM -0500, J. Bruce Fields wrote: > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > > On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > > > > > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote: > > > > > > > Al, do you have this in your queue to look at? Need me to resend? Or > > > > should it take some other route? > > > > > > It's in queue, but I'd be a lot happier if I understood what's going on > > > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing > > > is, unless I'm missing something we ought to use __d_find_any_alias() > > > there as well. Directories really, _really_ should not have more than > > > one alias. And what we get is really weird: > > > * find (the only) alias > > > * if it doesn't exist, create one (OK, no problem) > > > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED, > > > move it (also fine, modulo rather useless BUG_ON() in there) > > > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED, > > > add a new alias and say nothing. > > > > > > The last part looks very strange. I'd been staring at the history of this > > > function and I _think_ it's a rudiment of period when we used that stuff for > > > non-directories as well. It used to be directory-only; then Neil had > > > switched it to non-directories as well (in "Fix disconnected dentries on NFS > > > exports" back in 2004), adding want_discon argument to __d_find_alias() in > > > process and using it instead of open-coded equivalent of __d_find_any_alias(). > > > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias" > > > he'd made that code directory-only again, at which point we arrived to the > > > current situation. > > > > > > Neil, is there some reason I'm missing that makes __d_find_alias() right in > > > there? And do we still need the second argument in __d_find_alias()? > > > > > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that > > > mess cleaned up as well or at least explained. > > Nothing like seeing somebody actually run across a bug here to motivate > getting back to this.... > > So, like Al I'm wondering whether the __d_find_alias in d_splice_alias > should be a __d_find_any_alias. > > Disclaimer: the bug manifested in rhel5, and I haven't looked closely at > upstream yet, though it seems like it would have the same problem. > > In the particular case I'm seeing, the directory already has an alias > that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a > second alias. The reproducer I was using against rhel5 isn't going to work against upstream, because it depended on the dentry_unhash() call in vfs_rmdir() to create that UNHASHED dentry. Though I think there's still code that can leave UNHASHED dentries around still on the alias list. I'll look some more.... --b. > > (And we notice because a rename happens to pick up one of each, and the > rename locking then tries to take the parent's i_mutex twice (since the > rename code think it's dealing with a cross-directory rename)). > > Using __d_find_any_alias seems to fix the problem. > > Would rehashing an UNHASHED dentry here violate any important assumptions? > > > I'm a bit uncomfortable that d_splice_alias drops all locks before > > calling d_move. Normally when d_move is called, both directories are > > locked in some way. In the case the source is not a directory bit > > is the state of being anonymous. Theoretically two calls to > > d_splice_alias on the same inode could try to d_move the same > > anonymous dentry to two different places, which would be bad. > > So I think some locking is needed there. > > I think d_splice_alias is only ever called from lookup, under the > parent's i_mutex? > > So if we have two tasks in the d_move here, then the two callers must be > doing lookups in two different directories, and holding the i_mutex on > both of them. > > The directory can't have been renamed while either caller was holding > the i_mutex, so this is really a directory that's returned on lookup > from two different parents. > > That sounds like a bug already. Could it happen on a corrupted > filesystem maybe? > > --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20120215165633.GE12490-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120215165633.GE12490-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2012-02-16 3:06 ` NeilBrown [not found] ` <20120216140603.08cb4900-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2012-02-16 22:30 ` J. Bruce Fields 0 siblings, 2 replies; 42+ messages in thread From: NeilBrown @ 2012-02-16 3:06 UTC (permalink / raw) To: J. Bruce Fields Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 3709 bytes --] On Wed, 15 Feb 2012 11:56:33 -0500 "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Tue, Feb 14, 2012 at 12:03:00PM -0500, J. Bruce Fields wrote: > > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > > > On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > > > > > > > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote: > > > > > > > > > Al, do you have this in your queue to look at? Need me to resend? Or > > > > > should it take some other route? > > > > > > > > It's in queue, but I'd be a lot happier if I understood what's going on > > > > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing > > > > is, unless I'm missing something we ought to use __d_find_any_alias() > > > > there as well. Directories really, _really_ should not have more than > > > > one alias. And what we get is really weird: > > > > * find (the only) alias > > > > * if it doesn't exist, create one (OK, no problem) > > > > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED, > > > > move it (also fine, modulo rather useless BUG_ON() in there) > > > > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED, > > > > add a new alias and say nothing. > > > > > > > > The last part looks very strange. I'd been staring at the history of this > > > > function and I _think_ it's a rudiment of period when we used that stuff for > > > > non-directories as well. It used to be directory-only; then Neil had > > > > switched it to non-directories as well (in "Fix disconnected dentries on NFS > > > > exports" back in 2004), adding want_discon argument to __d_find_alias() in > > > > process and using it instead of open-coded equivalent of __d_find_any_alias(). > > > > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias" > > > > he'd made that code directory-only again, at which point we arrived to the > > > > current situation. > > > > > > > > Neil, is there some reason I'm missing that makes __d_find_alias() right in > > > > there? And do we still need the second argument in __d_find_alias()? > > > > > > > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that > > > > mess cleaned up as well or at least explained. > > > > Nothing like seeing somebody actually run across a bug here to motivate > > getting back to this.... > > > > So, like Al I'm wondering whether the __d_find_alias in d_splice_alias > > should be a __d_find_any_alias. > > > > Disclaimer: the bug manifested in rhel5, and I haven't looked closely at > > upstream yet, though it seems like it would have the same problem. > > > > In the particular case I'm seeing, the directory already has an alias > > that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a > > second alias. > > The reproducer I was using against rhel5 isn't going to work against > upstream, because it depended on the dentry_unhash() call in vfs_rmdir() > to create that UNHASHED dentry. > > Though I think there's still code that can leave UNHASHED dentries > around still on the alias list. I'll look some more.... I was going ask how you managed to get an 'unhashed' dentry which was not DISCONNECTED, and belonged to a directory that could be the subject of d_splice_alias (that implies it has a name). The bug sounds like a race between lookup and rmdir, which should be prevented by i_mutex. I think that using __d_find_any_alias would just be papering over the problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20120216140603.08cb4900-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120216140603.08cb4900-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2012-02-16 11:51 ` J. Bruce Fields [not found] ` <20120216115133.GA20279-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-02-16 11:51 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote: > On Wed, 15 Feb 2012 11:56:33 -0500 "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> > wrote: > > > On Tue, Feb 14, 2012 at 12:03:00PM -0500, J. Bruce Fields wrote: > > > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > > > > On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> wrote: > > > > > > > > > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote: > > > > > > > > > > > Al, do you have this in your queue to look at? Need me to resend? Or > > > > > > should it take some other route? > > > > > > > > > > It's in queue, but I'd be a lot happier if I understood what's going on > > > > > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing > > > > > is, unless I'm missing something we ought to use __d_find_any_alias() > > > > > there as well. Directories really, _really_ should not have more than > > > > > one alias. And what we get is really weird: > > > > > * find (the only) alias > > > > > * if it doesn't exist, create one (OK, no problem) > > > > > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED, > > > > > move it (also fine, modulo rather useless BUG_ON() in there) > > > > > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED, > > > > > add a new alias and say nothing. > > > > > > > > > > The last part looks very strange. I'd been staring at the history of this > > > > > function and I _think_ it's a rudiment of period when we used that stuff for > > > > > non-directories as well. It used to be directory-only; then Neil had > > > > > switched it to non-directories as well (in "Fix disconnected dentries on NFS > > > > > exports" back in 2004), adding want_discon argument to __d_find_alias() in > > > > > process and using it instead of open-coded equivalent of __d_find_any_alias(). > > > > > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias" > > > > > he'd made that code directory-only again, at which point we arrived to the > > > > > current situation. > > > > > > > > > > Neil, is there some reason I'm missing that makes __d_find_alias() right in > > > > > there? And do we still need the second argument in __d_find_alias()? > > > > > > > > > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that > > > > > mess cleaned up as well or at least explained. > > > > > > Nothing like seeing somebody actually run across a bug here to motivate > > > getting back to this.... > > > > > > So, like Al I'm wondering whether the __d_find_alias in d_splice_alias > > > should be a __d_find_any_alias. > > > > > > Disclaimer: the bug manifested in rhel5, and I haven't looked closely at > > > upstream yet, though it seems like it would have the same problem. > > > > > > In the particular case I'm seeing, the directory already has an alias > > > that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a > > > second alias. > > > > The reproducer I was using against rhel5 isn't going to work against > > upstream, because it depended on the dentry_unhash() call in vfs_rmdir() > > to create that UNHASHED dentry. > > > > Though I think there's still code that can leave UNHASHED dentries > > around still on the alias list. I'll look some more.... > > I was going ask how you managed to get an 'unhashed' dentry which was not > DISCONNECTED, and belonged to a directory that could be the subject of > d_splice_alias (that implies it has a name). > > The bug sounds like a race between lookup and rmdir, which should be > prevented by i_mutex. The rhel5 race is a bit more complicated than that: a filehandle lookup has to come in to grab the unhashed dentry before rmdir drops the parent i_mutex, and then the lookup comes along later. I suspect that this is actually expected, and that it's not safe for us to assume that directories will lose any UNHASHED dentries before relevant locks are dropped. I'm seeing something similar happening on an upstream kernel, so I'll try to figure out where that's coming from now.... > I think that using __d_find_any_alias would just be papering over the > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias. Right, I throw away that BUG_ON as well. Tested on rhel5 only. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20120216115133.GA20279-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120216115133.GA20279-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2012-02-16 16:08 ` J. Bruce Fields 0 siblings, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2012-02-16 16:08 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky On Thu, Feb 16, 2012 at 06:51:33AM -0500, J. Bruce Fields wrote: > I'm seeing something similar happening on an upstream kernel, so I'll > try to figure out where that's coming from now.... After taking another look, actually this was something happening on proc, not on an exported filesystem. Running some tests on a kernel with this extra warning added to __d_instantiate whenever it gives a directory a second alias: Feb 15 14:43:07 pip1 kernel: __d_instantiate adding alias ffff880037bf51c0(8) to dir ffff880035eaf048(4026532190) with existing alias(es) ffff880034037180(88) The numbers after the dentry pointers are dentry flags in hex (8 == DCACHE_OP_DELETE, 88 == DCACHE_RCUACCESS|DCACHE_OP_DELETE), and after the inode pointers is an inode number. I don't know if it's a bug or not, though it seems suspicous. --b. Feb 15 14:43:07 pip1 kernel: ------------[ cut here ]------------ Feb 15 14:43:07 pip1 kernel: WARNING: at fs/dcache.c:1320 __d_instantiate+0x185/0x1d0() Feb 15 14:43:07 pip1 kernel: Hardware name: Bochs Feb 15 14:43:07 pip1 kernel: Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc [last unloaded: scsi_wait_scan] Feb 15 14:43:07 pip1 kernel: Pid: 4810, comm: exportfs Not tainted 3.3.0-rc2-00517-g35ae270 #1070 Feb 15 14:43:07 pip1 kernel: Call Trace: Feb 15 14:43:07 pip1 kernel: [<ffffffff8104040f>] warn_slowpath_common+0x7f/0xc0 Feb 15 14:43:07 pip1 kernel: [<ffffffff8104046a>] warn_slowpath_null+0x1a/0x20 Feb 15 14:43:07 pip1 kernel: [<ffffffff8113f885>] __d_instantiate+0x185/0x1d0 Feb 15 14:43:07 pip1 kernel: [<ffffffff8113face>] d_instantiate+0x4e/0x90 Feb 15 14:43:07 pip1 kernel: [<ffffffff81192d7b>] proc_lookup_de+0xab/0x110 Feb 15 14:43:07 pip1 kernel: [<ffffffff81196a90>] ? proc_sys_poll_notify+0x30/0x30 Feb 15 14:43:07 pip1 kernel: [<ffffffff81196c1f>] proc_tgid_net_lookup+0x3f/0x50 Feb 15 14:43:07 pip1 kernel: [<ffffffff81133f55>] d_alloc_and_lookup+0x45/0x90 Feb 15 14:43:07 pip1 kernel: [<ffffffff8113e455>] ? d_lookup+0x35/0x60 Feb 15 14:43:07 pip1 kernel: [<ffffffff8113453a>] do_lookup+0x28a/0x3b0 Feb 15 14:43:07 pip1 kernel: [<ffffffff814c792c>] ? security_inode_permission+0x1c/0x30 Feb 15 14:43:07 pip1 kernel: [<ffffffff81135337>] link_path_walk+0x157/0x940 Feb 15 14:43:07 pip1 kernel: [<ffffffff8113910c>] path_openat+0xbc/0x440 Feb 15 14:43:07 pip1 kernel: [<ffffffff81106243>] ? might_fault+0x53/0xb0 Feb 15 14:43:07 pip1 kernel: [<ffffffff81106243>] ? might_fault+0x53/0xb0 Feb 15 14:43:07 pip1 kernel: [<ffffffff81145aeb>] ? alloc_fd+0x3b/0x160 Feb 15 14:43:07 pip1 kernel: [<ffffffff811395a9>] do_filp_open+0x49/0xa0 Feb 15 14:43:07 pip1 kernel: [<ffffffff81145b5b>] ? alloc_fd+0xab/0x160 Feb 15 14:43:07 pip1 kernel: [<ffffffff81126328>] do_sys_open+0x108/0x1e0 Feb 15 14:43:07 pip1 kernel: [<ffffffff81126441>] sys_open+0x21/0x30 Feb 15 14:43:07 pip1 kernel: [<ffffffff81973252>] system_call_fastpath+0x16/0x1b Feb 15 14:43:07 pip1 kernel: ---[ end trace 5714bb1463cb9b46 ]--- > > > I think that using __d_find_any_alias would just be papering over the > > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias. > > Right, I throw away that BUG_ON as well. Tested on rhel5 only. > > --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries 2012-02-16 3:06 ` NeilBrown [not found] ` <20120216140603.08cb4900-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2012-02-16 22:30 ` J. Bruce Fields [not found] ` <20120216223011.GA23997-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 1 sibling, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-02-16 22:30 UTC (permalink / raw) To: NeilBrown; +Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs, linux-fsdevel On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote: > I was going ask how you managed to get an 'unhashed' dentry which was not > DISCONNECTED, and belonged to a directory that could be the subject of > d_splice_alias (that implies it has a name). > > The bug sounds like a race between lookup and rmdir, which should be > prevented by i_mutex. > > I think that using __d_find_any_alias would just be papering over the > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias. Looking through the latest upstream code, I can't come up with another obvious reproducer. But I also can't see the code making any particular effort to ensure that dentries are removed from inode's alias lists at the same time they're unhashed. E.g., trace up through the callers of d_drop/__d_drop and try to convince yourselves that they all end up removing the dentry from the alias list. Can you see any reason why the following would actually create a problem? --b. commit fcfef6b7319c5d19ea5064317528ff994343b011 Author: J. Bruce Fields <bfields@redhat.com> Date: Mon Feb 13 13:38:33 2012 -0500 exports: stop d_splice_alias creating directory aliases A directory should never have more than one dentry pointing to it. But d_splice_alias() does so in the case of a directory with an already-existing non-DISCONNECTED dentry. Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, this could cause an nfsd deadlock like this: - Somebody attempts to remove a non-empty directory. - The dentry_unhash() in vfs_rmdir() unhashes the dentry pointing to the non-empty directory. - ->rmdir() then fails with -ENOTEMPTY - Before the vfs_rmdir() caller reaches dput(), an nfsd process in rename looks up the directory by filehandle; at the end of that lookup, this dentry is found by d_alloc_anon(), and a reference is taken on it, preventing dput() from removing it. - A regular lookup of the directory calls d_splice_alias(), finds only an unhashed (not a DISCONNECTED) dentry, and insteads adds a new one, so the directory now has two dentries. - The nfsd process in rename, which was previously looking up the source directory of the rename, now looks up the target directory (which is the same), and gets the dentry newly created by the previous lookup. - The rename, seeing two different dentries, assumes this is a cross-directory rename and attempts to take the i_mutex on the directory twice. I don't see as obvious a reproducer now, but I also don't see the existing code taking care to remove dentries from the alias list whenever they're unhashed. It therefore seems safest to allow d_splice_alias to use any dentry it finds. Signed-off-by: J. Bruce Fields <bfields@redhat.com> diff --git a/fs/dcache.c b/fs/dcache.c index f68e193..1fd2256 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) if (inode && S_ISDIR(inode->i_mode)) { spin_lock(&inode->i_lock); - new = __d_find_alias(inode, 1); + new = __d_find_any_alias(inode); if (new) { - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); spin_unlock(&inode->i_lock); security_d_instantiate(new, inode); d_move(new, dentry); ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <20120216223011.GA23997-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120216223011.GA23997-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2012-02-17 16:34 ` Peng Tao 2012-03-13 20:55 ` J. Bruce Fields 2012-02-20 2:55 ` [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries NeilBrown 1 sibling, 1 reply; 42+ messages in thread From: Peng Tao @ 2012-02-17 16:34 UTC (permalink / raw) To: J. Bruce Fields Cc: NeilBrown, Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Fri, Feb 17, 2012 at 6:30 AM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote: >> I was going ask how you managed to get an 'unhashed' dentry which was not >> DISCONNECTED, and belonged to a directory that could be the subject of >> d_splice_alias (that implies it has a name). >> >> The bug sounds like a race between lookup and rmdir, which should be >> prevented by i_mutex. >> >> I think that using __d_find_any_alias would just be papering over the >> problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias. > > Looking through the latest upstream code, I can't come up with another > obvious reproducer. > > But I also can't see the code making any particular effort to ensure > that dentries are removed from inode's alias lists at the same time > they're unhashed. > > E.g., trace up through the callers of d_drop/__d_drop and try to > convince yourselves that they all end up removing the dentry from the > alias list. > > Can you see any reason why the following would actually create a > problem? > > --b. > > commit fcfef6b7319c5d19ea5064317528ff994343b011 > Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Date: Mon Feb 13 13:38:33 2012 -0500 > > exports: stop d_splice_alias creating directory aliases > > A directory should never have more than one dentry pointing to it. > > But d_splice_alias() does so in the case of a directory with an > already-existing non-DISCONNECTED dentry. > > Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, > this could cause an nfsd deadlock like this: > > - Somebody attempts to remove a non-empty directory. > - The dentry_unhash() in vfs_rmdir() unhashes the dentry > pointing to the non-empty directory. > - ->rmdir() then fails with -ENOTEMPTY > - Before the vfs_rmdir() caller reaches dput(), an nfsd process > in rename looks up the directory by filehandle; at the end of > that lookup, this dentry is found by d_alloc_anon(), and a > reference is taken on it, preventing dput() from removing it. > - A regular lookup of the directory calls d_splice_alias(), > finds only an unhashed (not a DISCONNECTED) dentry, and > insteads adds a new one, so the directory now has two > dentries. > - The nfsd process in rename, which was previously looking up > the source directory of the rename, now looks up the target > directory (which is the same), and gets the dentry newly > created by the previous lookup. > - The rename, seeing two different dentries, assumes this is a > cross-directory rename and attempts to take the i_mutex on the > directory twice. > > I don't see as obvious a reproducer now, but I also don't see the > existing code taking care to remove dentries from the alias list > whenever they're unhashed. > > It therefore seems safest to allow d_splice_alias to use any dentry it > finds. > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > diff --git a/fs/dcache.c b/fs/dcache.c > index f68e193..1fd2256 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) > > if (inode && S_ISDIR(inode->i_mode)) { > spin_lock(&inode->i_lock); > - new = __d_find_alias(inode, 1); by replacing this, the want_discon argument is no longer in use. care to remove it as well? -- Thanks, Tao > + new = __d_find_any_alias(inode); > if (new) { > - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); > spin_unlock(&inode->i_lock); > security_d_instantiate(new, inode); > d_move(new, dentry); > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries 2012-02-17 16:34 ` Peng Tao @ 2012-03-13 20:55 ` J. Bruce Fields 2012-03-13 20:58 ` [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases J. Bruce Fields 2012-03-13 20:58 ` [PATCH 2/2] vfs: remove unused __d_splice_alias argument J. Bruce Fields 0 siblings, 2 replies; 42+ messages in thread From: J. Bruce Fields @ 2012-03-13 20:55 UTC (permalink / raw) To: Peng Tao Cc: NeilBrown, Al Viro, Nick Piggin, Nick Piggin, linux-nfs, linux-fsdevel On Sat, Feb 18, 2012 at 12:34:14AM +0800, Peng Tao wrote: > > diff --git a/fs/dcache.c b/fs/dcache.c > > index f68e193..1fd2256 100644 > > --- a/fs/dcache.c > > +++ b/fs/dcache.c > > @@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) > > > > if (inode && S_ISDIR(inode->i_mode)) { > > spin_lock(&inode->i_lock); > > - new = __d_find_alias(inode, 1); > by replacing this, the want_discon argument is no longer in use. care > to remove it as well? Yep, both patches follow; Al, could you take these? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases 2012-03-13 20:55 ` J. Bruce Fields @ 2012-03-13 20:58 ` J. Bruce Fields 2012-03-13 20:58 ` [PATCH 2/2] vfs: remove unused __d_splice_alias argument J. Bruce Fields 1 sibling, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2012-03-13 20:58 UTC (permalink / raw) To: Al Viro Cc: Peng Tao, NeilBrown, Al Viro, Nick Piggin, Nick Piggin, linux-nfs, linux-fsdevel, J. Bruce Fields From: "J. Bruce Fields" <bfields@redhat.com> A directory should never have more than one dentry pointing to it. But d_splice_alias() will add one if it finds a directory with an already-existing non-DISCONNECTED dentry. I can't find an obvious reproducer, but I also can't see what prevents d_splice_alias() from encountering such a case. It therefore seems safest to allow d_splice_alias to use any dentry it finds. (Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, this could cause an nfsd deadlock like this: - Somebody attempts to remove a non-empty directory. - The dentry_unhash() in vfs_rmdir() unhashes the dentry pointing to the non-empty directory. - ->rmdir() then fails with -ENOTEMPTY - Before the vfs_rmdir() caller reaches dput(), an nfsd process in rename looks up the directory by filehandle; at the end of that lookup, this dentry is found by d_alloc_anon(), and a reference is taken on it, preventing dput() from removing it. - A regular lookup of the directory calls d_splice_alias(), finds only an unhashed (not a DISCONNECTED) dentry, and insteads adds a new one, so the directory now has two dentries. - The nfsd process in rename, which was previously looking up the source directory of the rename, now looks up the target directory (which is the same), and gets the dentry newly created by the previous lookup. - The rename, seeing two different dentries, assumes this is a cross-directory rename and attempts to take the i_mutex on the directory twice. That reproducer no longer exists, but I don't think there was anything fundamentally incorrect about the vfs_rmdir() behavior there, so I think the real fault was here in d_splice_alias().) Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- fs/dcache.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 16a53cc..30f4236 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1587,9 +1587,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) if (inode && S_ISDIR(inode->i_mode)) { spin_lock(&inode->i_lock); - new = __d_find_alias(inode, 1); + new = __d_find_any_alias(inode); if (new) { - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); spin_unlock(&inode->i_lock); security_d_instantiate(new, inode); d_move(new, dentry); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 2/2] vfs: remove unused __d_splice_alias argument 2012-03-13 20:55 ` J. Bruce Fields 2012-03-13 20:58 ` [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases J. Bruce Fields @ 2012-03-13 20:58 ` J. Bruce Fields 1 sibling, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2012-03-13 20:58 UTC (permalink / raw) To: Al Viro Cc: Peng Tao, NeilBrown, Al Viro, Nick Piggin, Nick Piggin, linux-nfs, linux-fsdevel, J. Bruce Fields From: "J. Bruce Fields" <bfields@redhat.com> Nobody sets want_disconn any more. Reported-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- fs/dcache.c | 13 +++++-------- 1 files changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 30f4236..5eda891 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -606,8 +606,6 @@ EXPORT_SYMBOL(dget_parent); /** * d_find_alias - grab a hashed alias of inode * @inode: inode in question - * @want_discon: flag, used by d_splice_alias, to request - * that only a DISCONNECTED alias be returned. * * If inode has a hashed alias, or is a directory and has any alias, * acquire the reference to alias and return it. Otherwise return NULL. @@ -616,10 +614,9 @@ EXPORT_SYMBOL(dget_parent); * of a filesystem. * * If the inode has an IS_ROOT, DCACHE_DISCONNECTED alias, then prefer - * any other hashed alias over that one unless @want_discon is set, - * in which case only return an IS_ROOT, DCACHE_DISCONNECTED alias. + * any other hashed alias over that. */ -static struct dentry *__d_find_alias(struct inode *inode, int want_discon) +static struct dentry *__d_find_alias(struct inode *inode) { struct dentry *alias, *discon_alias; @@ -631,7 +628,7 @@ again: if (IS_ROOT(alias) && (alias->d_flags & DCACHE_DISCONNECTED)) { discon_alias = alias; - } else if (!want_discon) { + } else { __dget_dlock(alias); spin_unlock(&alias->d_lock); return alias; @@ -662,7 +659,7 @@ struct dentry *d_find_alias(struct inode *inode) if (!list_empty(&inode->i_dentry)) { spin_lock(&inode->i_lock); - de = __d_find_alias(inode, 0); + de = __d_find_alias(inode); spin_unlock(&inode->i_lock); } return de; @@ -2373,7 +2370,7 @@ struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode) struct dentry *alias; /* Does an aliased dentry already exist? */ - alias = __d_find_alias(inode, 0); + alias = __d_find_alias(inode); if (alias) { actual = alias; write_seqlock(&rename_lock); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120216223011.GA23997-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-02-17 16:34 ` Peng Tao @ 2012-02-20 2:55 ` NeilBrown [not found] ` <20120220135537.3078e20b-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 1 sibling, 1 reply; 42+ messages in thread From: NeilBrown @ 2012-02-20 2:55 UTC (permalink / raw) To: J. Bruce Fields Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 4740 bytes --] On Thu, 16 Feb 2012 17:30:11 -0500 "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote: > On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote: > > I was going ask how you managed to get an 'unhashed' dentry which was not > > DISCONNECTED, and belonged to a directory that could be the subject of > > d_splice_alias (that implies it has a name). > > > > The bug sounds like a race between lookup and rmdir, which should be > > prevented by i_mutex. > > > > I think that using __d_find_any_alias would just be papering over the > > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias. > > Looking through the latest upstream code, I can't come up with another > obvious reproducer. > > But I also can't see the code making any particular effort to ensure > that dentries are removed from inode's alias lists at the same time > they're unhashed. > > E.g., trace up through the callers of d_drop/__d_drop and try to > convince yourselves that they all end up removing the dentry from the > alias list. > > Can you see any reason why the following would actually create a > problem? No, I don't think that would cause problems, so it is probably a good clean up and as Peng says it means we can remove the want_discon arg as well. However I cannot help thinking that something else must be going wrong before we get to this point. When you rmdir a directory it must be empty and it will never be linked again. So how does 'lookup' find the inode and want to attach a dentry to it? So I still think this is just papering over some other problem. I think that looking at when aliases are removed is missing the point. A directory can only have one name so it can only have one dentry. If that dentry gets unhashed, that is because the directory was deleted. So now it must have zero names. So there is no way that lookup can possibly find it, so there is no way that d_splice_alias can be asked to attach an alias to it. NeilBrown > > --b. > > commit fcfef6b7319c5d19ea5064317528ff994343b011 > Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Date: Mon Feb 13 13:38:33 2012 -0500 > > exports: stop d_splice_alias creating directory aliases > > A directory should never have more than one dentry pointing to it. > > But d_splice_alias() does so in the case of a directory with an > already-existing non-DISCONNECTED dentry. > > Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, > this could cause an nfsd deadlock like this: > > - Somebody attempts to remove a non-empty directory. > - The dentry_unhash() in vfs_rmdir() unhashes the dentry > pointing to the non-empty directory. > - ->rmdir() then fails with -ENOTEMPTY > - Before the vfs_rmdir() caller reaches dput(), an nfsd process > in rename looks up the directory by filehandle; at the end of > that lookup, this dentry is found by d_alloc_anon(), and a > reference is taken on it, preventing dput() from removing it. > - A regular lookup of the directory calls d_splice_alias(), > finds only an unhashed (not a DISCONNECTED) dentry, and > insteads adds a new one, so the directory now has two > dentries. > - The nfsd process in rename, which was previously looking up > the source directory of the rename, now looks up the target > directory (which is the same), and gets the dentry newly > created by the previous lookup. > - The rename, seeing two different dentries, assumes this is a > cross-directory rename and attempts to take the i_mutex on the > directory twice. > > I don't see as obvious a reproducer now, but I also don't see the > existing code taking care to remove dentries from the alias list > whenever they're unhashed. > > It therefore seems safest to allow d_splice_alias to use any dentry it > finds. > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > diff --git a/fs/dcache.c b/fs/dcache.c > index f68e193..1fd2256 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) > > if (inode && S_ISDIR(inode->i_mode)) { > spin_lock(&inode->i_lock); > - new = __d_find_alias(inode, 1); > + new = __d_find_any_alias(inode); > if (new) { > - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); > spin_unlock(&inode->i_lock); > security_d_instantiate(new, inode); > d_move(new, dentry); [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20120220135537.3078e20b-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120220135537.3078e20b-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2012-02-29 23:10 ` J. Bruce Fields 0 siblings, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2012-02-29 23:10 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Mon, Feb 20, 2012 at 01:55:37PM +1100, NeilBrown wrote: > On Thu, 16 Feb 2012 17:30:11 -0500 "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> > wrote: > > > On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote: > > > I was going ask how you managed to get an 'unhashed' dentry which was not > > > DISCONNECTED, and belonged to a directory that could be the subject of > > > d_splice_alias (that implies it has a name). > > > > > > The bug sounds like a race between lookup and rmdir, which should be > > > prevented by i_mutex. > > > > > > I think that using __d_find_any_alias would just be papering over the > > > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias. > > > > Looking through the latest upstream code, I can't come up with another > > obvious reproducer. > > > > But I also can't see the code making any particular effort to ensure > > that dentries are removed from inode's alias lists at the same time > > they're unhashed. > > > > E.g., trace up through the callers of d_drop/__d_drop and try to > > convince yourselves that they all end up removing the dentry from the > > alias list. > > > > Can you see any reason why the following would actually create a > > problem? > > No, I don't think that would cause problems, so it is probably a good clean > up and as Peng says it means we can remove the want_discon arg as well. > > However I cannot help thinking that something else must be going wrong before > we get to this point. > When you rmdir a directory it must be empty and it will never be linked again. > So how does 'lookup' find the inode and want to attach a dentry to it? > > So I still think this is just papering over some other problem. > > I think that looking at when aliases are removed is missing the point. > > A directory can only have one name so it can only have one dentry. > If that dentry gets unhashed, that is because the directory was deleted. Wait, what other reasons could cause them to get unhashed?: Just grepping for callers of d_drop.... On a distributed filesystem, we may unhash an in-use dentry if it no longer seems to be valid. Can something like this happen? - nfsd holds a filehandle for directory a/b - a/b is renamed to c/d by some other host using this filesystem. - filesystem looks up a/b, finds it no longer there, unhashes it. - a lookup for c/d later adds a new dentry. And then we have two dentries? And of course we unhash dentries when they're not in use, just to free memory. I think that case is OK. --b. > So > now it must have zero names. So there is no way that lookup can possibly > find it, so there is no way that d_splice_alias can be asked to attach an > alias to it. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20110311150749.2fa2be66-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2012-02-14 17:03 ` J. Bruce Fields @ 2012-06-28 13:59 ` J. Bruce Fields [not found] ` <20120628135927.GA6406-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 1 sibling, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-06-28 13:59 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Coming back to this now, just trying to review the filehandle-lookup/dcache interactions: On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence > was only a hint, its absence was a strong statement. > If the flag is set, the dentry might not be linked to the root. > If it is clear, it definitely is link through to the root. > However I think it was used with stronger intent than that. > > Now it seems to mean a little bit more: If it is set and the dentry > is hashed, then it must be on the sb->s_anon list. The code that makes that assumption is __d_shrink (which does the work of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to lock. I can't find any basis for that assumption. The only code that clears DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as hashing. Am I missing something? > This is a significant > which I never noticed (I haven't been watching). Originally a > disconnected dentry would be attached (and hashed) to its parent. Then > that parent would get its own parent and so on until it was attached all > the way to the root. Only then would be start clearing > DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if > that is correct. It looks wrong to me: If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle lookup thinking the dentry is OK to use. That could mean for example trying to rename across directories that don't have any ancestor relationship to each other in the dcache yet. So we need to wait to clear DCACHE_DISCONNECTED until we *know* the dentry's parents go all the way back to the root. As you say, that's what the current code does. But that means DCACHE_DISCONNECTED dentries can be hashed to their parents, and __d_shrink can be handed such dentries and then get the locking wrong. It looks like this bug might originate with Nick Piggin's ceb5bdc2d246 "fs: dcache per-bucket dcache hash locking"? There's no discussion in the changelog, so probably it was just based on an unexamined assumption about DCACHE_DISCONNECTED. I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test in __d_shrink(), or if we need another flag, or ? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20120628135927.GA6406-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120628135927.GA6406-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2012-06-29 20:10 ` J. Bruce Fields [not found] ` <20120629201034.GA17103-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-06-29 20:10 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Thu, Jun 28, 2012 at 09:59:27AM -0400, J. Bruce Fields wrote: > Coming back to this now, just trying to review the > filehandle-lookup/dcache interactions: > > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > > 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence > > was only a hint, its absence was a strong statement. > > If the flag is set, the dentry might not be linked to the root. > > If it is clear, it definitely is link through to the root. > > However I think it was used with stronger intent than that. > > > > Now it seems to mean a little bit more: If it is set and the dentry > > is hashed, then it must be on the sb->s_anon list. > > The code that makes that assumption is __d_shrink (which does the work > of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to > lock. > > I can't find any basis for that assumption. The only code that clears > DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as > hashing. Am I missing something? > > > This is a significant > > which I never noticed (I haven't been watching). Originally a > > disconnected dentry would be attached (and hashed) to its parent. Then > > that parent would get its own parent and so on until it was attached all > > the way to the root. Only then would be start clearing > > DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if > > that is correct. > > It looks wrong to me: > > If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle > lookup thinking the dentry is OK to use. That could mean for example > trying to rename across directories that don't have any ancestor > relationship to each other in the dcache yet. > > So we need to wait to clear DCACHE_DISCONNECTED until we *know* the > dentry's parents go all the way back to the root. As you say, that's > what the current code does. > > But that means DCACHE_DISCONNECTED dentries can be hashed to their > parents, and __d_shrink can be handed such dentries and then get the > locking wrong. > > It looks like this bug might originate with Nick Piggin's ceb5bdc2d246 > "fs: dcache per-bucket dcache hash locking"? There's no discussion in > the changelog, so probably it was just based on an unexamined assumption > about DCACHE_DISCONNECTED. > > I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test > in __d_shrink(), or if we need another flag, or ? Bah, sorry, and I only just noticed that you already said as much later and did the IS_ROOT() thing in your patch. Anyway, here's just that one change with a slightly more painstaking changelog. --b. commit b1fa644355122627424fe2240a9fc60cbef4c349 Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Date: Thu Jun 28 12:10:55 2012 -0400 dcache: use IS_ROOT to decide where dentry is hashed Every hashed dentry is either hashed in the dentry_hashtable, or a superblock's s_anon list. __d_shrink assumes it can determine which is the case by checking DCACHE_DISCONNECTED; this is not true. It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not only hashed on dentry_hashtable, but is fully connected to its parents back to the root. But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path() attempts to connect a directory (found by filehandle lookup) back to root by ascending to parents and performing lookups one at a time. It does not clear DCACHE_DISCONNECTED until its done, and that is not at all an atomic process. In particular, it is possible for DCACHE_DISCONNECTED to be set on a dentry which is hashed on the dentry_hashtable. Instead, use IS_ROOT() to check which hash chain a dentry is on. This *does* work: Dentries are hashed only by: - d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon. - __d_rehash, called by _d_rehash: hashes to the dentry's parent, and all callers of _d_rehash appear to have d_parent set to a "real" parent. - __d_rehash, called by __d_move: rehashes the moved dentry to hash chain determined by target, and assigns target's d_parent to its d_parent, before dropping the dentry's d_lock. Therefore I believe it's safe for a holder of a dentry's d_lock to assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is true. I believe the incorrect assumption about DCACHE_DISCONNECTED was originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash locking". Cc: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> Cc: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> diff --git a/fs/dcache.c b/fs/dcache.c index 87c2da7..b2b382c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -410,7 +410,7 @@ static void __d_shrink(struct dentry *dentry) { if (!d_unhashed(dentry)) { struct hlist_bl_head *b; - if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED)) + if (unlikely(IS_ROOT(dentry->d_flags))) b = &dentry->d_sb->s_anon; else b = d_hash(dentry->d_parent, dentry->d_name.hash); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <20120629201034.GA17103-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries [not found] ` <20120629201034.GA17103-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> @ 2012-06-29 20:29 ` J. Bruce Fields 2012-07-01 23:15 ` NeilBrown 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-06-29 20:29 UTC (permalink / raw) To: NeilBrown Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Fri, Jun 29, 2012 at 04:10:34PM -0400, J. Bruce Fields wrote: > On Thu, Jun 28, 2012 at 09:59:27AM -0400, J. Bruce Fields wrote: > > Coming back to this now, just trying to review the > > filehandle-lookup/dcache interactions: > > > > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > > > 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence > > > was only a hint, its absence was a strong statement. > > > If the flag is set, the dentry might not be linked to the root. > > > If it is clear, it definitely is link through to the root. > > > However I think it was used with stronger intent than that. > > > > > > Now it seems to mean a little bit more: If it is set and the dentry > > > is hashed, then it must be on the sb->s_anon list. > > > > The code that makes that assumption is __d_shrink (which does the work > > of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to > > lock. > > > > I can't find any basis for that assumption. The only code that clears > > DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as > > hashing. Am I missing something? > > > > > This is a significant > > > which I never noticed (I haven't been watching). Originally a > > > disconnected dentry would be attached (and hashed) to its parent. Then > > > that parent would get its own parent and so on until it was attached all > > > the way to the root. Only then would be start clearing > > > DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if > > > that is correct. > > > > It looks wrong to me: > > > > If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle > > lookup thinking the dentry is OK to use. That could mean for example > > trying to rename across directories that don't have any ancestor > > relationship to each other in the dcache yet. > > > > So we need to wait to clear DCACHE_DISCONNECTED until we *know* the > > dentry's parents go all the way back to the root. As you say, that's > > what the current code does. > > > > But that means DCACHE_DISCONNECTED dentries can be hashed to their > > parents, and __d_shrink can be handed such dentries and then get the > > locking wrong. > > > > It looks like this bug might originate with Nick Piggin's ceb5bdc2d246 > > "fs: dcache per-bucket dcache hash locking"? There's no discussion in > > the changelog, so probably it was just based on an unexamined assumption > > about DCACHE_DISCONNECTED. > > > > I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test > > in __d_shrink(), or if we need another flag, or ? > > Bah, sorry, and I only just noticed that you already said as much later > and did the IS_ROOT() thing in your patch. > > Anyway, here's just that one change with a slightly more painstaking > changelog. > > --b. > > commit b1fa644355122627424fe2240a9fc60cbef4c349 > Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Date: Thu Jun 28 12:10:55 2012 -0400 > > dcache: use IS_ROOT to decide where dentry is hashed > > Every hashed dentry is either hashed in the dentry_hashtable, or a > superblock's s_anon list. > > __d_shrink assumes it can determine which is the case by checking > DCACHE_DISCONNECTED; this is not true. > > It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not > only hashed on dentry_hashtable, but is fully connected to its parents > back to the root. > > But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path() > attempts to connect a directory (found by filehandle lookup) back to > root by ascending to parents and performing lookups one at a time. It > does not clear DCACHE_DISCONNECTED until its done, and that is not at > all an atomic process. > > In particular, it is possible for DCACHE_DISCONNECTED to be set on a > dentry which is hashed on the dentry_hashtable. > > Instead, use IS_ROOT() to check which hash chain a dentry is on. This > *does* work: > > Dentries are hashed only by: > > - d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon. > > - __d_rehash, called by _d_rehash: hashes to the dentry's > parent, and all callers of _d_rehash appear to have d_parent > set to a "real" parent. > - __d_rehash, called by __d_move: rehashes the moved dentry to > hash chain determined by target, and assigns target's d_parent > to its d_parent, before dropping the dentry's d_lock. > > Therefore I believe it's safe for a holder of a dentry's d_lock to > assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is > true. > > I believe the incorrect assumption about DCACHE_DISCONNECTED was > originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash > locking". > > Cc: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org> > Cc: Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > diff --git a/fs/dcache.c b/fs/dcache.c > index 87c2da7..b2b382c 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -410,7 +410,7 @@ static void __d_shrink(struct dentry *dentry) > { > if (!d_unhashed(dentry)) { > struct hlist_bl_head *b; > - if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED)) > + if (unlikely(IS_ROOT(dentry->d_flags))) Um, right--I'll send an actual tested version along with some other stuff later. --b. > b = &dentry->d_sb->s_anon; > else > b = d_hash(dentry->d_parent, dentry->d_name.hash); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries 2012-06-29 20:29 ` J. Bruce Fields @ 2012-07-01 23:15 ` NeilBrown 0 siblings, 0 replies; 42+ messages in thread From: NeilBrown @ 2012-07-01 23:15 UTC (permalink / raw) To: J. Bruce Fields Cc: Al Viro, Nick Piggin, Nick Piggin, linux-nfs, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 6463 bytes --] On Fri, 29 Jun 2012 16:29:03 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > On Fri, Jun 29, 2012 at 04:10:34PM -0400, J. Bruce Fields wrote: > > On Thu, Jun 28, 2012 at 09:59:27AM -0400, J. Bruce Fields wrote: > > > Coming back to this now, just trying to review the > > > filehandle-lookup/dcache interactions: > > > > > > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote: > > > > 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence > > > > was only a hint, its absence was a strong statement. > > > > If the flag is set, the dentry might not be linked to the root. > > > > If it is clear, it definitely is link through to the root. > > > > However I think it was used with stronger intent than that. > > > > > > > > Now it seems to mean a little bit more: If it is set and the dentry > > > > is hashed, then it must be on the sb->s_anon list. > > > > > > The code that makes that assumption is __d_shrink (which does the work > > > of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to > > > lock. > > > > > > I can't find any basis for that assumption. The only code that clears > > > DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as > > > hashing. Am I missing something? > > > > > > > This is a significant > > > > which I never noticed (I haven't been watching). Originally a > > > > disconnected dentry would be attached (and hashed) to its parent. Then > > > > that parent would get its own parent and so on until it was attached all > > > > the way to the root. Only then would be start clearing > > > > DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if > > > > that is correct. > > > > > > It looks wrong to me: > > > > > > If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle > > > lookup thinking the dentry is OK to use. That could mean for example > > > trying to rename across directories that don't have any ancestor > > > relationship to each other in the dcache yet. > > > > > > So we need to wait to clear DCACHE_DISCONNECTED until we *know* the > > > dentry's parents go all the way back to the root. As you say, that's > > > what the current code does. > > > > > > But that means DCACHE_DISCONNECTED dentries can be hashed to their > > > parents, and __d_shrink can be handed such dentries and then get the > > > locking wrong. > > > > > > It looks like this bug might originate with Nick Piggin's ceb5bdc2d246 > > > "fs: dcache per-bucket dcache hash locking"? There's no discussion in > > > the changelog, so probably it was just based on an unexamined assumption > > > about DCACHE_DISCONNECTED. > > > > > > I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test > > > in __d_shrink(), or if we need another flag, or ? > > > > Bah, sorry, and I only just noticed that you already said as much later > > and did the IS_ROOT() thing in your patch. > > > > Anyway, here's just that one change with a slightly more painstaking > > changelog. > > > > --b. > > > > commit b1fa644355122627424fe2240a9fc60cbef4c349 > > Author: J. Bruce Fields <bfields@redhat.com> > > Date: Thu Jun 28 12:10:55 2012 -0400 > > > > dcache: use IS_ROOT to decide where dentry is hashed > > > > Every hashed dentry is either hashed in the dentry_hashtable, or a > > superblock's s_anon list. > > > > __d_shrink assumes it can determine which is the case by checking > > DCACHE_DISCONNECTED; this is not true. > > > > It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not > > only hashed on dentry_hashtable, but is fully connected to its parents > > back to the root. > > > > But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path() > > attempts to connect a directory (found by filehandle lookup) back to > > root by ascending to parents and performing lookups one at a time. It > > does not clear DCACHE_DISCONNECTED until its done, and that is not at > > all an atomic process. > > > > In particular, it is possible for DCACHE_DISCONNECTED to be set on a > > dentry which is hashed on the dentry_hashtable. > > > > Instead, use IS_ROOT() to check which hash chain a dentry is on. This > > *does* work: > > > > Dentries are hashed only by: > > > > - d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon. > > > > - __d_rehash, called by _d_rehash: hashes to the dentry's > > parent, and all callers of _d_rehash appear to have d_parent > > set to a "real" parent. > > - __d_rehash, called by __d_move: rehashes the moved dentry to > > hash chain determined by target, and assigns target's d_parent > > to its d_parent, before dropping the dentry's d_lock. > > > > Therefore I believe it's safe for a holder of a dentry's d_lock to > > assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is > > true. > > > > I believe the incorrect assumption about DCACHE_DISCONNECTED was > > originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash > > locking". > > > > Cc: Neil Brown <neilb@suse.de> > > Cc: Nick Piggin <npiggin@kernel.dk> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > > > > diff --git a/fs/dcache.c b/fs/dcache.c > > index 87c2da7..b2b382c 100644 > > --- a/fs/dcache.c > > +++ b/fs/dcache.c > > @@ -410,7 +410,7 @@ static void __d_shrink(struct dentry *dentry) > > { > > if (!d_unhashed(dentry)) { > > struct hlist_bl_head *b; > > - if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED)) > > + if (unlikely(IS_ROOT(dentry->d_flags))) > > Um, right--I'll send an actual tested version along with some other > stuff later. :-) If that tested version looks like: if (unlikely(IS_ROOT(dentry))) you can add a Reviewed-by: NeilBrown <neilb@suse.de> Thanks, NeilBrown > > --b. > > > b = &dentry->d_sb->s_anon; > > else > > b = d_hash(dentry->d_parent, dentry->d_name.hash); > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* DCACHE_DISCONNECTED patches @ 2012-05-09 21:18 J. Bruce Fields [not found] ` <1336598286-28636-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-05-09 21:18 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA This is a resend of a couple DCACHE_DISCONNECTED patches that I think could go into 3.5. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <1336598286-28636-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases [not found] ` <1336598286-28636-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2012-05-09 21:18 ` J. Bruce Fields 0 siblings, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2012-05-09 21:18 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, J. Bruce Fields From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> A directory should never have more than one dentry pointing to it. But d_splice_alias() will add one if it finds a directory with an already-existing non-DISCONNECTED dentry. I can't find an obvious reproducer, but I also can't see what prevents d_splice_alias() from encountering such a case. It therefore seems safest to allow d_splice_alias to use any dentry it finds. (Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, this could cause an nfsd deadlock like this: - Somebody attempts to remove a non-empty directory. - The dentry_unhash() in vfs_rmdir() unhashes the dentry pointing to the non-empty directory. - ->rmdir() then fails with -ENOTEMPTY - Before the vfs_rmdir() caller reaches dput(), an nfsd process in rename looks up the directory by filehandle; at the end of that lookup, this dentry is found by d_alloc_anon(), and a reference is taken on it, preventing dput() from removing it. - A regular lookup of the directory calls d_splice_alias(), finds only an unhashed (not a DISCONNECTED) dentry, and insteads adds a new one, so the directory now has two dentries. - The nfsd process in rename, which was previously looking up the source directory of the rename, now looks up the target directory (which is the same), and gets the dentry newly created by the previous lookup. - The rename, seeing two different dentries, assumes this is a cross-directory rename and attempts to take the i_mutex on the directory twice. That reproducer no longer exists, but I don't think there was anything fundamentally incorrect about the vfs_rmdir() behavior there, so I think the real fault was here in d_splice_alias().) Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- fs/dcache.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index b60ddc4..2434c1e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1606,9 +1606,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) if (inode && S_ISDIR(inode->i_mode)) { spin_lock(&inode->i_lock); - new = __d_find_alias(inode, 1); + new = __d_find_any_alias(inode); if (new) { - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); spin_unlock(&inode->i_lock); security_d_instantiate(new, inode); d_move(new, dentry); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases @ 2012-05-23 22:05 J. Bruce Fields [not found] ` <1337810746-16240-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2012-05-23 22:05 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-nfs, J. Bruce Fields From: "J. Bruce Fields" <bfields@redhat.com> A directory should never have more than one dentry pointing to it. But d_splice_alias() will add one if it finds a directory with an already-existing non-DISCONNECTED dentry. I can't find an obvious reproducer, but I also can't see what prevents d_splice_alias() from encountering such a case. It therefore seems safest to allow d_splice_alias to use any dentry it finds. (Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, this could cause an nfsd deadlock like this: - Somebody attempts to remove a non-empty directory. - The dentry_unhash() in vfs_rmdir() unhashes the dentry pointing to the non-empty directory. - ->rmdir() then fails with -ENOTEMPTY - Before the vfs_rmdir() caller reaches dput(), an nfsd process in rename looks up the directory by filehandle; at the end of that lookup, this dentry is found by d_alloc_anon(), and a reference is taken on it, preventing dput() from removing it. - A regular lookup of the directory calls d_splice_alias(), finds only an unhashed (not a DISCONNECTED) dentry, and insteads adds a new one, so the directory now has two dentries. - The nfsd process in rename, which was previously looking up the source directory of the rename, now looks up the target directory (which is the same), and gets the dentry newly created by the previous lookup. - The rename, seeing two different dentries, assumes this is a cross-directory rename and attempts to take the i_mutex on the directory twice. That reproducer no longer exists, but I don't think there was anything fundamentally incorrect about the vfs_rmdir() behavior there, so I think the real fault was here in d_splice_alias().) Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- fs/dcache.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index b60ddc4..2434c1e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1606,9 +1606,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) if (inode && S_ISDIR(inode->i_mode)) { spin_lock(&inode->i_lock); - new = __d_find_alias(inode, 1); + new = __d_find_any_alias(inode); if (new) { - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); spin_unlock(&inode->i_lock); security_d_instantiate(new, inode); d_move(new, dentry); -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 42+ messages in thread
[parent not found: <1337810746-16240-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases [not found] ` <1337810746-16240-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2012-05-23 22:07 ` J. Bruce Fields 0 siblings, 0 replies; 42+ messages in thread From: J. Bruce Fields @ 2012-05-23 22:07 UTC (permalink / raw) To: J. Bruce Fields Cc: Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA These two patches are a resend. I think they're appropriate for 3.5. --b. On Wed, May 23, 2012 at 06:05:45PM -0400, J. Bruce Fields wrote: > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > A directory should never have more than one dentry pointing to it. > > But d_splice_alias() will add one if it finds a directory with an > already-existing non-DISCONNECTED dentry. > > I can't find an obvious reproducer, but I also can't see what prevents > d_splice_alias() from encountering such a case. > > It therefore seems safest to allow d_splice_alias to use any dentry it > finds. > > (Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, > this could cause an nfsd deadlock like this: > > - Somebody attempts to remove a non-empty directory. > - The dentry_unhash() in vfs_rmdir() unhashes the dentry > pointing to the non-empty directory. > - ->rmdir() then fails with -ENOTEMPTY > - Before the vfs_rmdir() caller reaches dput(), an nfsd process > in rename looks up the directory by filehandle; at the end of > that lookup, this dentry is found by d_alloc_anon(), and a > reference is taken on it, preventing dput() from removing it. > - A regular lookup of the directory calls d_splice_alias(), > finds only an unhashed (not a DISCONNECTED) dentry, and > insteads adds a new one, so the directory now has two > dentries. > - The nfsd process in rename, which was previously looking up > the source directory of the rename, now looks up the target > directory (which is the same), and gets the dentry newly > created by the previous lookup. > - The rename, seeing two different dentries, assumes this is a > cross-directory rename and attempts to take the i_mutex on the > directory twice. > > That reproducer no longer exists, but I don't think there was anything > fundamentally incorrect about the vfs_rmdir() behavior there, so I think > the real fault was here in d_splice_alias().) > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > fs/dcache.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/dcache.c b/fs/dcache.c > index b60ddc4..2434c1e 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1606,9 +1606,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry) > > if (inode && S_ISDIR(inode->i_mode)) { > spin_lock(&inode->i_lock); > - new = __d_find_alias(inode, 1); > + new = __d_find_any_alias(inode); > if (new) { > - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); > spin_unlock(&inode->i_lock); > security_d_instantiate(new, inode); > d_move(new, dentry); > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2012-07-01 23:15 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-12 18:43 lifetime of DCACHE_DISCONECTED dentries J. Bruce Fields 2010-11-13 11:53 ` Nick Piggin 2010-11-15 17:48 ` J. Bruce Fields [not found] ` <20101115174837.GB10044-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2010-11-16 6:45 ` Nick Piggin 2010-11-29 3:56 ` Nick Piggin 2010-11-29 19:32 ` J. Bruce Fields [not found] ` <20101129193248.GA9897-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2010-11-30 1:00 ` Nick Piggin [not found] ` <AANLkTikwzDJ_q65==uxDsAhp3h8bU7Rkt7U9gVgRAK0D-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-11-30 18:39 ` J. Bruce Fields 2010-12-03 22:33 ` [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries J. Bruce Fields 2010-12-13 5:19 ` Nick Piggin 2010-12-14 22:01 ` J. Bruce Fields [not found] ` <20101214220102.GM24828-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2010-12-17 17:53 ` [PATCH] fs/dcache: use standard list macro for d_find_alias J. Bruce Fields 2010-12-17 18:00 ` [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries J. Bruce Fields 2010-12-18 2:01 ` Nick Piggin 2010-12-18 16:16 ` J. Bruce Fields [not found] ` <20101218161609.GA22150-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2010-12-19 14:53 ` Nick Piggin [not found] ` <AANLkTingRv_gtRSctGzMfYrKg02M_sKj97HSQPRm_mA_-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-12-27 23:46 ` [PATCH] " J. Bruce Fields [not found] ` <20101227234641.GA22248-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2011-01-18 20:45 ` J. Bruce Fields [not found] ` <20110118204509.GA10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2011-01-18 22:02 ` Nick Piggin [not found] ` <AANLkTikL2CDSWQJ1QH_Y4G-j70Vd=VesNMMnYTmMGHC9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-01-18 22:08 ` J. Bruce Fields [not found] ` <20110118220817.GF10903-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2011-03-08 18:13 ` J. Bruce Fields [not found] ` <20110308181320.GA15566-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2011-03-10 10:58 ` Al Viro [not found] ` <20110310105821.GE22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2011-03-11 4:07 ` NeilBrown [not found] ` <20110311150749.2fa2be66-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2012-02-14 17:03 ` J. Bruce Fields [not found] ` <20120214170300.GA4309-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-02-15 16:56 ` J. Bruce Fields [not found] ` <20120215165633.GE12490-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-02-16 3:06 ` NeilBrown [not found] ` <20120216140603.08cb4900-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2012-02-16 11:51 ` J. Bruce Fields [not found] ` <20120216115133.GA20279-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-02-16 16:08 ` J. Bruce Fields 2012-02-16 22:30 ` J. Bruce Fields [not found] ` <20120216223011.GA23997-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-02-17 16:34 ` Peng Tao 2012-03-13 20:55 ` J. Bruce Fields 2012-03-13 20:58 ` [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases J. Bruce Fields 2012-03-13 20:58 ` [PATCH 2/2] vfs: remove unused __d_splice_alias argument J. Bruce Fields 2012-02-20 2:55 ` [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries NeilBrown [not found] ` <20120220135537.3078e20b-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2012-02-29 23:10 ` J. Bruce Fields 2012-06-28 13:59 ` J. Bruce Fields [not found] ` <20120628135927.GA6406-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-06-29 20:10 ` J. Bruce Fields [not found] ` <20120629201034.GA17103-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2012-06-29 20:29 ` J. Bruce Fields 2012-07-01 23:15 ` NeilBrown -- strict thread matches above, loose matches on Subject: below -- 2012-05-09 21:18 DCACHE_DISCONNECTED patches J. Bruce Fields [not found] ` <1336598286-28636-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-05-09 21:18 ` [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases J. Bruce Fields 2012-05-23 22:05 J. Bruce Fields [not found] ` <1337810746-16240-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2012-05-23 22:07 ` J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).