* Regression with initramfs and nfsroot (appears to be in the dcache)
@ 2012-11-29 19:16 Patrick McLean
2012-11-29 21:33 ` Al Viro
0 siblings, 1 reply; 17+ messages in thread
From: Patrick McLean @ 2012-11-29 19:16 UTC (permalink / raw)
To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel
With 3.6-rc1 and up, when using a (dracut) initramfs with a read-only
nfs root, all accesses to /proc. /sys and /dev return EBUSY.
Bisecting finds this commit as where this was introduced:
> ee3efa91e240f513898050ef305a49a653c8ed90 is the first bad commit
> commit ee3efa91e240f513898050ef305a49a653c8ed90
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date: Fri Jun 8 15:59:33 2012 -0400
>
> __d_unalias() should refuse to move mountpoints
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> :040000 040000 1d6ecde959d3f8252b33f4adff3c4bf1e67f2b92 992ec34563b90fb349957418f76d4673c1af4ab6 M fs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-29 19:16 Regression with initramfs and nfsroot (appears to be in the dcache) Patrick McLean @ 2012-11-29 21:33 ` Al Viro 2012-11-29 22:06 ` Patrick McLean 0 siblings, 1 reply; 17+ messages in thread From: Al Viro @ 2012-11-29 21:33 UTC (permalink / raw) To: Patrick McLean; +Cc: linux-fsdevel, linux-kernel On Thu, Nov 29, 2012 at 11:16:59AM -0800, Patrick McLean wrote: > With 3.6-rc1 and up, when using a (dracut) initramfs with a read-only > nfs root, all accesses to /proc. /sys and /dev return EBUSY. See "[PATCH] Revert "__d_unalias() should refuse to move mountpoints" thread. If you have a convenient reproducer, could you check if the fixes the breakage? If so, we'll need to look into false negatives from nfs_same_file() in there... diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index ce8cb92..55436f5 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -450,7 +450,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) nfs_refresh_inode(dentry->d_inode, entry->fattr); goto out; } else { - d_drop(dentry); + if (d_invalidate(dentry) != 0) { + WARN_ON(1); + goto out; + } dput(dentry); } } ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-29 21:33 ` Al Viro @ 2012-11-29 22:06 ` Patrick McLean 2012-11-29 22:21 ` Al Viro 0 siblings, 1 reply; 17+ messages in thread From: Patrick McLean @ 2012-11-29 22:06 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel On Thu, Nov 29, 2012 at 1:33 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Nov 29, 2012 at 11:16:59AM -0800, Patrick McLean wrote: >> With 3.6-rc1 and up, when using a (dracut) initramfs with a read-only >> nfs root, all accesses to /proc. /sys and /dev return EBUSY. > > See "[PATCH] Revert "__d_unalias() should refuse to move mountpoints" > thread. If you have a convenient reproducer, could you check if > the fixes the breakage? If so, we'll need to look into false negatives > from nfs_same_file() in there... > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index ce8cb92..55436f5 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -450,7 +450,10 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) > nfs_refresh_inode(dentry->d_inode, entry->fattr); > goto out; > } else { > - d_drop(dentry); > + if (d_invalidate(dentry) != 0) { > + WARN_ON(1); > + goto out; > + } > dput(dentry); > } > } I have a trivial reproducer and am happy to help debug in any way that I can. That patch seems to fix the problem, and produces these warnings in dmesg: [ 3.306483] dracut: Switching root [ 4.324378] systemd-udevd[552]: starting version 195 [ 9.254972] ------------[ cut here ]------------ [ 9.254981] WARNING: at fs/nfs/dir.c:454 nfs_readdir_page_filler+0x1cc/0x3a2() [ 9.254983] Hardware name: Bochs [ 9.254984] Modules linked in: [ 9.254989] Pid: 676, comm: ls Not tainted 3.7.0-rc7+ #35 [ 9.254990] Call Trace: [ 9.254999] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.255002] [<ffffffff8117de91>] ? nfs_readdir_page_filler+0x1cc/0x3a2 [ 9.255005] [<ffffffff8117e683>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.255009] [<ffffffff8117e70c>] ? nfs_readdir_filler+0x1c/0x6b [ 9.255014] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.255017] [<ffffffff8117e6f0>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.255020] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.255025] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255028] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.255031] [<ffffffff8117e888>] ? nfs_readdir+0x12d/0x435 [ 9.255036] [<ffffffff8118e653>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.255039] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255042] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255045] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.255049] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.255053] [<ffffffff814ac769>] ? system_call_fastpath+0x16/0x1b [ 9.255055] ---[ end trace 5e8b5f37fe752ab1 ]--- [ 9.255062] ------------[ cut here ]------------ [ 9.255065] WARNING: at fs/nfs/dir.c:454 nfs_readdir_page_filler+0x1cc/0x3a2() [ 9.255066] Hardware name: Bochs [ 9.255067] Modules linked in: [ 9.255070] Pid: 676, comm: ls Tainted: G W 3.7.0-rc7+ #35 [ 9.255071] Call Trace: [ 9.255075] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.255077] [<ffffffff8117de91>] ? nfs_readdir_page_filler+0x1cc/0x3a2 [ 9.255080] [<ffffffff8117e683>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.255083] [<ffffffff8117e70c>] ? nfs_readdir_filler+0x1c/0x6b [ 9.255087] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.255089] [<ffffffff8117e6f0>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.255093] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.255096] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255099] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.255102] [<ffffffff8117e888>] ? nfs_readdir+0x12d/0x435 [ 9.255105] [<ffffffff8118e653>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.255109] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255112] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255115] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.255118] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.255121] [<ffffffff814ac769>] ? system_call_fastpath+0x16/0x1b [ 9.255122] ---[ end trace 5e8b5f37fe752ab2 ]--- [ 9.255133] ------------[ cut here ]------------ [ 9.255135] WARNING: at fs/nfs/dir.c:454 nfs_readdir_page_filler+0x1cc/0x3a2() [ 9.255136] Hardware name: Bochs [ 9.255137] Modules linked in: [ 9.255140] Pid: 676, comm: ls Tainted: G W 3.7.0-rc7+ #35 [ 9.255141] Call Trace: [ 9.255144] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.255147] [<ffffffff8117de91>] ? nfs_readdir_page_filler+0x1cc/0x3a2 [ 9.255150] [<ffffffff8117e683>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.255153] [<ffffffff8117e70c>] ? nfs_readdir_filler+0x1c/0x6b [ 9.255157] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.255159] [<ffffffff8117e6f0>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.255162] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.255166] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255169] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.255171] [<ffffffff8117e888>] ? nfs_readdir+0x12d/0x435 [ 9.255175] [<ffffffff8118e653>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.255178] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255181] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.255184] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.255188] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.255190] [<ffffffff814ac769>] ? system_call_fastpath+0x16/0x1b [ 9.255192] ---[ end trace 5e8b5f37fe752ab3 ]--- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-29 22:06 ` Patrick McLean @ 2012-11-29 22:21 ` Al Viro 2012-11-29 22:53 ` Patrick McLean 0 siblings, 1 reply; 17+ messages in thread From: Al Viro @ 2012-11-29 22:21 UTC (permalink / raw) To: Patrick McLean; +Cc: linux-fsdevel, linux-kernel On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote: > I have a trivial reproducer and am happy to help debug in any way that > I can. That patch seems to fix the problem, and produces these > warnings in dmesg: > > [ 3.306483] dracut: Switching root > [ 4.324378] systemd-udevd[552]: starting version 195 > [ 9.254972] ------------[ cut here ]------------ > [ 9.254981] WARNING: at fs/nfs/dir.c:454 > nfs_readdir_page_filler+0x1cc/0x3a2() > [ 9.254983] Hardware name: Bochs > [ 9.254984] Modules linked in: > [ 9.254989] Pid: 676, comm: ls Not tainted 3.7.0-rc7+ #35 > [ 9.254990] Call Trace: > [ 9.254999] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a > [ 9.255002] [<ffffffff8117de91>] ? nfs_readdir_page_filler+0x1cc/0x3a2 > [ 9.255005] [<ffffffff8117e683>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d > [ 9.255009] [<ffffffff8117e70c>] ? nfs_readdir_filler+0x1c/0x6b > [ 9.255014] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 > [ 9.255017] [<ffffffff8117e6f0>] ? nfs_readdir_xdr_to_array+0x22d/0x22d > [ 9.255020] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b > [ 9.255025] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a > [ 9.255028] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 > [ 9.255031] [<ffffffff8117e888>] ? nfs_readdir+0x12d/0x435 > [ 9.255036] [<ffffffff8118e653>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 > [ 9.255039] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a > [ 9.255042] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a > [ 9.255045] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 > [ 9.255049] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc > [ 9.255053] [<ffffffff814ac769>] ? system_call_fastpath+0x16/0x1b > [ 9.255055] ---[ end trace 5e8b5f37fe752ab1 ]--- OK... So we have differing entry->fh and NFS_FH(dentry->d_inode). Something like static void dump_fh(const struct nfs_fh *fh) { int i; printk(KERN_INFO "FH(%d)", fh->size); for (i = 0; i < fh->size; i++) printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]); printk(KERN_CONT "]\n"); } with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to that WARN_ON(1) would probably be interesting. And probably would make sense to print filename->name as well, to see which files it is about. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-29 22:21 ` Al Viro @ 2012-11-29 22:53 ` Patrick McLean 2012-11-29 23:43 ` Al Viro 0 siblings, 1 reply; 17+ messages in thread From: Patrick McLean @ 2012-11-29 22:53 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel On 29/11/12 02:21 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote: > >> I have a trivial reproducer and am happy to help debug in any way that >> I can. That patch seems to fix the problem, and produces these >> warnings in dmesg: > > OK... So we have differing entry->fh and NFS_FH(dentry->d_inode). Something > like > static void dump_fh(const struct nfs_fh *fh) > { > int i; > printk(KERN_INFO "FH(%d)", fh->size); > for (i = 0; i < fh->size; i++) > printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]); > printk(KERN_CONT "]\n"); > } > with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to > that WARN_ON(1) would probably be interesting. And probably would make > sense to print filename->name as well, to see which files it is about. > Here is the output of the first of the 3 times it hits the WARN_ON (I can include all 3 if desired), with the filename.name at the end: [ 8.821503] ------------[ cut here ]------------ [ 8.821512] WARNING: at fs/nfs/dir.c:463 nfs_readdir_page_filler+0x1d0/0x3d2() [ 8.821513] Hardware name: Bochs [ 8.821515] Modules linked in: [ 8.821519] Pid: 630, comm: bash Not tainted 3.7.0-rc7+ #36 [ 8.821520] Call Trace: [ 8.821528] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 8.821531] [<ffffffff8117de95>] ? nfs_readdir_page_filler+0x1d0/0x3d2 [ 8.821535] [<ffffffff8117e6b3>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 8.821538] [<ffffffff8117e73c>] ? nfs_readdir_filler+0x1c/0x6b [ 8.821543] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 8.821546] [<ffffffff8117e720>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 8.821549] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 8.821554] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 8.821557] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 8.821560] [<ffffffff8117e8b8>] ? nfs_readdir+0x12d/0x435 [ 8.821564] [<ffffffff8118e683>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 8.821568] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 8.821571] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 8.821574] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 8.821577] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 8.821581] [<ffffffff814ac7e9>] ? system_call_fastpath+0x16/0x1b [ 8.821583] ---[ end trace 89263124889205c1 ]--- [ 8.821584] FH(0)] [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] [ 8.821601] filename: proc ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-29 22:53 ` Patrick McLean @ 2012-11-29 23:43 ` Al Viro 2012-11-30 0:19 ` Patrick McLean 0 siblings, 1 reply; 17+ messages in thread From: Al Viro @ 2012-11-29 23:43 UTC (permalink / raw) To: Patrick McLean; +Cc: linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 02:53:13PM -0800, Patrick McLean wrote: > On 29/11/12 02:21 PM, Al Viro wrote: > > On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote: > > > >> I have a trivial reproducer and am happy to help debug in any way that > >> I can. That patch seems to fix the problem, and produces these > >> warnings in dmesg: > > > > OK... So we have differing entry->fh and NFS_FH(dentry->d_inode). Something > > like > > static void dump_fh(const struct nfs_fh *fh) > > { > > int i; > > printk(KERN_INFO "FH(%d)", fh->size); > > for (i = 0; i < fh->size; i++) > > printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]); > > printk(KERN_CONT "]\n"); > > } > > with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to > > that WARN_ON(1) would probably be interesting. And probably would make > > sense to print filename->name as well, to see which files it is about. > [ 8.821584] FH(0)] > [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] > [ 8.821601] filename: proc *whoa* So we have zero entry->fh->size? No wonder it doesn't match... Which NFS version it is? entry->fh->size is set by nfs[34]_decode_dirent(). NFS folks: any ideas on best way to debug it? The brute-force way would be to capture all NFS traffic with tcpdump and see what's going on, but that would be a lot of work... Looks like we have READDIRPLUS attempted and succeeded, but fhandle was not given. Result: nfs_prime_dcache() is doing blind d_drop() on perfectly valid dentries, no matter how busy. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-29 23:43 ` Al Viro @ 2012-11-30 0:19 ` Patrick McLean 2012-11-30 0:35 ` Al Viro 0 siblings, 1 reply; 17+ messages in thread From: Patrick McLean @ 2012-11-30 0:19 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On 29/11/12 03:43 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 02:53:13PM -0800, Patrick McLean wrote: >> On 29/11/12 02:21 PM, Al Viro wrote: >>> On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote: >>> >>>> I have a trivial reproducer and am happy to help debug in any way that >>>> I can. That patch seems to fix the problem, and produces these >>>> warnings in dmesg: >>> >>> OK... So we have differing entry->fh and NFS_FH(dentry->d_inode). Something >>> like >>> static void dump_fh(const struct nfs_fh *fh) >>> { >>> int i; >>> printk(KERN_INFO "FH(%d)", fh->size); >>> for (i = 0; i < fh->size; i++) >>> printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]); >>> printk(KERN_CONT "]\n"); >>> } >>> with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to >>> that WARN_ON(1) would probably be interesting. And probably would make >>> sense to print filename->name as well, to see which files it is about. > >> [ 8.821584] FH(0)] >> [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] >> [ 8.821601] filename: proc > > *whoa* > > So we have zero entry->fh->size? No wonder it doesn't match... Which NFS > version it is? entry->fh->size is set by nfs[34]_decode_dirent(). This is nfs v3 over TCP on Linus git at commit e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace. > NFS folks: any ideas on best way to debug it? The brute-force way would be > to capture all NFS traffic with tcpdump and see what's going on, but that > would be a lot of work... > > Looks like we have READDIRPLUS attempted and succeeded, but fhandle was not > given. Result: nfs_prime_dcache() is doing blind d_drop() on perfectly > valid dentries, no matter how busy. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 0:19 ` Patrick McLean @ 2012-11-30 0:35 ` Al Viro 2012-11-30 0:57 ` Patrick McLean 0 siblings, 1 reply; 17+ messages in thread From: Al Viro @ 2012-11-30 0:35 UTC (permalink / raw) To: Patrick McLean; +Cc: linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 04:19:51PM -0800, Patrick McLean wrote: > >> [ 8.821584] FH(0)] > >> [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] > >> [ 8.821601] filename: proc > > > > *whoa* > > > > So we have zero entry->fh->size? No wonder it doesn't match... Which NFS > > version it is? entry->fh->size is set by nfs[34]_decode_dirent(). > > This is nfs v3 over TCP on Linus git at commit e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace. So we have nfs3_decode_dirent(), stepping into /* In fact, a post_op_fh3: */ p = xdr_inline_decode(xdr, 4); if (unlikely(p == NULL)) goto out_overflow; if (*p != xdr_zero) { error = decode_nfs_fh3(xdr, entry->fh); if (unlikely(error)) { if (error == -E2BIG) goto out_truncated; return error; } } else zero_nfs_fh3(entry->fh); Interesting... Server-side that should've been produced by encode_entryplus_baggage(), which looks like failing compose_entry_fh()... which has explicit if (d_mountpoint(dchild)) goto out; resulting in ENOENT on everything that's overmounted on server. Do you, by any chance, have the server really exporting its own root filesystem? Another thing to check: have nfs_prime_dcache() print filename.name of everything that fails nfs_same_entry() and has zero entry->fh->size, regardless of d_invalidate() results. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 0:35 ` Al Viro @ 2012-11-30 0:57 ` Patrick McLean 2012-11-30 1:36 ` Al Viro 0 siblings, 1 reply; 17+ messages in thread From: Patrick McLean @ 2012-11-30 0:57 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On 29/11/12 04:35 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 04:19:51PM -0800, Patrick McLean wrote: >>>> [ 8.821584] FH(0)] >>>> [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] >>>> [ 8.821601] filename: proc >>> >>> *whoa* >>> >>> So we have zero entry->fh->size? No wonder it doesn't match... Which NFS >>> version it is? entry->fh->size is set by nfs[34]_decode_dirent(). >> >> This is nfs v3 over TCP on Linus git at commit e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace. > > So we have nfs3_decode_dirent(), stepping into > /* In fact, a post_op_fh3: */ > p = xdr_inline_decode(xdr, 4); > if (unlikely(p == NULL)) > goto out_overflow; > if (*p != xdr_zero) { > error = decode_nfs_fh3(xdr, entry->fh); > if (unlikely(error)) { > if (error == -E2BIG) > goto out_truncated; > return error; > } > } else > zero_nfs_fh3(entry->fh); > Interesting... Server-side that should've been produced by > encode_entryplus_baggage(), which looks like failing compose_entry_fh()... > which has explicit > if (d_mountpoint(dchild)) > goto out; > resulting in ENOENT on everything that's overmounted on server. > > Do you, by any chance, have the server really exporting its own root > filesystem? Another thing to check: have nfs_prime_dcache() print > filename.name of everything that fails nfs_same_entry() and has > zero entry->fh->size, regardless of d_invalidate() results. The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem (which does not happen to be the root filesystem). The client is running as a KVM guest on the machine that is serving the NFS. I am reproducing this by booting the guest via an initramfs, and doing "ls /" at in single user mode. I added a check that prints the filename.name of everything that fails nfs_same_file, and it appears to just be triggered by the same filesystems that are triggering the WARN_ON, the relevant dmesg is below. [ 9.495217] entry->fh->size is 0 on: proc [ 9.495222] ------------[ cut here ]------------ [ 9.495230] WARNING: at fs/nfs/dir.c:464 nfs_readdir_page_filler+0x1ef/0x3eb() [ 9.495232] Hardware name: Bochs [ 9.495233] Modules linked in: [ 9.495237] Pid: 655, comm: ls Not tainted 3.7.0-rc7+ #40 [ 9.495239] Call Trace: [ 9.495247] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.495250] [<ffffffff8117deb4>] ? nfs_readdir_page_filler+0x1ef/0x3eb [ 9.495254] [<ffffffff8117e6cc>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.495257] [<ffffffff8117e755>] ? nfs_readdir_filler+0x1c/0x6b [ 9.495263] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.495266] [<ffffffff8117e739>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.495269] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.495274] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495277] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.495280] [<ffffffff8117e8d1>] ? nfs_readdir+0x12d/0x435 [ 9.495285] [<ffffffff8118e69b>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.495288] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495291] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495294] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.495298] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.495302] [<ffffffff814ac829>] ? system_call_fastpath+0x16/0x1b [ 9.495304] ---[ end trace e502c5d56c594e85 ]--- [ 9.495306] FH(0)] [ 9.495308] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] [ 9.495323] filename: proc [ 9.495330] entry->fh->size is 0 on: dev [ 9.495332] ------------[ cut here ]------------ [ 9.495335] WARNING: at fs/nfs/dir.c:464 nfs_readdir_page_filler+0x1ef/0x3eb() [ 9.495336] Hardware name: Bochs [ 9.495337] Modules linked in: [ 9.495340] Pid: 655, comm: ls Tainted: G W 3.7.0-rc7+ #40 [ 9.495341] Call Trace: [ 9.495345] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.495348] [<ffffffff8117deb4>] ? nfs_readdir_page_filler+0x1ef/0x3eb [ 9.495351] [<ffffffff8117e6cc>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.495354] [<ffffffff8117e755>] ? nfs_readdir_filler+0x1c/0x6b [ 9.495358] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.495361] [<ffffffff8117e739>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.495364] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.495368] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495371] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.495373] [<ffffffff8117e8d1>] ? nfs_readdir+0x12d/0x435 [ 9.495377] [<ffffffff8118e69b>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.495380] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495383] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495387] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.495390] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.495393] [<ffffffff814ac829>] ? system_call_fastpath+0x16/0x1b [ 9.495395] ---[ end trace e502c5d56c594e86 ]--- [ 9.495396] FH(0)] [ 9.495397] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 85 00 00 40 d6 39 e0 7d] [ 9.495412] filename: dev [ 9.495422] entry->fh->size is 0 on: sys [ 9.495423] ------------[ cut here ]------------ [ 9.495426] WARNING: at fs/nfs/dir.c:464 nfs_readdir_page_filler+0x1ef/0x3eb() [ 9.495427] Hardware name: Bochs [ 9.495428] Modules linked in: [ 9.495430] Pid: 655, comm: ls Tainted: G W 3.7.0-rc7+ #40 [ 9.495431] Call Trace: [ 9.495435] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.495438] [<ffffffff8117deb4>] ? nfs_readdir_page_filler+0x1ef/0x3eb [ 9.495441] [<ffffffff8117e6cc>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.495444] [<ffffffff8117e755>] ? nfs_readdir_filler+0x1c/0x6b [ 9.495448] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.495450] [<ffffffff8117e739>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.495454] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.495457] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495460] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.495463] [<ffffffff8117e8d1>] ? nfs_readdir+0x12d/0x435 [ 9.495466] [<ffffffff8118e69b>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.495470] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495473] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495476] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.495479] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.495482] [<ffffffff814ac829>] ? system_call_fastpath+0x16/0x1b [ 9.495484] ---[ end trace e502c5d56c594e87 ]--- [ 9.495485] FH(0)] [ 9.495486] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a3 0e 00 40 42 57 d3 90] [ 9.495511] filename: sys ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 0:57 ` Patrick McLean @ 2012-11-30 1:36 ` Al Viro 2012-11-30 1:54 ` Patrick McLean 0 siblings, 1 reply; 17+ messages in thread From: Al Viro @ 2012-11-30 1:36 UTC (permalink / raw) To: Patrick McLean Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 04:57:19PM -0800, Patrick McLean wrote: > > Interesting... Server-side that should've been produced by > > encode_entryplus_baggage(), which looks like failing compose_entry_fh()... > > which has explicit > > if (d_mountpoint(dchild)) > > goto out; > > resulting in ENOENT on everything that's overmounted on server. > > > > Do you, by any chance, have the server really exporting its own root > > filesystem? Another thing to check: have nfs_prime_dcache() print > > filename.name of everything that fails nfs_same_entry() and has > > zero entry->fh->size, regardless of d_invalidate() results. > > The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem (which does not happen to be the root filesystem). > > The client is running as a KVM guest on the machine that is serving the NFS. I am reproducing this by booting the guest via an initramfs, and doing > "ls /" at in single user mode. > > I added a check that prints the filename.name of everything that fails nfs_same_file, and it appears to just be triggered by the same filesystems that > are triggering the WARN_ON, the relevant dmesg is below. [the same /dev, /proc and /sys] Very interesting. Do you have anything mounted on the corresponding directories on server? The picture looks like you are getting empty fhandles in readdir+ respons for exactly the same directories that happen to be mountpoints on client. In any case, we shouldn't do that blind d_drop() - empty fhandles can happen. The only remaining question is why do they happen on that set of entries. From my reading of encode_entryplus_baggage() it looks like we have compose_entry_fh() failing for those entries and those entries alone. One possible cause would be d_mountpoint(dchild) being true on server. If it is true, we can declare the case closed; if not, I really wonder what's going on. Note that if the same fs is mounted elsewhere, d_mountpoint() would mean that something is mounted on top of that directory in _some_ instance; not necessary the exported one. Can you slap printks on fs/nfsd/nfs3xdr.c compose_entry_fh() failure exits and see which one triggers server-side? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 1:36 ` Al Viro @ 2012-11-30 1:54 ` Patrick McLean [not found] ` <50B811BA.6070503-NrXZpuFoTVF8GC8d84axZg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Patrick McLean @ 2012-11-30 1:54 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On 29/11/12 05:36 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 04:57:19PM -0800, Patrick McLean wrote: >>> Interesting... Server-side that should've been produced by >>> encode_entryplus_baggage(), which looks like failing compose_entry_fh()... >>> which has explicit >>> if (d_mountpoint(dchild)) >>> goto out; >>> resulting in ENOENT on everything that's overmounted on server. >>> >>> Do you, by any chance, have the server really exporting its own root >>> filesystem? Another thing to check: have nfs_prime_dcache() print >>> filename.name of everything that fails nfs_same_entry() and has >>> zero entry->fh->size, regardless of d_invalidate() results. >> >> The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem (which does not happen to be the root filesystem). >> >> The client is running as a KVM guest on the machine that is serving the NFS. I am reproducing this by booting the guest via an initramfs, and doing >> "ls /" at in single user mode. >> >> I added a check that prints the filename.name of everything that fails nfs_same_file, and it appears to just be triggered by the same filesystems that >> are triggering the WARN_ON, the relevant dmesg is below. > > [the same /dev, /proc and /sys] > > Very interesting. Do you have anything mounted on the corresponding > directories on server? The picture looks like you are getting empty > fhandles in readdir+ respons for exactly the same directories that happen > to be mountpoints on client. In any case, we shouldn't do that blind > d_drop() - empty fhandles can happen. The only remaining question is > why do they happen on that set of entries. From my reading of > encode_entryplus_baggage() it looks like we have compose_entry_fh() > failing for those entries and those entries alone. One possible cause > would be d_mountpoint(dchild) being true on server. If it is true, we > can declare the case closed; if not, I really wonder what's going on. Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace. Unmounting those directories on the server does appear to stop the WARN_ON from triggering. > Note that if the same fs is mounted elsewhere, d_mountpoint() would mean > that something is mounted on top of that directory in _some_ instance; > not necessary the exported one. Can you slap printks on fs/nfsd/nfs3xdr.c > compose_entry_fh() failure exits and see which one triggers server-side? ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <50B811BA.6070503-NrXZpuFoTVF8GC8d84axZg@public.gmane.org>]
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) [not found] ` <50B811BA.6070503-NrXZpuFoTVF8GC8d84axZg@public.gmane.org> @ 2012-11-30 2:00 ` Al Viro [not found] ` <20121130020047.GA4939-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2012-12-01 2:18 ` Simon Kirby 0 siblings, 2 replies; 17+ messages in thread From: Al Viro @ 2012-11-30 2:00 UTC (permalink / raw) To: Patrick McLean Cc: Patrick McLean, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust, linux-nfs-u79uwXL29TY76Z2rM5mHXA On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote: > > Very interesting. Do you have anything mounted on the corresponding > > directories on server? The picture looks like you are getting empty > > fhandles in readdir+ respons for exactly the same directories that happen > > to be mountpoints on client. In any case, we shouldn't do that blind > > d_drop() - empty fhandles can happen. The only remaining question is > > why do they happen on that set of entries. From my reading of > > encode_entryplus_baggage() it looks like we have compose_entry_fh() > > failing for those entries and those entries alone. One possible cause > > would be d_mountpoint(dchild) being true on server. If it is true, we > > can declare the case closed; if not, I really wonder what's going on. > > Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace. > > Unmounting those directories on the server does appear to stop the WARN_ON from triggering. OK, that settles it. WARN_ON() and printks in the area can be dropped; the right fix is below. However, there's a similar place in cifs that also needs to be dealt with and I really, really wonder why the hell do we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this bug, but I would like to understand what's wrong with simply returning 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care of unhashing, etc. itself. Would make have_submounts() in there pointless as well - we could just return 0 and let d_invalidate() take care of the checks... Trond? diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) nfs_refresh_inode(dentry->d_inode, entry->fattr); goto out; } else { - d_drop(dentry); + if (d_invalidate(dentry) != 0) + goto out; dput(dentry); } } -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20121130020047.GA4939-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>]
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) [not found] ` <20121130020047.GA4939-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2012-11-30 2:33 ` Patrick McLean [not found] ` <50B81B11.7050704-aTCcZBytkHbQT0dZR+AlfA@public.gmane.org> 2012-11-30 13:58 ` Myklebust, Trond 1 sibling, 1 reply; 17+ messages in thread From: Patrick McLean @ 2012-11-30 2:33 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust, linux-nfs-u79uwXL29TY76Z2rM5mHXA On 29/11/12 06:00 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote: >>> Very interesting. Do you have anything mounted on the corresponding >>> directories on server? The picture looks like you are getting empty >>> fhandles in readdir+ respons for exactly the same directories that happen >>> to be mountpoints on client. In any case, we shouldn't do that blind >>> d_drop() - empty fhandles can happen. The only remaining question is >>> why do they happen on that set of entries. From my reading of >>> encode_entryplus_baggage() it looks like we have compose_entry_fh() >>> failing for those entries and those entries alone. One possible cause >>> would be d_mountpoint(dchild) being true on server. If it is true, we >>> can declare the case closed; if not, I really wonder what's going on. >> >> Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace. >> >> Unmounting those directories on the server does appear to stop the WARN_ON from triggering. > > OK, that settles it. WARN_ON() and printks in the area can be dropped; > the right fix is below. However, there's a similar place in cifs that > also needs to be dealt with and I really, really wonder why the hell do > we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this > bug, but I would like to understand what's wrong with simply returning > 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care > of unhashing, etc. itself. Would make have_submounts() in there pointless > as well - we could just return 0 and let d_invalidate() take care of the > checks... Trond? > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) > nfs_refresh_inode(dentry->d_inode, entry->fattr); > goto out; > } else { > - d_drop(dentry); > + if (d_invalidate(dentry) != 0) > + goto out; > dput(dentry); > } > } Excellent, thanks. Is there any chance this will make it to 3.7? Also we might want to cc stable@ on this as well since it is a regression in 3.6. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <50B81B11.7050704-aTCcZBytkHbQT0dZR+AlfA@public.gmane.org>]
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) [not found] ` <50B81B11.7050704-aTCcZBytkHbQT0dZR+AlfA@public.gmane.org> @ 2012-11-30 4:11 ` Al Viro 0 siblings, 0 replies; 17+ messages in thread From: Al Viro @ 2012-11-30 4:11 UTC (permalink / raw) To: Patrick McLean Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust, linux-nfs-u79uwXL29TY76Z2rM5mHXA On Thu, Nov 29, 2012 at 06:33:53PM -0800, Patrick McLean wrote: > Excellent, thanks. Is there any chance this will make it to 3.7? Also we might want to cc stable@ on this as well since it is a regression in 3.6. Definitely. I've dropped that into vfs.git#for-linus and vfs.git#for-next and tomorrow to Linus it goes... -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) [not found] ` <20121130020047.GA4939-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2012-11-30 2:33 ` Patrick McLean @ 2012-11-30 13:58 ` Myklebust, Trond 2012-12-01 21:40 ` Al Viro 1 sibling, 1 reply; 17+ messages in thread From: Myklebust, Trond @ 2012-11-30 13:58 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, Patrick McLean, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri, 2012-11-30 at 02:00 +0000, Al Viro wrote: > On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote: > > > Very interesting. Do you have anything mounted on the corresponding > > > directories on server? The picture looks like you are getting empty > > > fhandles in readdir+ respons for exactly the same directories that happen > > > to be mountpoints on client. In any case, we shouldn't do that blind > > > d_drop() - empty fhandles can happen. The only remaining question is > > > why do they happen on that set of entries. From my reading of > > > encode_entryplus_baggage() it looks like we have compose_entry_fh() > > > failing for those entries and those entries alone. One possible cause > > > would be d_mountpoint(dchild) being true on server. If it is true, we > > > can declare the case closed; if not, I really wonder what's going on. > > > > Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace. > > > > Unmounting those directories on the server does appear to stop the WARN_ON from triggering. > > OK, that settles it. WARN_ON() and printks in the area can be dropped; > the right fix is below. However, there's a similar place in cifs that > also needs to be dealt with and I really, really wonder why the hell do > we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this > bug, but I would like to understand what's wrong with simply returning > 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care > of unhashing, etc. itself. Would make have_submounts() in there pointless > as well - we could just return 0 and let d_invalidate() take care of the > checks... Trond? The reason for the choice of d_drop over d_invalidate() is the d_count checks. It really doesn't matter whether or not the client thinks it has users for a directory if the server is telling you that it is ESTALE. So we force a d_drop to prevent further lookups from finding it. IOW: It is there in order to fix the case where the user does 'rmdir("foo"); mkdir("foo")' on the server. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 13:58 ` Myklebust, Trond @ 2012-12-01 21:40 ` Al Viro 0 siblings, 0 replies; 17+ messages in thread From: Al Viro @ 2012-12-01 21:40 UTC (permalink / raw) To: Myklebust, Trond Cc: Patrick McLean, Patrick McLean, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org On Fri, Nov 30, 2012 at 01:58:18PM +0000, Myklebust, Trond wrote: > The reason for the choice of d_drop over d_invalidate() is the d_count > checks. It really doesn't matter whether or not the client thinks it has > users for a directory if the server is telling you that it is ESTALE. So > we force a d_drop to prevent further lookups from finding it. > > IOW: It is there in order to fix the case where the user does > 'rmdir("foo"); mkdir("foo")' on the server. You do realize that your have_submounts() check in there is inherently racy, right? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 2:00 ` Al Viro [not found] ` <20121130020047.GA4939-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> @ 2012-12-01 2:18 ` Simon Kirby 1 sibling, 0 replies; 17+ messages in thread From: Simon Kirby @ 2012-12-01 2:18 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Fri, Nov 30, 2012 at 02:00:48AM +0000, Al Viro wrote: > OK, that settles it. WARN_ON() and printks in the area can be dropped; > the right fix is below. However, there's a similar place in cifs that > also needs to be dealt with and I really, really wonder why the hell do > we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this > bug, but I would like to understand what's wrong with simply returning > 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care > of unhashing, etc. itself. Would make have_submounts() in there pointless > as well - we could just return 0 and let d_invalidate() take care of the > checks... Trond? > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) > nfs_refresh_inode(dentry->d_inode, entry->fattr); > goto out; > } else { > - d_drop(dentry); > + if (d_invalidate(dentry) != 0) > + goto out; > dput(dentry); > } > } Hello, With your previous patch (with the WARN_ON), I hit the WARN_ON() in the test case described here: https://patchwork.kernel.org/patch/1446851/ . The __d_move()ing mountpoint case no longer hits, and there is no longer an EBUSY, so this seems to work for me (in 3.6, where it broke). Simon- ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-12-01 21:40 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-29 19:16 Regression with initramfs and nfsroot (appears to be in the dcache) Patrick McLean 2012-11-29 21:33 ` Al Viro 2012-11-29 22:06 ` Patrick McLean 2012-11-29 22:21 ` Al Viro 2012-11-29 22:53 ` Patrick McLean 2012-11-29 23:43 ` Al Viro 2012-11-30 0:19 ` Patrick McLean 2012-11-30 0:35 ` Al Viro 2012-11-30 0:57 ` Patrick McLean 2012-11-30 1:36 ` Al Viro 2012-11-30 1:54 ` Patrick McLean [not found] ` <50B811BA.6070503-NrXZpuFoTVF8GC8d84axZg@public.gmane.org> 2012-11-30 2:00 ` Al Viro [not found] ` <20121130020047.GA4939-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> 2012-11-30 2:33 ` Patrick McLean [not found] ` <50B81B11.7050704-aTCcZBytkHbQT0dZR+AlfA@public.gmane.org> 2012-11-30 4:11 ` Al Viro 2012-11-30 13:58 ` Myklebust, Trond 2012-12-01 21:40 ` Al Viro 2012-12-01 2:18 ` Simon Kirby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).