* Re: Regression with initramfs and nfsroot (appears to be in the dcache) [not found] ` <50B7E759.9070007@gaikai.com> @ 2012-11-29 23:43 ` Al Viro 2012-11-30 0:19 ` Patrick McLean 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2012-11-29 23:43 UTC (permalink / raw) To: Patrick McLean; +Cc: linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 02:53:13PM -0800, Patrick McLean wrote: > On 29/11/12 02:21 PM, Al Viro wrote: > > On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote: > > > >> I have a trivial reproducer and am happy to help debug in any way that > >> I can. That patch seems to fix the problem, and produces these > >> warnings in dmesg: > > > > OK... So we have differing entry->fh and NFS_FH(dentry->d_inode). Something > > like > > static void dump_fh(const struct nfs_fh *fh) > > { > > int i; > > printk(KERN_INFO "FH(%d)", fh->size); > > for (i = 0; i < fh->size; i++) > > printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]); > > printk(KERN_CONT "]\n"); > > } > > with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to > > that WARN_ON(1) would probably be interesting. And probably would make > > sense to print filename->name as well, to see which files it is about. > [ 8.821584] FH(0)] > [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] > [ 8.821601] filename: proc *whoa* So we have zero entry->fh->size? No wonder it doesn't match... Which NFS version it is? entry->fh->size is set by nfs[34]_decode_dirent(). NFS folks: any ideas on best way to debug it? The brute-force way would be to capture all NFS traffic with tcpdump and see what's going on, but that would be a lot of work... Looks like we have READDIRPLUS attempted and succeeded, but fhandle was not given. Result: nfs_prime_dcache() is doing blind d_drop() on perfectly valid dentries, no matter how busy. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-29 23:43 ` Regression with initramfs and nfsroot (appears to be in the dcache) Al Viro @ 2012-11-30 0:19 ` Patrick McLean 2012-11-30 0:35 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Patrick McLean @ 2012-11-30 0:19 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On 29/11/12 03:43 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 02:53:13PM -0800, Patrick McLean wrote: >> On 29/11/12 02:21 PM, Al Viro wrote: >>> On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote: >>> >>>> I have a trivial reproducer and am happy to help debug in any way that >>>> I can. That patch seems to fix the problem, and produces these >>>> warnings in dmesg: >>> >>> OK... So we have differing entry->fh and NFS_FH(dentry->d_inode). Something >>> like >>> static void dump_fh(const struct nfs_fh *fh) >>> { >>> int i; >>> printk(KERN_INFO "FH(%d)", fh->size); >>> for (i = 0; i < fh->size; i++) >>> printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]); >>> printk(KERN_CONT "]\n"); >>> } >>> with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to >>> that WARN_ON(1) would probably be interesting. And probably would make >>> sense to print filename->name as well, to see which files it is about. > >> [ 8.821584] FH(0)] >> [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] >> [ 8.821601] filename: proc > > *whoa* > > So we have zero entry->fh->size? No wonder it doesn't match... Which NFS > version it is? entry->fh->size is set by nfs[34]_decode_dirent(). This is nfs v3 over TCP on Linus git at commit e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace. > NFS folks: any ideas on best way to debug it? The brute-force way would be > to capture all NFS traffic with tcpdump and see what's going on, but that > would be a lot of work... > > Looks like we have READDIRPLUS attempted and succeeded, but fhandle was not > given. Result: nfs_prime_dcache() is doing blind d_drop() on perfectly > valid dentries, no matter how busy. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 0:19 ` Patrick McLean @ 2012-11-30 0:35 ` Al Viro 2012-11-30 0:57 ` Patrick McLean 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2012-11-30 0:35 UTC (permalink / raw) To: Patrick McLean; +Cc: linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 04:19:51PM -0800, Patrick McLean wrote: > >> [ 8.821584] FH(0)] > >> [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] > >> [ 8.821601] filename: proc > > > > *whoa* > > > > So we have zero entry->fh->size? No wonder it doesn't match... Which NFS > > version it is? entry->fh->size is set by nfs[34]_decode_dirent(). > > This is nfs v3 over TCP on Linus git at commit e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace. So we have nfs3_decode_dirent(), stepping into /* In fact, a post_op_fh3: */ p = xdr_inline_decode(xdr, 4); if (unlikely(p == NULL)) goto out_overflow; if (*p != xdr_zero) { error = decode_nfs_fh3(xdr, entry->fh); if (unlikely(error)) { if (error == -E2BIG) goto out_truncated; return error; } } else zero_nfs_fh3(entry->fh); Interesting... Server-side that should've been produced by encode_entryplus_baggage(), which looks like failing compose_entry_fh()... which has explicit if (d_mountpoint(dchild)) goto out; resulting in ENOENT on everything that's overmounted on server. Do you, by any chance, have the server really exporting its own root filesystem? Another thing to check: have nfs_prime_dcache() print filename.name of everything that fails nfs_same_entry() and has zero entry->fh->size, regardless of d_invalidate() results. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 0:35 ` Al Viro @ 2012-11-30 0:57 ` Patrick McLean 2012-11-30 1:36 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Patrick McLean @ 2012-11-30 0:57 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On 29/11/12 04:35 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 04:19:51PM -0800, Patrick McLean wrote: >>>> [ 8.821584] FH(0)] >>>> [ 8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] >>>> [ 8.821601] filename: proc >>> >>> *whoa* >>> >>> So we have zero entry->fh->size? No wonder it doesn't match... Which NFS >>> version it is? entry->fh->size is set by nfs[34]_decode_dirent(). >> >> This is nfs v3 over TCP on Linus git at commit e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace. > > So we have nfs3_decode_dirent(), stepping into > /* In fact, a post_op_fh3: */ > p = xdr_inline_decode(xdr, 4); > if (unlikely(p == NULL)) > goto out_overflow; > if (*p != xdr_zero) { > error = decode_nfs_fh3(xdr, entry->fh); > if (unlikely(error)) { > if (error == -E2BIG) > goto out_truncated; > return error; > } > } else > zero_nfs_fh3(entry->fh); > Interesting... Server-side that should've been produced by > encode_entryplus_baggage(), which looks like failing compose_entry_fh()... > which has explicit > if (d_mountpoint(dchild)) > goto out; > resulting in ENOENT on everything that's overmounted on server. > > Do you, by any chance, have the server really exporting its own root > filesystem? Another thing to check: have nfs_prime_dcache() print > filename.name of everything that fails nfs_same_entry() and has > zero entry->fh->size, regardless of d_invalidate() results. The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem (which does not happen to be the root filesystem). The client is running as a KVM guest on the machine that is serving the NFS. I am reproducing this by booting the guest via an initramfs, and doing "ls /" at in single user mode. I added a check that prints the filename.name of everything that fails nfs_same_file, and it appears to just be triggered by the same filesystems that are triggering the WARN_ON, the relevant dmesg is below. [ 9.495217] entry->fh->size is 0 on: proc [ 9.495222] ------------[ cut here ]------------ [ 9.495230] WARNING: at fs/nfs/dir.c:464 nfs_readdir_page_filler+0x1ef/0x3eb() [ 9.495232] Hardware name: Bochs [ 9.495233] Modules linked in: [ 9.495237] Pid: 655, comm: ls Not tainted 3.7.0-rc7+ #40 [ 9.495239] Call Trace: [ 9.495247] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.495250] [<ffffffff8117deb4>] ? nfs_readdir_page_filler+0x1ef/0x3eb [ 9.495254] [<ffffffff8117e6cc>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.495257] [<ffffffff8117e755>] ? nfs_readdir_filler+0x1c/0x6b [ 9.495263] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.495266] [<ffffffff8117e739>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.495269] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.495274] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495277] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.495280] [<ffffffff8117e8d1>] ? nfs_readdir+0x12d/0x435 [ 9.495285] [<ffffffff8118e69b>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.495288] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495291] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495294] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.495298] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.495302] [<ffffffff814ac829>] ? system_call_fastpath+0x16/0x1b [ 9.495304] ---[ end trace e502c5d56c594e85 ]--- [ 9.495306] FH(0)] [ 9.495308] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62] [ 9.495323] filename: proc [ 9.495330] entry->fh->size is 0 on: dev [ 9.495332] ------------[ cut here ]------------ [ 9.495335] WARNING: at fs/nfs/dir.c:464 nfs_readdir_page_filler+0x1ef/0x3eb() [ 9.495336] Hardware name: Bochs [ 9.495337] Modules linked in: [ 9.495340] Pid: 655, comm: ls Tainted: G W 3.7.0-rc7+ #40 [ 9.495341] Call Trace: [ 9.495345] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.495348] [<ffffffff8117deb4>] ? nfs_readdir_page_filler+0x1ef/0x3eb [ 9.495351] [<ffffffff8117e6cc>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.495354] [<ffffffff8117e755>] ? nfs_readdir_filler+0x1c/0x6b [ 9.495358] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.495361] [<ffffffff8117e739>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.495364] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.495368] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495371] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.495373] [<ffffffff8117e8d1>] ? nfs_readdir+0x12d/0x435 [ 9.495377] [<ffffffff8118e69b>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.495380] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495383] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495387] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.495390] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.495393] [<ffffffff814ac829>] ? system_call_fastpath+0x16/0x1b [ 9.495395] ---[ end trace e502c5d56c594e86 ]--- [ 9.495396] FH(0)] [ 9.495397] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 85 00 00 40 d6 39 e0 7d] [ 9.495412] filename: dev [ 9.495422] entry->fh->size is 0 on: sys [ 9.495423] ------------[ cut here ]------------ [ 9.495426] WARNING: at fs/nfs/dir.c:464 nfs_readdir_page_filler+0x1ef/0x3eb() [ 9.495427] Hardware name: Bochs [ 9.495428] Modules linked in: [ 9.495430] Pid: 655, comm: ls Tainted: G W 3.7.0-rc7+ #40 [ 9.495431] Call Trace: [ 9.495435] [<ffffffff8108534c>] ? warn_slowpath_common+0x76/0x8a [ 9.495438] [<ffffffff8117deb4>] ? nfs_readdir_page_filler+0x1ef/0x3eb [ 9.495441] [<ffffffff8117e6cc>] ? nfs_readdir_xdr_to_array+0x1c0/0x22d [ 9.495444] [<ffffffff8117e755>] ? nfs_readdir_filler+0x1c/0x6b [ 9.495448] [<ffffffff810dca9a>] ? add_to_page_cache_lru+0x2c/0x36 [ 9.495450] [<ffffffff8117e739>] ? nfs_readdir_xdr_to_array+0x22d/0x22d [ 9.495454] [<ffffffff810dcbe3>] ? do_read_cache_page+0x7d/0x12b [ 9.495457] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495460] [<ffffffff810dccc6>] ? read_cache_page+0x7/0x10 [ 9.495463] [<ffffffff8117e8d1>] ? nfs_readdir+0x12d/0x435 [ 9.495466] [<ffffffff8118e69b>] ? nfs3_xdr_dec_create3res+0xc5/0xc5 [ 9.495470] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495473] [<ffffffff811274f8>] ? sys_ioctl+0x7a/0x7a [ 9.495476] [<ffffffff811277b3>] ? vfs_readdir+0x6c/0xa7 [ 9.495479] [<ffffffff811278da>] ? sys_getdents+0x7e/0xdc [ 9.495482] [<ffffffff814ac829>] ? system_call_fastpath+0x16/0x1b [ 9.495484] ---[ end trace e502c5d56c594e87 ]--- [ 9.495485] FH(0)] [ 9.495486] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 dc bf 1b d5 95 d6 76 d6 d9 a3 0e 00 40 42 57 d3 90] [ 9.495511] filename: sys ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 0:57 ` Patrick McLean @ 2012-11-30 1:36 ` Al Viro 2012-11-30 1:54 ` Patrick McLean 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2012-11-30 1:36 UTC (permalink / raw) To: Patrick McLean Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 04:57:19PM -0800, Patrick McLean wrote: > > Interesting... Server-side that should've been produced by > > encode_entryplus_baggage(), which looks like failing compose_entry_fh()... > > which has explicit > > if (d_mountpoint(dchild)) > > goto out; > > resulting in ENOENT on everything that's overmounted on server. > > > > Do you, by any chance, have the server really exporting its own root > > filesystem? Another thing to check: have nfs_prime_dcache() print > > filename.name of everything that fails nfs_same_entry() and has > > zero entry->fh->size, regardless of d_invalidate() results. > > The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem (which does not happen to be the root filesystem). > > The client is running as a KVM guest on the machine that is serving the NFS. I am reproducing this by booting the guest via an initramfs, and doing > "ls /" at in single user mode. > > I added a check that prints the filename.name of everything that fails nfs_same_file, and it appears to just be triggered by the same filesystems that > are triggering the WARN_ON, the relevant dmesg is below. [the same /dev, /proc and /sys] Very interesting. Do you have anything mounted on the corresponding directories on server? The picture looks like you are getting empty fhandles in readdir+ respons for exactly the same directories that happen to be mountpoints on client. In any case, we shouldn't do that blind d_drop() - empty fhandles can happen. The only remaining question is why do they happen on that set of entries. From my reading of encode_entryplus_baggage() it looks like we have compose_entry_fh() failing for those entries and those entries alone. One possible cause would be d_mountpoint(dchild) being true on server. If it is true, we can declare the case closed; if not, I really wonder what's going on. Note that if the same fs is mounted elsewhere, d_mountpoint() would mean that something is mounted on top of that directory in _some_ instance; not necessary the exported one. Can you slap printks on fs/nfsd/nfs3xdr.c compose_entry_fh() failure exits and see which one triggers server-side? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 1:36 ` Al Viro @ 2012-11-30 1:54 ` Patrick McLean 2012-11-30 2:00 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Patrick McLean @ 2012-11-30 1:54 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On 29/11/12 05:36 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 04:57:19PM -0800, Patrick McLean wrote: >>> Interesting... Server-side that should've been produced by >>> encode_entryplus_baggage(), which looks like failing compose_entry_fh()... >>> which has explicit >>> if (d_mountpoint(dchild)) >>> goto out; >>> resulting in ENOENT on everything that's overmounted on server. >>> >>> Do you, by any chance, have the server really exporting its own root >>> filesystem? Another thing to check: have nfs_prime_dcache() print >>> filename.name of everything that fails nfs_same_entry() and has >>> zero entry->fh->size, regardless of d_invalidate() results. >> >> The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem (which does not happen to be the root filesystem). >> >> The client is running as a KVM guest on the machine that is serving the NFS. I am reproducing this by booting the guest via an initramfs, and doing >> "ls /" at in single user mode. >> >> I added a check that prints the filename.name of everything that fails nfs_same_file, and it appears to just be triggered by the same filesystems that >> are triggering the WARN_ON, the relevant dmesg is below. > > [the same /dev, /proc and /sys] > > Very interesting. Do you have anything mounted on the corresponding > directories on server? The picture looks like you are getting empty > fhandles in readdir+ respons for exactly the same directories that happen > to be mountpoints on client. In any case, we shouldn't do that blind > d_drop() - empty fhandles can happen. The only remaining question is > why do they happen on that set of entries. From my reading of > encode_entryplus_baggage() it looks like we have compose_entry_fh() > failing for those entries and those entries alone. One possible cause > would be d_mountpoint(dchild) being true on server. If it is true, we > can declare the case closed; if not, I really wonder what's going on. Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace. Unmounting those directories on the server does appear to stop the WARN_ON from triggering. > Note that if the same fs is mounted elsewhere, d_mountpoint() would mean > that something is mounted on top of that directory in _some_ instance; > not necessary the exported one. Can you slap printks on fs/nfsd/nfs3xdr.c > compose_entry_fh() failure exits and see which one triggers server-side? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 1:54 ` Patrick McLean @ 2012-11-30 2:00 ` Al Viro 2012-11-30 2:33 ` Patrick McLean ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Al Viro @ 2012-11-30 2:00 UTC (permalink / raw) To: Patrick McLean Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote: > > Very interesting. Do you have anything mounted on the corresponding > > directories on server? The picture looks like you are getting empty > > fhandles in readdir+ respons for exactly the same directories that happen > > to be mountpoints on client. In any case, we shouldn't do that blind > > d_drop() - empty fhandles can happen. The only remaining question is > > why do they happen on that set of entries. From my reading of > > encode_entryplus_baggage() it looks like we have compose_entry_fh() > > failing for those entries and those entries alone. One possible cause > > would be d_mountpoint(dchild) being true on server. If it is true, we > > can declare the case closed; if not, I really wonder what's going on. > > Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace. > > Unmounting those directories on the server does appear to stop the WARN_ON from triggering. OK, that settles it. WARN_ON() and printks in the area can be dropped; the right fix is below. However, there's a similar place in cifs that also needs to be dealt with and I really, really wonder why the hell do we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this bug, but I would like to understand what's wrong with simply returning 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care of unhashing, etc. itself. Would make have_submounts() in there pointless as well - we could just return 0 and let d_invalidate() take care of the checks... Trond? diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) nfs_refresh_inode(dentry->d_inode, entry->fattr); goto out; } else { - d_drop(dentry); + if (d_invalidate(dentry) != 0) + goto out; dput(dentry); } } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 2:00 ` Al Viro @ 2012-11-30 2:33 ` Patrick McLean 2012-11-30 4:11 ` Al Viro 2012-11-30 13:58 ` Myklebust, Trond 2012-12-01 2:18 ` Simon Kirby 2 siblings, 1 reply; 12+ messages in thread From: Patrick McLean @ 2012-11-30 2:33 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On 29/11/12 06:00 PM, Al Viro wrote: > On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote: >>> Very interesting. Do you have anything mounted on the corresponding >>> directories on server? The picture looks like you are getting empty >>> fhandles in readdir+ respons for exactly the same directories that happen >>> to be mountpoints on client. In any case, we shouldn't do that blind >>> d_drop() - empty fhandles can happen. The only remaining question is >>> why do they happen on that set of entries. From my reading of >>> encode_entryplus_baggage() it looks like we have compose_entry_fh() >>> failing for those entries and those entries alone. One possible cause >>> would be d_mountpoint(dchild) being true on server. If it is true, we >>> can declare the case closed; if not, I really wonder what's going on. >> >> Those directories do have the server's own copies of the said directories bind mounted at the moment in a separate mount namespace. >> >> Unmounting those directories on the server does appear to stop the WARN_ON from triggering. > > OK, that settles it. WARN_ON() and printks in the area can be dropped; > the right fix is below. However, there's a similar place in cifs that > also needs to be dealt with and I really, really wonder why the hell do > we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this > bug, but I would like to understand what's wrong with simply returning > 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care > of unhashing, etc. itself. Would make have_submounts() in there pointless > as well - we could just return 0 and let d_invalidate() take care of the > checks... Trond? > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) > nfs_refresh_inode(dentry->d_inode, entry->fattr); > goto out; > } else { > - d_drop(dentry); > + if (d_invalidate(dentry) != 0) > + goto out; > dput(dentry); > } > } Excellent, thanks. Is there any chance this will make it to 3.7? Also we might want to cc stable@ on this as well since it is a regression in 3.6. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 2:33 ` Patrick McLean @ 2012-11-30 4:11 ` Al Viro 0 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2012-11-30 4:11 UTC (permalink / raw) To: Patrick McLean; +Cc: linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Thu, Nov 29, 2012 at 06:33:53PM -0800, Patrick McLean wrote: > Excellent, thanks. Is there any chance this will make it to 3.7? Also we might want to cc stable@ on this as well since it is a regression in 3.6. Definitely. I've dropped that into vfs.git#for-linus and vfs.git#for-next and tomorrow to Linus it goes... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 2:00 ` Al Viro 2012-11-30 2:33 ` Patrick McLean @ 2012-11-30 13:58 ` Myklebust, Trond 2012-12-01 21:40 ` Al Viro 2012-12-01 2:18 ` Simon Kirby 2 siblings, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-11-30 13:58 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, Patrick McLean, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org T24gRnJpLCAyMDEyLTExLTMwIGF0IDAyOjAwICswMDAwLCBBbCBWaXJvIHdyb3RlOg0KPiBPbiBU aHUsIE5vdiAyOSwgMjAxMiBhdCAwNTo1NDowMlBNIC0wODAwLCBQYXRyaWNrIE1jTGVhbiB3cm90 ZToNCj4gPiA+IAlWZXJ5IGludGVyZXN0aW5nLiAgRG8geW91IGhhdmUgYW55dGhpbmcgbW91bnRl ZCBvbiB0aGUgY29ycmVzcG9uZGluZw0KPiA+ID4gZGlyZWN0b3JpZXMgb24gc2VydmVyPyAgVGhl IHBpY3R1cmUgbG9va3MgbGlrZSB5b3UgYXJlIGdldHRpbmcgZW1wdHkNCj4gPiA+IGZoYW5kbGVz IGluIHJlYWRkaXIrIHJlc3BvbnMgZm9yIGV4YWN0bHkgdGhlIHNhbWUgZGlyZWN0b3JpZXMgdGhh dCBoYXBwZW4NCj4gPiA+IHRvIGJlIG1vdW50cG9pbnRzIG9uIGNsaWVudC4gIEluIGFueSBjYXNl LCB3ZSBzaG91bGRuJ3QgZG8gdGhhdCBibGluZA0KPiA+ID4gZF9kcm9wKCkgLSBlbXB0eSBmaGFu ZGxlcyBjYW4gaGFwcGVuLiAgVGhlIG9ubHkgcmVtYWluaW5nIHF1ZXN0aW9uIGlzDQo+ID4gPiB3 aHkgZG8gdGhleSBoYXBwZW4gb24gdGhhdCBzZXQgb2YgZW50cmllcy4gIEZyb20gbXkgcmVhZGlu ZyBvZg0KPiA+ID4gZW5jb2RlX2VudHJ5cGx1c19iYWdnYWdlKCkgaXQgbG9va3MgbGlrZSB3ZSBo YXZlIGNvbXBvc2VfZW50cnlfZmgoKQ0KPiA+ID4gZmFpbGluZyBmb3IgdGhvc2UgZW50cmllcyBh bmQgdGhvc2UgZW50cmllcyBhbG9uZS4gIE9uZSBwb3NzaWJsZSBjYXVzZQ0KPiA+ID4gd291bGQg YmUgZF9tb3VudHBvaW50KGRjaGlsZCkgYmVpbmcgdHJ1ZSBvbiBzZXJ2ZXIuICBJZiBpdCBpcyB0 cnVlLCB3ZQ0KPiA+ID4gY2FuIGRlY2xhcmUgdGhlIGNhc2UgY2xvc2VkOyBpZiBub3QsIEkgcmVh bGx5IHdvbmRlciB3aGF0J3MgZ29pbmcgb24uDQo+ID4gDQo+ID4gVGhvc2UgZGlyZWN0b3JpZXMg ZG8gaGF2ZSB0aGUgc2VydmVyJ3Mgb3duIGNvcGllcyBvZiB0aGUgc2FpZCBkaXJlY3RvcmllcyBi aW5kIG1vdW50ZWQgYXQgdGhlIG1vbWVudCBpbiBhIHNlcGFyYXRlIG1vdW50IG5hbWVzcGFjZS4N Cj4gPiANCj4gPiBVbm1vdW50aW5nIHRob3NlIGRpcmVjdG9yaWVzIG9uIHRoZSBzZXJ2ZXIgZG9l cyBhcHBlYXIgdG8gc3RvcCB0aGUgV0FSTl9PTiBmcm9tIHRyaWdnZXJpbmcuDQo+IA0KPiBPSywg dGhhdCBzZXR0bGVzIGl0LiAgV0FSTl9PTigpIGFuZCBwcmludGtzIGluIHRoZSBhcmVhIGNhbiBi ZSBkcm9wcGVkOw0KPiB0aGUgcmlnaHQgZml4IGlzIGJlbG93LiAgSG93ZXZlciwgdGhlcmUncyBh IHNpbWlsYXIgcGxhY2UgaW4gY2lmcyB0aGF0DQo+IGFsc28gbmVlZHMgdG8gYmUgZGVhbHQgd2l0 aCBhbmQgSSByZWFsbHksIHJlYWxseSB3b25kZXIgd2h5IHRoZSBoZWxsIGRvDQo+IHdlIGRvIGRf ZHJvcCgpIGluIG5mc19yZXZhbGlkYXRlX2xvb2t1cCgpLiAgSXQncyBub3QgcmVsZXZhbnQgaW4g dGhpcw0KPiBidWcsIGJ1dCBJIHdvdWxkIGxpa2UgdG8gdW5kZXJzdGFuZCB3aGF0J3Mgd3Jvbmcg d2l0aCBzaW1wbHkgcmV0dXJuaW5nDQo+IDAgZnJvbSAtPmRfcmV2YWxpZGF0ZSgpIGFuZCBsZXR0 aW5nIHRoZSBjYWxsZXIgKGluIGZzL25hbWVpLmMpIHRha2UgY2FyZQ0KPiBvZiB1bmhhc2hpbmcs IGV0Yy4gaXRzZWxmLiAgV291bGQgbWFrZSBoYXZlX3N1Ym1vdW50cygpIGluIHRoZXJlIHBvaW50 bGVzcw0KPiBhcyB3ZWxsIC0gd2UgY291bGQganVzdCByZXR1cm4gMCBhbmQgbGV0IGRfaW52YWxp ZGF0ZSgpIHRha2UgY2FyZSBvZiB0aGUNCj4gY2hlY2tzLi4uICBUcm9uZD8NCg0KVGhlIHJlYXNv biBmb3IgdGhlIGNob2ljZSBvZiBkX2Ryb3Agb3ZlciBkX2ludmFsaWRhdGUoKSBpcyB0aGUgZF9j b3VudA0KY2hlY2tzLiBJdCByZWFsbHkgZG9lc24ndCBtYXR0ZXIgd2hldGhlciBvciBub3QgdGhl IGNsaWVudCB0aGlua3MgaXQgaGFzDQp1c2VycyBmb3IgYSBkaXJlY3RvcnkgaWYgdGhlIHNlcnZl ciBpcyB0ZWxsaW5nIHlvdSB0aGF0IGl0IGlzIEVTVEFMRS4gU28NCndlIGZvcmNlIGEgZF9kcm9w IHRvIHByZXZlbnQgZnVydGhlciBsb29rdXBzIGZyb20gZmluZGluZyBpdC4NCg0KSU9XOiBJdCBp cyB0aGVyZSBpbiBvcmRlciB0byBmaXggdGhlIGNhc2Ugd2hlcmUgdGhlIHVzZXIgZG9lcw0KJ3Jt ZGlyKCJmb28iKTsgbWtkaXIoImZvbyIpJyBvbiB0aGUgc2VydmVyLg0KDQoNCi0tIA0KVHJvbmQg TXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5N eWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg== ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 13:58 ` Myklebust, Trond @ 2012-12-01 21:40 ` Al Viro 0 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2012-12-01 21:40 UTC (permalink / raw) To: Myklebust, Trond Cc: Patrick McLean, Patrick McLean, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org On Fri, Nov 30, 2012 at 01:58:18PM +0000, Myklebust, Trond wrote: > The reason for the choice of d_drop over d_invalidate() is the d_count > checks. It really doesn't matter whether or not the client thinks it has > users for a directory if the server is telling you that it is ESTALE. So > we force a d_drop to prevent further lookups from finding it. > > IOW: It is there in order to fix the case where the user does > 'rmdir("foo"); mkdir("foo")' on the server. You do realize that your have_submounts() check in there is inherently racy, right? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Regression with initramfs and nfsroot (appears to be in the dcache) 2012-11-30 2:00 ` Al Viro 2012-11-30 2:33 ` Patrick McLean 2012-11-30 13:58 ` Myklebust, Trond @ 2012-12-01 2:18 ` Simon Kirby 2 siblings, 0 replies; 12+ messages in thread From: Simon Kirby @ 2012-12-01 2:18 UTC (permalink / raw) To: Al Viro Cc: Patrick McLean, Patrick McLean, linux-fsdevel, linux-kernel, Trond Myklebust, linux-nfs On Fri, Nov 30, 2012 at 02:00:48AM +0000, Al Viro wrote: > OK, that settles it. WARN_ON() and printks in the area can be dropped; > the right fix is below. However, there's a similar place in cifs that > also needs to be dealt with and I really, really wonder why the hell do > we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this > bug, but I would like to understand what's wrong with simply returning > 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care > of unhashing, etc. itself. Would make have_submounts() in there pointless > as well - we could just return 0 and let d_invalidate() take care of the > checks... Trond? > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) > nfs_refresh_inode(dentry->d_inode, entry->fattr); > goto out; > } else { > - d_drop(dentry); > + if (d_invalidate(dentry) != 0) > + goto out; > dput(dentry); > } > } Hello, With your previous patch (with the WARN_ON), I hit the WARN_ON() in the test case described here: https://patchwork.kernel.org/patch/1446851/ . The __d_move()ing mountpoint case no longer hits, and there is no longer an EBUSY, so this seems to work for me (in 3.6, where it broke). Simon- ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-12-01 21:40 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAE2NFgU+4QDcLaVE5dKDvZCxey7qEf7FAbYNjNmcjVU=HLhF=A@mail.gmail.com>
[not found] ` <20121129213316.GU4939@ZenIV.linux.org.uk>
[not found] ` <CAE2NFgWCnmgSCsw7MzRkqBLTHgRDG-FvKKcbTHN6UqePK6Ef+Q@mail.gmail.com>
[not found] ` <20121129222109.GW4939@ZenIV.linux.org.uk>
[not found] ` <50B7E759.9070007@gaikai.com>
2012-11-29 23:43 ` Regression with initramfs and nfsroot (appears to be in the dcache) Al Viro
2012-11-30 0:19 ` Patrick McLean
2012-11-30 0:35 ` Al Viro
2012-11-30 0:57 ` Patrick McLean
2012-11-30 1:36 ` Al Viro
2012-11-30 1:54 ` Patrick McLean
2012-11-30 2:00 ` Al Viro
2012-11-30 2:33 ` Patrick McLean
2012-11-30 4:11 ` Al Viro
2012-11-30 13:58 ` Myklebust, Trond
2012-12-01 21:40 ` Al Viro
2012-12-01 2:18 ` Simon Kirby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox