* [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad
@ 2010-11-26 12:05 Guennadi Liakhovetski
2010-11-26 18:05 ` Trond Myklebust
0 siblings, 1 reply; 8+ messages in thread
From: Guennadi Liakhovetski @ 2010-11-26 12:05 UTC (permalink / raw)
To: linux-nfs; +Cc: J. Bruce Fields, Neil Brown, Bryan Schumaker, rees, sim
Hi all
I've bisected the problem, reported several times before:
http://www.spinics.net/lists/linux-nfs/msg17208.html
http://www.spinics.net/lists/linux-nfs/msg17298.html
(authors cc'ed) and also causing reproducibly problems on my sh7724 SuperH
and sh7372 ARM Debian systems. Commit
commit d1bacf9eb2fd0e7ef870acf84b9e3b157dcfa7dc
Author: Bryan Schumaker <bjschuma@netapp.com>
Date: Fri Sep 24 14:48:42 2010 -0400
NFS: add readdir cache array
can be verified to be the culprit. Would be nice, if the other two
reporters could also verify this commit. Or is there already a fix
available?
Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad 2010-11-26 12:05 [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad Guennadi Liakhovetski @ 2010-11-26 18:05 ` Trond Myklebust 2010-11-26 18:34 ` Guennadi Liakhovetski 2010-11-27 0:25 ` Simon Kirby 0 siblings, 2 replies; 8+ messages in thread From: Trond Myklebust @ 2010-11-26 18:05 UTC (permalink / raw) To: Guennadi Liakhovetski Cc: linux-nfs, J. Bruce Fields, Neil Brown, Bryan Schumaker, rees, sim On Fri, 2010-11-26 at 13:05 +0100, Guennadi Liakhovetski wrote: > Hi all > > I've bisected the problem, reported several times before: > > http://www.spinics.net/lists/linux-nfs/msg17208.html > http://www.spinics.net/lists/linux-nfs/msg17298.html > > (authors cc'ed) and also causing reproducibly problems on my sh7724 SuperH > and sh7372 ARM Debian systems. Commit > > commit d1bacf9eb2fd0e7ef870acf84b9e3b157dcfa7dc > Author: Bryan Schumaker <bjschuma@netapp.com> > Date: Fri Sep 24 14:48:42 2010 -0400 > > NFS: add readdir cache array > > can be verified to be the culprit. Would be nice, if the other two > reporters could also verify this commit. Or is there already a fix > available? > That patch removes readdirplus, and cannot therefore be responsible for the fileid changed error that is reported in the emails below (which does not occur when mounting with -onordirplus). It introduces a bunch of other bugs (most which have been fixed), but not that one. I've asked Simon for info about which NFS versions he is seeing this with. He has not replied so far, but if you are seeing the same bug, then I'd appreciate the same info. Does the fileid bug occur with NFSv3 and NFSv4 or is it limited to one or the other? Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad 2010-11-26 18:05 ` Trond Myklebust @ 2010-11-26 18:34 ` Guennadi Liakhovetski 2010-11-27 1:41 ` Simon Kirby 2010-11-27 0:25 ` Simon Kirby 1 sibling, 1 reply; 8+ messages in thread From: Guennadi Liakhovetski @ 2010-11-26 18:34 UTC (permalink / raw) To: Trond Myklebust Cc: linux-nfs, J. Bruce Fields, Neil Brown, Bryan Schumaker, rees, sim Hi Trond On Fri, 26 Nov 2010, Trond Myklebust wrote: > > On Fri, 2010-11-26 at 13:05 +0100, Guennadi Liakhovetski wrote: > > Hi all > > > > I've bisected the problem, reported several times before: > > > > http://www.spinics.net/lists/linux-nfs/msg17208.html > > http://www.spinics.net/lists/linux-nfs/msg17298.html > > > > (authors cc'ed) and also causing reproducibly problems on my sh7724 SuperH > > and sh7372 ARM Debian systems. Commit > > > > commit d1bacf9eb2fd0e7ef870acf84b9e3b157dcfa7dc > > Author: Bryan Schumaker <bjschuma@netapp.com> > > Date: Fri Sep 24 14:48:42 2010 -0400 > > > > NFS: add readdir cache array > > > > can be verified to be the culprit. Would be nice, if the other two > > reporters could also verify this commit. Or is there already a fix > > available? > > > > That patch removes readdirplus, and cannot therefore be responsible for > the fileid changed error that is reported in the emails below (which > does not occur when mounting with -onordirplus). It introduces a bunch > of other bugs (most which have been fixed), but not that one. > > I've asked Simon for info about which NFS versions he is seeing this > with. He has not replied so far, but if you are seeing the same bug, > then I'd appreciate the same info. > Does the fileid bug occur with NFSv3 and NFSv4 or is it limited to one > or the other? v3 here. As for errors - as I bisected, I didn't specifically watch out for the "fileid" bug. There are a couple of warnings appearing, of which "fileid" is just one. It is quite possible, that as I've found this commit, the actual bug(s) that it introduces are different ones. For me it is just "NFS works before this commit" and "NFS stops working reliably after it." Symptoms vary indeed. Apart from "fileid" I'm also getting warnings like nfs_update_inode: inode 297450 mode changed, 0100005 to 0120777 Sometimes also there are no warnings, the action, currently in progress (like apt-get or ldconfig) just hangs forever, consuming CPU and thrushing the network. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad 2010-11-26 18:34 ` Guennadi Liakhovetski @ 2010-11-27 1:41 ` Simon Kirby 0 siblings, 0 replies; 8+ messages in thread From: Simon Kirby @ 2010-11-27 1:41 UTC (permalink / raw) To: Guennadi Liakhovetski Cc: Trond Myklebust, linux-nfs, J. Bruce Fields, Neil Brown, Bryan Schumaker, rees On Fri, Nov 26, 2010 at 07:34:10PM +0100, Guennadi Liakhovetski wrote: > On Fri, 26 Nov 2010, Trond Myklebust wrote: > > > On Fri, 2010-11-26 at 13:05 +0100, Guennadi Liakhovetski wrote: > > > Hi all > > > > > > I've bisected the problem, reported several times before: > > > > > > http://www.spinics.net/lists/linux-nfs/msg17208.html > > > http://www.spinics.net/lists/linux-nfs/msg17298.html > > > > > > (authors cc'ed) and also causing reproducibly problems on my sh7724 SuperH > > > and sh7372 ARM Debian systems. Commit > > > > > > commit d1bacf9eb2fd0e7ef870acf84b9e3b157dcfa7dc > > > Author: Bryan Schumaker <bjschuma@netapp.com> > > > Date: Fri Sep 24 14:48:42 2010 -0400 > > > > > > NFS: add readdir cache array > > > > > > can be verified to be the culprit. Would be nice, if the other two > > > reporters could also verify this commit. Or is there already a fix > > > available? > > > > > > > That patch removes readdirplus, and cannot therefore be responsible for > > the fileid changed error that is reported in the emails below (which > > does not occur when mounting with -onordirplus). It introduces a bunch > > of other bugs (most which have been fixed), but not that one. > > > > I've asked Simon for info about which NFS versions he is seeing this > > with. He has not replied so far, but if you are seeing the same bug, > > then I'd appreciate the same info. > > Does the fileid bug occur with NFSv3 and NFSv4 or is it limited to one > > or the other? > > v3 here. As for errors - as I bisected, I didn't specifically watch out > for the "fileid" bug. There are a couple of warnings appearing, of which > "fileid" is just one. It is quite possible, that as I've found this > commit, the actual bug(s) that it introduces are different ones. For me it > is just "NFS works before this commit" and "NFS stops working reliably > after it." Symptoms vary indeed. Apart from "fileid" I'm also getting > warnings like > > nfs_update_inode: inode 297450 mode changed, 0100005 to 0120777 > > Sometimes also there are no warnings, the action, currently in progress > (like apt-get or ldconfig) just hangs forever, consuming CPU and thrushing > the network. My report was the first one, but I'm actually seeing what seems to be an improvement on 2.6.37-rc3 for that issue. That post was complaining about NFS getting stuck on 2.6.35 and 2.6.36. However, I did report problems similar to the second post ("fileid changed"), and "nordirplus" made them stop. I am not sure if these two problems are the same thing, since I've also seen an issue on 2.6.36 that might be related to the hang issue ("flush" processes take all of the CPU, NFS stuck, eventually recovers). Do you see problems at all on 2.6.37 with the nordirplus mount option? It sounds like your are using root on NFS, and some apt-get upgrade/ldconfig type command is preproducing the issue? Any guesses on how to reproduce it with a simple testcase? :) Simon- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad 2010-11-26 18:05 ` Trond Myklebust 2010-11-26 18:34 ` Guennadi Liakhovetski @ 2010-11-27 0:25 ` Simon Kirby 2010-11-27 10:27 ` Simon Kirby 1 sibling, 1 reply; 8+ messages in thread From: Simon Kirby @ 2010-11-27 0:25 UTC (permalink / raw) To: Trond Myklebust Cc: Guennadi Liakhovetski, linux-nfs, J. Bruce Fields, Neil Brown, Bryan Schumaker, rees On Fri, Nov 26, 2010 at 01:05:26PM -0500, Trond Myklebust wrote: > On Fri, 2010-11-26 at 13:05 +0100, Guennadi Liakhovetski wrote: > > Hi all > > > > I've bisected the problem, reported several times before: > > > > http://www.spinics.net/lists/linux-nfs/msg17208.html > > http://www.spinics.net/lists/linux-nfs/msg17298.html > > > > (authors cc'ed) and also causing reproducibly problems on my sh7724 SuperH > > and sh7372 ARM Debian systems. Commit > > > > commit d1bacf9eb2fd0e7ef870acf84b9e3b157dcfa7dc > > Author: Bryan Schumaker <bjschuma@netapp.com> > > Date: Fri Sep 24 14:48:42 2010 -0400 > > > > NFS: add readdir cache array > > > > can be verified to be the culprit. Would be nice, if the other two > > reporters could also verify this commit. Or is there already a fix > > available? > > > > That patch removes readdirplus, and cannot therefore be responsible for > the fileid changed error that is reported in the emails below (which > does not occur when mounting with -onordirplus). It introduces a bunch > of other bugs (most which have been fixed), but not that one. > > I've asked Simon for info about which NFS versions he is seeing this > with. He has not replied so far, but if you are seeing the same bug, > then I'd appreciate the same info. > Does the fileid bug occur with NFSv3 and NFSv4 or is it limited to one > or the other? Sorry, it's NFSv3. We still need to fix the ID mapper's ability to work with libnss-mysql-bg before we can try NFSv4. I went trying to track down the inodes on the server, but didn't get very far. Would this still be helpful? Some of the file handles do seem to have recurred in the errors: # zfgrep -i 'expected fileid' kern.log kern.log.0 kern.log.?.gz | cut -f6- -d' ' | tail [62767.492630] fsid 0:51: expected fileid 0x8dbd93c3, got 0x8dbd93aa [62767.492777] fsid 0:51: expected fileid 0x8dbd93c4, got 0x8dbee995 [62767.492925] fsid 0:51: expected fileid 0x8dbd93c5, got 0x8db992b0 [62767.493074] fsid 0:51: expected fileid 0x8dbd93c6, got 0x8db992b3 [62767.493221] fsid 0:51: expected fileid 0x8dbd93c7, got 0x8db992c2 [62767.493370] fsid 0:51: expected fileid 0x8dbd93c8, got 0x8dbee99e [62767.493518] fsid 0:51: expected fileid 0x8dbd93c9, got 0x8db992aa [62767.493666] fsid 0:51: expected fileid 0x8dbd93ca, got 0x8db992ae [62767.493818] fsid 0:51: expected fileid 0x8dbd93cb, got 0x8dbee996 [62768.125674] fsid 0:51: expected fileid 0x4e387db6, got 0x5dc463aa # zfgrep -i 'expected fileid' kern.log kern.log.0 kern.log.?.gz | wc -l 1387 # zfgrep -i 'expected fileid' kern.log kern.log.0 kern.log.?.gz | cut -f1 -d, | cut -f3 -dx | sort | uniq -d | wc -l 222 # zfgrep -i 'expected fileid' kern.log kern.log.0 kern.log.?.gz | cut -f1 -d, | cut -f3 -dx | sort | uniq -d | tail c27f7de0 c27f7de1 c2b3f216 c49cdbd9 c4dfde81 c7f6da82 c7f6da84 c7f6da85 c7f6da86 c7f6da87 # zfgrep -i 'expected fileid' kern.log kern.log.0 kern.log.?.gz | cut -f1 -d, | cut -f3 -dx | sort | uniq -c | sort -nr | head 34 4d388eb6 18 c49cdbd9 17 c7f6da82 13 c7f6da84 13 80bf4a5e 12 4f670322 12 4e3a515b 11 4d100339 10 4dcd298a 10 4dbfffa4 It looks like maybe a directory that is growing or shrinking or something, and corruption is happening on a boundary somewhere.. It definitely goes away wit "nordirplus". XFS is hosting the FSes on the server side, and all the ones I see here are just under 1 TB and thus not using the XFS inode64 option. I'll try to dig up those inodes. Simon- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad 2010-11-27 0:25 ` Simon Kirby @ 2010-11-27 10:27 ` Simon Kirby 2010-11-27 18:24 ` Trond Myklebust 0 siblings, 1 reply; 8+ messages in thread From: Simon Kirby @ 2010-11-27 10:27 UTC (permalink / raw) To: Trond Myklebust Cc: Guennadi Liakhovetski, linux-nfs, J. Bruce Fields, Neil Brown, Bryan Schumaker, rees On Fri, Nov 26, 2010 at 04:25:48PM -0800, Simon Kirby wrote: > On Fri, Nov 26, 2010 at 01:05:26PM -0500, Trond Myklebust wrote: > > > On Fri, 2010-11-26 at 13:05 +0100, Guennadi Liakhovetski wrote: > > > Hi all > > > > > > I've bisected the problem, reported several times before: > > > > > > http://www.spinics.net/lists/linux-nfs/msg17208.html > > > http://www.spinics.net/lists/linux-nfs/msg17298.html > > > > > > (authors cc'ed) and also causing reproducibly problems on my sh7724 SuperH > > > and sh7372 ARM Debian systems. Commit > > > > > > commit d1bacf9eb2fd0e7ef870acf84b9e3b157dcfa7dc > > > Author: Bryan Schumaker <bjschuma@netapp.com> > > > Date: Fri Sep 24 14:48:42 2010 -0400 > > > > > > NFS: add readdir cache array > > > > > > can be verified to be the culprit. Would be nice, if the other two > > > reporters could also verify this commit. Or is there already a fix > > > available? > > > > > > > That patch removes readdirplus, and cannot therefore be responsible for > > the fileid changed error that is reported in the emails below (which > > does not occur when mounting with -onordirplus). It introduces a bunch > > of other bugs (most which have been fixed), but not that one. > > > > I've asked Simon for info about which NFS versions he is seeing this > > with. He has not replied so far, but if you are seeing the same bug, > > then I'd appreciate the same info. > > Does the fileid bug occur with NFSv3 and NFSv4 or is it limited to one > > or the other? > > Sorry, it's NFSv3. We still need to fix the ID mapper's ability to work > with libnss-mysql-bg before we can try NFSv4. I went trying to track > down the inodes on the server, but didn't get very far. Would this still > be helpful? Ok, so I tracked them down, and they didn't seem to be particularly unusual, so I tried a not-particularly-unusual thing that I figured might work, and reproduced it: server: echo test > a client: ls -l server: echo test > b ; mv b a client: ls -l That's it. The kernel (2.6.37-rc3), on the final "ls -l", says: [12814.611197] NFS: server 10.10.52.228 error: fileid changed [12814.611200] fsid 0:3f: expected fileid 0x122efbf1, got 0x122efc15 "ls -li" shows the inode updated, so maybe this isn't even a bug? Simon- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad 2010-11-27 10:27 ` Simon Kirby @ 2010-11-27 18:24 ` Trond Myklebust 2010-11-30 8:30 ` Simon Kirby 0 siblings, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2010-11-27 18:24 UTC (permalink / raw) To: Simon Kirby Cc: Guennadi Liakhovetski, linux-nfs, J. Bruce Fields, Neil Brown, Bryan Schumaker, rees On Sat, 2010-11-27 at 02:27 -0800, Simon Kirby wrote: > Ok, so I tracked them down, and they didn't seem to be particularly > unusual, so I tried a not-particularly-unusual thing that I figured might > work, and reproduced it: > > server: echo test > a > client: ls -l > server: echo test > b ; mv b a > client: ls -l > > That's it. The kernel (2.6.37-rc3), on the final "ls -l", says: > > [12814.611197] NFS: server 10.10.52.228 error: fileid changed > [12814.611200] fsid 0:3f: expected fileid 0x122efbf1, got 0x122efc15 > > "ls -li" shows the inode updated, so maybe this isn't even a bug? Ah! I think I see it now, and yes it is a bug... Does the following patch fix it? Cheers Trond ------------------------------------------------------------------------------------- NFS: Fix a readdirplus bug From: Trond Myklebust <Trond.Myklebust@netapp.com> When comparing filehandles in the helper nfs_same_file(), we should not be using 'strncmp()': filehandles are not null terminated strings. Instead, we should just use the existing helper nfs_compare_fh(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- fs/nfs/dir.c | 6 +----- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 8ea4a41..f0a384e 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -395,13 +395,9 @@ int xdr_decode(nfs_readdir_descriptor_t *desc, struct nfs_entry *entry, struct x static int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry) { - struct nfs_inode *node; if (dentry->d_inode == NULL) goto different; - node = NFS_I(dentry->d_inode); - if (node->fh.size != entry->fh->size) - goto different; - if (strncmp(node->fh.data, entry->fh->data, node->fh.size) != 0) + if (nfs_compare_fh(entry->fh, NFS_FH(dentry->d_inode)) != 0) goto different; return 1; different: -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad 2010-11-27 18:24 ` Trond Myklebust @ 2010-11-30 8:30 ` Simon Kirby 0 siblings, 0 replies; 8+ messages in thread From: Simon Kirby @ 2010-11-30 8:30 UTC (permalink / raw) To: Trond Myklebust Cc: Guennadi Liakhovetski, linux-nfs, J. Bruce Fields, Neil Brown, Bryan Schumaker, rees On Sat, Nov 27, 2010 at 01:24:00PM -0500, Trond Myklebust wrote: > On Sat, 2010-11-27 at 02:27 -0800, Simon Kirby wrote: > > Ok, so I tracked them down, and they didn't seem to be particularly > > unusual, so I tried a not-particularly-unusual thing that I figured might > > work, and reproduced it: > > > > server: echo test > a > > client: ls -l > > server: echo test > b ; mv b a > > client: ls -l > > > > That's it. The kernel (2.6.37-rc3), on the final "ls -l", says: > > > > [12814.611197] NFS: server 10.10.52.228 error: fileid changed > > [12814.611200] fsid 0:3f: expected fileid 0x122efbf1, got 0x122efc15 > > > > "ls -li" shows the inode updated, so maybe this isn't even a bug? > > Ah! I think I see it now, and yes it is a bug... > > Does the following patch fix it? Indeed, it does! I verified that I can reproduce it on 2.6.37-rc3 without this patch, and I can no longer reproduce it with this patch. Thanks muchly! Tested-by: Simon Kirby <sim@hostway.ca> Simon- > Cheers > Trond > ------------------------------------------------------------------------------------- > NFS: Fix a readdirplus bug > From: Trond Myklebust <Trond.Myklebust@netapp.com> > > When comparing filehandles in the helper nfs_same_file(), we should not be > using 'strncmp()': filehandles are not null terminated strings. > > Instead, we should just use the existing helper nfs_compare_fh(). > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > --- > > fs/nfs/dir.c | 6 +----- > 1 files changed, 1 insertions(+), 5 deletions(-) > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index 8ea4a41..f0a384e 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -395,13 +395,9 @@ int xdr_decode(nfs_readdir_descriptor_t *desc, struct nfs_entry *entry, struct x > static > int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry) > { > - struct nfs_inode *node; > if (dentry->d_inode == NULL) > goto different; > - node = NFS_I(dentry->d_inode); > - if (node->fh.size != entry->fh->size) > - goto different; > - if (strncmp(node->fh.data, entry->fh->data, node->fh.size) != 0) > + if (nfs_compare_fh(entry->fh, NFS_FH(dentry->d_inode)) != 0) > goto different; > return 1; > different: > > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-11-30 8:30 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-26 12:05 [REGRESSION] git commit d1bacf9e "NFS: add readdir cache array" is bad Guennadi Liakhovetski 2010-11-26 18:05 ` Trond Myklebust 2010-11-26 18:34 ` Guennadi Liakhovetski 2010-11-27 1:41 ` Simon Kirby 2010-11-27 0:25 ` Simon Kirby 2010-11-27 10:27 ` Simon Kirby 2010-11-27 18:24 ` Trond Myklebust 2010-11-30 8:30 ` Simon Kirby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).