* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop [not found] <alpine.DEB.2.02.1107270952270.1451@p34.internal.lan> @ 2011-07-27 16:07 ` J. Bruce Fields 2011-07-27 16:28 ` Justin Piszcz 0 siblings, 1 reply; 32+ messages in thread From: J. Bruce Fields @ 2011-07-27 16:07 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-kernel, linux-nfs On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote: > Hi, > > Kernel 2.6.30 on client. > Kernel 2.6.28 on server. > > p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a > readdir loop. Please contact your server vendor. Offending cookie: 10272 What filesystem on the server are you exporting? > In the past I used NFS to push -> imagery -> NFS server. > Now I've flipped it so I am storing the images locally and viewing > them remotely, what causes this? Sorry, I don't understand what you mean. --b. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 16:07 ` 2.6.xx: NFS: directory motion/cam2 contains a readdir loop J. Bruce Fields @ 2011-07-27 16:28 ` Justin Piszcz 2011-07-27 16:40 ` Bryan Schumaker 2011-07-27 18:11 ` Christoph Hellwig 0 siblings, 2 replies; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 16:28 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-kernel, linux-nfs, xfs On Wed, 27 Jul 2011, J. Bruce Fields wrote: > On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote: >> Hi, >> >> Kernel 2.6.30 on client. >> Kernel 2.6.28 on server. >> >> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a >> readdir loop. Please contact your server vendor. Offending cookie: 10272 > > What filesystem on the server are you exporting? Hi, xfs. /dev/sda1 on / type xfs (rw,noatime) Nothing special, thoughts? Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 16:28 ` Justin Piszcz @ 2011-07-27 16:40 ` Bryan Schumaker 2011-07-27 17:00 ` Ruediger Meier 2011-07-27 17:15 ` Justin Piszcz 2011-07-27 18:11 ` Christoph Hellwig 1 sibling, 2 replies; 32+ messages in thread From: Bryan Schumaker @ 2011-07-27 16:40 UTC (permalink / raw) To: Justin Piszcz; +Cc: J. Bruce Fields, linux-kernel, linux-nfs, xfs On 07/27/2011 12:28 PM, Justin Piszcz wrote: > > > On Wed, 27 Jul 2011, J. Bruce Fields wrote: > >> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote: >>> Hi, >>> >>> Kernel 2.6.30 on client. >>> Kernel 2.6.28 on server. >>> >>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a >>> readdir loop. Please contact your server vendor. Offending cookie: 10272 >> >> What filesystem on the server are you exporting? > > Hi, > > xfs. > /dev/sda1 on / type xfs (rw,noatime) > > Nothing special, thoughts? Are there a lot of files in the directory you're exporting? It looks like cookie 10272 is mapped to multiple files. When the client tries to resume reading from this cookie, xfs will reply from the first matching file and cause the client to enter a loop. - Bryan > > Justin. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 16:40 ` Bryan Schumaker @ 2011-07-27 17:00 ` Ruediger Meier 2011-07-27 17:09 ` Bryan Schumaker ` (2 more replies) 2011-07-27 17:15 ` Justin Piszcz 1 sibling, 3 replies; 32+ messages in thread From: Ruediger Meier @ 2011-07-27 17:00 UTC (permalink / raw) To: Bryan Schumaker Cc: Justin Piszcz, J. Bruce Fields, linux-kernel, linux-nfs, xfs On Wednesday 27 July 2011, Bryan Schumaker wrote: > On 07/27/2011 12:28 PM, Justin Piszcz wrote: > > On Wed, 27 Jul 2011, J. Bruce Fields wrote: > >> > >> What filesystem on the server are you exporting? > > > > xfs. > > /dev/sda1 on / type xfs (rw,noatime) > > > > Nothing special, thoughts? > > Are there a lot of files in the directory you're exporting? It looks > like cookie 10272 is mapped to multiple files. I thought xfs is immune to readdir loops!? Is your export directory really located directly within / on /dev/sda1? cu, Rudi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 17:00 ` Ruediger Meier @ 2011-07-27 17:09 ` Bryan Schumaker 2011-07-27 17:17 ` Justin Piszcz 2011-07-27 18:28 ` Bryan Schumaker 2 siblings, 0 replies; 32+ messages in thread From: Bryan Schumaker @ 2011-07-27 17:09 UTC (permalink / raw) To: Ruediger Meier Cc: Justin Piszcz, J. Bruce Fields, linux-kernel, linux-nfs, xfs On 07/27/2011 01:00 PM, Ruediger Meier wrote: > On Wednesday 27 July 2011, Bryan Schumaker wrote: >> On 07/27/2011 12:28 PM, Justin Piszcz wrote: >>> On Wed, 27 Jul 2011, J. Bruce Fields wrote: >>>> >>>> What filesystem on the server are you exporting? >>> >>> xfs. >>> /dev/sda1 on / type xfs (rw,noatime) >>> >>> Nothing special, thoughts? >> >> Are there a lot of files in the directory you're exporting? It looks >> like cookie 10272 is mapped to multiple files. > > I thought xfs is immune to readdir loops!? I guess that depends how it generates the cookie... I want to try out the ext4 patches that were posted earlier today. I'll double check xfs while I'm at it. - Bryan > Is your export directory really located directly within / on /dev/sda1? > > cu, > Rudi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 17:00 ` Ruediger Meier 2011-07-27 17:09 ` Bryan Schumaker @ 2011-07-27 17:17 ` Justin Piszcz 2011-07-27 17:45 ` J. Bruce Fields 2011-07-27 18:28 ` Bryan Schumaker 2 siblings, 1 reply; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 17:17 UTC (permalink / raw) To: Ruediger Meier Cc: Bryan Schumaker, J. Bruce Fields, linux-kernel, linux-nfs, xfs On Wed, 27 Jul 2011, Ruediger Meier wrote: > On Wednesday 27 July 2011, Bryan Schumaker wrote: >> On 07/27/2011 12:28 PM, Justin Piszcz wrote: >>> On Wed, 27 Jul 2011, J. Bruce Fields wrote: >>>> >>>> What filesystem on the server are you exporting? >>> >>> xfs. >>> /dev/sda1 on / type xfs (rw,noatime) >>> >>> Nothing special, thoughts? >> >> Are there a lot of files in the directory you're exporting? It looks >> like cookie 10272 is mapped to multiple files. > > I thought xfs is immune to readdir loops!? > Is your export directory really located directly within / on /dev/sda1? Hi, I was sharing out a directory on the NFS server: /d1 192.168.0.0/24(async,rw,no_root_squash,no_subtree_check,fsid=1) Should I share out / instead? Is this a known problem? $ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 30G 13G 18G 43% / tmpfs 2.0G 8.0K 2.0G 1% /lib/init/rw udev 10M 192K 9.9M 2% /dev tmpfs 2.0G 0 2.0G 0% /dev/shm $ Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 17:17 ` Justin Piszcz @ 2011-07-27 17:45 ` J. Bruce Fields 0 siblings, 0 replies; 32+ messages in thread From: J. Bruce Fields @ 2011-07-27 17:45 UTC (permalink / raw) To: Justin Piszcz Cc: Ruediger Meier, Bryan Schumaker, linux-kernel, linux-nfs, xfs On Wed, Jul 27, 2011 at 01:17:35PM -0400, Justin Piszcz wrote: > > > On Wed, 27 Jul 2011, Ruediger Meier wrote: > > >On Wednesday 27 July 2011, Bryan Schumaker wrote: > >>On 07/27/2011 12:28 PM, Justin Piszcz wrote: > >>>On Wed, 27 Jul 2011, J. Bruce Fields wrote: > >>>> > >>>>What filesystem on the server are you exporting? > >>> > >>>xfs. > >>>/dev/sda1 on / type xfs (rw,noatime) > >>> > >>>Nothing special, thoughts? > >> > >>Are there a lot of files in the directory you're exporting? It looks > >>like cookie 10272 is mapped to multiple files. > > > >I thought xfs is immune to readdir loops!? > >Is your export directory really located directly within / on /dev/sda1? > > Hi, > > I was sharing out a directory on the NFS server: > /d1 192.168.0.0/24(async,rw,no_root_squash,no_subtree_check,fsid=1) > > Should I share out / instead? You can do that if you want, but note that anyone malicious on that network can get access to / by guessing filehandles. (Safer would be to mount a separate partition at /d1.) But in any case that's got nothing to do with readdir cookie problems. --b. > Is this a known problem? > > $ df -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 30G 13G 18G 43% / > tmpfs 2.0G 8.0K 2.0G 1% /lib/init/rw > udev 10M 192K 9.9M 2% /dev > tmpfs 2.0G 0 2.0G 0% /dev/shm > $ > > Justin. > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 17:00 ` Ruediger Meier 2011-07-27 17:09 ` Bryan Schumaker 2011-07-27 17:17 ` Justin Piszcz @ 2011-07-27 18:28 ` Bryan Schumaker 2 siblings, 0 replies; 32+ messages in thread From: Bryan Schumaker @ 2011-07-27 18:28 UTC (permalink / raw) To: Ruediger Meier Cc: Justin Piszcz, J. Bruce Fields, linux-kernel, linux-nfs, xfs On 07/27/2011 01:00 PM, Ruediger Meier wrote: > On Wednesday 27 July 2011, Bryan Schumaker wrote: >> On 07/27/2011 12:28 PM, Justin Piszcz wrote: >>> On Wed, 27 Jul 2011, J. Bruce Fields wrote: >>>> >>>> What filesystem on the server are you exporting? >>> >>> xfs. >>> /dev/sda1 on / type xfs (rw,noatime) >>> >>> Nothing special, thoughts? >> >> Are there a lot of files in the directory you're exporting? It looks >> like cookie 10272 is mapped to multiple files. > > I thought xfs is immune to readdir loops!? I can ls a directory with 500,000 files over nfs4. That's usually enough to cause the readdir loop in ext4, so I guess this is a different problem. > Is your export directory really located directly within / on /dev/sda1? > > cu, > Rudi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 16:40 ` Bryan Schumaker 2011-07-27 17:00 ` Ruediger Meier @ 2011-07-27 17:15 ` Justin Piszcz 1 sibling, 0 replies; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 17:15 UTC (permalink / raw) To: Bryan Schumaker; +Cc: J. Bruce Fields, linux-kernel, linux-nfs, xfs On Wed, 27 Jul 2011, Bryan Schumaker wrote: > On 07/27/2011 12:28 PM, Justin Piszcz wrote: >> >> >> On Wed, 27 Jul 2011, J. Bruce Fields wrote: >> >>> On Wed, Jul 27, 2011 at 09:54:09AM -0400, Justin Piszcz wrote: >>>> Hi, >>>> >>>> Kernel 2.6.30 on client. >>>> Kernel 2.6.28 on server. >>>> >>>> p34 kernel: [92223.918892] NFS: directory motion/cam2 contains a >>>> readdir loop. Please contact your server vendor. Offending cookie: 10272 >>> >>> What filesystem on the server are you exporting? >> >> Hi, >> >> xfs. >> /dev/sda1 on / type xfs (rw,noatime) >> >> Nothing special, thoughts? > > Are there a lot of files in the directory you're exporting? It looks like cookie 10272 is mapped to multiple files. When the client tries to resume reading from this cookie, xfs will reply from the first matching file and cause the client to enter a loop. Should I be using a different filesystem? user@atom:/d1$ cd /d1/motion/cam1 user@atom:/d1/motion/cam1$ ls|wc 5198 5198 140346 user@atom:/d1/motion/cam1$ Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 16:28 ` Justin Piszcz 2011-07-27 16:40 ` Bryan Schumaker @ 2011-07-27 18:11 ` Christoph Hellwig 2011-07-27 19:35 ` Justin Piszcz 1 sibling, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2011-07-27 18:11 UTC (permalink / raw) To: Justin Piszcz; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs [-- Attachment #1: Type: text/plain, Size: 222 bytes --] Justin, can you please run the attached test program on the affected directory on the server, and see if you see duplicates in the d_off colum. Unless you have privacy concerns I would also love to see the full output. [-- Attachment #2: getdents.c --] [-- Type: text/plain, Size: 1150 bytes --] #define _GNU_SOURCE #include <dirent.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/syscall.h> #define handle_error(msg) \ do { perror(msg); exit(EXIT_FAILURE); } while (0) struct linux_dirent64 { unsigned long long d_ino; long long d_off; unsigned short d_reclen; unsigned char d_type; char d_name[]; }; #define BUF_SIZE 131072 int main(int argc, char *argv[]) { int fd, nread; char buf[BUF_SIZE]; struct linux_dirent64 *d; int bpos; fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY); if (fd == -1) handle_error("open"); for (;;) { nread = syscall(SYS_getdents64, fd, buf, BUF_SIZE); if (nread == -1) handle_error("getdents"); if (nread == 0) break; printf("--------------- nread=%d ---------------\n", nread); printf("i-node# type d_reclen d_off d_name\n"); for (bpos = 0; bpos < nread;) { d = (struct linux_dirent64 *)(buf + bpos); printf("%16lld ", d->d_ino); printf("%4d %10lld %s\n", d->d_reclen, d->d_off, d->d_name); bpos += d->d_reclen; } } exit(EXIT_SUCCESS); } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 18:11 ` Christoph Hellwig @ 2011-07-27 19:35 ` Justin Piszcz 2011-07-27 19:39 ` Christoph Hellwig 0 siblings, 1 reply; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 19:35 UTC (permalink / raw) To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 27 Jul 2011, Christoph Hellwig wrote: > Justin, > > can you please run the attached test program on the affected directory > on the server, and see if you see duplicates in the d_off colum. Unless > you have privacy concerns I would also love to see the full output. > > Hi, Done: atom:/d1/motion/cam1# /root/getdents > /tmp/cam1-out.txt atom:/d1/motion/cam1# cd ../cam2 atom:/d1/motion/cam2# /root/getdents > /tmp/cam2-out.txt atom:/d1/motion/cam2# cd ../cam3 atom:/d1/motion/cam3# /root/getdents > /tmp/cam3-out.txt atom:/d1/motion/cam3# Files: http://home.comcast.net/~jpiszcz/20110727/cam1-out.txt http://home.comcast.net/~jpiszcz/20110727/cam2-out.txt http://home.comcast.net/~jpiszcz/20110727/cam3-out.txt Currently I do not see any dupes, however I have a script that moves images out of the directory once an hour: 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 I'll disable that for now and see if this recurs, if it does, I'll gather additional output and send it out, thanks. Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 19:35 ` Justin Piszcz @ 2011-07-27 19:39 ` Christoph Hellwig 2011-07-27 19:44 ` Justin Piszcz 0 siblings, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2011-07-27 19:39 UTC (permalink / raw) To: Justin Piszcz Cc: Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: > Currently I do not see any dupes, however I have a script that moves > images out of the directory once an hour: > 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 Do you keep adding files to the directory while you move files out? What's the rate of additions/removals to the directory? If we add files to the directory while removing others we could easily re-use the same offset for a different file. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 19:39 ` Christoph Hellwig @ 2011-07-27 19:44 ` Justin Piszcz 2011-07-27 19:47 ` Christoph Hellwig 0 siblings, 1 reply; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 19:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 27 Jul 2011, Christoph Hellwig wrote: > On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: >> Currently I do not see any dupes, however I have a script that moves >> images out of the directory once an hour: >> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 > > Do you keep adding files to the directory while you move files out? Yes, otherwise there are too many files in the directory and viewers, e.g., each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep it around 5,000 pictures or less. > What's the rate of additions/removals to the directory? Additions it depends, around 5,000 over a 12hr period, 416/hr, current: atom:/d1/motion# find cam1|wc 5215 5215 166853 atom:/d1/motion# find cam2|wc 5069 5069 162181 atom:/d1/motion# find cam3|wc 5594 5594 178981 atom:/d1/motion# > > If we add files to the directory while removing others we could easily > re-use the same offset for a different file. > Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 19:44 ` Justin Piszcz @ 2011-07-27 19:47 ` Christoph Hellwig 2011-07-27 19:54 ` Bryan Schumaker ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Christoph Hellwig @ 2011-07-27 19:47 UTC (permalink / raw) To: Justin Piszcz Cc: Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: > > > On Wed, 27 Jul 2011, Christoph Hellwig wrote: > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: > >>Currently I do not see any dupes, however I have a script that moves > >>images out of the directory once an hour: > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 > > > >Do you keep adding files to the directory while you move files out? > Yes, otherwise there are too many files in the directory and viewers, e.g., > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep > it around 5,000 pictures or less. > > >What's the rate of additions/removals to the directory? > Additions it depends, around 5,000 over a 12hr period, 416/hr, current: > > atom:/d1/motion# find cam1|wc > 5215 5215 166853 > atom:/d1/motion# find cam2|wc > 5069 5069 162181 > atom:/d1/motion# find cam3|wc > 5594 5594 178981 > atom:/d1/motion# This sounds a lot like xfs simply filling up the directory index slots of files that you just moved out with new files, and nfs falsely claiming that this is a problem. Any chance to figure out if the file you hit the printk with was one that got either recently added or moved when you hit it? (I can't follow the nfs code enough to check if it prints the first or second hit of the same cookie) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 19:47 ` Christoph Hellwig @ 2011-07-27 19:54 ` Bryan Schumaker 2011-07-27 20:02 ` Christoph Hellwig 2011-07-27 19:57 ` Justin Piszcz 2011-07-27 20:37 ` Trond Myklebust 2 siblings, 1 reply; 32+ messages in thread From: Bryan Schumaker @ 2011-07-27 19:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On 07/27/2011 03:47 PM, Christoph Hellwig wrote: > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: >> >> >> On Wed, 27 Jul 2011, Christoph Hellwig wrote: >> >>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: >>>> Currently I do not see any dupes, however I have a script that moves >>>> images out of the directory once an hour: >>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 >>> >>> Do you keep adding files to the directory while you move files out? >> Yes, otherwise there are too many files in the directory and viewers, e.g., >> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep >> it around 5,000 pictures or less. >> >>> What's the rate of additions/removals to the directory? >> Additions it depends, around 5,000 over a 12hr period, 416/hr, current: >> >> atom:/d1/motion# find cam1|wc >> 5215 5215 166853 >> atom:/d1/motion# find cam2|wc >> 5069 5069 162181 >> atom:/d1/motion# find cam3|wc >> 5594 5594 178981 >> atom:/d1/motion# > > This sounds a lot like xfs simply filling up the directory index slots > of files that you just moved out with new files, and nfs falsely > claiming that this is a problem. > > Any chance to figure out if the file you hit the printk with was one > that got either recently added or moved when you hit it? (I can't > follow the nfs code enough to check if it prints the first or second hit > of the same cookie) It should be printing on the second hit of a cookie. - Bryan > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 19:54 ` Bryan Schumaker @ 2011-07-27 20:02 ` Christoph Hellwig 2011-07-27 20:05 ` Christoph Hellwig 2011-07-27 20:26 ` Rüdiger Meier 0 siblings, 2 replies; 32+ messages in thread From: Christoph Hellwig @ 2011-07-27 20:02 UTC (permalink / raw) To: Bryan Schumaker Cc: Christoph Hellwig, Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, Jul 27, 2011 at 03:54:49PM -0400, Bryan Schumaker wrote: > > Any chance to figure out if the file you hit the printk with was one > > that got either recently added or moved when you hit it? (I can't > > follow the nfs code enough to check if it prints the first or second hit > > of the same cookie) > > It should be printing on the second hit of a cookie. But looking closer at it it only prints the directory name and not that of any of the matching cookies, making it pretty useless to debug any problem. (and it makes my previous question to Justin look stupid..). But so far I still stick to my previous theory that this sounds like a directory offset getting reused. How is cache invalidation for the array supposed to work? And maybe more importantly, given that he can only reproduce it with a .38 client did any bugs get fixed in that code recently that might lead to issues with the cache invalidation? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 20:02 ` Christoph Hellwig @ 2011-07-27 20:05 ` Christoph Hellwig 2011-07-27 20:26 ` Rüdiger Meier 1 sibling, 0 replies; 32+ messages in thread From: Christoph Hellwig @ 2011-07-27 20:05 UTC (permalink / raw) To: Bryan Schumaker Cc: Christoph Hellwig, Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, Jul 27, 2011 at 04:02:40PM -0400, Christoph Hellwig wrote: > But looking closer at it it only prints the directory name and not that > of any of the matching cookies, making it pretty useless to debug any > problem. (and it makes my previous question to Justin look stupid..). > > > But so far I still stick to my previous theory that this sounds like > a directory offset getting reused. How is cache invalidation for > the array supposed to work? And maybe more importantly, given that he > can only reproduce it with a .38 client did any bugs get fixed in that > code recently that might lead to issues with the cache invalidation? Actually we won't even need cache invalidation bugs, see nfsd_buffered_readdir() - we might do multiple vfs_readdir calls to fill a single nfs reply, and between these two directory contents might have been completely replaced, in the worst (pathological case) you might get a second readdir having exactly the same offsets, but pointing to completely different inodes. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 20:02 ` Christoph Hellwig 2011-07-27 20:05 ` Christoph Hellwig @ 2011-07-27 20:26 ` Rüdiger Meier 2011-07-27 20:47 ` Christoph Hellwig 1 sibling, 1 reply; 32+ messages in thread From: Rüdiger Meier @ 2011-07-27 20:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wednesday 27 July 2011, Christoph Hellwig wrote: > On Wed, Jul 27, 2011 at 03:54:49PM -0400, Bryan Schumaker wrote: > > It should be printing on the second hit of a cookie. > > But looking closer at it it only prints the directory name and not > that of any of the matching cookies, making it pretty useless to > debug any problem. (and it makes my previous question to Justin look > stupid..). > > > But so far I still stick to my previous theory that this sounds like > a directory offset getting reused. How is cache invalidation for > the array supposed to work? And maybe more importantly, given that > he can only reproduce it with a .38 client did any bugs get fixed in > that code recently that might lead to issues with the cache > invalidation? At the time I've started this thread http://comments.gmane.org/gmane.linux.nfs/40863 I had the feeling that the readdir cache changings in 2.6.37 have something to do with these loop problems. After that thread I've accepted that's a general problem with ext4/dirindex and nfs but seeing it again on xfs with just 5000 files I'm in doubt again. cu, Rudi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 20:26 ` Rüdiger Meier @ 2011-07-27 20:47 ` Christoph Hellwig 2011-07-27 21:21 ` Rüdiger Meier 0 siblings, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2011-07-27 20:47 UTC (permalink / raw) To: R?diger Meier Cc: Christoph Hellwig, Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, Jul 27, 2011 at 10:26:55PM +0200, R?diger Meier wrote: > At the time I've started this thread > http://comments.gmane.org/gmane.linux.nfs/40863 > I had the feeling that the readdir cache changings in 2.6.37 have > something to do with these loop problems. > > After that thread I've accepted that's a general problem with > ext4/dirindex and nfs but seeing it again on xfs with just 5000 files > I'm in doubt again. Two separate issues. For one thing the nfs code simply doesn't seem to handle changing directories very well, and one and a half the Linux NFS server might even send incoherent readdir output in a single protocol reply. Issue two is that the ext3/4 hashed directory format is too simply (not to say dumb) to provide a proper 32-bit linear value for the dirent d_off field. It's not a complex task, and the first relatively simple generation of xfs btree directories couldn't handle it either. The v2 directory format handles it fine, but at the cost of a much more complex codebase. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 20:47 ` Christoph Hellwig @ 2011-07-27 21:21 ` Rüdiger Meier 0 siblings, 0 replies; 32+ messages in thread From: Rüdiger Meier @ 2011-07-27 21:21 UTC (permalink / raw) To: Christoph Hellwig Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wednesday 27 July 2011, Christoph Hellwig wrote: > On Wed, Jul 27, 2011 at 10:26:55PM +0200, R?diger Meier wrote: > > At the time I've started this thread > > http://comments.gmane.org/gmane.linux.nfs/40863 > > I had the feeling that the readdir cache changings in 2.6.37 have > > something to do with these loop problems. > > > > After that thread I've accepted that's a general problem with > > ext4/dirindex and nfs but seeing it again on xfs with just 5000 > > files I'm in doubt again. > > Two separate issues. [...] Yup, I didn't wanted to say that I'm in doubt about the general ext4/dirindex problem but I'am still in doubt about the complete innocence of readdir cache. I guess I've run into both issues at that time. I remember that I couldn't easily create such "broken" dir from scratch but my users managed it to have dozens of them, often just about 30000 files. Somehow it seemed to be important that the dirs were growing in a natural way. However no probs again since with xfs and ext4 without dirindex. But still the feeling that upgrading to 2.6.37 was also a part of the problem. cu, Rudi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 19:47 ` Christoph Hellwig 2011-07-27 19:54 ` Bryan Schumaker @ 2011-07-27 19:57 ` Justin Piszcz 2011-07-27 20:37 ` Trond Myklebust 2 siblings, 0 replies; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 19:57 UTC (permalink / raw) To: Christoph Hellwig; +Cc: J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 27 Jul 2011, Christoph Hellwig wrote: > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: >> >> >> On Wed, 27 Jul 2011, Christoph Hellwig wrote: >> >>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: >>>> Currently I do not see any dupes, however I have a script that moves >>>> images out of the directory once an hour: >>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 >>> >>> Do you keep adding files to the directory while you move files out? >> Yes, otherwise there are too many files in the directory and viewers, e.g., >> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep >> it around 5,000 pictures or less. >> >>> What's the rate of additions/removals to the directory? >> Additions it depends, around 5,000 over a 12hr period, 416/hr, current: >> >> atom:/d1/motion# find cam1|wc >> 5215 5215 166853 >> atom:/d1/motion# find cam2|wc >> 5069 5069 162181 >> atom:/d1/motion# find cam3|wc >> 5594 5594 178981 >> atom:/d1/motion# > > This sounds a lot like xfs simply filling up the directory index slots > of files that you just moved out with new files, and nfs falsely > claiming that this is a problem. > > Any chance to figure out if the file you hit the printk with was one > that got either recently added or moved when you hit it? (I can't > follow the nfs code enough to check if it prints the first or second hit > of the same cookie) > It seems to happen across all directories, these are from the past 24 hours. [41901.041923] NFS: directory motion/cam2 contains a readdir loop. Please contact your server vendor. Offending cookie: 14368 [41901.275284] NFS: directory motion/cam3 contains a readdir loop. Please contact your server vendor. Offending cookie: 17435 [45497.265250] NFS: directory motion/cam1 contains a readdir loop. Please contact your server vendor. Offending cookie: 14488 [45498.832696] NFS: directory motion/cam1 contains a readdir loop. Please contact your server vendor. Offending cookie: 16416 [45507.812712] NFS: directory motion/cam2 contains a readdir loop. Please contact your server vendor. Offending cookie: 14778 [45508.458785] NFS: directory motion/cam2 contains a readdir loop. Please contact your server vendor. Offending cookie: 14778 [92223.918892] NFS: directory motion/cam2 contains a readdir loop. Please contact your server vendor. Offending cookie: 10272 [99413.259688] NFS: directory motion/cam1 contains a readdir loop. Please contact your server vendor. Offending cookie: 10272 [113791.004006] NFS: directory motion/cam1 contains a readdir loop. Please contact your server vendor. Offending cookie: 6848 Interestingly, I have two machines that perform this function, both XFS and it only affects the client running 2.6.38: $ df -h 2.6.38 - Has a kernel driver that was removed in 2.6.39 (rt2870sta) which works really well. atomw:/d1 30G 13G 18G 43% /nfs/atomw/d1 2.6.39: d630w:/d1 75G 2.6G 72G 4% /nfs/d630w/d1 However, to rule out any kernel issues I'll try 3.0 and see if the problem recurs with a newer version as it is _NOT_ happening with 2.6.39 (similar setup) on both; however: d630 => 32bit installation (core2duo t7500) atomw => 64-bit atom Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 19:47 ` Christoph Hellwig 2011-07-27 19:54 ` Bryan Schumaker 2011-07-27 19:57 ` Justin Piszcz @ 2011-07-27 20:37 ` Trond Myklebust 2011-07-27 20:54 ` Trond Myklebust 2 siblings, 1 reply; 32+ messages in thread From: Trond Myklebust @ 2011-07-27 20:37 UTC (permalink / raw) To: Christoph Hellwig, Bryan Schumaker Cc: Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: > > > > > > On Wed, 27 Jul 2011, Christoph Hellwig wrote: > > > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: > > >>Currently I do not see any dupes, however I have a script that moves > > >>images out of the directory once an hour: > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 > > > > > >Do you keep adding files to the directory while you move files out? > > Yes, otherwise there are too many files in the directory and viewers, e.g., > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep > > it around 5,000 pictures or less. > > > > >What's the rate of additions/removals to the directory? > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current: > > > > atom:/d1/motion# find cam1|wc > > 5215 5215 166853 > > atom:/d1/motion# find cam2|wc > > 5069 5069 162181 > > atom:/d1/motion# find cam3|wc > > 5594 5594 178981 > > atom:/d1/motion# > > This sounds a lot like xfs simply filling up the directory index slots > of files that you just moved out with new files, and nfs falsely > claiming that this is a problem. Yep. There is an existing bugzilla report for this bug at https://bugzilla.kernel.org/show_bug.cgi?id=38572 I have a preliminary patch there that attempts to turn off the loop detection when the directory is seen to change, however that patch still appears to have a bug in it, and I haven't had time to figure out what is wrong yet. Can you perhaps take a look, Bryan? Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 20:37 ` Trond Myklebust @ 2011-07-27 20:54 ` Trond Myklebust 2011-07-27 20:56 ` Trond Myklebust 0 siblings, 1 reply; 32+ messages in thread From: Trond Myklebust @ 2011-07-27 20:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: > On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: > > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: > > > > > > > > > On Wed, 27 Jul 2011, Christoph Hellwig wrote: > > > > > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: > > > >>Currently I do not see any dupes, however I have a script that moves > > > >>images out of the directory once an hour: > > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 > > > > > > > >Do you keep adding files to the directory while you move files out? > > > Yes, otherwise there are too many files in the directory and viewers, e.g., > > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep > > > it around 5,000 pictures or less. > > > > > > >What's the rate of additions/removals to the directory? > > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current: > > > > > > atom:/d1/motion# find cam1|wc > > > 5215 5215 166853 > > > atom:/d1/motion# find cam2|wc > > > 5069 5069 162181 > > > atom:/d1/motion# find cam3|wc > > > 5594 5594 178981 > > > atom:/d1/motion# > > > > This sounds a lot like xfs simply filling up the directory index slots > > of files that you just moved out with new files, and nfs falsely > > claiming that this is a problem. > > Yep. There is an existing bugzilla report for this bug at > > https://bugzilla.kernel.org/show_bug.cgi?id=38572 > > I have a preliminary patch there that attempts to turn off the loop > detection when the directory is seen to change, however that patch still > appears to have a bug in it, and I haven't had time to figure out what > is wrong yet. > > Can you perhaps take a look, Bryan? Actually, Justin, can you test the following slight variant on the patch in the bugzilla? 8<--------------------------------------------------------- ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 20:54 ` Trond Myklebust @ 2011-07-27 20:56 ` Trond Myklebust [not found] ` <1311800195.25645.45.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Trond Myklebust @ 2011-07-27 20:56 UTC (permalink / raw) To: Christoph Hellwig Cc: Bryan Schumaker, Justin Piszcz, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: > On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: > > On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: > > > On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: > > > > > > > > > > > > On Wed, 27 Jul 2011, Christoph Hellwig wrote: > > > > > > > > >On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: > > > > >>Currently I do not see any dupes, however I have a script that moves > > > > >>images out of the directory once an hour: > > > > >>0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 > > > > > > > > > >Do you keep adding files to the directory while you move files out? > > > > Yes, otherwise there are too many files in the directory and viewers, e.g., > > > > each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep > > > > it around 5,000 pictures or less. > > > > > > > > >What's the rate of additions/removals to the directory? > > > > Additions it depends, around 5,000 over a 12hr period, 416/hr, current: > > > > > > > > atom:/d1/motion# find cam1|wc > > > > 5215 5215 166853 > > > > atom:/d1/motion# find cam2|wc > > > > 5069 5069 162181 > > > > atom:/d1/motion# find cam3|wc > > > > 5594 5594 178981 > > > > atom:/d1/motion# > > > > > > This sounds a lot like xfs simply filling up the directory index slots > > > of files that you just moved out with new files, and nfs falsely > > > claiming that this is a problem. > > > > Yep. There is an existing bugzilla report for this bug at > > > > https://bugzilla.kernel.org/show_bug.cgi?id=38572 > > > > I have a preliminary patch there that attempts to turn off the loop > > detection when the directory is seen to change, however that patch still > > appears to have a bug in it, and I haven't had time to figure out what > > is wrong yet. > > > > Can you perhaps take a look, Bryan? > > Actually, Justin, can you test the following slight variant on the patch > in the bugzilla? Doh! This one will actually compile.... > 8<--------------------------------------------------------- ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <1311800195.25645.45.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>]
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop [not found] ` <1311800195.25645.45.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org> @ 2011-07-27 21:24 ` Justin Piszcz [not found] ` <alpine.DEB.2.02.1107271723500.25432-0qmrozcXWo8bm2hyYBkBBg@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 21:24 UTC (permalink / raw) To: Trond Myklebust Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 27 Jul 2011, Trond Myklebust wrote: > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: >>>>> >>>>> >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote: >>>>> >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: >>>>>>> Currently I do not see any dupes, however I have a script that moves >>>>>>> images out of the directory once an hour: >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 >>>>>> >>>>>> Do you keep adding files to the directory while you move files out? >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g., >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep >>>>> it around 5,000 pictures or less. >>>>> >>>>>> What's the rate of additions/removals to the directory? >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current: >>>>> >>>>> atom:/d1/motion# find cam1|wc >>>>> 5215 5215 166853 >>>>> atom:/d1/motion# find cam2|wc >>>>> 5069 5069 162181 >>>>> atom:/d1/motion# find cam3|wc >>>>> 5594 5594 178981 >>>>> atom:/d1/motion# >>>> >>>> This sounds a lot like xfs simply filling up the directory index slots >>>> of files that you just moved out with new files, and nfs falsely >>>> claiming that this is a problem. >>> >>> Yep. There is an existing bugzilla report for this bug at >>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=38572 >>> >>> I have a preliminary patch there that attempts to turn off the loop >>> detection when the directory is seen to change, however that patch still >>> appears to have a bug in it, and I haven't had time to figure out what >>> is wrong yet. >>> >>> Can you perhaps take a look, Bryan? >> >> Actually, Justin, can you test the following slight variant on the patch >> in the bugzilla? > > Doh! This one will actually compile.... Hi, Should I try 3.0 first or retry 2.6.38 w/ this patch? Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <alpine.DEB.2.02.1107271723500.25432-0qmrozcXWo8bm2hyYBkBBg@public.gmane.org>]
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop [not found] ` <alpine.DEB.2.02.1107271723500.25432-0qmrozcXWo8bm2hyYBkBBg@public.gmane.org> @ 2011-07-27 22:44 ` Justin Piszcz 2011-07-28 20:48 ` Trond Myklebust 0 siblings, 1 reply; 32+ messages in thread From: Justin Piszcz @ 2011-07-27 22:44 UTC (permalink / raw) To: Trond Myklebust Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 27 Jul 2011, Justin Piszcz wrote: > > > On Wed, 27 Jul 2011, Trond Myklebust wrote: > > > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: > >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: > >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: > >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: > >>>>> > >>>>> > >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote: > >>>>> > >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: > >>>>>>> Currently I do not see any dupes, however I have a script that moves > >>>>>>> images out of the directory once an hour: > >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 > >>>>>> > >>>>>> Do you keep adding files to the directory while you move files out? > >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g., > >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep > >>>>> it around 5,000 pictures or less. > >>>>> > >>>>>> What's the rate of additions/removals to the directory? > >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current: > >>>>> > >>>>> atom:/d1/motion# find cam1|wc > >>>>> 5215 5215 166853 > >>>>> atom:/d1/motion# find cam2|wc > >>>>> 5069 5069 162181 > >>>>> atom:/d1/motion# find cam3|wc > >>>>> 5594 5594 178981 > >>>>> atom:/d1/motion# > >>>> > >>>> This sounds a lot like xfs simply filling up the directory index slots > >>>> of files that you just moved out with new files, and nfs falsely > >>>> claiming that this is a problem. > >>> > >>> Yep. There is an existing bugzilla report for this bug at > >>> > >>> https://bugzilla.kernel.org/show_bug.cgi?id=38572 > >>> > >>> I have a preliminary patch there that attempts to turn off the loop > >>> detection when the directory is seen to change, however that patch still > >>> appears to have a bug in it, and I haven't had time to figure out what > >>> is wrong yet. > >>> > >>> Can you perhaps take a look, Bryan? > >> > >> Actually, Justin, can you test the following slight variant on the patch > >> in the bugzilla? > > > > Doh! This one will actually compile.... > > Hi, > > Should I try 3.0 first or retry 2.6.38 w/ this patch? > > Justin. > > I'll give 3.0 a go first. Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-27 22:44 ` Justin Piszcz @ 2011-07-28 20:48 ` Trond Myklebust 2011-07-29 20:52 ` Bryan Schumaker 0 siblings, 1 reply; 32+ messages in thread From: Trond Myklebust @ 2011-07-28 20:48 UTC (permalink / raw) To: Justin Piszcz Cc: Christoph Hellwig, Bryan Schumaker, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote: > > On Wed, 27 Jul 2011, Justin Piszcz wrote: > > > > > > > On Wed, 27 Jul 2011, Trond Myklebust wrote: > > > > > On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: > > >> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: > > >>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: > > >>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: > > >>>>> > > >>>>> > > >>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote: > > >>>>> > > >>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: > > >>>>>>> Currently I do not see any dupes, however I have a script that moves > > >>>>>>> images out of the directory once an hour: > > >>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 > > >>>>>> > > >>>>>> Do you keep adding files to the directory while you move files out? > > >>>>> Yes, otherwise there are too many files in the directory and viewers, e.g., > > >>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep > > >>>>> it around 5,000 pictures or less. > > >>>>> > > >>>>>> What's the rate of additions/removals to the directory? > > >>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current: > > >>>>> > > >>>>> atom:/d1/motion# find cam1|wc > > >>>>> 5215 5215 166853 > > >>>>> atom:/d1/motion# find cam2|wc > > >>>>> 5069 5069 162181 > > >>>>> atom:/d1/motion# find cam3|wc > > >>>>> 5594 5594 178981 > > >>>>> atom:/d1/motion# > > >>>> > > >>>> This sounds a lot like xfs simply filling up the directory index slots > > >>>> of files that you just moved out with new files, and nfs falsely > > >>>> claiming that this is a problem. > > >>> > > >>> Yep. There is an existing bugzilla report for this bug at > > >>> > > >>> https://bugzilla.kernel.org/show_bug.cgi?id=38572 > > >>> > > >>> I have a preliminary patch there that attempts to turn off the loop > > >>> detection when the directory is seen to change, however that patch still > > >>> appears to have a bug in it, and I haven't had time to figure out what > > >>> is wrong yet. > > >>> > > >>> Can you perhaps take a look, Bryan? > > >> > > >> Actually, Justin, can you test the following slight variant on the patch > > >> in the bugzilla? > > > > > > Doh! This one will actually compile.... > > > > Hi, > > > > Should I try 3.0 first or retry 2.6.38 w/ this patch? > > > > Justin. > > > > > > I'll give 3.0 a go first. I had Bryan do some more tests, which revealed a couple more issues. The attached patch should fix those, and has resisted everything we've thrown at it so far. It should apply to 2.6.39 and newer. Cheers Trond 8<----------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-28 20:48 ` Trond Myklebust @ 2011-07-29 20:52 ` Bryan Schumaker 2011-07-29 20:59 ` Justin Piszcz 0 siblings, 1 reply; 32+ messages in thread From: Bryan Schumaker @ 2011-07-29 20:52 UTC (permalink / raw) To: Trond Myklebust Cc: Justin Piszcz, Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs On 07/28/2011 04:48 PM, Trond Myklebust wrote: > On Wed, 2011-07-27 at 18:44 -0400, Justin Piszcz wrote: >> >> On Wed, 27 Jul 2011, Justin Piszcz wrote: >> >>> >>> >>> On Wed, 27 Jul 2011, Trond Myklebust wrote: >>> >>>> On Wed, 2011-07-27 at 16:54 -0400, Trond Myklebust wrote: >>>>> On Wed, 2011-07-27 at 16:37 -0400, Trond Myklebust wrote: >>>>>> On Wed, 2011-07-27 at 15:47 -0400, Christoph Hellwig wrote: >>>>>>> On Wed, Jul 27, 2011 at 03:44:20PM -0400, Justin Piszcz wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 27 Jul 2011, Christoph Hellwig wrote: >>>>>>>> >>>>>>>>> On Wed, Jul 27, 2011 at 03:35:01PM -0400, Justin Piszcz wrote: >>>>>>>>>> Currently I do not see any dupes, however I have a script that moves >>>>>>>>>> images out of the directory once an hour: >>>>>>>>>> 0 * * * * /usr/local/bin/move_to_old2.sh > /dev/null 2>&1 >>>>>>>>> >>>>>>>>> Do you keep adding files to the directory while you move files out? >>>>>>>> Yes, otherwise there are too many files in the directory and viewers, e.g., >>>>>>>> each geeqie (picture viewer) will use > 4-6GB of memory, so I try to keep >>>>>>>> it around 5,000 pictures or less. >>>>>>>> >>>>>>>>> What's the rate of additions/removals to the directory? >>>>>>>> Additions it depends, around 5,000 over a 12hr period, 416/hr, current: >>>>>>>> >>>>>>>> atom:/d1/motion# find cam1|wc >>>>>>>> 5215 5215 166853 >>>>>>>> atom:/d1/motion# find cam2|wc >>>>>>>> 5069 5069 162181 >>>>>>>> atom:/d1/motion# find cam3|wc >>>>>>>> 5594 5594 178981 >>>>>>>> atom:/d1/motion# >>>>>>> >>>>>>> This sounds a lot like xfs simply filling up the directory index slots >>>>>>> of files that you just moved out with new files, and nfs falsely >>>>>>> claiming that this is a problem. >>>>>> >>>>>> Yep. There is an existing bugzilla report for this bug at >>>>>> >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=38572 >>>>>> >>>>>> I have a preliminary patch there that attempts to turn off the loop >>>>>> detection when the directory is seen to change, however that patch still >>>>>> appears to have a bug in it, and I haven't had time to figure out what >>>>>> is wrong yet. >>>>>> >>>>>> Can you perhaps take a look, Bryan? >>>>> >>>>> Actually, Justin, can you test the following slight variant on the patch >>>>> in the bugzilla? >>>> >>>> Doh! This one will actually compile.... >>> >>> Hi, >>> >>> Should I try 3.0 first or retry 2.6.38 w/ this patch? >>> >>> Justin. >>> >>> >> >> I'll give 3.0 a go first. > > I had Bryan do some more tests, which revealed a couple more issues. The > attached patch should fix those, and has resisted everything we've > thrown at it so far. It should apply to 2.6.39 and newer. This patch still looks good (after testing it a bit more today). How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday. - Bryan 8<----------------------------------------------------------------------- >From 4d74863dc2bcd4e603a873b3725f0a05afd21f1f Mon Sep 17 00:00:00 2001 From: Bryan Schumaker <bjschuma@netapp.com> Date: Fri, 29 Jul 2011 11:49:06 -0400 Subject: [PATCH] Additional readdir cookie loop information Print out the name of the file that triggers the cookie loop message to make it slightly easier to track down the cause. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> --- fs/nfs/dir.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index d23108b..b238d95 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -365,9 +365,10 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des if (printk_ratelimit()) { pr_notice("NFS: directory %s/%s contains a readdir loop." "Please contact your server vendor. " - "Offending cookie: %llu\n", + "The file: %s has duplicate cookie %llu\n", desc->file->f_dentry->d_parent->d_name.name, desc->file->f_dentry->d_name.name, + array->array[i].string.name, *desc->dir_cookie); } status = -ELOOP; -- 1.7.6 > > Cheers > Trond > 8<----------------------------------------------------------------------- > From 75c0387540737a6663338d4ec0538bd6fb724173 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <Trond.Myklebust@netapp.com> > Date: Thu, 28 Jul 2011 16:34:33 -0400 > Subject: [PATCH v3] NFS: Fix spurious readdir cookie loop messages > > If the directory contents change, then we have to accept that the > file->f_pos value may shrink if we do a 'search-by-cookie'. In that > case, we should turn off the loop detection and let the NFS client > try to recover. > > The patch also fixes a second loop detection bug by ensuring > that after turning on the ctx->duped flag, we read at least one new > cookie into ctx->dir_cookie before attempting to match with > ctx->dup_cookie. > > Reported-by: Petr Vandrovec <petr@vandrovec.name> > Cc: stable@kernel.org [2.6.39+] > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > --- > fs/nfs/dir.c | 56 ++++++++++++++++++++++++++++------------------- > include/linux/nfs_fs.h | 3 +- > 2 files changed, 35 insertions(+), 24 deletions(-) > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index 57f578e..d23108b 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -134,18 +134,19 @@ const struct inode_operations nfs4_dir_inode_operations = { > > #endif /* CONFIG_NFS_V4 */ > > -static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct rpc_cred *cred) > +static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir, struct rpc_cred *cred) > { > struct nfs_open_dir_context *ctx; > ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); > if (ctx != NULL) { > ctx->duped = 0; > + ctx->attr_gencount = NFS_I(dir)->attr_gencount; > ctx->dir_cookie = 0; > ctx->dup_cookie = 0; > ctx->cred = get_rpccred(cred); > - } else > - ctx = ERR_PTR(-ENOMEM); > - return ctx; > + return ctx; > + } > + return ERR_PTR(-ENOMEM); > } > > static void put_nfs_open_dir_context(struct nfs_open_dir_context *ctx) > @@ -173,7 +174,7 @@ nfs_opendir(struct inode *inode, struct file *filp) > cred = rpc_lookup_cred(); > if (IS_ERR(cred)) > return PTR_ERR(cred); > - ctx = alloc_nfs_open_dir_context(cred); > + ctx = alloc_nfs_open_dir_context(inode, cred); > if (IS_ERR(ctx)) { > res = PTR_ERR(ctx); > goto out; > @@ -323,7 +324,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri > { > loff_t diff = desc->file->f_pos - desc->current_index; > unsigned int index; > - struct nfs_open_dir_context *ctx = desc->file->private_data; > > if (diff < 0) > goto out_eof; > @@ -336,7 +336,6 @@ int nfs_readdir_search_for_pos(struct nfs_cache_array *array, nfs_readdir_descri > index = (unsigned int)diff; > *desc->dir_cookie = array->array[index].cookie; > desc->cache_entry_index = index; > - ctx->duped = 0; > return 0; > out_eof: > desc->eof = 1; > @@ -349,14 +348,33 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des > int i; > loff_t new_pos; > int status = -EAGAIN; > - struct nfs_open_dir_context *ctx = desc->file->private_data; > > for (i = 0; i < array->size; i++) { > if (array->array[i].cookie == *desc->dir_cookie) { > + struct nfs_inode *nfsi = NFS_I(desc->file->f_path.dentry->d_inode); > + struct nfs_open_dir_context *ctx = desc->file->private_data; > + > new_pos = desc->current_index + i; > - if (new_pos < desc->file->f_pos) { > + if (ctx->attr_gencount != nfsi->attr_gencount > + || (nfsi->cache_validity & (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA))) { > + ctx->duped = 0; > + ctx->attr_gencount = nfsi->attr_gencount; > + } else if (new_pos < desc->file->f_pos) { > + if (ctx->duped > 0 > + && ctx->dup_cookie == *desc->dir_cookie) { > + if (printk_ratelimit()) { > + pr_notice("NFS: directory %s/%s contains a readdir loop." > + "Please contact your server vendor. " > + "Offending cookie: %llu\n", > + desc->file->f_dentry->d_parent->d_name.name, > + desc->file->f_dentry->d_name.name, > + *desc->dir_cookie); > + } > + status = -ELOOP; > + goto out; > + } > ctx->dup_cookie = *desc->dir_cookie; > - ctx->duped = 1; > + ctx->duped = -1; > } > desc->file->f_pos = new_pos; > desc->cache_entry_index = i; > @@ -368,6 +386,7 @@ int nfs_readdir_search_for_cookie(struct nfs_cache_array *array, nfs_readdir_des > if (*desc->dir_cookie == array->last_cookie) > desc->eof = 1; > } > +out: > return status; > } > > @@ -740,19 +759,6 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent, > struct nfs_cache_array *array = NULL; > struct nfs_open_dir_context *ctx = file->private_data; > > - if (ctx->duped != 0 && ctx->dup_cookie == *desc->dir_cookie) { > - if (printk_ratelimit()) { > - pr_notice("NFS: directory %s/%s contains a readdir loop. " > - "Please contact your server vendor. " > - "Offending cookie: %llu\n", > - file->f_dentry->d_parent->d_name.name, > - file->f_dentry->d_name.name, > - *desc->dir_cookie); > - } > - res = -ELOOP; > - goto out; > - } > - > array = nfs_readdir_get_array(desc->page); > if (IS_ERR(array)) { > res = PTR_ERR(array); > @@ -774,6 +780,8 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc, void *dirent, > *desc->dir_cookie = array->array[i+1].cookie; > else > *desc->dir_cookie = array->last_cookie; > + if (ctx->duped != 0) > + ctx->duped = 1; > } > if (array->eof_index >= 0) > desc->eof = 1; > @@ -805,6 +813,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent, > struct page *page = NULL; > int status; > struct inode *inode = desc->file->f_path.dentry->d_inode; > + struct nfs_open_dir_context *ctx = desc->file->private_data; > > dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n", > (unsigned long long)*desc->dir_cookie); > @@ -818,6 +827,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc, void *dirent, > desc->page_index = 0; > desc->last_cookie = *desc->dir_cookie; > desc->page = page; > + ctx->duped = 0; > > status = nfs_readdir_xdr_to_array(desc, page, inode); > if (status < 0) > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > index 8b579be..b96fb99 100644 > --- a/include/linux/nfs_fs.h > +++ b/include/linux/nfs_fs.h > @@ -99,9 +99,10 @@ struct nfs_open_context { > > struct nfs_open_dir_context { > struct rpc_cred *cred; > + unsigned long attr_gencount; > __u64 dir_cookie; > __u64 dup_cookie; > - int duped; > + signed char duped; > }; > > /* ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-29 20:52 ` Bryan Schumaker @ 2011-07-29 20:59 ` Justin Piszcz 2011-07-29 22:03 ` Trond Myklebust 0 siblings, 1 reply; 32+ messages in thread From: Justin Piszcz @ 2011-07-29 20:59 UTC (permalink / raw) To: Bryan Schumaker Cc: Trond Myklebust, Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Fri, 29 Jul 2011, Bryan Schumaker wrote: > How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday. Hi, This fails against 2.6.38: patching file fs/nfs/dir.c Hunk #1 FAILED at 134. Hunk #2 FAILED at 173. Hunk #3 FAILED at 323. Hunk #4 FAILED at 336. Hunk #5 FAILED at 349. Hunk #6 succeeded at 320 (offset -48 lines). Hunk #7 FAILED at 741. Hunk #8 succeeded at 716 (offset -59 lines). Hunk #9 succeeded at 749 (offset -59 lines). Hunk #10 succeeded at 763 (offset -59 lines). 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej patching file include/linux/nfs_fs.h Hunk #1 FAILED at 99. 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej atom:/usr/src/linux# And the 3.0 kernel is broken for my wireless adapter: http://www.gossamer-threads.com/lists/linux/kernel/1411576 If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a horrible driver (rt2800usb) and 1 person emailed me as well stating the same thing off-list (they stick with the manufacturer's driver or the *sta one). Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-29 20:59 ` Justin Piszcz @ 2011-07-29 22:03 ` Trond Myklebust 2011-07-29 22:23 ` Justin Piszcz 0 siblings, 1 reply; 32+ messages in thread From: Trond Myklebust @ 2011-07-29 22:03 UTC (permalink / raw) To: Justin Piszcz Cc: Bryan Schumaker, Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote: > On Fri, 29 Jul 2011, Bryan Schumaker wrote: > > > How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday. > > > Hi, > > This fails against 2.6.38: > > patching file fs/nfs/dir.c > Hunk #1 FAILED at 134. > Hunk #2 FAILED at 173. > Hunk #3 FAILED at 323. > Hunk #4 FAILED at 336. > Hunk #5 FAILED at 349. > Hunk #6 succeeded at 320 (offset -48 lines). > Hunk #7 FAILED at 741. > Hunk #8 succeeded at 716 (offset -59 lines). > Hunk #9 succeeded at 749 (offset -59 lines). > Hunk #10 succeeded at 763 (offset -59 lines). > 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej > patching file include/linux/nfs_fs.h > Hunk #1 FAILED at 99. > 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej > atom:/usr/src/linux# > > And the 3.0 kernel is broken for my wireless adapter: > http://www.gossamer-threads.com/lists/linux/kernel/1411576 > > If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a > horrible driver (rt2800usb) and 1 person emailed me as well stating the > same thing off-list (they stick with the manufacturer's driver or the *sta > one). I don't understand. The readdir loop detection code was first merged upstream in 2.6.39. 2.6.38 doesn't report any loops... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-29 22:03 ` Trond Myklebust @ 2011-07-29 22:23 ` Justin Piszcz 2011-07-30 9:58 ` Justin Piszcz 0 siblings, 1 reply; 32+ messages in thread From: Justin Piszcz @ 2011-07-29 22:23 UTC (permalink / raw) To: Trond Myklebust Cc: Bryan Schumaker, Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Fri, 29 Jul 2011, Trond Myklebust wrote: > On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote: >> On Fri, 29 Jul 2011, Bryan Schumaker wrote: >> >>> How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday. >> >> >> Hi, >> >> This fails against 2.6.38: >> >> patching file fs/nfs/dir.c >> Hunk #1 FAILED at 134. >> Hunk #2 FAILED at 173. >> Hunk #3 FAILED at 323. >> Hunk #4 FAILED at 336. >> Hunk #5 FAILED at 349. >> Hunk #6 succeeded at 320 (offset -48 lines). >> Hunk #7 FAILED at 741. >> Hunk #8 succeeded at 716 (offset -59 lines). >> Hunk #9 succeeded at 749 (offset -59 lines). >> Hunk #10 succeeded at 763 (offset -59 lines). >> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej >> patching file include/linux/nfs_fs.h >> Hunk #1 FAILED at 99. >> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej >> atom:/usr/src/linux# >> >> And the 3.0 kernel is broken for my wireless adapter: >> http://www.gossamer-threads.com/lists/linux/kernel/1411576 >> >> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a >> horrible driver (rt2800usb) and 1 person emailed me as well stating the >> same thing off-list (they stick with the manufacturer's driver or the *sta >> one). > > I don't understand. The readdir loop detection code was first merged > upstream in 2.6.39. 2.6.38 doesn't report any loops... Hi, Sorry--(my error) this is meant for the client, patched & will e-mail when it happens again. # patch -p1 < /home/jpiszcz/patch1 patching file fs/nfs/dir.c patching file include/linux/nfs_fs.h # patch -p1 < /home/jpiszcz/patch2 patching file fs/nfs/dir.c (recompile->reboot->waiting for next error) Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: 2.6.xx: NFS: directory motion/cam2 contains a readdir loop 2011-07-29 22:23 ` Justin Piszcz @ 2011-07-30 9:58 ` Justin Piszcz 0 siblings, 0 replies; 32+ messages in thread From: Justin Piszcz @ 2011-07-30 9:58 UTC (permalink / raw) To: Trond Myklebust Cc: Bryan Schumaker, Christoph Hellwig, J. Bruce Fields, linux-nfs, linux-kernel, xfs On Fri, 29 Jul 2011, Justin Piszcz wrote: > > > On Fri, 29 Jul 2011, Trond Myklebust wrote: > > > On Fri, 2011-07-29 at 16:59 -0400, Justin Piszcz wrote: > >> On Fri, 29 Jul 2011, Bryan Schumaker wrote: > >> > >>> How does this look for printing out more information when a cookie loop is detected? Is there anything else that should be printed out? My patch applies on top of Trond's from yesterday. > >> > >> > >> Hi, > >> > >> This fails against 2.6.38: > >> > >> patching file fs/nfs/dir.c > >> Hunk #1 FAILED at 134. > >> Hunk #2 FAILED at 173. > >> Hunk #3 FAILED at 323. > >> Hunk #4 FAILED at 336. > >> Hunk #5 FAILED at 349. > >> Hunk #6 succeeded at 320 (offset -48 lines). > >> Hunk #7 FAILED at 741. > >> Hunk #8 succeeded at 716 (offset -59 lines). > >> Hunk #9 succeeded at 749 (offset -59 lines). > >> Hunk #10 succeeded at 763 (offset -59 lines). > >> 6 out of 10 hunks FAILED -- saving rejects to file fs/nfs/dir.c.rej > >> patching file include/linux/nfs_fs.h > >> Hunk #1 FAILED at 99. > >> 1 out of 1 hunk FAILED -- saving rejects to file include/linux/nfs_fs.h.rej > >> atom:/usr/src/linux# > >> > >> And the 3.0 kernel is broken for my wireless adapter: > >> http://www.gossamer-threads.com/lists/linux/kernel/1411576 > >> > >> If you can make a combined patch for 2.6.38 I can try it, 2.6.39+ have a > >> horrible driver (rt2800usb) and 1 person emailed me as well stating the > >> same thing off-list (they stick with the manufacturer's driver or the *sta > >> one). > > > > I don't understand. The readdir loop detection code was first merged > > upstream in 2.6.39. 2.6.38 doesn't report any loops... > > Hi, > > Sorry--(my error) this is meant for the client, patched & will e-mail when > it happens again. > > # patch -p1 < /home/jpiszcz/patch1 > patching file fs/nfs/dir.c > patching file include/linux/nfs_fs.h > > # patch -p1 < /home/jpiszcz/patch2 > patching file fs/nfs/dir.c > > (recompile->reboot->waiting for next error) > > Justin. So I have been running Linux 2.6.37-(.. 3.0 recently) since Jan of this year on these new hosts and I have never had so much as a kernel OOPS, with these patches, there were several kernel lockups/problems but the nfs/loop did not show up. I've went back to the previous (non-patched) kernel, is there a less invasive patch? http://home.comcast.net/~jpiszcz/20110730/kernel-error.txt Justin. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2011-07-30 9:58 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.DEB.2.02.1107270952270.1451@p34.internal.lan>
2011-07-27 16:07 ` 2.6.xx: NFS: directory motion/cam2 contains a readdir loop J. Bruce Fields
2011-07-27 16:28 ` Justin Piszcz
2011-07-27 16:40 ` Bryan Schumaker
2011-07-27 17:00 ` Ruediger Meier
2011-07-27 17:09 ` Bryan Schumaker
2011-07-27 17:17 ` Justin Piszcz
2011-07-27 17:45 ` J. Bruce Fields
2011-07-27 18:28 ` Bryan Schumaker
2011-07-27 17:15 ` Justin Piszcz
2011-07-27 18:11 ` Christoph Hellwig
2011-07-27 19:35 ` Justin Piszcz
2011-07-27 19:39 ` Christoph Hellwig
2011-07-27 19:44 ` Justin Piszcz
2011-07-27 19:47 ` Christoph Hellwig
2011-07-27 19:54 ` Bryan Schumaker
2011-07-27 20:02 ` Christoph Hellwig
2011-07-27 20:05 ` Christoph Hellwig
2011-07-27 20:26 ` Rüdiger Meier
2011-07-27 20:47 ` Christoph Hellwig
2011-07-27 21:21 ` Rüdiger Meier
2011-07-27 19:57 ` Justin Piszcz
2011-07-27 20:37 ` Trond Myklebust
2011-07-27 20:54 ` Trond Myklebust
2011-07-27 20:56 ` Trond Myklebust
[not found] ` <1311800195.25645.45.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-07-27 21:24 ` Justin Piszcz
[not found] ` <alpine.DEB.2.02.1107271723500.25432-0qmrozcXWo8bm2hyYBkBBg@public.gmane.org>
2011-07-27 22:44 ` Justin Piszcz
2011-07-28 20:48 ` Trond Myklebust
2011-07-29 20:52 ` Bryan Schumaker
2011-07-29 20:59 ` Justin Piszcz
2011-07-29 22:03 ` Trond Myklebust
2011-07-29 22:23 ` Justin Piszcz
2011-07-30 9:58 ` Justin Piszcz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).