* 2.5: NFS troubles @ 2003-04-06 12:06 Felipe Alfaro Solana 2003-04-06 12:58 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Felipe Alfaro Solana @ 2003-04-06 12:06 UTC (permalink / raw) To: LKML [-- Attachment #1: Type: text/plain, Size: 1575 bytes --] Hello, I'm testing 2.5.66-bk11 on my NFS server running RH9. When I run the "find" command on the NFS share from my client computer, it hangs forever after a while, but it always hangs *exactly* at the same place every time. However, if I boot into NFS with RH9's standard kernel (2.4.20), the "find" command works as expected and is able to complete with any hangs or delays. My NFS server (hostname glass) has a whole ext3 partition - mounted under /data - formatted as ext3. /etc/exports is /data 192.168.0.100(rw,no_root_squash) /etc/fstab is /dev/hda3 /data ext3 defaults,noatime 1 2 I have this NFS share placed on my client computer /etc/fstab as glass:/data /net/glass_data nfs noauto,users,soft 0 0 The client computer is running 2.5.66-mm1. I have attached the following files, compressed with bzip2 as they are really big: local-find.bz2 is the result of running the find command locally on the NFS server (not using NFS) for you to see the whole list of files that should be shown up. nfs-find.bz2 is the actual list of files that are shown up using the find command over the NFS share (running on my client computer) before the "find" command hangs. nfs-tcpdump.bz2 is a partial tcpdump output generated while the "find" command is running over the NFS share. nfs-strace.bz2 is an "strace -p <pid_of_find_command>" on the find command that I run on my client computer. Any ideas on what's hapenning here? Thank you very much! ________________________________________________________________________ Linux Registered User #287198 [-- Attachment #2: local-find.bz2 --] [-- Type: application/x-bzip, Size: 15321 bytes --] [-- Attachment #3: nfs-find.bz2 --] [-- Type: application/x-bzip, Size: 546 bytes --] [-- Attachment #4: nfs-strace.bz2 --] [-- Type: application/x-bzip, Size: 22290 bytes --] [-- Attachment #5: nfs-tcpdump.bz2 --] [-- Type: application/x-bzip, Size: 736 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-06 12:06 2.5: NFS troubles Felipe Alfaro Solana @ 2003-04-06 12:58 ` Trond Myklebust 2003-04-07 0:18 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2003-04-06 12:58 UTC (permalink / raw) To: Felipe Alfaro Solana; +Cc: LKML >>>>> " " == Felipe Alfaro Solana <felipe_alfaro@linuxmail.org> writes: > Hello, I'm testing 2.5.66-bk11 on my NFS server running > RH9. When I run the "find" command on the NFS share from my > client computer, it hangs forever after a while, but it always > hangs *exactly* at the same place every time. However, if I > boot into NFS with RH9's standard kernel (2.4.20), the "find" > command works as expected and is able to complete with any > hangs or delays. > My NFS server (hostname glass) has a whole ext3 partition - > mounted under /data - formatted as ext3. The 2.5.66 ext3 code still has some issues with respect to NFS readdir cookies. Cheers, Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-06 12:58 ` Trond Myklebust @ 2003-04-07 0:18 ` Andrew Morton 2003-04-07 0:27 ` Robert Love 2003-04-07 8:58 ` Felipe Alfaro Solana 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2003-04-07 0:18 UTC (permalink / raw) To: Trond Myklebust; +Cc: felipe_alfaro, linux-kernel Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > > >>>>> " " == Felipe Alfaro Solana <felipe_alfaro@linuxmail.org> writes: > > > Hello, I'm testing 2.5.66-bk11 on my NFS server running > > RH9. When I run the "find" command on the NFS share from my > > client computer, it hangs forever after a while, but it always > > hangs *exactly* at the same place every time. However, if I > > boot into NFS with RH9's standard kernel (2.4.20), the "find" > > command works as expected and is able to complete with any > > hangs or delays. > > > My NFS server (hostname glass) has a whole ext3 partition - > > mounted under /data - formatted as ext3. > > The 2.5.66 ext3 code still has some issues with respect to NFS readdir > cookies. It might do. I have Ted's htree/NFS fixes in there though. Felipe, please do dumpe2fs /dev/hdXX | grep features if it shows dir_index then it might be an ext3 problem. If not then it is probably an NFS problem. If it does have dir_index set then please run tune2fs -O ^dir_index /dev/hdXX and reboot and retest. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 0:18 ` Andrew Morton @ 2003-04-07 0:27 ` Robert Love 2003-04-07 9:01 ` Felipe Alfaro Solana ` (2 more replies) 2003-04-07 8:58 ` Felipe Alfaro Solana 1 sibling, 3 replies; 12+ messages in thread From: Robert Love @ 2003-04-07 0:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Trond Myklebust, felipe_alfaro, linux-kernel On Sun, 2003-04-06 at 20:18, Andrew Morton wrote: > if it shows dir_index then it might be an ext3 problem. If not then it is > probably an NFS problem. Nah, its not an ext3 problem (at least not with htree). I am seeing this same problem, starting recently, with a 2.5 client and a 2.4 server. Both are ext3 but neither have htree, and the problem is new. I have not yet figured out whether its the 2.5 kernel on the client or the newly-upgraded Red Hat 9 on the server... but I suspect the 2.5 kernel on the client. Robert Love ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 0:27 ` Robert Love @ 2003-04-07 9:01 ` Felipe Alfaro Solana 2003-04-07 9:13 ` Andrew Morton 2003-04-07 9:39 ` Trond Myklebust 2003-04-07 13:23 ` Trond Myklebust 2 siblings, 1 reply; 12+ messages in thread From: Felipe Alfaro Solana @ 2003-04-07 9:01 UTC (permalink / raw) To: Robert Love; +Cc: Andrew Morton, Trond Myklebust, LKML On Mon, 2003-04-07 at 02:27, Robert Love wrote: > On Sun, 2003-04-06 at 20:18, Andrew Morton wrote: > > > if it shows dir_index then it might be an ext3 problem. If not then it is > > probably an NFS problem. > > Nah, its not an ext3 problem (at least not with htree). But it could be an interaction problem between NFS and ext3. I did what Andrew pointed (disabling dir_index) and it solved my problems. I don't think it's a client problem, since I can't reproduce with 2.4+ext3, 2.5.66+ext2 and 2.5.66+ext3-dir_index, but is reproducible with 2.5.66+ext3+dir_index. ________________________________________________________________________ Linux Registered User #287198 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 9:01 ` Felipe Alfaro Solana @ 2003-04-07 9:13 ` Andrew Morton 2003-04-07 9:24 ` Felipe Alfaro Solana 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2003-04-07 9:13 UTC (permalink / raw) To: Felipe Alfaro Solana; +Cc: rml, trond.myklebust, linux-kernel Felipe Alfaro Solana <felipe_alfaro@linuxmail.org> wrote: > > On Mon, 2003-04-07 at 02:27, Robert Love wrote: > > On Sun, 2003-04-06 at 20:18, Andrew Morton wrote: > > > > > if it shows dir_index then it might be an ext3 problem. If not then it is > > > probably an NFS problem. > > > > Nah, its not an ext3 problem (at least not with htree). > > But it could be an interaction problem between NFS and ext3. I did what > Andrew pointed (disabling dir_index) and it solved my problems. > > I don't think it's a client problem, since I can't reproduce with > 2.4+ext3, 2.5.66+ext2 and 2.5.66+ext3-dir_index, but is reproducible > with 2.5.66+ext3+dir_index. > Well it could still be an NFS problem. Turning off htree on the server could cause filenames to be returned in a different order (or is that illegal?) or changed timing or such. If Robert is seeing it on non-htree servers then we'd need to see that fixed up before deciding if there is also an(other) htree bug. First thing we need to do is to debug it. Trond would have a better idea of how to set about that than I. Possibly a tcpdump of the traffic? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 9:13 ` Andrew Morton @ 2003-04-07 9:24 ` Felipe Alfaro Solana 2003-04-07 21:58 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: Felipe Alfaro Solana @ 2003-04-07 9:24 UTC (permalink / raw) To: Andrew Morton; +Cc: rml, trond.myklebust, LKML On Mon, 2003-04-07 at 11:13, Andrew Morton wrote: > Felipe Alfaro Solana <felipe_alfaro@linuxmail.org> wrote: > > > > I don't think it's a client problem, since I can't reproduce with > > 2.4+ext3, 2.5.66+ext2 and 2.5.66+ext3-dir_index, but is reproducible > > with 2.5.66+ext3+dir_index. > > > > Well it could still be an NFS problem. Turning off htree on the server could > cause filenames to be returned in a different order (or is that illegal?) or > changed timing or such. > > If Robert is seeing it on non-htree servers then we'd need to see that fixed > up before deciding if there is also an(other) htree bug. > > First thing we need to do is to debug it. Trond would have a better idea of > how to set about that than I. Possibly a tcpdump of the traffic? I sent a tcpdump and strace as attachments in my original message ;-) Do you have it handy? Should I send it again?... ________________________________________________________________________ Linux Registered User #287198 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 9:24 ` Felipe Alfaro Solana @ 2003-04-07 21:58 ` Trond Myklebust 0 siblings, 0 replies; 12+ messages in thread From: Trond Myklebust @ 2003-04-07 21:58 UTC (permalink / raw) To: Felipe Alfaro Solana; +Cc: Andrew Morton, rml, LKML >>>>> " " == Felipe Alfaro Solana <felipe_alfaro@linuxmail.org> writes: >> If Robert is seeing it on non-htree servers then we'd need to >> see that fixed up before deciding if there is also an(other) >> htree bug. > I sent a tcpdump and strace as attachments in my original > message ;-) Do you have it handy? Should I send it again?... Robert was seeing a very different read-related bug (not related to the htree bug). I posted a fix for it earlier this afternoon, and he has already confirmed that it fixed his problem. Cheers, Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 0:27 ` Robert Love 2003-04-07 9:01 ` Felipe Alfaro Solana @ 2003-04-07 9:39 ` Trond Myklebust 2003-04-07 13:23 ` Trond Myklebust 2 siblings, 0 replies; 12+ messages in thread From: Trond Myklebust @ 2003-04-07 9:39 UTC (permalink / raw) To: Robert Love; +Cc: Andrew Morton, Trond Myklebust, felipe_alfaro, linux-kernel >>>>> " " == Robert Love <rml@tech9.net> writes: > I have not yet figured out whether its the 2.5 kernel on the > client or the newly-upgraded Red Hat 9 on the server... but I > suspect the 2.5 kernel on the client. There is a problem with generic reads under 2.5. Now that I think I've got the resource management under control, I've started to look into it. Cheers, Trond ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 0:27 ` Robert Love 2003-04-07 9:01 ` Felipe Alfaro Solana 2003-04-07 9:39 ` Trond Myklebust @ 2003-04-07 13:23 ` Trond Myklebust 2003-04-07 15:17 ` Robert Love 2 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2003-04-07 13:23 UTC (permalink / raw) To: Robert Love, Siim Vahtre; +Cc: linux-kernel, NFS maillist OK. I've managed to squash the NFS read corruption problems that I had on my 2.5.x client setup with the following patch. Since the two of you reported what appears to be the same problem, would you mind trying it out? The fix basically tightens up consistency checks in the process of reading the skb (which is done in the sk->data_ready() callback). Cheers, Trond diff -u --recursive --new-file linux-2.5.66-10-nr_dirty/net/sunrpc/xprt.c linux-2.5.66-11-fix_read/net/sunrpc/xprt.c --- linux-2.5.66-10-nr_dirty/net/sunrpc/xprt.c 2003-03-27 18:34:08.000000000 +0100 +++ linux-2.5.66-11-fix_read/net/sunrpc/xprt.c 2003-04-07 15:15:29.000000000 +0200 @@ -625,7 +625,8 @@ { if (len > desc->count) len = desc->count; - skb_copy_bits(desc->skb, desc->offset, to, len); + if (skb_copy_bits(desc->skb, desc->offset, to, len)) + return 0; desc->count -= len; desc->offset += len; return len; @@ -669,11 +670,15 @@ csum2 = skb_checksum(skb, desc.offset, skb->len - desc.offset, 0); desc.csum = csum_block_add(desc.csum, csum2, desc.offset); } + if (desc.count) + return -1; if ((unsigned short)csum_fold(desc.csum)) return -1; return 0; no_checksum: xdr_partial_copy_from_skb(xdr, 0, &desc, skb_read_bits); + if (desc.count) + return -1; return 0; } @@ -750,7 +755,8 @@ { if (len > desc->count) len = desc->count; - skb_copy_bits(desc->skb, desc->offset, p, len); + if (skb_copy_bits(desc->skb, desc->offset, p, len)) + return 0; desc->offset += len; desc->count -= len; return len; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 13:23 ` Trond Myklebust @ 2003-04-07 15:17 ` Robert Love 0 siblings, 0 replies; 12+ messages in thread From: Robert Love @ 2003-04-07 15:17 UTC (permalink / raw) To: trond.myklebust; +Cc: Siim Vahtre, linux-kernel, NFS maillist On Mon, 2003-04-07 at 09:23, Trond Myklebust wrote: > OK. I've managed to squash the NFS read corruption problems that I had > on my 2.5.x client setup with the following patch. > Since the two of you reported what appears to be the same problem, > would you mind trying it out? This fixes it for me. No errors, no corruption. I did a verify of the md5sums of all of the Red Hat 9 RPM packages over NFS. I had random failures (in different packages each time) before. I just did it twice to be sure -- it works. Thank you, Trond. Robert Love ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.5: NFS troubles 2003-04-07 0:18 ` Andrew Morton 2003-04-07 0:27 ` Robert Love @ 2003-04-07 8:58 ` Felipe Alfaro Solana 1 sibling, 0 replies; 12+ messages in thread From: Felipe Alfaro Solana @ 2003-04-07 8:58 UTC (permalink / raw) To: Andrew Morton; +Cc: Trond Myklebust, LKML On Mon, 2003-04-07 at 02:18, Andrew Morton wrote: > Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > > The 2.5.66 ext3 code still has some issues with respect to NFS readdir > > cookies. > > It might do. I have Ted's htree/NFS fixes in there though. > > Felipe, please do > > dumpe2fs /dev/hdXX | grep features > > if it shows dir_index then it might be an ext3 problem. If not then it is > probably an NFS problem. > > If it does have dir_index set then please run > > tune2fs -O ^dir_index /dev/hdXX > > and reboot and retest. Wonderful, Andrew... You were right. Disabling H/Tree indexes solved the problem! Anything else? PS: I previously solved the problem by mounting the filesystem as ext2 instead, but now it seems to be working pretty well with ext3 (at least, I can't reproduce the hang I described in my previous message). Thanks! ________________________________________________________________________ Linux Registered User #287198 ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2003-04-07 21:48 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-04-06 12:06 2.5: NFS troubles Felipe Alfaro Solana 2003-04-06 12:58 ` Trond Myklebust 2003-04-07 0:18 ` Andrew Morton 2003-04-07 0:27 ` Robert Love 2003-04-07 9:01 ` Felipe Alfaro Solana 2003-04-07 9:13 ` Andrew Morton 2003-04-07 9:24 ` Felipe Alfaro Solana 2003-04-07 21:58 ` Trond Myklebust 2003-04-07 9:39 ` Trond Myklebust 2003-04-07 13:23 ` Trond Myklebust 2003-04-07 15:17 ` Robert Love 2003-04-07 8:58 ` Felipe Alfaro Solana
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox