* NFS client bug in 2.6.8-2.6.11 @ 2005-03-08 4:53 Bernardo Innocenti 2005-03-08 5:30 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: Bernardo Innocenti @ 2005-03-08 4:53 UTC (permalink / raw) To: lkml; +Cc: Neil Conway, nfs Hello, This problem was previously described by Neil Conway. All relevant information here: http://lkml.org/lkml/2005/2/10/97 I still see this very same problem on 2.6.11 vanilla and in Fedora/RawHide hernels. It has haunted me for a couple of months on several Fedora clients. Strangely, a Gentoo client isn't affected, but I couldn't investigate further. When the current directory becomes inaccessible, it remains so until I cd somewhere else and then cd back to it. Sometimes I must wait a few seconds before cd succeeds. Here's a sample session: [executing a find / in another shell to trigger the bug] beetle:/pub/linux/distro/fedora-devel# ll ls: .: No such file or directory beetle:/pub/linux/distro/fedora-devel# cd - / beetle:/# cd - bash: cd: /pub/linux/distro/fedora-devel: No such file or directory beetle:/# [...a few seconds later...] beetle:/# cd - /pub/linux/distro/fedora-devel Appears to be a client bug. The problem only happens when there's heavy filesystem activity on other filesystems (local or NFS). NFS mount options: rw,_netdev,rsize=32768,wsize=32768,hard,intr,proto=udp,addr=10.3.3.1 -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-08 4:53 NFS client bug in 2.6.8-2.6.11 Bernardo Innocenti @ 2005-03-08 5:30 ` Trond Myklebust 2005-03-08 6:38 ` Bernardo Innocenti 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2005-03-08 5:30 UTC (permalink / raw) To: Bernardo Innocenti; +Cc: lkml, Neil Conway, nfs ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti: > Appears to be a client bug. Why? -- Trond Myklebust <trond.myklebust@fys.uio.no> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-08 5:30 ` Trond Myklebust @ 2005-03-08 6:38 ` Bernardo Innocenti 2005-03-08 6:47 ` Trond Myklebust 2005-03-08 7:03 ` Bernardo Innocenti 0 siblings, 2 replies; 10+ messages in thread From: Bernardo Innocenti @ 2005-03-08 6:38 UTC (permalink / raw) To: Trond Myklebust; +Cc: lkml, Neil Conway, nfs Trond Myklebust wrote: > ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti: > >>Appears to be a client bug. > > Why? Two clients started showing the problem after being upgraded from FC2 to FC3, while the server remained unchanged. I also can't reproduce the problem on an older client running 2.4.21. I'll test with 2.6.7 as soon as I can reboot the client I'm using right now. -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-08 6:38 ` Bernardo Innocenti @ 2005-03-08 6:47 ` Trond Myklebust 2005-03-08 9:26 ` Bernardo Innocenti 2005-03-08 7:03 ` Bernardo Innocenti 1 sibling, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2005-03-08 6:47 UTC (permalink / raw) To: Bernardo Innocenti; +Cc: lkml, Neil Conway, nfs ty den 08.03.2005 Klokka 07:38 (+0100) skreiv Bernardo Innocenti: > Trond Myklebust wrote: > > ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti: > > > >>Appears to be a client bug. > > > > Why? > > Two clients started showing the problem after > being upgraded from FC2 to FC3, while the server > remained unchanged. Can you produce tcpdumps to back that up? Neil's problem appeared rather to be server-related. Neither of us could reproduce his problem when the server was exporting an XFS partition. The other thing to try is to turn off subtree checking on the server. Cheers, Trond -- Trond Myklebust <trond.myklebust@fys.uio.no> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-08 6:47 ` Trond Myklebust @ 2005-03-08 9:26 ` Bernardo Innocenti 0 siblings, 0 replies; 10+ messages in thread From: Bernardo Innocenti @ 2005-03-08 9:26 UTC (permalink / raw) To: Trond Myklebust; +Cc: lkml, Neil Conway, nfs Trond Myklebust wrote: > ty den 08.03.2005 Klokka 07:38 (+0100) skreiv Bernardo Innocenti: >> >>Two clients started showing the problem after >>being upgraded from FC2 to FC3, while the server >>remained unchanged. > > Can you produce tcpdumps to back that up? > > Neil's problem appeared rather to be server-related. Neither of us could > reproduce his problem when the server was exporting an XFS partition. Actually, I was mistaken: running a background "find / >/dev/null" triggers the problem even on the old RedHat (2.4.26) and Gentoo (2.6.11) clients. > The other thing to try is to turn off subtree checking on the server. It's already turned off on all shares. For the record, this is the contents of my /etc/exportfs: /home gss/krb5(rw,no_root_squash,no_subtree_check,async) beetle(rw,no_root_squash,no_subtree_check,async) deimos(rw,async,no_subtree_check,anonuid=134,anongid=100) haring(rw,async,no_subtree_check,anonuid=127,anongid=100) murphy(rw,async,no_subtree_check,anonuid=158,anongid=100) daneel(rw,async,no_subtree_check,anonuid=100,anongid=100) 10.0.0.0/8(rw,no_subtree_check,async) /arc 10.0.0.0/8(rw,no_root_squash,no_subtree_check,async,anonuid=14,anongid=113) # # NFSv4 # /export beetle(rw,fsid=0,no_root_squash,insecure,no_subtree_check,async) /export 10.0.0.0/8(rw,fsid=0,insecure,no_subtree_check,async) /export gss/krb5(rw,fsid=0,insecure,no_subtree_check,async) /export/home beetle(rw,nohide,no_root_squash,insecure,no_subtree_check,async) /export/home 10.0.0.0/8(rw,nohide,insecure,no_subtree_check,async) /export/home gss/krb5(rw,nohide,no_root_squash,insecure,no_subtree_check,async) /export/arc 10.0.0.0/8(rw,nohide,no_root_squash,insecure,no_subtree_check,async,anonuid=14,anongid=113) -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-08 6:38 ` Bernardo Innocenti 2005-03-08 6:47 ` Trond Myklebust @ 2005-03-08 7:03 ` Bernardo Innocenti 2005-03-08 8:56 ` Anders Saaby 1 sibling, 1 reply; 10+ messages in thread From: Bernardo Innocenti @ 2005-03-08 7:03 UTC (permalink / raw) To: Bernardo Innocenti; +Cc: Trond Myklebust, lkml, Neil Conway, nfs Bernardo Innocenti wrote: > Trond Myklebust wrote: > > I also can't reproduce the problem on an older > client running 2.4.21. Well, actually I tried harder with the 2.4.21 client and I obtained a similar effect: naraku:/pub/linux/distro/fedora-devel# ll ls: .: Stale NFS file handle naraku:/pub/linux/distro/fedora-devel# cd - /arc/linux naraku:/arc/linux# cd - /pub/linux/distro/fedora-devel naraku:/pub/linux/distro/fedora-devel# ll ... (lots of files) So, instead of ENOENT I get ESTALE on 2.4.21. May well be a server bug then. The server is running 2.6.10-1.766_FC3. Do you think I should try installing a vanilla kernel on the server? -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-08 7:03 ` Bernardo Innocenti @ 2005-03-08 8:56 ` Anders Saaby 2005-03-08 22:25 ` Bernardo Innocenti 0 siblings, 1 reply; 10+ messages in thread From: Anders Saaby @ 2005-03-08 8:56 UTC (permalink / raw) To: Bernardo Innocenti; +Cc: Trond Myklebust, lkml, Neil Conway, nfs On Tuesday 08 March 2005 08:03, Bernardo Innocenti wrote: > Bernardo Innocenti wrote: > > Trond Myklebust wrote: > > > > I also can't reproduce the problem on an older > > client running 2.4.21. > > Well, actually I tried harder with the 2.4.21 > client and I obtained a similar effect: > > So, instead of ENOENT I get ESTALE on 2.4.21. > > May well be a server bug then. The server is running > 2.6.10-1.766_FC3. Do you think I should try installing > a vanilla kernel on the server? We have seen lots of ESTALE's/ENOENT's when the server is running 2.6.10 (vanilla). Don't know if this was supposed to be fixed in the 2.6.10-FC kernels, but vanilla 2.6.11 doesen't seem to have this bug at all. You mention a lot of kernel versions including 2.6.11, and I can't really figure out whether you are talking abount the clients or the server. - Anyways if your server has only run with 2.6.10 - try 2.6.11. - Apologies if I missed something obvious. -- Med venlig hilsen - Best regards - Meilleures salutations Anders Saaby Systems Engineer ------------------------------------------------ Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby Phone: +45 45 880 888 - Fax: +45 45 880 777 Mail: as@cohaesio.com - http://www.cohaesio.com ------------------------------------------------ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-08 8:56 ` Anders Saaby @ 2005-03-08 22:25 ` Bernardo Innocenti 0 siblings, 0 replies; 10+ messages in thread From: Bernardo Innocenti @ 2005-03-08 22:25 UTC (permalink / raw) To: Anders Saaby; +Cc: Trond Myklebust, lkml, Neil Conway, nfs Anders Saaby wrote: > On Tuesday 08 March 2005 08:03, Bernardo Innocenti wrote: > >>Bernardo Innocenti wrote: >> >>>Trond Myklebust wrote: >>> >>>I also can't reproduce the problem on an older >>>client running 2.4.21. >> >>Well, actually I tried harder with the 2.4.21 >>client and I obtained a similar effect: >> >>So, instead of ENOENT I get ESTALE on 2.4.21. >> >>May well be a server bug then. The server is running >>2.6.10-1.766_FC3. Do you think I should try installing >>a vanilla kernel on the server? > > > We have seen lots of ESTALE's/ENOENT's when the server is running 2.6.10 > (vanilla). Don't know if this was supposed to be fixed in the 2.6.10-FC > kernels, but vanilla 2.6.11 doesen't seem to have this bug at all. > > You mention a lot of kernel versions including 2.6.11, and I can't really > figure out whether you are talking abount the clients or the server. - > Anyways if your server has only run with 2.6.10 - try 2.6.11. Thank you, I've finally nailed it down by upgrading the *server* kernel from 2.6.10-1.770_FC3 to 2.6.10-1.770_FC3. The latter is basically 2.6.10-ac12 plus a bunch of vendor specific patches. > - Apologies if I missed something obvious. No, *I* did. All the clues I had leaded me to the client side, while the problem was in the server instead. -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 @ 2005-03-15 23:44 Neil Conway 2005-03-16 2:49 ` Bernardo Innocenti 0 siblings, 1 reply; 10+ messages in thread From: Neil Conway @ 2005-03-15 23:44 UTC (permalink / raw) To: Bernardo Innocenti, Anders Saaby; +Cc: Trond Myklebust, lkml, nfs Hi Bernardo (et al). Apologies - I've not been reading my account for a wee while. Then again, I probably don't have much useful to add to the debate right now ;-) --- Bernardo Innocenti <bernie@develer.com> wrote: > Anders Saaby wrote: > > Anyways if your server has only run with 2.6.10 - try 2.6.11. > > Thank you, I've finally nailed it down by upgrading the > *server* kernel from 2.6.10-1.770_FC3 to 2.6.10-1.770_FC3. Hmm, I will infer from a previous email you sent that you mean 766_FC3 for the "from" kernel. > The latter is basically 2.6.10-ac12 plus a bunch of vendor > specific patches. 766 -> 770 sounds like a "small" (ish) number of patches to check, if we're lucky. Did you wade through 'em all yet? Any smoking guns? Regards, Neil PS: oh bugger, just remembered that I also reproduced my bug with a 2.6.8 kernel on the server; admittedly though it was an FC2 kernel so who knows what extra patches it had. __________________________________ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: NFS client bug in 2.6.8-2.6.11 2005-03-15 23:44 Neil Conway @ 2005-03-16 2:49 ` Bernardo Innocenti 0 siblings, 0 replies; 10+ messages in thread From: Bernardo Innocenti @ 2005-03-16 2:49 UTC (permalink / raw) To: Neil Conway; +Cc: Anders Saaby, Trond Myklebust, lkml, nfs Neil Conway wrote: > 766 -> 770 sounds like a "small" (ish) number of patches to check, if > we're lucky. Did you wade through 'em all yet? Any smoking guns? The RPM changelog doesn't contain anything relevant between 766 and 770: ---CUT--- * Thu Feb 24 2005 Dave Jones <davej@redhat.com> - Use old scheme first when probing USB. (#145273) * Wed Feb 23 2005 Dave Jones <davej@redhat.com> - Try as you may, there's no escape from crap SCSI hardware. (#149402) * Mon Feb 21 2005 Dave Jones <davej@redhat.com> - Disable some experimental USB EHCI features. * Tue Feb 15 2005 Dave Jones <davej@redhat.com> - Fix bio leak in md layer. ---CUT--- Perhaps the changelog is incomplete. I don't have the two SRPMs at hand to make a comparison. By the way, it seems upgrading to 2.6.10-1.770_FC3 just made the bug much harder to trigger: I've definitely seen it once again when I had left a shell sitting in an NFS directory overnight. I couldn't reproduce it a second time. > PS: oh bugger, just remembered that I also reproduced my bug with a > 2.6.8 kernel on the server; admittedly though it was an FC2 kernel so > who knows what extra patches it had. You can easily find out by downloading the SRPM. Now that Fedora provides a public CVS, perhaps it could be used to make such investigations directly with the cvsweb interface without downloading and unpacking a 40MB file. -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-03-16 2:50 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-08 4:53 NFS client bug in 2.6.8-2.6.11 Bernardo Innocenti 2005-03-08 5:30 ` Trond Myklebust 2005-03-08 6:38 ` Bernardo Innocenti 2005-03-08 6:47 ` Trond Myklebust 2005-03-08 9:26 ` Bernardo Innocenti 2005-03-08 7:03 ` Bernardo Innocenti 2005-03-08 8:56 ` Anders Saaby 2005-03-08 22:25 ` Bernardo Innocenti -- strict thread matches above, loose matches on Subject: below -- 2005-03-15 23:44 Neil Conway 2005-03-16 2:49 ` Bernardo Innocenti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox