public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* NFS client bug in 2.6.8-2.6.11
@ 2005-03-08  4:53 Bernardo Innocenti
  2005-03-08  5:30 ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Bernardo Innocenti @ 2005-03-08  4:53 UTC (permalink / raw)
  To: lkml; +Cc: Neil Conway, nfs

Hello,

This problem was previously described by Neil Conway.
All relevant information here:

  http://lkml.org/lkml/2005/2/10/97


I still see this very same problem on 2.6.11 vanilla and in
Fedora/RawHide hernels.  It has haunted me for a couple of
months on several Fedora clients.  Strangely, a Gentoo
client isn't affected, but I couldn't investigate further.

When the current directory becomes inaccessible, it remains
so until I cd somewhere else and then cd back to it.
Sometimes I must wait a few seconds before cd succeeds.

Here's a sample session:

 [executing a find / in another shell to trigger the bug]
 beetle:/pub/linux/distro/fedora-devel# ll
 ls: .: No such file or directory
 beetle:/pub/linux/distro/fedora-devel# cd -
 /
 beetle:/# cd -
 bash: cd: /pub/linux/distro/fedora-devel: No such file or directory
 beetle:/#
 [...a few seconds later...]
 beetle:/# cd -
 /pub/linux/distro/fedora-devel


Appears to be a client bug.  The problem only happens
when there's heavy filesystem activity on other
filesystems (local or NFS).

NFS mount options: rw,_netdev,rsize=32768,wsize=32768,hard,intr,proto=udp,addr=10.3.3.1

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-08  4:53 Bernardo Innocenti
@ 2005-03-08  5:30 ` Trond Myklebust
  2005-03-08  6:38   ` Bernardo Innocenti
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2005-03-08  5:30 UTC (permalink / raw)
  To: Bernardo Innocenti; +Cc: lkml, Neil Conway, nfs

ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti:

> Appears to be a client bug.

Why?

-- 
Trond Myklebust <trond.myklebust@fys.uio.no>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-08  5:30 ` Trond Myklebust
@ 2005-03-08  6:38   ` Bernardo Innocenti
  2005-03-08  6:47     ` Trond Myklebust
  2005-03-08  7:03     ` Bernardo Innocenti
  0 siblings, 2 replies; 10+ messages in thread
From: Bernardo Innocenti @ 2005-03-08  6:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: lkml, Neil Conway, nfs

Trond Myklebust wrote:
> ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti:
> 
>>Appears to be a client bug.
> 
> Why?

Two clients started showing the problem after
being upgraded from FC2 to FC3, while the server
remained unchanged.

I also can't reproduce the problem on an older
client running 2.4.21.

I'll test with 2.6.7 as soon as I can reboot the
client I'm using right now.

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-08  6:38   ` Bernardo Innocenti
@ 2005-03-08  6:47     ` Trond Myklebust
  2005-03-08  9:26       ` Bernardo Innocenti
  2005-03-08  7:03     ` Bernardo Innocenti
  1 sibling, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2005-03-08  6:47 UTC (permalink / raw)
  To: Bernardo Innocenti; +Cc: lkml, Neil Conway, nfs

ty den 08.03.2005 Klokka 07:38 (+0100) skreiv Bernardo Innocenti:
> Trond Myklebust wrote:
> > ty den 08.03.2005 Klokka 05:53 (+0100) skreiv Bernardo Innocenti:
> > 
> >>Appears to be a client bug.
> > 
> > Why?
> 
> Two clients started showing the problem after
> being upgraded from FC2 to FC3, while the server
> remained unchanged.

Can you produce tcpdumps to back that up?

Neil's problem appeared rather to be server-related. Neither of us could
reproduce his problem when the server was exporting an XFS partition.

The other thing to try is to turn off subtree checking on the server.

Cheers,
  Trond

-- 
Trond Myklebust <trond.myklebust@fys.uio.no>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-08  6:38   ` Bernardo Innocenti
  2005-03-08  6:47     ` Trond Myklebust
@ 2005-03-08  7:03     ` Bernardo Innocenti
  2005-03-08  8:56       ` Anders Saaby
  1 sibling, 1 reply; 10+ messages in thread
From: Bernardo Innocenti @ 2005-03-08  7:03 UTC (permalink / raw)
  To: Bernardo Innocenti; +Cc: Trond Myklebust, lkml, Neil Conway, nfs

Bernardo Innocenti wrote:
> Trond Myklebust wrote:
>
> I also can't reproduce the problem on an older
> client running 2.4.21.

Well, actually I tried harder with the 2.4.21
client and I obtained a similar effect:

 naraku:/pub/linux/distro/fedora-devel# ll
 ls: .: Stale NFS file handle
 naraku:/pub/linux/distro/fedora-devel# cd -
 /arc/linux
 naraku:/arc/linux# cd -
 /pub/linux/distro/fedora-devel
 naraku:/pub/linux/distro/fedora-devel# ll
 ... (lots of files)


So, instead of ENOENT I get ESTALE on 2.4.21.

May well be a server bug then.  The server is running
2.6.10-1.766_FC3.  Do you think I should try installing
a vanilla kernel on the server?

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-08  7:03     ` Bernardo Innocenti
@ 2005-03-08  8:56       ` Anders Saaby
  2005-03-08 22:25         ` Bernardo Innocenti
  0 siblings, 1 reply; 10+ messages in thread
From: Anders Saaby @ 2005-03-08  8:56 UTC (permalink / raw)
  To: Bernardo Innocenti; +Cc: Trond Myklebust, lkml, Neil Conway, nfs

On Tuesday 08 March 2005 08:03, Bernardo Innocenti wrote:
> Bernardo Innocenti wrote:
> > Trond Myklebust wrote:
> >
> > I also can't reproduce the problem on an older
> > client running 2.4.21.
>
> Well, actually I tried harder with the 2.4.21
> client and I obtained a similar effect:
>
> So, instead of ENOENT I get ESTALE on 2.4.21.
>
> May well be a server bug then.  The server is running
> 2.6.10-1.766_FC3.  Do you think I should try installing
> a vanilla kernel on the server?

We have seen lots of ESTALE's/ENOENT's when the server is running 2.6.10 
(vanilla). Don't know if this was supposed to be fixed in the 2.6.10-FC 
kernels, but vanilla 2.6.11 doesen't seem to have this bug at all.

You mention a lot of kernel versions including 2.6.11, and I can't really 
figure out whether you are talking abount the clients or the server. - 
Anyways if your server has only run with 2.6.10 - try 2.6.11.

- Apologies if I missed something obvious.

-- 
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer
------------------------------------------------
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: as@cohaesio.com - http://www.cohaesio.com
------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-08  6:47     ` Trond Myklebust
@ 2005-03-08  9:26       ` Bernardo Innocenti
  0 siblings, 0 replies; 10+ messages in thread
From: Bernardo Innocenti @ 2005-03-08  9:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: lkml, Neil Conway, nfs

Trond Myklebust wrote:
> ty den 08.03.2005 Klokka 07:38 (+0100) skreiv Bernardo Innocenti:
>>
>>Two clients started showing the problem after
>>being upgraded from FC2 to FC3, while the server
>>remained unchanged.
> 
> Can you produce tcpdumps to back that up?
> 
> Neil's problem appeared rather to be server-related. Neither of us could
> reproduce his problem when the server was exporting an XFS partition.

Actually, I was mistaken: running a background "find / >/dev/null"
triggers the problem even on the old RedHat (2.4.26) and
Gentoo (2.6.11) clients.


> The other thing to try is to turn off subtree checking on the server.

It's already turned off on all shares.  For the record, this is the
contents of my /etc/exportfs:

/home                   gss/krb5(rw,no_root_squash,no_subtree_check,async) beetle(rw,no_root_squash,no_subtree_check,async) deimos(rw,async,no_subtree_check,anonuid=134,anongid=100) haring(rw,async,no_subtree_check,anonuid=127,anongid=100) murphy(rw,async,no_subtree_check,anonuid=158,anongid=100) daneel(rw,async,no_subtree_check,anonuid=100,anongid=100) 10.0.0.0/8(rw,no_subtree_check,async)
/arc                    10.0.0.0/8(rw,no_root_squash,no_subtree_check,async,anonuid=14,anongid=113)

#
# NFSv4
#
/export                 beetle(rw,fsid=0,no_root_squash,insecure,no_subtree_check,async)
/export                 10.0.0.0/8(rw,fsid=0,insecure,no_subtree_check,async)
/export                 gss/krb5(rw,fsid=0,insecure,no_subtree_check,async)
/export/home            beetle(rw,nohide,no_root_squash,insecure,no_subtree_check,async)
/export/home            10.0.0.0/8(rw,nohide,insecure,no_subtree_check,async)
/export/home            gss/krb5(rw,nohide,no_root_squash,insecure,no_subtree_check,async)
/export/arc             10.0.0.0/8(rw,nohide,no_root_squash,insecure,no_subtree_check,async,anonuid=14,anongid=113)

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-08  8:56       ` Anders Saaby
@ 2005-03-08 22:25         ` Bernardo Innocenti
  0 siblings, 0 replies; 10+ messages in thread
From: Bernardo Innocenti @ 2005-03-08 22:25 UTC (permalink / raw)
  To: Anders Saaby; +Cc: Trond Myklebust, lkml, Neil Conway, nfs

Anders Saaby wrote:
> On Tuesday 08 March 2005 08:03, Bernardo Innocenti wrote:
> 
>>Bernardo Innocenti wrote:
>>
>>>Trond Myklebust wrote:
>>>
>>>I also can't reproduce the problem on an older
>>>client running 2.4.21.
>>
>>Well, actually I tried harder with the 2.4.21
>>client and I obtained a similar effect:
>>
>>So, instead of ENOENT I get ESTALE on 2.4.21.
>>
>>May well be a server bug then.  The server is running
>>2.6.10-1.766_FC3.  Do you think I should try installing
>>a vanilla kernel on the server?
> 
> 
> We have seen lots of ESTALE's/ENOENT's when the server is running 2.6.10 
> (vanilla). Don't know if this was supposed to be fixed in the 2.6.10-FC 
> kernels, but vanilla 2.6.11 doesen't seem to have this bug at all.
> 
> You mention a lot of kernel versions including 2.6.11, and I can't really 
> figure out whether you are talking abount the clients or the server. - 
> Anyways if your server has only run with 2.6.10 - try 2.6.11.

Thank you, I've finally nailed it down by upgrading the
*server* kernel from 2.6.10-1.770_FC3 to 2.6.10-1.770_FC3.

The latter is basically 2.6.10-ac12 plus a bunch of vendor
specific patches.


> - Apologies if I missed something obvious.

No, *I* did.  All the clues I had leaded me to the client
side, while the problem was in the server instead.

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
@ 2005-03-15 23:44 Neil Conway
  2005-03-16  2:49 ` Bernardo Innocenti
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Conway @ 2005-03-15 23:44 UTC (permalink / raw)
  To: Bernardo Innocenti, Anders Saaby; +Cc: Trond Myklebust, lkml, nfs

Hi Bernardo (et al).  Apologies - I've not been reading my account for
a wee while.  Then again, I probably don't have much useful to add to
the debate right now ;-)

--- Bernardo Innocenti <bernie@develer.com> wrote:
> Anders Saaby wrote:
> > Anyways if your server has only run with 2.6.10 - try 2.6.11.
> 
> Thank you, I've finally nailed it down by upgrading the
> *server* kernel from 2.6.10-1.770_FC3 to 2.6.10-1.770_FC3.

Hmm, I will infer from a previous email you sent that you mean 766_FC3
for the "from" kernel.

> The latter is basically 2.6.10-ac12 plus a bunch of vendor
> specific patches.

766 -> 770 sounds like a "small" (ish) number of patches to check, if
we're lucky.  Did you wade through 'em all yet?  Any smoking guns?

Regards,
Neil
PS: oh bugger, just remembered that I also reproduced my bug with a
2.6.8 kernel on the server; admittedly though it was an FC2 kernel so
who knows what extra patches it had.



		
__________________________________ 
Do you Yahoo!? 
Make Yahoo! your home page 
http://www.yahoo.com/r/hs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS client bug in 2.6.8-2.6.11
  2005-03-15 23:44 NFS client bug in 2.6.8-2.6.11 Neil Conway
@ 2005-03-16  2:49 ` Bernardo Innocenti
  0 siblings, 0 replies; 10+ messages in thread
From: Bernardo Innocenti @ 2005-03-16  2:49 UTC (permalink / raw)
  To: Neil Conway; +Cc: Anders Saaby, Trond Myklebust, lkml, nfs

Neil Conway wrote:

> 766 -> 770 sounds like a "small" (ish) number of patches to check, if
> we're lucky.  Did you wade through 'em all yet?  Any smoking guns?

The RPM changelog doesn't contain anything relevant
between 766 and 770:

---CUT---
* Thu Feb 24 2005 Dave Jones <davej@redhat.com>

- Use old scheme first when probing USB. (#145273)

* Wed Feb 23 2005 Dave Jones <davej@redhat.com>

- Try as you may, there's no escape from crap SCSI hardware. (#149402)

* Mon Feb 21 2005 Dave Jones <davej@redhat.com>

- Disable some experimental USB EHCI features.

* Tue Feb 15 2005 Dave Jones <davej@redhat.com>

- Fix bio leak in md layer.
---CUT---

Perhaps the changelog is incomplete.  I don't have the
two SRPMs at hand to make a comparison.

By the way, it seems upgrading to 2.6.10-1.770_FC3 just made
the bug much harder to trigger: I've definitely seen it once
again when I had left a shell sitting in an NFS directory
overnight.  I couldn't reproduce it a second time.


> PS: oh bugger, just remembered that I also reproduced my bug with a
> 2.6.8 kernel on the server; admittedly though it was an FC2 kernel so
> who knows what extra patches it had.

You can easily find out by downloading the SRPM.  Now that
Fedora provides a public CVS, perhaps it could be used to
make such investigations directly with the cvsweb interface
without downloading and unpacking a 40MB file.

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-03-16  2:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-15 23:44 NFS client bug in 2.6.8-2.6.11 Neil Conway
2005-03-16  2:49 ` Bernardo Innocenti
  -- strict thread matches above, loose matches on Subject: below --
2005-03-08  4:53 Bernardo Innocenti
2005-03-08  5:30 ` Trond Myklebust
2005-03-08  6:38   ` Bernardo Innocenti
2005-03-08  6:47     ` Trond Myklebust
2005-03-08  9:26       ` Bernardo Innocenti
2005-03-08  7:03     ` Bernardo Innocenti
2005-03-08  8:56       ` Anders Saaby
2005-03-08 22:25         ` Bernardo Innocenti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox