All of lore.kernel.org
 help / color / mirror / Atom feed
* ls hangs on NFS share from Apple Xserve
@ 2004-09-08  8:42 bnies
  2004-09-08 16:52 ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: bnies @ 2004-09-08  8:42 UTC (permalink / raw)
  To: nfs

Hi,

A couple of months ago I reported a bug that occurs with Linux NFS client
and Mac OS X NFS server. Here's the bug report:

http://sourceforge.net/tracker/index.php?func=detail&aid=964204&group_id=14&atid=100014

And here are other docs related to this problem:

http://discussions.info.apple.com/webx?14@5.gRhIaxY5v2O.0@.689495bc
http://discussions.info.apple.com/webx?13@140.HfAHaE8Ew3J.148064@.6897b10d/2
http://groups.google.ch/groups?hl=de&lr=&ie=UTF-8&selm=9c87e8e6.0405280036.16f3c991%40posting.google.com
http://algesten.blogspot.com/2004/07/mac-os-x-server-nfs-unique-cookie-bug.html
http://sources.redhat.com/bugzilla/show_bug.cgi?id=353

Martin Algesten reported that this might be caused by non-unique NFS cookies
in the same READDIRPLUS reply from the Apple NFS server but in our environment
with MacOS X 10.3.5 and SuSE Linux 9.0 (Kernel 2.4.21-243-smp4G, glibc-2.3.2-88)
I cannot confirm this. The NFS cookies are unique in the same READDIRPLUS
reply but not unique during the same NFS session.

We don't have problems accessing the MacOS X 10.3.5 NFS server from our Solaris
clients and also don't have problems with Solaris, Linux and NetApp NFS servers
from Linux clients. 

I assume this is a bug with the Linux Kernel NFS implementation or the glibc
library which provides the getdents64 call that loops. Apple says it's a
Linux bug.

According to SuSE the patch linux-2.4.21-15-seekdir.dif from http://client.linux-nfs.org/Linux-2.4.x/2.4.21/
is already included in their kernel.

Could someone who is familiar with NFS and the Linux NFS or glibc code analyze
and solve this problem? I can provide network traffic and strace logs if
you don't have the equipment to reproduce this problem.

Thanks in advance.

Regards,
Bernd



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08  8:42 ls hangs on NFS share from Apple Xserve bnies
@ 2004-09-08 16:52 ` Trond Myklebust
  2004-09-08 18:36   ` Bernd Nies
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2004-09-08 16:52 UTC (permalink / raw)
  To: bnies; +Cc: nfs

P=E5 on , 08/09/2004 klokka 04:42, skreiv bnies@bluewin.ch:

> Martin Algesten reported that this might be caused by non-unique NFS cook=
ies
> in the same READDIRPLUS reply from the Apple NFS server but in our enviro=
nment
> with MacOS X 10.3.5 and SuSE Linux 9.0 (Kernel 2.4.21-243-smp4G, glibc-2.=
3.2-88)
> I cannot confirm this. The NFS cookies are unique in the same READDIRPLUS
> reply but not unique during the same NFS session.
>=20

What does this mean? That 2 different READDIRPLUS entries for the same
directory may return the same cookie? That would a server bug...

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08 16:52 ` Trond Myklebust
@ 2004-09-08 18:36   ` Bernd Nies
  2004-09-08 19:53     ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Nies @ 2004-09-08 18:36 UTC (permalink / raw)
  To: nfs

Hi Trond,

>>Martin Algesten reported that this might be caused by non-unique NFS cookies
>>in the same READDIRPLUS reply from the Apple NFS server but in our environment
>>with MacOS X 10.3.5 and SuSE Linux 9.0 (Kernel 2.4.21-243-smp4G, glibc-2.3.2-88)
>>I cannot confirm this. The NFS cookies are unique in the same READDIRPLUS
>>reply but not unique during the same NFS session.
>>
> 
> 
> What does this mean? That 2 different READDIRPLUS entries for the same
> directory may return the same cookie? That would a server bug...

The problem reported by Martin Algesten was: His MacOS X 10.3.4 NFS 
server sent a duplicate Cookie within the _same_ READDIRPLUS reply. See:

http://algesten.blogspot.com/index.html#109103599929132337

I can't observe that on our MacOS X 10.3.5 server. I only see that while 
traversing a NFS share with ls -lR on a Linux client the same Cookie ID 
appears in different READDIRPLUS replies of different directories. 
Because our Solaris NFS servers do the same and Linux NFS client works 
fine with it, I guess this is allowed.

So I think the endless looping ls -lR from a Linux NFS client on a Mac 
OS X 10.3.5 NFS share I reported is not related to that bug reported by 
Martin Algesten.

Here is the ethereal snoop of such an incident:

http://www.nies.ch/download/apple-linux-nfs-loop.tar.gz

I did a ls -lR on a MacOS X NFS share and in a certain directory it 
hung. A strace on the ls process shows that it loops over these system 
calls:

getdents64(5, /* 3 entries */, 4096)    = 128 
lstat64("/share/dir/file1.txt", {st_mode=S_IFREG|0444, st_size=8550, 
...}) = 0
getxattr("/share/dir/file1.txt", "system.posix_acl_access", (nil), 0) = 
-1 EOPNOTSUPP (Operation not supported)
lstat64("/share/dir/file2.txt", {st_mode=S_IFREG|0444, st_size=6570, 
...}) = 0
getxattr("/share/dir/file2.txt", "system.posix_acl_access", (nil), 0) = 
-1 EOPNOTSUPP (Operation not supported)
lstat64("/share/dir/file3.txt", {st_mode=S_IFREG|0444, st_size=23411, 
...}) = 0
getxattr("/share/dir/file3.txt", "system.posix_acl_access", (nil), 0) = 
-1 EOPNOTSUPP (Operation not supported)

This does not only happen with ls. It happens with all commands that 
want to read a certain directory (e.g. tar). A quick workaround is to 
create a new file in the directory where it loops.

Regards,
Bernd



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08 18:36   ` Bernd Nies
@ 2004-09-08 19:53     ` Trond Myklebust
  2004-09-08 20:07       ` Bernd Nies
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2004-09-08 19:53 UTC (permalink / raw)
  To: Bernd Nies; +Cc: nfs

As far as I can see from your traces, 2 consecutive READDIRs on the same
directory is giving totally different results...

In your TCP file, the first time the Linux client reads through the
directory which begins with "Autorun.inf", the entry "install.sh" (for
instance) has cookie 292. The second time round, it has cookie 252.

Our client will not work with such a server.

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08 19:53     ` Trond Myklebust
@ 2004-09-08 20:07       ` Bernd Nies
  2004-09-08 20:46         ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Nies @ 2004-09-08 20:07 UTC (permalink / raw)
  To: nfs



Trond Myklebust wrote:

> As far as I can see from your traces, 2 consecutive READDIRs on the same
> directory is giving totally different results...
> 
> In your TCP file, the first time the Linux client reads through the
> directory which begins with "Autorun.inf", the entry "install.sh" (for
> instance) has cookie 292. The second time round, it has cookie 252.
> 
> Our client will not work with such a server.

Are you sure it reads the same directory twice? A ls -lR shouldn't do 
this and I ran it only once per traffic capture. Between the UDP and NFS 
traffics I unmounted the directory. The share contains directories which 
contents are sometimes similar.

The server is working fine. Only on about one of thousand directories 
the client loops and if Client A loops over one directory, client B does 
not.

Regards,
Bernd





-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08 20:07       ` Bernd Nies
@ 2004-09-08 20:46         ` Trond Myklebust
  2004-09-08 21:51           ` Bernd Nies
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2004-09-08 20:46 UTC (permalink / raw)
  To: Bernd Nies; +Cc: nfs

P=E5 on , 08/09/2004 klokka 16:07, skreiv Bernd Nies:

> Are you sure it reads the same directory twice? A ls -lR shouldn't do=20

Yes. Check the tcpdumps.

> this and I ran it only once per traffic capture. Between the UDP and
NFS=20
> traffics I unmounted the directory. The share contains directories which=20
> contents are sometimes similar.

Huh? Of course the client may end up reading in the directory multiple
times.

Unless you have allocated a large buffer for it, readdir() will end up
calling multiple times down into the kernel, so the read of an entire
directory is NOT going to be atomic.
If the mtime on the directory has changed in the meantime, the cached
version may be flushed out, and the directory read in again. If the
server has screwed up the cookies, then that will confuse the client.

Cheers,
 Trond



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08 20:46         ` Trond Myklebust
@ 2004-09-08 21:51           ` Bernd Nies
  2004-09-08 22:30             ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Nies @ 2004-09-08 21:51 UTC (permalink / raw)
  To: nfs


> Unless you have allocated a large buffer for it, readdir() will end up
> calling multiple times down into the kernel, so the read of an entire
> directory is NOT going to be atomic.
> If the mtime on the directory has changed in the meantime, the cached
> version may be flushed out, and the directory read in again. If the
> server has screwed up the cookies, then that will confuse the client.

OK, right. I tracked down one incident captured here:

http://www.nies.ch/download/apple-nfs-loop-1dir.dump.gz

A READDIRPLUS reports a directory entry "." with cookie 276 (Frame 157) 
and the next READDIRPLUS reports a file "package-use.html" with same 
cookie 276 (Frame 161). This is WRONG by the server, right?

But why is it working then without problems on a Solaris NFS client?

Well, seems that I have to call Apple support and listen to one hour 
iTunes again ...

Thanks a lot.

Regards,
Bernd


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08 21:51           ` Bernd Nies
@ 2004-09-08 22:30             ` Trond Myklebust
  2004-09-09  8:08               ` bnies
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2004-09-08 22:30 UTC (permalink / raw)
  To: Bernd Nies; +Cc: nfs

P=E5 on , 08/09/2004 klokka 17:51, skreiv Bernd Nies:
> OK, right. I tracked down one incident captured here:
>=20
> http://www.nies.ch/download/apple-nfs-loop-1dir.dump.gz
>=20
> A READDIRPLUS reports a directory entry "." with cookie 276 (Frame 157)=20
> and the next READDIRPLUS reports a file "package-use.html" with same=20
> cookie 276 (Frame 161). This is WRONG by the server, right?
>=20
> But why is it working then without problems on a Solaris NFS client?

They probably do not rely exclusively on the cookie to track where they
are in the READDIR stream: for instance it is possible to cache the
filename too, and to use that as a backup solution if the server is
being cranky.
We could possibly add this sort of information to Linux too.

Note though, this scheme fails for the case where the user wants to use
telldir()/seekdir(), so if you have an older glibc that uses the
horrible heuristic algorithm, you will still see odd behaviour.
More obviously: it also breaks down if the filename you cached was one
of the entries that got deleted...

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-08 22:30             ` Trond Myklebust
@ 2004-09-09  8:08               ` bnies
  2004-09-09 15:00                 ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: bnies @ 2004-09-09  8:08 UTC (permalink / raw)
  To: nfs

Hi Trond,

>They probably do not rely exclusively on the cookie to track where they
>are in the READDIR stream: for instance it is possible to cache the
>filename too, and to use that as a backup solution if the server is
>being cranky.
>We could possibly add this sort of information to Linux too.

That's probably a good idea. At least the network file system connecting
each flavours of Unix should be consistent and reliable. Maybe this bug also
exists in other BSD derivates. Sun developed NFS. Is the Solaris client and
server code available?

I filed a bug in OpenDarwins Bugzilla:

http://www.opendarwin.org/bugzilla/show_bug.cgi?id=2201

Seems that Mac OS X NFS implementation is a kind of buggy, because I found
other problems with it:

http://discussions.info.apple.com/webx?14@246.XlfoaVdSw0s.2@.6899e34d
http://discussions.info.apple.com/webx?14@246.XlfoaVdSw0s.2@.6899e2eb
http://discussions.info.apple.com/webx?14@246.XlfoaVdSw0s.0@.6899e444

Regards,
Bernd



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ls hangs on NFS share from Apple Xserve
  2004-09-09  8:08               ` bnies
@ 2004-09-09 15:00                 ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2004-09-09 15:00 UTC (permalink / raw)
  To: bnies; +Cc: nfs

P=E5 to , 09/09/2004 klokka 04:08, skreiv bnies@bluewin.ch:

> Sun developed NFS. Is the Solaris client and server code available?

You'll have to wait until they GPL Solaris 8-)

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-09-09 15:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-08  8:42 ls hangs on NFS share from Apple Xserve bnies
2004-09-08 16:52 ` Trond Myklebust
2004-09-08 18:36   ` Bernd Nies
2004-09-08 19:53     ` Trond Myklebust
2004-09-08 20:07       ` Bernd Nies
2004-09-08 20:46         ` Trond Myklebust
2004-09-08 21:51           ` Bernd Nies
2004-09-08 22:30             ` Trond Myklebust
2004-09-09  8:08               ` bnies
2004-09-09 15:00                 ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.