All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.8.1 kernel NFS client connectathon failure
@ 2004-09-15 20:54 Andrew Schretter
  2004-09-15 22:23 ` vfs bug. was: Re: [NFS] " Frank van Maarseveen
  2004-09-16  3:11 ` Trond Myklebust
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Schretter @ 2004-09-15 20:54 UTC (permalink / raw)
  To: NFS

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

The attached code is a blatant rip from the latest connectathon code that
was stripped down to show exactly where it fails on my 2.6.8.1 NFS client.
Note that this code works on the local disk and on a 2.4.22 client (both to
a 2.6.8.1 server and a 2.4.22 server) but fail from a 2.6.8.1 client to
both 2.4.22 and 2.6.8.1 servers.

The problem has something to do with hard links and using rename(2) to
overwrite hard linked files.

Compile stright with gcc and run on a local disk partition and then on a
remote partition.  It creates a directory TEMP, creates a file FOO, links
it to TEMP/BAR, renames TEMP/BAR to FOO, then removes FOO and does an 
ls on FOO and TEMP/BAR. 

On local disk runs, TEMP/BAR still exists (I don't know why though).  On NFS
runs, it does not exists.  This breaks connectathon's test.

Any ideas?

Andrew Schretter
Systems Programmer, Duke University
Dept. of Mathematics (919) 660-2866



[-- Attachment #2: lnk.c --]
[-- Type: text/plain, Size: 1052 bytes --]

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/stat.h>

#ifndef MAXPATHLEN
#define MAXPATHLEN	128
#endif

/* maximum number of chars for the message string */
#define STRCHARS	100

main(int ac, char *av[]) {
	int count, fd, slen, lerr, slerr;
	char *fn;
	struct stat sb;
        char str[255];

        if (mkdir("TEMP", 0755)) {
        	perror("mkdir TEMP");
        	exit(0);
        }
	fd = open("FOO", O_RDWR|O_CREAT, 0666);
	if (fd < 0) {
		perror("creat");
		exit(0);
	}
        sprintf(str, "...");
        slen = strlen(str);
	if (write(fd, str, slen) != slen) {
		perror("write");
		(void) close(fd);
		exit(0);
	}
	if (close(fd)) {
		perror("close");
		exit(0);
	}
	if (lerr = link("FOO", "TEMP/BAR")) {
		if (errno != EOPNOTSUPP) {
			perror("link");
			exit(0);
		}
	} else if (rename("TEMP/BAR", "FOO")) {
		perror("rerename");
		exit(0);
	}
	if (unlink("FOO")) {
		perror("unlink 1");
		exit(0);
	}
        system("ls -l FOO TEMP/BAR");
	exit(errno);
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* vfs bug. was: Re: [NFS] 2.6.8.1 kernel NFS client connectathon failure
  2004-09-15 20:54 2.6.8.1 kernel NFS client connectathon failure Andrew Schretter
@ 2004-09-15 22:23 ` Frank van Maarseveen
  2004-09-15 23:02   ` viro
  2004-09-16  3:11 ` Trond Myklebust
  1 sibling, 1 reply; 9+ messages in thread
From: Frank van Maarseveen @ 2004-09-15 22:23 UTC (permalink / raw)
  To: Andrew Schretter; +Cc: NFS, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]

On Wed, Sep 15, 2004 at 04:54:29PM -0400, Andrew Schretter wrote:
> 
> On local disk runs, TEMP/BAR still exists (I don't know why though).  On NFS
> runs, it does not exists.  This breaks connectathon's test.

Client: 2.6.8.1, server: 2.4.27 (serving ext3)
/proc/mounts: rw,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,udp,lock

Behavior I see is identical to that on a local (ext3) fs on 2.6.8.1 and
also on local ext3 on 2.4.27, namely that TEMP/BAR still exists. This
looks like a long standing kernel bug since the rename(2) seems to
succeed:

(snippet from strace):
write(3, "...", 3)                      = 3
close(3)                                = 0
link("FOO", "TEMP/BAR")                 = 0
rename("TEMP/BAR", "FOO")               = 0
unlink("FOO")                           = 0

output:
ls: FOO: No such file or directory
-rw-r--r--    1 fvm      sec             3 Sep 16 00:21 TEMP/BAR


(CC'ed to lkml)

-- 
Frank

[-- Attachment #2: lnk.c --]
[-- Type: text/plain, Size: 1052 bytes --]

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/stat.h>

#ifndef MAXPATHLEN
#define MAXPATHLEN	128
#endif

/* maximum number of chars for the message string */
#define STRCHARS	100

main(int ac, char *av[]) {
	int count, fd, slen, lerr, slerr;
	char *fn;
	struct stat sb;
        char str[255];

        if (mkdir("TEMP", 0755)) {
        	perror("mkdir TEMP");
        	exit(0);
        }
	fd = open("FOO", O_RDWR|O_CREAT, 0666);
	if (fd < 0) {
		perror("creat");
		exit(0);
	}
        sprintf(str, "...");
        slen = strlen(str);
	if (write(fd, str, slen) != slen) {
		perror("write");
		(void) close(fd);
		exit(0);
	}
	if (close(fd)) {
		perror("close");
		exit(0);
	}
	if (lerr = link("FOO", "TEMP/BAR")) {
		if (errno != EOPNOTSUPP) {
			perror("link");
			exit(0);
		}
	} else if (rename("TEMP/BAR", "FOO")) {
		perror("rerename");
		exit(0);
	}
	if (unlink("FOO")) {
		perror("unlink 1");
		exit(0);
	}
        system("ls -l FOO TEMP/BAR");
	exit(errno);
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: vfs bug. was: Re: [NFS] 2.6.8.1 kernel NFS client connectathon failure
  2004-09-15 22:23 ` vfs bug. was: Re: [NFS] " Frank van Maarseveen
@ 2004-09-15 23:02   ` viro
  0 siblings, 0 replies; 9+ messages in thread
From: viro @ 2004-09-15 23:02 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Andrew Schretter, NFS, linux-kernel

On Thu, Sep 16, 2004 at 12:23:58AM +0200, Frank van Maarseveen wrote:
> Behavior I see is identical to that on a local (ext3) fs on 2.6.8.1 and
> also on local ext3 on 2.4.27, namely that TEMP/BAR still exists. This
> looks like a long standing kernel bug since the rename(2) seems to
> succeed:
> 
> (snippet from strace):
> write(3, "...", 3)                      = 3
> close(3)                                = 0
> link("FOO", "TEMP/BAR")                 = 0
> rename("TEMP/BAR", "FOO")               = 0
> unlink("FOO")                           = 0

Behaviour is REQUIRED by POSIX and SuS.

<quote>
     If the old argument and the new argument both refer to, and both
     link to the same existing file, rename() returns successfully and
     performs no other action.
</quote>

So what you get is
	* after link(2) success - FOO and TEMP/BAR being links to the same
file.
	* after rename(2) (required) success - same as before
	* after unlink(2) - FOO is removed, TEMP/BAR remains.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.8.1 kernel NFS client connectathon failure
  2004-09-15 20:54 2.6.8.1 kernel NFS client connectathon failure Andrew Schretter
  2004-09-15 22:23 ` vfs bug. was: Re: [NFS] " Frank van Maarseveen
@ 2004-09-16  3:11 ` Trond Myklebust
  2004-09-17  1:41   ` Neil Brown
  1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2004-09-16  3:11 UTC (permalink / raw)
  To: Andrew Schretter; +Cc: NFS

P=E5 on , 15/09/2004 klokka 16:54, skreiv Andrew Schretter:
> The attached code is a blatant rip from the latest connectathon code that
> was stripped down to show exactly where it fails on my 2.6.8.1 NFS client=
.
> Note that this code works on the local disk and on a 2.4.22 client (both =
to
> a 2.6.8.1 server and a 2.4.22 server) but fail from a 2.6.8.1 client to
> both 2.4.22 and 2.6.8.1 servers.
>=20


This question keeps recurring on this list. The answer is the same now
as it was before: Fix the bloody servers!!!!

   subtree_check is INCOMPATIBLE with correct rename behaviour, and it
is simply ridiculous that this is still a default option.

Grrr,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.8.1 kernel NFS client connectathon failure
  2004-09-16  3:11 ` Trond Myklebust
@ 2004-09-17  1:41   ` Neil Brown
  2004-09-17  1:54     ` Neil Brown
  2004-09-17  3:14     ` Trond Myklebust
  0 siblings, 2 replies; 9+ messages in thread
From: Neil Brown @ 2004-09-17  1:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andrew Schretter, NFS

On Wednesday September 15, trond.myklebust@fys.uio.no wrote:
> P=E5 on , 15/09/2004 klokka 16:54, skreiv Andrew Schretter:
> > The attached code is a blatant rip from the latest connectathon cod=
e that
> > was stripped down to show exactly where it fails on my 2.6.8.1 NFS =
client.
> > Note that this code works on the local disk and on a 2.4.22 client =
(both to
> > a 2.6.8.1 server and a 2.4.22 server) but fail from a 2.6.8.1 clien=
t to
> > both 2.4.22 and 2.6.8.1 servers.
> >=20
>=20
>=20
> This question keeps recurring on this list. The answer is the same no=
w
> as it was before: Fix the bloody servers!!!!
>=20
>    subtree_check is INCOMPATIBLE with correct rename behaviour, and i=
t
> is simply ridiculous that this is still a default option.

I find it hard to see how it is the server that is at fault.

The effect of "subtree_check" in this case is that the same file (FOO
and TEMP/BAR) has a different file-handle for each name. =20

When the client tries to perform the "rename" request it checks the
two names to see if they exist and what they are.
In the no_subtree_check case it finds that they both have the same
filehandle and so must be the same file, and as the rename request is
defined to be a no-op in that case the client optimises it away.  (This=
 is
based on reading the actual NFS traffic, not the NFS client code).

In the subtree_check case, the two have different filehandles, though
both are clearly files.
In this case, one would think that the client would send the RENAME
request, and the server would then notice that the two files are
actually the same and would do nothing.  This is what appears to
happen from a 2.4 client.

However with 2.6.9-rc1 on the client, something else happens.
Before the rename of BAR to FOO there is a rename
of FOO to .nfs000111a300000002.
It appears to be doing a "silly rename" even though the file is not
open.
Immediately after the rename of BAR to FOO - which ofcourse works as
FOO doesn't exist now, the .nfs000111a300000002 file is removed.

To me, this looks like the client doing the wrong thing - certainly an
unnecessary thing, and in this case an unhelpful thing.

If the file *were* open during the rename, then I can understand the
need for a silly-rename.  However in that case it might be safer to do
a silly-link instead (though I haven't tried to think through all the
implications of this).


Having said all that, I agree that subtree_check should be avoided
where possible.  Maybe we should do what we did with "async" and print
noisy warning messages....

NeilBrown


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.8.1 kernel NFS client connectathon failure
  2004-09-17  1:41   ` Neil Brown
@ 2004-09-17  1:54     ` Neil Brown
  2004-09-17  3:14     ` Trond Myklebust
  1 sibling, 0 replies; 9+ messages in thread
From: Neil Brown @ 2004-09-17  1:54 UTC (permalink / raw)
  To: Trond Myklebust, Andrew Schretter, NFS

On Friday September 17, neilb@cse.unsw.edu.au wrote:
> However with 2.6.9-rc1 on the client, something else happens.
> Before the rename of BAR to FOO there is a rename
> of FOO to .nfs000111a300000002.
> It appears to be doing a "silly rename" even though the file is not
> open.
> Immediately after the rename of BAR to FOO - which ofcourse works as
> FOO doesn't exist now, the .nfs000111a300000002 file is removed.
> 
> To me, this looks like the client doing the wrong thing - certainly an
> unnecessary thing, and in this case an unhelpful thing.

I just realised why it is definitely *Wrong*.
"rename" is meant to guarantee that the target name always exists.
Thus implementing
   rename BAR FOO
as
   rename FOO .nfsxxx
   rename BAR FOO
is clearly wrong as there is a moment when FOO doesn't exist.
   link FOO .nfsxxx
   rename BAR FOO
would be OK.

NeilBrown


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.8.1 kernel NFS client connectathon failure
  2004-09-17  1:41   ` Neil Brown
  2004-09-17  1:54     ` Neil Brown
@ 2004-09-17  3:14     ` Trond Myklebust
       [not found]       ` <20040917035515.GR23987@parcelfarce.linux.theplanet.co.uk>
  1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2004-09-17  3:14 UTC (permalink / raw)
  To: Neil Brown, Alexander Viro; +Cc: Andrew Schretter, NFS

P=E5 to , 16/09/2004 klokka 21:41, skreiv Neil Brown:

> I find it hard to see how it is the server that is at fault.
>=20
> The effect of "subtree_check" in this case is that the same file (FOO
> and TEMP/BAR) has a different file-handle for each name. =20
>=20
> When the client tries to perform the "rename" request it checks the
> two names to see if they exist and what they are.
> In the no_subtree_check case it finds that they both have the same
> filehandle and so must be the same file, and as the rename request is
> defined to be a no-op in that case the client optimises it away.  (This i=
s
> based on reading the actual NFS traffic, not the NFS client code).
>=20
> In the subtree_check case, the two have different filehandles, though
> both are clearly files.
> In this case, one would think that the client would send the RENAME
> request, and the server would then notice that the two files are
> actually the same and would do nothing.  This is what appears to
> happen from a 2.4 client.


> However with 2.6.9-rc1 on the client, something else happens.
> Before the rename of BAR to FOO there is a rename
> of FOO to .nfs000111a300000002.
> It appears to be doing a "silly rename" even though the file is not
> open.
> Immediately after the rename of BAR to FOO - which ofcourse works as
> FOO doesn't exist now, the .nfs000111a300000002 file is removed.
>=20
> To me, this looks like the client doing the wrong thing - certainly an
> unnecessary thing, and in this case an unhelpful thing.

Yes. There does appear to be a bug in the vfs_rename_other() code. Al,
can you explain WTF we're doing an extra dget()/dput() on the target
dentry there? It appears to be totally arbitrary, and induces a totally
unnecessary sillyrename on all renames where an target file that is not
in use exists.

That said, the sillyrename issue is precisely WHY the client needs to be
able to do the check, and not the server. If the file *IS* open, we do
not want to be sillyrenaming it in a situation which should otherwise be
a no-op.

> If the file *were* open during the rename, then I can understand the
> need for a silly-rename.  However in that case it might be safer to do
> a silly-link instead (though I haven't tried to think through all the
> implications of this).
>=20
>=20
> Having said all that, I agree that subtree_check should be avoided
> where possible.  Maybe we should do what we did with "async" and print
> noisy warning messages....

At the very least, we should change the default.

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.8.1 kernel NFS client connectathon failure
       [not found]       ` <20040917035515.GR23987@parcelfarce.linux.theplanet.co.uk>
@ 2004-09-17 16:34         ` Trond Myklebust
       [not found]           ` <20040918001817.GX23987@parcelfarce.linux.theplanet.co.uk>
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2004-09-17 16:34 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Neil Brown, Andrew Schretter, NFS

P=E5 to , 16/09/2004 klokka 23:55, skreiv
viro@parcelfarce.linux.theplanet.co.uk:
> On Thu, Sep 16, 2004 at 11:14:34PM -0400, Trond Myklebust wrote:
> > Yes. There does appear to be a bug in the vfs_rename_other() code. Al,
> > can you explain WTF we're doing an extra dget()/dput() on the target
> > dentry there? It appears to be totally arbitrary, and induces a totally
> > unnecessary sillyrename on all renames where an target file that is not
> > in use exists.
>=20
> We need to make sure that victim inode won't disappear on us.  Note that
> we are holding ->i_sem on it.

So dget() will block d_delete(), but is that really necessary? Is there
any reason why we couldn't just limit ourselves to doing an
igrab()/iput() instead?

Also, what is the target->i_sem protecting us against there?

> FWIW, I would love to take sillyrename logics to fs/namei.c.  First of
> all, that would make life _much_ simpler wrt keeping the locking logics
> in one place.  And NFS is not the only filesystem that doesn't handle
> opened-but-unliked stuff - NCPFS et.al. simply break on it and IIRC
> HFS+ tries to do a homegrown version and fails.
>=20
> Would something like
> 	sillyrename_target =3D dir->i_op->sillyrename(dir, victim);
> creating a suitable negative dentry encapsulate all NFS-specific
> logics in there?  ISTR that it would be enough, but I haven't looked
> at the details since 2.3.late...

Yes, that would clearly suffice as far as NFS is concerned. I've no idea
about what semantics the other filesystems want.

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.8.1 kernel NFS client connectathon failure
       [not found]           ` <20040918001817.GX23987@parcelfarce.linux.theplanet.co.uk>
@ 2004-09-18  1:23             ` Trond Myklebust
  0 siblings, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2004-09-18  1:23 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Neil Brown, Andrew Schretter, NFS

P=E5 fr , 17/09/2004 klokka 20:18, skreiv
viro@parcelfarce.linux.theplanet.co.uk:
> >=20
> > Also, what is the target->i_sem protecting us against there?
>=20
> target->i_sem protects us against a lot of fun stuff - e.g. attempts to
> create a link to victim in the middle of removal, etc.

Ah, OK... We've been unhashing the dentry since day 1 in order to
achieve the same thing. That has the added advantage that it prevents
someone from coming along and actually opening the target: that's an
extra requirement in those cases where you have sillyrename semantics.


> FWIW, could we change the check in nfs_rename() and don't do that d_delet=
e()
> there?  The only potential trouble I can see is if target is itself
> sillyrenamed and we probably want to prevent that anyway...  Comments?

Do you have to unconditionally unhash new_dentry then?
=20
> Care to give a braindump on your async unlink logics?  The problem with
> doing that for local filesystems is that we need to protect the parent
> directory here...

It's fairly simple:

If the dentry->d_flags is marked with the sillyrename flag, then our
->d_delete() is rigged to cause dput() to kill the dentry.

The ->d_iput() method (i.e. dentry_iput()) is then responsible for
actually triggering the RPC call.

We do not actually try to lock down the directory while the unlink is on
the wire. That means we implicitly assume that nobody in their right
mind will look up or try to open a sillydeleted file and expect strict
POSIX behaviour.
The only thing that cares about the lookup behaviour of sillyrenamed
files is nfs_silly_rename() itself, and it is really only concerned
about avoiding naming collisions (and note that sillyrename is always
done while holding the dir->i_sem).

Cheers,
  Trond



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-09-18  1:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-15 20:54 2.6.8.1 kernel NFS client connectathon failure Andrew Schretter
2004-09-15 22:23 ` vfs bug. was: Re: [NFS] " Frank van Maarseveen
2004-09-15 23:02   ` viro
2004-09-16  3:11 ` Trond Myklebust
2004-09-17  1:41   ` Neil Brown
2004-09-17  1:54     ` Neil Brown
2004-09-17  3:14     ` Trond Myklebust
     [not found]       ` <20040917035515.GR23987@parcelfarce.linux.theplanet.co.uk>
2004-09-17 16:34         ` Trond Myklebust
     [not found]           ` <20040918001817.GX23987@parcelfarce.linux.theplanet.co.uk>
2004-09-18  1:23             ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.