* [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
@ 2002-07-09 13:49 Trond Myklebust
0 siblings, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2002-07-09 13:49 UTC (permalink / raw)
To: nfs, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 361 bytes --]
Hi,
There was a bug reported on the 'exim' user list a couple of months ago:
the Linux NFS client reports -EINVAL if you try to fsync() a directory.
The correct response would be to return a dummy '0' for success, since all
NFS operations that change the directory are supposed to be performed
synchronously on the server anyway...
Cheers,
Trond
[-- Attachment #2: linux-2.4.19-fsync_dir.dif --]
[-- Type: text/plain, Size: 1071 bytes --]
diff -u --recursive --new-file linux-2.4.19-rc1/fs/nfs/dir.c linux-2.4.19-fsync_dir/fs/nfs/dir.c
--- linux-2.4.19-rc1/fs/nfs/dir.c Tue Mar 12 16:35:02 2002
+++ linux-2.4.19-fsync_dir/fs/nfs/dir.c Tue Jul 9 15:41:29 2002
@@ -45,12 +45,14 @@
static int nfs_mknod(struct inode *, struct dentry *, int, int);
static int nfs_rename(struct inode *, struct dentry *,
struct inode *, struct dentry *);
+static int nfs_fsync_dir(struct file *, struct dentry *, int);
struct file_operations nfs_dir_operations = {
read: generic_read_dir,
readdir: nfs_readdir,
open: nfs_open,
release: nfs_release,
+ fsync: nfs_fsync_dir
};
struct inode_operations nfs_dir_inode_operations = {
@@ -401,6 +403,15 @@
return 0;
}
+/*
+ * All directory operations under NFS are synchronous, so fsync()
+ * is a dummy operation.
+ */
+int nfs_fsync_dir(struct file *filp, struct dentry *dentry, int datasync)
+{
+ return 0;
+}
+
/*
* A check for whether or not the parent directory has changed.
* In the case it has, we assume that the dentries are untrustworthy
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] <200207091549.15913.trond.myklebust@fys.uio.no>
@ 2002-07-09 14:06 ` Richard B. Johnson
[not found] ` <Pine.LNX.3.95.1020709095544.27285A-100000@chaos.analogic.com>
2002-07-11 10:52 ` Matthias Andree
2 siblings, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2002-07-09 14:06 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs, linux-kernel
On Tue, 9 Jul 2002, Trond Myklebust wrote:
> Hi,
>
> There was a bug reported on the 'exim' user list a couple of months ago:
> the Linux NFS client reports -EINVAL if you try to fsync() a directory.
>
> The correct response would be to return a dummy '0' for success, since all
> NFS operations that change the directory are supposed to be performed
> synchronously on the server anyway...
>
> Cheers,
> Trond
>
>
Isn't it supposed to return EINVAL if "fd is bound to a file which
doesn't support synchronization..." That's what POSIX 4 says.
Errors:
EBADF fildes is not a valid file descriptor.
EINVAL The file descriptor is valid, but the system doesn't support
fsync on this particular file.
I think code that opens a directory as a file is broken. We have
opendir() for that and it returns a DIR pointer, not a file descriptor.
If the directory was properly opened, one would never attempt to
fsync() it.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] ` <Pine.LNX.3.95.1020709095544.27285A-100000@chaos.analogic.com>
@ 2002-07-09 14:08 ` Trond Myklebust
0 siblings, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2002-07-09 14:08 UTC (permalink / raw)
To: root; +Cc: nfs, linux-kernel
>>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:
> I think code that opens a directory as a file is broken. We
> have opendir() for that and it returns a DIR pointer, not a
> file descriptor. If the directory was properly opened, one
> would never attempt to fsync() it.
fsync() is supported on directories on local filesystems as a way of
ensuring that changes (due to file creation etc) are committed to
disk. Where is the POSIX violation in that?
There is no reason why NFS, which ensures this anyway, should
not adhere to this convention.
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] <15658.61035.450205.832652@charged.uio.no>
@ 2002-07-09 15:06 ` Richard B. Johnson
0 siblings, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2002-07-09 15:06 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs, linux-kernel
On Tue, 9 Jul 2002, Trond Myklebust wrote:
> >>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes:
>
> > I think code that opens a directory as a file is broken. We
> > have opendir() for that and it returns a DIR pointer, not a
> > file descriptor. If the directory was properly opened, one
> > would never attempt to fsync() it.
>
> fsync() is supported on directories on local filesystems as a way of
> ensuring that changes (due to file creation etc) are committed to
> disk. Where is the POSIX violation in that?
>
> There is no reason why NFS, which ensures this anyway, should
> not adhere to this convention.
>
> Cheers,
> Trond
> -
Well, no. It's not supported. You can't get a valid file-descriptor...
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
int fd;
fd = open("/", O_RDWR, 0);
fsync(fd);
}
execve("./xxx", ["xxx"], [/* 32 vars */]) = 0
brk(0) = 0x804966c
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libc.so.6", O_RDONLY) = 3
old_mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, 3, 0) = 0x4000c000
munmap(0x4000c000, 4096) = 0
old_mmap(NULL, 644232, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4000c000
mprotect(0x40097000, 74888, PROT_NONE) = 0
old_mmap(0x40097000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x8b000) = 0x40097000
old_mmap(0x4009d000, 50312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4009d000
close(3) = 0
mprotect(0x4000c000, 569344, PROT_READ|PROT_WRITE) = 0
mprotect(0x4000c000, 569344, PROT_READ|PROT_EXEC) = 0
personality(PER_LINUX) = 0
getpid() = 27544
open("/", O_RDWR) = -1 EISDIR (Is a directory)
There are ways to 'cheat' and obtain a file-descriptor that references
a directory, but cheating is against POSIX rules, also.
You can open it read-only. But, Read-Only means that you can't
update it, so fsync means nothing, will return 0 because it is
already "whatever it was" since you can't modify it...
getpid() = 27568
open("/", O_RDONLY) = 3
fsync(3) = 0
_exit(0) = ?
My reading is that you need to fsync() every file within a directory
to fsync() a directory. Playing tricks with a directory inode doesn't
do it.
Regardless, POSIX.4 declines to state exactly what "successfully
transferred" means when it states that fsync() doesn't return until
all data has been successfully transferred to the disk or underlying
hardware. This is a real problem for a network file-system where
data that will eventually get to a file-server in the Congo may be
en-route for several minutes.
If an application insists, it is up to the application to determine,
probably once upon startup, just what kind of file synchronization
is supported.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] <Pine.LNX.3.95.1020709104427.27442B-100000@chaos.analogic.com>
@ 2002-07-09 16:56 ` Alan Cox
2002-07-09 17:22 ` Richard B. Johnson
0 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2002-07-09 16:56 UTC (permalink / raw)
To: root; +Cc: Trond Myklebust, nfs, linux-kernel
> > not adhere to this convention.
>
> Well, no. It's not supported. You can't get a valid file-descriptor...
Wrong (as usual)
> If an application insists, it is up to the application to determine,
> probably once upon startup, just what kind of file synchronization
> is supported.
Linux defines fsync for directories
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
2002-07-09 16:56 ` Alan Cox
@ 2002-07-09 17:22 ` Richard B. Johnson
2002-07-09 19:11 ` Alan Cox
0 siblings, 1 reply; 19+ messages in thread
From: Richard B. Johnson @ 2002-07-09 17:22 UTC (permalink / raw)
To: Alan Cox; +Cc: Trond Myklebust, nfs, linux-kernel
On Tue, 9 Jul 2002, Alan Cox wrote:
> > > not adhere to this convention.
> >
> > Well, no. It's not supported. You can't get a valid file-descriptor...
>
> Wrong (as usual)
Really? Then what is the meaning of fsync() on a read-only file-
descriptor? You can't update the information you can't change.
This is (as usual) just an example of your helpful responses.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
2002-07-09 17:22 ` Richard B. Johnson
@ 2002-07-09 19:11 ` Alan Cox
2002-07-09 19:13 ` Richard B. Johnson
0 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2002-07-09 19:11 UTC (permalink / raw)
To: root; +Cc: Alan Cox, Trond Myklebust, nfs, linux-kernel
> Really? Then what is the meaning of fsync() on a read-only file-
> descriptor? You can't update the information you can't change.
fsync ensures the data for that inode/file content is on stable storage - note
_the_ _data_ not only random things written by this specific file handle.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
2002-07-09 19:11 ` Alan Cox
@ 2002-07-09 19:13 ` Richard B. Johnson
0 siblings, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2002-07-09 19:13 UTC (permalink / raw)
To: Alan Cox; +Cc: Trond Myklebust, nfs, linux-kernel
On Tue, 9 Jul 2002, Alan Cox wrote:
> > Really? Then what is the meaning of fsync() on a read-only file-
> > descriptor? You can't update the information you can't change.
>
> fsync ensures the data for that inode/file content is on stable storage - note
> _the_ _data_ not only random things written by this specific file handle.
>
That is what it's supposed to do with files. The attached code clearly
shows that it doesn't work with directories. The fsync() instantly
returns, even though there is buffered data still to be written.
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#define NR_WRITES 0x1000
int main()
{
char foo[0x10000];
int dirfd, outfd;
int flags, i;
outfd = open("/foo", O_WRONLY|O_TRUNC|O_CREAT, 0644);
dirfd = open("/", O_RDONLY, 0);
flags = fcntl(dirfd, F_GETFL);
flags &= ~O_RDONLY;
flags |= O_RDWR;
fcntl(dirfd, F_SETFL, flags);
fprintf(stderr, "Write %d bytes\n", sizeof(foo) * NR_WRITES);
for(i=0; i< NR_WRITES; i++)
write(outfd, foo, sizeof(foo));
fprintf(stderr, "Write complete\n");
fprintf(stderr, "Sync the directory\n");
fsync(dirfd);
fprintf(stderr, "Done, returns immediately!\n");
close(outfd);
fprintf(stderr, "Now execute sync and see if your disk is active!\n");
// unlink("/foo");
}
Again, to assure that file-data is written to storage, one must
execute fsync on files, not directories. The dummy return of 0,
that Linux provides is a database bug waiting to happen.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
2002-07-09 19:59 ` Alan Cox
@ 2002-07-09 19:50 ` Richard B. Johnson
[not found] ` <Pine.LNX.3.95.1020709154108.14801B-100000@chaos.analogic.com>
1 sibling, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2002-07-09 19:50 UTC (permalink / raw)
To: Alan Cox; +Cc: Trond Myklebust, nfs, linux-kernel
On Tue, 9 Jul 2002, Alan Cox wrote:
> > That is what it's supposed to do with files. The attached code clearly
> > shows that it doesn't work with directories. The fsync() instantly
> > returns, even though there is buffered data still to be written.
>
> Your understanding or code is wrong. Its hard to tell which.
>
> fsync on the directory syncs the directory metadata not the file metadata
>
Well the original complaint was that Linux NFS didn't allow a directory to
be fsync()ed. I showed that POSIX.4 doesn't provide for fsync()ing
directories, only files, that you have to fsync() individual files, not
the directories that contain them. Others said that fsync()ing individual
files was not necessary, that you only have to fsync() the directory. I
explained that you have to cheat to even get a fd that can be used
to fsync() a directory. Then I showed that fsync()ing a directory in this
manner doesn't work so, we are actually in violent agreement.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] <Pine.LNX.3.95.1020709150615.14559A-100000@chaos.analogic.com>
@ 2002-07-09 19:59 ` Alan Cox
2002-07-09 19:50 ` Richard B. Johnson
[not found] ` <Pine.LNX.3.95.1020709154108.14801B-100000@chaos.analogic.com>
0 siblings, 2 replies; 19+ messages in thread
From: Alan Cox @ 2002-07-09 19:59 UTC (permalink / raw)
To: root; +Cc: Alan Cox, Trond Myklebust, nfs, linux-kernel
> That is what it's supposed to do with files. The attached code clearly
> shows that it doesn't work with directories. The fsync() instantly
> returns, even though there is buffered data still to be written.
Your understanding or code is wrong. Its hard to tell which.
fsync on the directory syncs the directory metadata not the file metadata
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Stuff, things, and much much more.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] <200207091549.15913.trond.myklebust@fys.uio.no>
2002-07-09 14:06 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Richard B. Johnson
[not found] ` <Pine.LNX.3.95.1020709095544.27285A-100000@chaos.analogic.com>
@ 2002-07-11 10:52 ` Matthias Andree
2002-07-11 11:26 ` Trond Myklebust
2 siblings, 1 reply; 19+ messages in thread
From: Matthias Andree @ 2002-07-11 10:52 UTC (permalink / raw)
To: nfs
Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> Hi,
>
> There was a bug reported on the 'exim' user list a couple of months ago:
> the Linux NFS client reports -EINVAL if you try to fsync() a directory.
>
> The correct response would be to return a dummy '0' for success, since all
> NFS operations that change the directory are supposed to be performed
> synchronously on the server anyway...
> +/*
> + * All directory operations under NFS are synchronous, so fsync()
> + * is a dummy operation.
> + */
> +int nfs_fsync_dir(struct file *filp, struct dentry *dentry, int datasync)
> +{
> + return 0;
> +}
> +
What if the NFS stuff is not mounted with sync or explicitly mounted
async? Will an nfsd still do synchronous writes to directories?
--
Matthias Andree
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
PC Mods, Computing goodies, cases & more
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
2002-07-11 10:52 ` Matthias Andree
@ 2002-07-11 11:26 ` Trond Myklebust
0 siblings, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2002-07-11 11:26 UTC (permalink / raw)
To: Matthias Andree; +Cc: nfs
>>>>> " " == Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:
> What if the NFS stuff is not mounted with sync or explicitly
> mounted async? Will an nfsd still do synchronous writes to
> directories?
Of course. It is not possible for the NFS client to switch this
behaviour off, since it is part of the server side specifications in
the protocol.
The only thing that can screw you up is if you use the 'async' option
in /etc/exports on the server.
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
PC Mods, Computing goodies, cases & more
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: redhat 2.4.18-5 client huge slowdown
[not found] <E17SjDh-00067R-00@usw-sf-list2.sourceforge.net>
@ 2002-07-11 19:08 ` Rex Dieter
2002-07-11 19:14 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Rex Dieter
2002-07-11 19:29 ` redhat 2.4.18-5 client huge slowdown Rex Dieter
2 siblings, 0 replies; 19+ messages in thread
From: Rex Dieter @ 2002-07-11 19:08 UTC (permalink / raw)
To: nfs
On Thursday 11 July 2002 2:02 pm, Trond Myklebust wrote:
> >>>>> " " =3D=3D Rex Dieter <rdieter@math.unl.edu> writes:
> > Yes, I've also experienced a huge nfs client slowdown (~50k/sec
> > writes) on 'sync' nfs mounts with the 2.4.18-5 kernel... See me
> > bugzilla report:
> > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=3D67199 and =
a
> > similar report as well:
> > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=3D67461
>
> > The only way I found to get any sort of respectable performance
> > out of 2.4.18-5 was to mount 'async'. *ack*
>
> For 2.4.18-5 RedHat appears to have decided to set the default r/wsize
> to 4k.=20
For me, the r/wsize made no significant difference as long as the 'sync'=20
option was used. Write speeds consistantly stayed ~50k/sec.
-- Rex
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
PC Mods, Computing goodies, cases & more
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] <E17SjDh-00067R-00@usw-sf-list2.sourceforge.net>
2002-07-11 19:08 ` redhat 2.4.18-5 client huge slowdown Rex Dieter
@ 2002-07-11 19:14 ` Rex Dieter
2002-07-11 20:05 ` Tom McNeal
2002-07-11 19:29 ` redhat 2.4.18-5 client huge slowdown Rex Dieter
2 siblings, 1 reply; 19+ messages in thread
From: Rex Dieter @ 2002-07-11 19:14 UTC (permalink / raw)
To: nfs
Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> >>>>> " " =3D=3D Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writ=
es:
> > What if the NFS stuff is not mounted with sync or explicitly
> > mounted async? Will an nfsd still do synchronous writes to
> > directories?
> Of course. It is not possible for the NFS client to switch this
> behaviour off, since it is part of the server side specifications in
> the protocol.
> The only thing that can screw you up is if you use the 'async' option
> in /etc/exports on the server.
How is specifying 'async' in /etc/exports on the NFS server different tha=
n a=20
NFS client explicitly mounting async?
-- Rex
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
PC Mods, Computing goodies, cases & more
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: redhat 2.4.18-5 client huge slowdown
[not found] <E17SjDh-00067R-00@usw-sf-list2.sourceforge.net>
2002-07-11 19:08 ` redhat 2.4.18-5 client huge slowdown Rex Dieter
2002-07-11 19:14 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Rex Dieter
@ 2002-07-11 19:29 ` Rex Dieter
2002-07-12 12:07 ` Trond Myklebust
2 siblings, 1 reply; 19+ messages in thread
From: Rex Dieter @ 2002-07-11 19:29 UTC (permalink / raw)
To: nfs
Rex Dieter <rdieter@math.unl.edu> wrote:
> Raphael Clifford wrote:
> > I was wondering if everyone else is experiencing huge nfs write
> > slowdowns using redhat kernel 2.4.18-5 (as opposed to 2.4.18-4). I a=
m
>
> Yes, I've also experienced a huge nfs client slowdown (~50k/sec writes)=
on
> 'sync' nfs mounts with the 2.4.18-5 kernel... See me bugzilla report:
> http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=3D67199
=2E..
Here's some interesting data (on switched fast ethernet) to chew on (usin=
g=20
Redhat's 2.4.18-5 kernel on both server and client):
Server(1) Client(2) Speed=20
async async ~8.5MB/sec=20
async sync ~6.5MB/sec=20
sync async ~350K/sec=20
sync sync ~50k/sec=20
=20
(1) using the 'async/sync' option in /etc/exports
(2) using the 'async/sync' option when mounting, ie, mount -t nfs -o sync=
...
Is this really the kind of performance I should expect to get? (I hope no=
t)
-- Rex
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
PC Mods, Computing goodies, cases & more
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
2002-07-11 19:14 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Rex Dieter
@ 2002-07-11 20:05 ` Tom McNeal
0 siblings, 0 replies; 19+ messages in thread
From: Tom McNeal @ 2002-07-11 20:05 UTC (permalink / raw)
To: Rex Dieter, NFS maillist
Rex Dieter wrote:
>
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>
> > >>>>> " " == Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:
> > > What if the NFS stuff is not mounted with sync or explicitly
> > > mounted async? Will an nfsd still do synchronous writes to
> > > directories?
>
> > Of course. It is not possible for the NFS client to switch this
> > behaviour off, since it is part of the server side specifications in
> > the protocol.
> > The only thing that can screw you up is if you use the 'async' option
> > in /etc/exports on the server.
>
> How is specifying 'async' in /etc/exports on the NFS server different than a
> NFS client explicitly mounting async?
>
> -- Rex
The /etc/exports async option allows the server to tell the client
that it has indeed written data/metadata to the disk, regardless
of what has actually happened.
So, the client sync/async affects *when* the client writes to the
server, and the server sync/async affects *how* (OK, and *when*) the
server replies to the client.
Regards -
Tom
--
------------------------------------------------------------
Tom McNeal trmcneal@attbi.com (650)906-0761 (cell)
------------------------------------------------------------
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
PC Mods, Computing goodies, cases & more
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: redhat 2.4.18-5 client huge slowdown
2002-07-11 19:29 ` redhat 2.4.18-5 client huge slowdown Rex Dieter
@ 2002-07-12 12:07 ` Trond Myklebust
0 siblings, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2002-07-12 12:07 UTC (permalink / raw)
To: Rex Dieter; +Cc: nfs
>>>>> " " == Rex Dieter <rdieter@math.unl.edu> writes:
> Here's some interesting data (on switched fast ethernet) to
> chew on (using Redhat's 2.4.18-5 kernel on both server and
> client): Server(1) Client(2) Speed async async ~8.5MB/sec async
> sync ~6.5MB/sec sync async ~350K/sec sync sync ~50k/sec
If you switch off write caching using the '-o sync' mount option then
of course you will see a dramatic slowdown. Where's the beef?
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Gadgets, caffeine, t-shirts, fun stuff.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
[not found] ` <Pine.LNX.3.95.1020709154108.14801B-100000@chaos.analogic.com>
@ 2002-07-15 7:52 ` Sean Hunter
2002-07-15 12:45 ` Richard B. Johnson
0 siblings, 1 reply; 19+ messages in thread
From: Sean Hunter @ 2002-07-15 7:52 UTC (permalink / raw)
To: Richard B. Johnson; +Cc: Alan Cox, Trond Myklebust, nfs, linux-kernel
On Tue, Jul 09, 2002 at 03:50:17PM -0400, Richard B. Johnson wrote:
> On Tue, 9 Jul 2002, Alan Cox wrote:
>
> > > That is what it's supposed to do with files. The attached code clearly
> > > shows that it doesn't work with directories. The fsync() instantly
> > > returns, even though there is buffered data still to be written.
> >
> > Your understanding or code is wrong. Its hard to tell which.
> >
> > fsync on the directory syncs the directory metadata not the file metadata
> >
>
> Well the original complaint was that Linux NFS didn't allow a directory to
> be fsync()ed. I showed that POSIX.4 doesn't provide for fsync()ing
> directories, only files, that you have to fsync() individual files, not
> the directories that contain them. Others said that fsync()ing individual
> files was not necessary, that you only have to fsync() the directory. I
> explained that you have to cheat to even get a fd that can be used
> to fsync() a directory. Then I showed that fsync()ing a directory in this
> manner doesn't work so, we are actually in violent agreement.
I'm not sure whether or not you've got the gist with all the flamage and
shrapnel flying about, however as I understand it, fsync on a directory fd
ensures that all directory ops such as rename()s unlinks(), links() etc are
committed, not that all data pending to all files in that dir are flushed.
To get all changes you need to fsync the dirfd and all the fds of the files as
well.
Because directory changes (such as renames, unlinks etc) are synchronous on NFS
any way, fsync() on a dir fd on an NFS mount can simply return. There will
never be any outstanding dir ops to flush. ergo: no bug.
Hope that's clear.
Sean
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
2002-07-15 7:52 ` Sean Hunter
@ 2002-07-15 12:45 ` Richard B. Johnson
0 siblings, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2002-07-15 12:45 UTC (permalink / raw)
To: Sean Hunter; +Cc: Alan Cox, Trond Myklebust, nfs, linux-kernel
On Mon, 15 Jul 2002, Sean Hunter wrote:
> On Tue, Jul 09, 2002 at 03:50:17PM -0400, Richard B. Johnson wrote:
> > On Tue, 9 Jul 2002, Alan Cox wrote:
> >
> > > > That is what it's supposed to do with files. The attached code clearly
> > > > shows that it doesn't work with directories. The fsync() instantly
> > > > returns, even though there is buffered data still to be written.
> > >
> > > Your understanding or code is wrong. Its hard to tell which.
> > >
> > > fsync on the directory syncs the directory metadata not the file metadata
> > >
> >
> > Well the original complaint was that Linux NFS didn't allow a directory to
> > be fsync()ed. I showed that POSIX.4 doesn't provide for fsync()ing
> > directories, only files, that you have to fsync() individual files, not
> > the directories that contain them. Others said that fsync()ing individual
> > files was not necessary, that you only have to fsync() the directory. I
> > explained that you have to cheat to even get a fd that can be used
> > to fsync() a directory. Then I showed that fsync()ing a directory in this
> > manner doesn't work so, we are actually in violent agreement.
>
> I'm not sure whether or not you've got the gist with all the flamage and
> shrapnel flying about, however as I understand it, fsync on a directory fd
> ensures that all directory ops such as rename()s unlinks(), links() etc are
> committed, not that all data pending to all files in that dir are flushed.
>
> To get all changes you need to fsync the dirfd and all the fds of the files as
> well.
>
> Because directory changes (such as renames, unlinks etc) are synchronous on NFS
> any way, fsync() on a dir fd on an NFS mount can simply return. There will
> never be any outstanding dir ops to flush. ergo: no bug.
>
> Hope that's clear.
>
> Sean
>
NFS has characteristics that seem to make it 'special'.
For instance, you have a server that performs local actions
on behalf of a remote client. As long as the local server
doesn't crash, everything it did for the remote client is
safe even if the remote client crashes and burns. From
the perspective of the remote client, it really doesn't make
much difference if it ever calls fsync() on anything as long
as the server doesn't crash. Therefore, for discussion I
will ignore NFS and other Client Server file access systems.
But just because they are special, it doesn't mean that they
should be treated specially.
Given the following:
/1/2/3/4/5/6/7/8/9/file
... I suggest that it MUST be sufficient to fsync() 'file' to
assure that file data can be recovered. That's what POSIX.4 states.
If the implementation doesn't allow this, i.e., 'file' will end up
in 'lost+found', then there is a problem that should be addressed.
This is because a local file user's program may not know the entire
directory tree. For example, in a chrooted environment. Also,
the task has no way of knowing what, if any, of these directory
entries have already been flushed to disk. A directory tree could,
in principle, be up to _POSIX_PATH_MAX entries in length.
In the beginning, when God created Unix, files and directories
were all the same. I could fix a bad directory entry with an
editor. Over the years, certain rules were established to prevent
users from accessing directories as files. They still are files,
but the Operating System(s) try their best to make sure you don't
muck with directories as files.
So now you have to read a directory with getdents(), actually that's
not even POSIX, you need to use readdir(). Also, the directory will
fail to be opened in other than read-only. These are all artificial
constraints, imposed to make sure you follow the rules.
So, you get a read-only file-descriptor and fsync() it! What does
that mean? Obviously, the file must have existed previously to open
it read-only. Since I can't change its contents, because I opened
it read-only, fsync() can't do anything because I could not have
altered its contents.
So, lets say two tasks open the same file. One opens it read-only
and the other read-write. The read-write task is happily writing
to the file. The read-only task executes fsync(). Does this cause
the writer to wait until the file has been flushed to disk? I don't
know, but if it does, we have a very broken system where an
unprivileged reader can severely affect the performance of a
file-server with a denial-of-service attack. So, I suggest that
a read-only file-descriptor CANNOT cause the contents of a file
to be written. If it does, it's broken. Given this, fsync() on
a directory entry, accessed by a read-only file-descriptor, can't
do anything.
These are things that should be addressed rather than flamed-
away. I think that the intent of fsync() on a file is to make
certain that it is on the physical media in a state from which
it can be accessed after a crash. If this is the intent, then
playing games with individual directories is not useful and
fsync() on the read/write file-descriptor actually updating the
file should be sufficient.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2002-07-15 12:43 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E17SjDh-00067R-00@usw-sf-list2.sourceforge.net>
2002-07-11 19:08 ` redhat 2.4.18-5 client huge slowdown Rex Dieter
2002-07-11 19:14 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Rex Dieter
2002-07-11 20:05 ` Tom McNeal
2002-07-11 19:29 ` redhat 2.4.18-5 client huge slowdown Rex Dieter
2002-07-12 12:07 ` Trond Myklebust
[not found] <200207091549.15913.trond.myklebust@fys.uio.no>
2002-07-09 14:06 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Richard B. Johnson
[not found] ` <Pine.LNX.3.95.1020709095544.27285A-100000@chaos.analogic.com>
2002-07-09 14:08 ` Trond Myklebust
2002-07-11 10:52 ` Matthias Andree
2002-07-11 11:26 ` Trond Myklebust
[not found] <Pine.LNX.3.95.1020709150615.14559A-100000@chaos.analogic.com>
2002-07-09 19:59 ` Alan Cox
2002-07-09 19:50 ` Richard B. Johnson
[not found] ` <Pine.LNX.3.95.1020709154108.14801B-100000@chaos.analogic.com>
2002-07-15 7:52 ` Sean Hunter
2002-07-15 12:45 ` Richard B. Johnson
[not found] <Pine.LNX.3.95.1020709104427.27442B-100000@chaos.analogic.com>
2002-07-09 16:56 ` Alan Cox
2002-07-09 17:22 ` Richard B. Johnson
2002-07-09 19:11 ` Alan Cox
2002-07-09 19:13 ` Richard B. Johnson
[not found] <15658.61035.450205.832652@charged.uio.no>
2002-07-09 15:06 ` Richard B. Johnson
2002-07-09 13:49 Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox