* [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts
@ 2002-07-09 13:49 Trond Myklebust
2002-07-09 14:06 ` Richard B. Johnson
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2002-07-09 13:49 UTC (permalink / raw)
To: nfs, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 361 bytes --]
Hi,
There was a bug reported on the 'exim' user list a couple of months ago:
the Linux NFS client reports -EINVAL if you try to fsync() a directory.
The correct response would be to return a dummy '0' for success, since all
NFS operations that change the directory are supposed to be performed
synchronously on the server anyway...
Cheers,
Trond
[-- Attachment #2: linux-2.4.19-fsync_dir.dif --]
[-- Type: text/plain, Size: 1071 bytes --]
diff -u --recursive --new-file linux-2.4.19-rc1/fs/nfs/dir.c linux-2.4.19-fsync_dir/fs/nfs/dir.c
--- linux-2.4.19-rc1/fs/nfs/dir.c Tue Mar 12 16:35:02 2002
+++ linux-2.4.19-fsync_dir/fs/nfs/dir.c Tue Jul 9 15:41:29 2002
@@ -45,12 +45,14 @@
static int nfs_mknod(struct inode *, struct dentry *, int, int);
static int nfs_rename(struct inode *, struct dentry *,
struct inode *, struct dentry *);
+static int nfs_fsync_dir(struct file *, struct dentry *, int);
struct file_operations nfs_dir_operations = {
read: generic_read_dir,
readdir: nfs_readdir,
open: nfs_open,
release: nfs_release,
+ fsync: nfs_fsync_dir
};
struct inode_operations nfs_dir_inode_operations = {
@@ -401,6 +403,15 @@
return 0;
}
+/*
+ * All directory operations under NFS are synchronous, so fsync()
+ * is a dummy operation.
+ */
+int nfs_fsync_dir(struct file *filp, struct dentry *dentry, int datasync)
+{
+ return 0;
+}
+
/*
* A check for whether or not the parent directory has changed.
* In the case it has, we assume that the dentries are untrustworthy
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 13:49 [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Trond Myklebust @ 2002-07-09 14:06 ` Richard B. Johnson 2002-07-09 14:08 ` Trond Myklebust 2002-07-10 6:33 ` Alex Riesen 0 siblings, 2 replies; 14+ messages in thread From: Richard B. Johnson @ 2002-07-09 14:06 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs, linux-kernel On Tue, 9 Jul 2002, Trond Myklebust wrote: > Hi, > > There was a bug reported on the 'exim' user list a couple of months ago: > the Linux NFS client reports -EINVAL if you try to fsync() a directory. > > The correct response would be to return a dummy '0' for success, since all > NFS operations that change the directory are supposed to be performed > synchronously on the server anyway... > > Cheers, > Trond > > Isn't it supposed to return EINVAL if "fd is bound to a file which doesn't support synchronization..." That's what POSIX 4 says. Errors: EBADF fildes is not a valid file descriptor. EINVAL The file descriptor is valid, but the system doesn't support fsync on this particular file. I think code that opens a directory as a file is broken. We have opendir() for that and it returns a DIR pointer, not a file descriptor. If the directory was properly opened, one would never attempt to fsync() it. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 14:06 ` Richard B. Johnson @ 2002-07-09 14:08 ` Trond Myklebust 2002-07-09 15:06 ` Richard B. Johnson 2002-07-10 6:33 ` Alex Riesen 1 sibling, 1 reply; 14+ messages in thread From: Trond Myklebust @ 2002-07-09 14:08 UTC (permalink / raw) To: root; +Cc: nfs, linux-kernel >>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes: > I think code that opens a directory as a file is broken. We > have opendir() for that and it returns a DIR pointer, not a > file descriptor. If the directory was properly opened, one > would never attempt to fsync() it. fsync() is supported on directories on local filesystems as a way of ensuring that changes (due to file creation etc) are committed to disk. Where is the POSIX violation in that? There is no reason why NFS, which ensures this anyway, should not adhere to this convention. Cheers, Trond ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 14:08 ` Trond Myklebust @ 2002-07-09 15:06 ` Richard B. Johnson 2002-07-09 16:56 ` Alan Cox 0 siblings, 1 reply; 14+ messages in thread From: Richard B. Johnson @ 2002-07-09 15:06 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs, linux-kernel On Tue, 9 Jul 2002, Trond Myklebust wrote: > >>>>> " " == Richard B Johnson <root@chaos.analogic.com> writes: > > > I think code that opens a directory as a file is broken. We > > have opendir() for that and it returns a DIR pointer, not a > > file descriptor. If the directory was properly opened, one > > would never attempt to fsync() it. > > fsync() is supported on directories on local filesystems as a way of > ensuring that changes (due to file creation etc) are committed to > disk. Where is the POSIX violation in that? > > There is no reason why NFS, which ensures this anyway, should > not adhere to this convention. > > Cheers, > Trond > - Well, no. It's not supported. You can't get a valid file-descriptor... #include <stdio.h> #include <unistd.h> #include <fcntl.h> int main() { int fd; fd = open("/", O_RDWR, 0); fsync(fd); } execve("./xxx", ["xxx"], [/* 32 vars */]) = 0 brk(0) = 0x804966c open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib/libc.so.6", O_RDONLY) = 3 old_mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, 3, 0) = 0x4000c000 munmap(0x4000c000, 4096) = 0 old_mmap(NULL, 644232, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4000c000 mprotect(0x40097000, 74888, PROT_NONE) = 0 old_mmap(0x40097000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x8b000) = 0x40097000 old_mmap(0x4009d000, 50312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4009d000 close(3) = 0 mprotect(0x4000c000, 569344, PROT_READ|PROT_WRITE) = 0 mprotect(0x4000c000, 569344, PROT_READ|PROT_EXEC) = 0 personality(PER_LINUX) = 0 getpid() = 27544 open("/", O_RDWR) = -1 EISDIR (Is a directory) There are ways to 'cheat' and obtain a file-descriptor that references a directory, but cheating is against POSIX rules, also. You can open it read-only. But, Read-Only means that you can't update it, so fsync means nothing, will return 0 because it is already "whatever it was" since you can't modify it... getpid() = 27568 open("/", O_RDONLY) = 3 fsync(3) = 0 _exit(0) = ? My reading is that you need to fsync() every file within a directory to fsync() a directory. Playing tricks with a directory inode doesn't do it. Regardless, POSIX.4 declines to state exactly what "successfully transferred" means when it states that fsync() doesn't return until all data has been successfully transferred to the disk or underlying hardware. This is a real problem for a network file-system where data that will eventually get to a file-server in the Congo may be en-route for several minutes. If an application insists, it is up to the application to determine, probably once upon startup, just what kind of file synchronization is supported. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 15:06 ` Richard B. Johnson @ 2002-07-09 16:56 ` Alan Cox 2002-07-09 17:22 ` Richard B. Johnson 0 siblings, 1 reply; 14+ messages in thread From: Alan Cox @ 2002-07-09 16:56 UTC (permalink / raw) To: root; +Cc: Trond Myklebust, nfs, linux-kernel > > not adhere to this convention. > > Well, no. It's not supported. You can't get a valid file-descriptor... Wrong (as usual) > If an application insists, it is up to the application to determine, > probably once upon startup, just what kind of file synchronization > is supported. Linux defines fsync for directories ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 16:56 ` Alan Cox @ 2002-07-09 17:22 ` Richard B. Johnson 2002-07-09 18:58 ` [NFS] " Bill Rugolsky Jr. 2002-07-09 19:11 ` Alan Cox 0 siblings, 2 replies; 14+ messages in thread From: Richard B. Johnson @ 2002-07-09 17:22 UTC (permalink / raw) To: Alan Cox; +Cc: Trond Myklebust, nfs, linux-kernel On Tue, 9 Jul 2002, Alan Cox wrote: > > > not adhere to this convention. > > > > Well, no. It's not supported. You can't get a valid file-descriptor... > > Wrong (as usual) Really? Then what is the meaning of fsync() on a read-only file- descriptor? You can't update the information you can't change. This is (as usual) just an example of your helpful responses. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [NFS] Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 17:22 ` Richard B. Johnson @ 2002-07-09 18:58 ` Bill Rugolsky Jr. 2002-07-09 19:11 ` Alan Cox 1 sibling, 0 replies; 14+ messages in thread From: Bill Rugolsky Jr. @ 2002-07-09 18:58 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Alan Cox, Trond Myklebust, nfs, linux-kernel On Tue, Jul 09, 2002 at 01:22:29PM -0400, Richard B. Johnson wrote: > Really? Then what is the meaning of fsync() on a read-only file- > descriptor? You can't update the information you can't change. Eh? I do an fchmod() on a readonly descriptor, then I call fsync() on that descriptor. The inode gets sync'd to disk (with updated mode and c_time). So no, I don't need a writable descriptor to call fsync(). The only question is *what* gets sync'd when I call fsync() on an O_RDONLY file-descriptor. SUSv3 (http://www.opengroup.org/onlinepubs/007908799/xsh/fsync.html) says "The fsync() function forces all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronised I/O completion state." It appears from this wording that the file-descriptor is *merely* a handle referring to the inode, and that *all* outstanding I/O on the inode [within the "system"] is performed. In other words, if I had several different file handles referring to the same inode (but different kernel "struct file" objects), all inode data and meta-data updates prior to the fsync() call would be synchronized. It doesn't say that explicitly, but given the usual visibility rules regarding writes, etc., that is the "natural" interpretation. [Caveat: mmap()] To state it succinctly: if other (data or meta-data) writes are visible to the process doing the fsync(), they need to be sync'd too. In the case of directories, there is no file handle "doing the writing" -- the kernel does that, so absent the ability to call fsync() on a readonly handle to a directory, i.e. fsync(dirfd(dir)), there is no convenient way to sync the directory contents. Calling fsync() on every file in a directory does not necessitate syncing the directory! Regards, Bill Rugolsky ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 17:22 ` Richard B. Johnson 2002-07-09 18:58 ` [NFS] " Bill Rugolsky Jr. @ 2002-07-09 19:11 ` Alan Cox 2002-07-09 19:13 ` Richard B. Johnson 1 sibling, 1 reply; 14+ messages in thread From: Alan Cox @ 2002-07-09 19:11 UTC (permalink / raw) To: root; +Cc: Alan Cox, Trond Myklebust, nfs, linux-kernel > Really? Then what is the meaning of fsync() on a read-only file- > descriptor? You can't update the information you can't change. fsync ensures the data for that inode/file content is on stable storage - note _the_ _data_ not only random things written by this specific file handle. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 19:11 ` Alan Cox @ 2002-07-09 19:13 ` Richard B. Johnson 2002-07-09 19:39 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine fordirectories " David Dillow 2002-07-09 19:59 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories " Alan Cox 0 siblings, 2 replies; 14+ messages in thread From: Richard B. Johnson @ 2002-07-09 19:13 UTC (permalink / raw) To: Alan Cox; +Cc: Trond Myklebust, nfs, linux-kernel On Tue, 9 Jul 2002, Alan Cox wrote: > > Really? Then what is the meaning of fsync() on a read-only file- > > descriptor? You can't update the information you can't change. > > fsync ensures the data for that inode/file content is on stable storage - note > _the_ _data_ not only random things written by this specific file handle. > That is what it's supposed to do with files. The attached code clearly shows that it doesn't work with directories. The fsync() instantly returns, even though there is buffered data still to be written. #include <stdio.h> #include <unistd.h> #include <fcntl.h> #define NR_WRITES 0x1000 int main() { char foo[0x10000]; int dirfd, outfd; int flags, i; outfd = open("/foo", O_WRONLY|O_TRUNC|O_CREAT, 0644); dirfd = open("/", O_RDONLY, 0); flags = fcntl(dirfd, F_GETFL); flags &= ~O_RDONLY; flags |= O_RDWR; fcntl(dirfd, F_SETFL, flags); fprintf(stderr, "Write %d bytes\n", sizeof(foo) * NR_WRITES); for(i=0; i< NR_WRITES; i++) write(outfd, foo, sizeof(foo)); fprintf(stderr, "Write complete\n"); fprintf(stderr, "Sync the directory\n"); fsync(dirfd); fprintf(stderr, "Done, returns immediately!\n"); close(outfd); fprintf(stderr, "Now execute sync and see if your disk is active!\n"); // unlink("/foo"); } Again, to assure that file-data is written to storage, one must execute fsync on files, not directories. The dummy return of 0, that Linux provides is a database bug waiting to happen. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine fordirectories on NFS mounts 2002-07-09 19:13 ` Richard B. Johnson @ 2002-07-09 19:39 ` David Dillow 2002-07-09 19:59 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories " Alan Cox 1 sibling, 0 replies; 14+ messages in thread From: David Dillow @ 2002-07-09 19:39 UTC (permalink / raw) To: root; +Cc: linux-kernel "Richard B. Johnson" wrote: > > On Tue, 9 Jul 2002, Alan Cox wrote: > > > > Really? Then what is the meaning of fsync() on a read-only file- > > > descriptor? You can't update the information you can't change. > > > > fsync ensures the data for that inode/file content is on stable storage - note > > _the_ _data_ not only random things written by this specific file handle. > > > flags = fcntl(dirfd, F_GETFL); > flags &= ~O_RDONLY; > flags |= O_RDWR; > fcntl(dirfd, F_SETFL, flags); Ehh? Not sure what you're doing here... > fprintf(stderr, "Write %d bytes\n", sizeof(foo) * NR_WRITES); > for(i=0; i< NR_WRITES; i++) > write(outfd, foo, sizeof(foo)); > fprintf(stderr, "Write complete\n"); > fprintf(stderr, "Sync the directory\n"); > fsync(dirfd); > fprintf(stderr, "Done, returns immediately!\n"); > close(outfd); > fprintf(stderr, "Now execute sync and see if your disk is active!\n"); > // unlink("/foo"); > } > > Again, to assure that file-data is written to storage, one must > execute fsync on files, not directories. You are correct, but re-read what Alan said -- "fsync ensures the data for that inode/file content is on stable storage". So your fsync(dirfd) only makes sure data written to the directory is on disk, i.e. the name of the new file "foo". sync still causes mucho activity because you did not fsync the outfd. > The dummy return of 0, > that Linux provides is a database bug waiting to happen. No, the dummy return of 0 from NFS is not a problem, because directory updates, and indeed all writes, are supposed to be syncronous by default. There really is no need to fsync on an NFS directory, as the name should already be on stable storage by the time open() returns. At least this is my understanding, D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 19:13 ` Richard B. Johnson 2002-07-09 19:39 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine fordirectories " David Dillow @ 2002-07-09 19:59 ` Alan Cox 2002-07-09 19:50 ` Richard B. Johnson 1 sibling, 1 reply; 14+ messages in thread From: Alan Cox @ 2002-07-09 19:59 UTC (permalink / raw) To: root; +Cc: Alan Cox, Trond Myklebust, nfs, linux-kernel > That is what it's supposed to do with files. The attached code clearly > shows that it doesn't work with directories. The fsync() instantly > returns, even though there is buffered data still to be written. Your understanding or code is wrong. Its hard to tell which. fsync on the directory syncs the directory metadata not the file metadata ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 19:59 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories " Alan Cox @ 2002-07-09 19:50 ` Richard B. Johnson 0 siblings, 0 replies; 14+ messages in thread From: Richard B. Johnson @ 2002-07-09 19:50 UTC (permalink / raw) To: Alan Cox; +Cc: Trond Myklebust, nfs, linux-kernel On Tue, 9 Jul 2002, Alan Cox wrote: > > That is what it's supposed to do with files. The attached code clearly > > shows that it doesn't work with directories. The fsync() instantly > > returns, even though there is buffered data still to be written. > > Your understanding or code is wrong. Its hard to tell which. > > fsync on the directory syncs the directory metadata not the file metadata > Well the original complaint was that Linux NFS didn't allow a directory to be fsync()ed. I showed that POSIX.4 doesn't provide for fsync()ing directories, only files, that you have to fsync() individual files, not the directories that contain them. Others said that fsync()ing individual files was not necessary, that you only have to fsync() the directory. I explained that you have to cheat to even get a fd that can be used to fsync() a directory. Then I showed that fsync()ing a directory in this manner doesn't work so, we are actually in violent agreement. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-09 14:06 ` Richard B. Johnson 2002-07-09 14:08 ` Trond Myklebust @ 2002-07-10 6:33 ` Alex Riesen 2002-07-10 11:20 ` Richard B. Johnson 1 sibling, 1 reply; 14+ messages in thread From: Alex Riesen @ 2002-07-10 6:33 UTC (permalink / raw) To: Richard B. Johnson; +Cc: linux-kernel On Tue, Jul 09, 2002 at 10:06:45AM -0400, Richard B. Johnson wrote: > I think code that opens a directory as a file is broken. We have > opendir() for that and it returns a DIR pointer, not a file descriptor. > If the directory was properly opened, one would never attempt to > fsync() it. It's the libc which defines it. Theere no syscall "opendir". How you think you can return what sus defines as "DIR*" from the kernel? offtopic: on aix you can do this: "cat ." ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts 2002-07-10 6:33 ` Alex Riesen @ 2002-07-10 11:20 ` Richard B. Johnson 0 siblings, 0 replies; 14+ messages in thread From: Richard B. Johnson @ 2002-07-10 11:20 UTC (permalink / raw) To: Alex Riesen; +Cc: linux-kernel On Wed, 10 Jul 2002, Alex Riesen wrote: > On Tue, Jul 09, 2002 at 10:06:45AM -0400, Richard B. Johnson wrote: > > I think code that opens a directory as a file is broken. We have > > opendir() for that and it returns a DIR pointer, not a file descriptor. > > If the directory was properly opened, one would never attempt to > > fsync() it. > > It's the libc which defines it. Theere no syscall "opendir". How you think > you can return what sus defines as "DIR*" from the kernel? > > offtopic: on aix you can do this: "cat ." > Any attempt to open a directory as a file and read it on Linux up to version 2.4.18 (at least), or on Sun (up to) SunOS 5.5.1, returns -1 with errno set to ISDIR (21). As mentioned several times, there are ways to 'cheat', but I was (and have been) talking about POSIX conformance. Script started on Wed Jul 10 07:15:46 2002 # od . od: .: Is a directory 0000000 # cat . cat: .: Is a directory # exit exit Script done on Wed Jul 10 07:15:58 2002 Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Windows-2000/Professional isn't. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2002-07-10 11:16 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-07-09 13:49 [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories on NFS mounts Trond Myklebust 2002-07-09 14:06 ` Richard B. Johnson 2002-07-09 14:08 ` Trond Myklebust 2002-07-09 15:06 ` Richard B. Johnson 2002-07-09 16:56 ` Alan Cox 2002-07-09 17:22 ` Richard B. Johnson 2002-07-09 18:58 ` [NFS] " Bill Rugolsky Jr. 2002-07-09 19:11 ` Alan Cox 2002-07-09 19:13 ` Richard B. Johnson 2002-07-09 19:39 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine fordirectories " David Dillow 2002-07-09 19:59 ` [PATCH] 2.4.19-rc1/2.5.25 provide dummy fsync() routine for directories " Alan Cox 2002-07-09 19:50 ` Richard B. Johnson 2002-07-10 6:33 ` Alex Riesen 2002-07-10 11:20 ` Richard B. Johnson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox