* synchronous mounts
@ 2001-11-15 8:03 Andrew Morton
2001-11-15 21:45 ` Stephen C. Tweedie
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2001-11-15 8:03 UTC (permalink / raw)
To: lkml; +Cc: Neil Brown
Linux is not syncing write() data for files on synchronously mounted
filesystems, and it isn't syncing write() data for ext2/3 files which
are operating under `chattr +S'.
Is this deliberate, or a bug?
This patch makes write()s data-synchronous. There's a fix-for-the-fix
here for ext3. Other filesystems which are second-guessing the generic
layer's behaviour may also end up doing a double sync with this patch.
Really, generic_osync_inode() needs to be front-ended by
an inode_operations method.
--- linux-2.4.15-pre4/mm/filemap.c Mon Nov 12 11:16:12 2001
+++ linux-akpm/mm/filemap.c Wed Nov 14 22:50:25 2001
@@ -2916,8 +2916,10 @@ unlock:
/* For now, when the user asks for O_SYNC, we'll actually
* provide O_DSYNC. */
- if ((status >= 0) && (file->f_flags & O_SYNC))
- status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);
+ if (status >= 0) {
+ if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
+ status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);
+ }
err = written ? written : status;
out:
--- linux-2.4.15-pre4/fs/ext3/file.c Mon Nov 12 11:16:12 2001
+++ linux-akpm/fs/ext3/file.c Wed Nov 14 23:52:08 2001
@@ -61,22 +61,19 @@ static int ext3_open_file (struct inode
static ssize_t
ext3_file_write(struct file *file, const char *buf, size_t count, loff_t *ppos)
{
- int ret;
struct inode *inode = file->f_dentry->d_inode;
- ret = generic_file_write(file, buf, count, ppos);
- if ((ret >= 0) && IS_SYNC(inode)) {
- if (file->f_flags & O_SYNC) {
- /*
- * generic_osync_inode() has already done the sync
- */
- } else {
- int ret2 = ext3_force_commit(inode->i_sb);
- if (ret2)
- ret = ret2;
- }
- }
- return ret;
+ /*
+ * Nasty: if the file is subject to synchronous writes then we need
+ * to force generic_osync_inode() to call ext3_write_inode().
+ * We do that by marking the inode dirty. This adds much more
+ * computational expense than we need, but we're going to sync
+ * anyway.
+ */
+ if (IS_SYNC(inode) || (file->f_flags & O_SYNC))
+ mark_inode_dirty(inode);
+
+ return generic_file_write(file, buf, count, ppos);
}
struct file_operations ext3_file_operations = {
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: synchronous mounts 2001-11-15 8:03 synchronous mounts Andrew Morton @ 2001-11-15 21:45 ` Stephen C. Tweedie 2001-11-15 22:02 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Stephen C. Tweedie @ 2001-11-15 21:45 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, Neil Brown, Stephen Tweedie Hi, On Thu, Nov 15, 2001 at 12:03:57AM -0800, Andrew Morton wrote: > Linux is not syncing write() data for files on synchronously mounted > filesystems, and it isn't syncing write() data for ext2/3 files which > are operating under `chattr +S'. In the past, chattr +S and mount -o sync always resulted in sync metadata with no guarantees about data. I'm not sure this makes much sense, but it's what has always happened. For directories, the behaviour is fine, in particular as it gives us the same directory sync consistency semantics as synchronous BSD UFS. It's not clear to me that chattr/mount sync options make _any_ sense for regular file metadata. Rather than tightening up the semantics, I'd actually prefer to restrict them so that they only apply to directories. Users who set the sync bits are usually doing so for applications like MTAs where it's directory syncing which is what matters: the apps typically fsync the files themselves, anyway. > Really, generic_osync_inode() needs to be front-ended by > an inode_operations method. Yes. Cheers, Stephen ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-15 21:45 ` Stephen C. Tweedie @ 2001-11-15 22:02 ` Andrew Morton 2001-11-15 23:05 ` Stephen C. Tweedie 2001-11-16 0:19 ` Jeff Garzik 2001-11-16 3:07 ` Neil Brown 2 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2001-11-15 22:02 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: lkml, Neil Brown "Stephen C. Tweedie" wrote: > > Hi, > > On Thu, Nov 15, 2001 at 12:03:57AM -0800, Andrew Morton wrote: > > Linux is not syncing write() data for files on synchronously mounted > > filesystems, and it isn't syncing write() data for ext2/3 files which > > are operating under `chattr +S'. > > In the past, chattr +S and mount -o sync always resulted in sync > metadata with no guarantees about data. > > I'm not sure this makes much sense, but it's what has always happened. > For directories, the behaviour is fine, in particular as it gives us > the same directory sync consistency semantics as synchronous BSD UFS. > > It's not clear to me that chattr/mount sync options make _any_ sense > for regular file metadata. Rather than tightening up the semantics, > I'd actually prefer to restrict them so that they only apply to > directories. Users who set the sync bits are usually doing so for > applications like MTAs where it's directory syncing which is > what matters: the apps typically fsync the files themselves, anyway. > OK, that makes sense. Thanks. The `mount' and `chattr' manpages need updating... So shall we try to nail this down? Synchronous mounts and chattr +S provide synchronous semantics for directory contents, diretory metadata and directory inodes only. And fsync() will write out a file's data, metadata and inode? If this is correct then there are a few places where ext2 is syncing stuff unnecessarily - file indirect blocks, etc. Not very important at this stage I guess. - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-15 22:02 ` Andrew Morton @ 2001-11-15 23:05 ` Stephen C. Tweedie 0 siblings, 0 replies; 13+ messages in thread From: Stephen C. Tweedie @ 2001-11-15 23:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Stephen C. Tweedie, lkml, Neil Brown Hi, On Thu, Nov 15, 2001 at 02:02:54PM -0800, Andrew Morton wrote: > > So shall we try to nail this down? Synchronous mounts and chattr +S > provide synchronous semantics for directory contents, diretory metadata > and directory inodes only. And fsync() will write out a file's data, > metadata and inode? Sounds sane. UFS sync mounts also update the inode bitmaps synchronously, and in order, so that they can tell exactly when an inode is valid. I guess that we don't need that, at least on ext2, since the i_dtime gives us the necessary information automatically. > If this is correct then there are a few places where ext2 is > syncing stuff unnecessarily - file indirect blocks, etc. Not > very important at this stage I guess. It's important for people running MTAs which expect dir sync behaviour: admins have to pay the performance cost of all file updates being sync as well as the directory updates if they enable chattr +S on the dir. --Stephen ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-15 21:45 ` Stephen C. Tweedie 2001-11-15 22:02 ` Andrew Morton @ 2001-11-16 0:19 ` Jeff Garzik 2001-11-16 8:33 ` Andrew Morton 2001-11-16 12:28 ` Stephen C. Tweedie 2001-11-16 3:07 ` Neil Brown 2 siblings, 2 replies; 13+ messages in thread From: Jeff Garzik @ 2001-11-16 0:19 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrew Morton, lkml, Neil Brown "Stephen C. Tweedie" wrote: > It's not clear to me that chattr/mount sync options make _any_ sense > for regular file metadata. Rather than tightening up the semantics, > I'd actually prefer to restrict them so that they only apply to > directories. Users who set the sync bits are usually doing so for > applications like MTAs where it's directory syncing which is > what matters: the apps typically fsync the files themselves, anyway. I don't think it is correct to make that assumption. When working on something likely to crash, I always remount my filesystems 'sync' with the intention to have the kernel immediately sync to disk anything and everything it is coded to do. Since the kernel is responsible to flushing data to disk, it makes perfect sense to have an option to sync not only metadata but data to disk immediately, if the user desires such. Further, expecting all apps to fsync(2) files under the right circumstances is not reasonable. There are "normal" circumstances where someone expects non-syncing behavior of "cat foo bar > foobar", and then there are extentuating circumstances where another expects the shell to sync that command immediately. Should we rewrite cat/bash/apps to all fsync, depending on an option? Should we expect people to modify all their shell scripts to include "/bin/sync" for those times when they want data-sync? Such is not scalable at all. It would make more sense to me to implement 'sync' as sync-everything, and add 'dirsync' or 'metasync' for syncing only directories or only metadata. As it stands, it seems like redefining 'sync' to sync less data than is currently done is not only changing current behavior, but providing less to users overall. Jeff -- Jeff Garzik | Only so many songs can be sung Building 1024 | with two lips, two lungs, and one tongue. MandrakeSoft | - nomeansno ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-16 0:19 ` Jeff Garzik @ 2001-11-16 8:33 ` Andrew Morton 2001-11-16 10:47 ` Matthias Andree 2001-11-16 12:28 ` Stephen C. Tweedie 1 sibling, 1 reply; 13+ messages in thread From: Andrew Morton @ 2001-11-16 8:33 UTC (permalink / raw) To: Jeff Garzik; +Cc: Stephen C. Tweedie, lkml, Neil Brown Jeff Garzik wrote: > > As it stands, it seems like redefining 'sync' to sync less data than is > currently done is not only changing current behavior, but providing less > to users overall. > Persuasively argued. You appear to have your wish, as this patch was merged in -pre5. A `dirsync' option does make sense though, for the reasons which Stephen outlined. The whole handling of synchronous operations needs a rip-up-and-rewrite anyway. We're currently holding onto a stack of locks while waiting for the disk to spin round and round. It's a great scalability bottleneck for multiple threads doing things in the same directory. This is something I shall look at when the kernel versions turn odd. - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-16 8:33 ` Andrew Morton @ 2001-11-16 10:47 ` Matthias Andree 0 siblings, 0 replies; 13+ messages in thread From: Matthias Andree @ 2001-11-16 10:47 UTC (permalink / raw) To: lkml On Fri, 16 Nov 2001, Andrew Morton wrote: > A `dirsync' option does make sense though, for the reasons which > Stephen outlined. So we could then have: - (no option) == async, which only syncs file + data on fsync() or O_SYNC (BSD calls this async, it may corrupt file systems because writes are out-of-order) - dirsync, which syncs directories and metadata and causes ordered writes thereof (BSD calls this noasync), no chance of corrupting on-disk structure unrecoverably. - sync, which syncs all filesystem operations (BSD calls this sync also), will have at most 1 dirty block at a time on non-journaled file systems(?) I expect sync to be faster on journalled file systems in that case, because "in-order execution" to journal will probably cause linear writes, while on ext2, it will involve seeking. -- Matthias Andree "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." Benjamin Franklin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-16 0:19 ` Jeff Garzik 2001-11-16 8:33 ` Andrew Morton @ 2001-11-16 12:28 ` Stephen C. Tweedie 2001-11-16 13:28 ` Jeff Garzik ` (2 more replies) 1 sibling, 3 replies; 13+ messages in thread From: Stephen C. Tweedie @ 2001-11-16 12:28 UTC (permalink / raw) To: Jeff Garzik; +Cc: Stephen C. Tweedie, Andrew Morton, lkml, Neil Brown Hi, On Thu, Nov 15, 2001 at 07:19:43PM -0500, Jeff Garzik wrote: > When working on something likely to crash, I always remount my > filesystems 'sync' with the intention to have the kernel immediately > sync to disk anything and everything it is coded to do. The kernel has, in my memory, never behaved like that on sync mounts. mount -o sync was always intended just to give people the BSD-style sync metadata updates that some users expected. The "mount" man page is wrong on this one. > Since the > kernel is responsible to flushing data to disk, it makes perfect sense > to have an option to sync not only metadata but data to disk > immediately, if the user desires such. If you want to sync _everything_, it's at least 5 seeks per write syscall when you're writing a new file: superblock, group descriptor, block bitmap, inode, data, and potentially inode indirect. There's no point doing all that, especially since some of that data is redundant and will be rebuilt by e2fsck anyway after a crash. Is it really such an important feature that we're willing to suffer a factor-of-100 or more slowdown for it? > Further, expecting all apps to fsync(2) files under the right > circumstances is not reasonable. There are "normal" circumstances where > someone expects non-syncing behavior of "cat foo bar > foobar", and then > there are extentuating circumstances where another expects the shell to > sync that command immediately. Should we rewrite cat/bash/apps to all > fsync, depending on an option? Should we expect people to modify all > their shell scripts to include "/bin/sync" for those times when they > want data-sync? Such is not scalable at all. Not-scalable is doing 5000 seeks to write a 4MB file. The behaviour you are talking about now, "cat foo bar > foobar" and expecting it to be intact on return, is *not the same thing*. The sync mount option is there to order metadata writes for predictable recovery of the directory structure. In the "cat" case, nobody cares what the inode is like during the write. All that is desired in that example is fsync-on-close, and it is insane to implement fsync-on-close by writing every single block of the file synchronously. At ALS, an ext3 user asked why ext3 performance was entirely unusable under mount -o sync (he had a broken config which accidentally set an ext3 mount synchronous), whereas ext2 was OK. I only realised afterwards that this was because of ext3's ordered data writes: whereas ext2 was just syncing the inodes and indirect blocks on write, ext3 was syncing the data too as part of the ordered data guarantees, and performance was totally destroyed by the extra seeks. "sync to keep the fs structures intact" and "sync to keep this file intact" are two totally different things. In the latter case, we only care about the file contents as a whole, so fsync-on-close is far more appropriate. If we want that, lets add it as a new option, but I don't see the benefit in making o- sync do all file data writes 100% synchronously. Cheers, Stephen ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-16 12:28 ` Stephen C. Tweedie @ 2001-11-16 13:28 ` Jeff Garzik 2001-11-16 13:37 ` Jeff Garzik 2001-11-16 22:40 ` Matthias Andree 2 siblings, 0 replies; 13+ messages in thread From: Jeff Garzik @ 2001-11-16 13:28 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrew Morton, lkml, Neil Brown "Stephen C. Tweedie" wrote: > On Thu, Nov 15, 2001 at 07:19:43PM -0500, Jeff Garzik wrote: > > When working on something likely to crash, I always remount my > > filesystems 'sync' with the intention to have the kernel immediately > > sync to disk anything and everything it is coded to do. > > The kernel has, in my memory, never behaved like that on sync mounts. > mount -o sync was always intended just to give people the BSD-style > sync metadata updates that some users expected. > > The "mount" man page is wrong on this one. No, that's always been a bug. Occasionally it gets brought up on lkml and people have talked about fixing it. > > Since the > > kernel is responsible to flushing data to disk, it makes perfect sense > > to have an option to sync not only metadata but data to disk > > immediately, if the user desires such. > > If you want to sync _everything_, it's at least 5 seeks per write > syscall when you're writing a new file: superblock, group descriptor, > block bitmap, inode, data, and potentially inode indirect. > > There's no point doing all that, especially since some of that data is > redundant and will be rebuilt by e2fsck anyway after a crash. > > Is it really such an important feature that we're willing to suffer a > factor-of-100 or more slowdown for it? mount -o dirsync, if you don't want the slowdown. I can write a fast, incorrect implementation of 'sync' too. Let's try for a correct implementation. > > Further, expecting all apps to fsync(2) files under the right > > circumstances is not reasonable. There are "normal" circumstances where > > someone expects non-syncing behavior of "cat foo bar > foobar", and then > > there are extentuating circumstances where another expects the shell to > > sync that command immediately. Should we rewrite cat/bash/apps to all > > fsync, depending on an option? Should we expect people to modify all > > their shell scripts to include "/bin/sync" for those times when they > > want data-sync? Such is not scalable at all. > > Not-scalable is doing 5000 seeks to write a 4MB file. > > The behaviour you are talking about now, "cat foo bar > foobar" and > expecting it to be intact on return, is *not the same thing*. The > sync mount option is there to order metadata writes for predictable > recovery of the directory structure. In the "cat" case, nobody cares > what the inode is like during the write. All that is desired in that > example is fsync-on-close, and it is insane to implement > fsync-on-close by writing every single block of the file > synchronously. The "sync" mount option's purpose is and should be this simple: "All I/O to the file system should be done synchronously." If you want different behavior, use a different option. I still do not understand why you seem to think modifying tons of programs and shell scripts is reasonable, in order to get "true" sync behavior. > At ALS, an ext3 user asked why ext3 performance was entirely unusable > under mount -o sync (he had a broken config which accidentally set an > ext3 mount synchronous), whereas ext2 was OK. I only realised > afterwards that this was because of ext3's ordered data writes: > whereas ext2 was just syncing the inodes and indirect blocks on write, > ext3 was syncing the data too as part of the ordered data guarantees, > and performance was totally destroyed by the extra seeks. Sure. Seems perfectly normal and expected for an fs mounted 'sync'. > "sync to keep the fs structures intact" and "sync to keep this file > intact" are two totally different things. In the latter case, we only > care about the file contents as a whole, so fsync-on-close is far more > appropriate. If we want that, lets add it as a new option, but I > don't see the benefit in making o- sync do all file data writes 100% > synchronously. The benefit is a correct implementation.. Jeff -- Jeff Garzik | Only so many songs can be sung Building 1024 | with two lips, two lungs, and one tongue. MandrakeSoft | - nomeansno ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-16 12:28 ` Stephen C. Tweedie 2001-11-16 13:28 ` Jeff Garzik @ 2001-11-16 13:37 ` Jeff Garzik 2001-11-16 22:40 ` Matthias Andree 2 siblings, 0 replies; 13+ messages in thread From: Jeff Garzik @ 2001-11-16 13:37 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrew Morton, lkml, Neil Brown May I suggest another way of thinking about this issue. ext3 allows one to journal everything, or just metadata. Why -avoid- this distrinction when it comes to the 'sync' mount option (and 'dirsync' or 'metasync')? -- Jeff Garzik | Only so many songs can be sung Building 1024 | with two lips, two lungs, and one tongue. MandrakeSoft | - nomeansno ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-16 12:28 ` Stephen C. Tweedie 2001-11-16 13:28 ` Jeff Garzik 2001-11-16 13:37 ` Jeff Garzik @ 2001-11-16 22:40 ` Matthias Andree 2 siblings, 0 replies; 13+ messages in thread From: Matthias Andree @ 2001-11-16 22:40 UTC (permalink / raw) To: lkml On Fri, 16 Nov 2001, Stephen Tweedie wrote: > If you want to sync _everything_, it's at least 5 seeks per write > syscall when you're writing a new file: superblock, group descriptor, > block bitmap, inode, data, and potentially inode indirect. > > There's no point doing all that, especially since some of that data is > redundant and will be rebuilt by e2fsck anyway after a crash. The file system cannot judge on what's needed, neither can the file system developer. The developer however can offer the administrator and/or application developer the corresponding interfaces. In the sense of portability, however, it may be rather useful to correspond to BSD semantics, so I vote to change sync and offer dirsync -- it might be useful to use BSD nomenclature though and offer a synonymous noasync option (which also has dirsync semantics as laid out in this thread before). It does not matter whether you look at BSD or Linux man pages, either one documents "sync - All I/O to the file system should be done synchronously." It should behave like documented. Let's not introduce compatibility problems by fixing the documentation. > Is it really such an important feature that we're willing to suffer a > factor-of-100 or more slowdown for it? It depends on the task. If the machine regularly reads data from an external interface and records it to disk, like in a log, synchronous data writes may be important. BTW, Wietse Venema was dazzled when he got to know that Linux values its syslog (defaults to all-synchronous writes) more than its mail (defaults to all-asynchronous writes that not even a file fsync() rectifies). > Not-scalable is doing 5000 seeks to write a 4MB file. If that's what the admin wants, let him do so. > what the inode is like during the write. All that is desired in that > example is fsync-on-close, and it is insane to implement > fsync-on-close by writing every single block of the file > synchronously. Yes, but you cannot generalize this example. > appropriate. If we want that, lets add it as a new option, but I > don't see the benefit in making o- sync do all file data writes 100% > synchronously. Compatibility and reliability are good points. -- Matthias Andree "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." Benjamin Franklin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-15 21:45 ` Stephen C. Tweedie 2001-11-15 22:02 ` Andrew Morton 2001-11-16 0:19 ` Jeff Garzik @ 2001-11-16 3:07 ` Neil Brown 2001-11-17 7:13 ` Andrew Morton 2 siblings, 1 reply; 13+ messages in thread From: Neil Brown @ 2001-11-16 3:07 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrew Morton, lkml On Thursday November 15, sct@redhat.com wrote: > Hi, > > On Thu, Nov 15, 2001 at 12:03:57AM -0800, Andrew Morton wrote: > > Linux is not syncing write() data for files on synchronously mounted > > filesystems, and it isn't syncing write() data for ext2/3 files which > > are operating under `chattr +S'. > > In the past, chattr +S and mount -o sync always resulted in sync > metadata with no guarantees about data. > > I'm not sure this makes much sense, but it's what has always happened. > For directories, the behaviour is fine, in particular as it gives us > the same directory sync consistency semantics as synchronous BSD UFS. > > It's not clear to me that chattr/mount sync options make _any_ sense > for regular file metadata. Rather than tightening up the semantics, > I'd actually prefer to restrict them so that they only apply to > directories. Users who set the sync bits are usually doing so for > applications like MTAs where it's directory syncing which is > what matters: the apps typically fsync the files themselves, anyway. Ok, but what about openning with O_SYNC? Surely that should sync the data. This issue came up because I was looking at how much disc activity the various sync options causes on an ext3 filesystem, and on the journal (which I had on a separate drive so I could differentiate journal IO and filesystem IO). You've probably already seen them on ext3-users, but I will repeat the numbers below for linux-kernel readers. If we forget about the "sync" mounts, as there is some question over their intended semantics, the issue becomes that ext3 does not appear to properly honour "fsync" in data=writeback mode (unless the data is going to the journal on fsync) and does not properly honour O_SYNC *except* in data=writeback mode. Andrew's patch fixes O_SYNC in data=journal and data=ordered modes, but does not fix fsync in data=writeback mode. NeilBrown ------------------------------------- Results: For each test I wrote the first 40000 bytes of a pre-existing file 100 times and counted the nubmer of scsi operations (which may be multi-block operations) on the filesytem device (first column) and the journal device (second column). For "async" I open, write, close 100 times. For "fsync" I open, write, fsync, close 100 times. For "osync" I open(,O_SYNC), write, close 100 times. ext2 sync async 0 0 fsync 200 0 osync 101 0 ext2 async async 0 0 fsync 200 0 osync 101 0 ext3 sync,data=journal async 0 200 fsync 0 300 osync 0 2 ext3 async,data=journal async 0 0 fsync 0 200 osync 0 2 ext3 sync,data=ordered async 100 200 fsync 100 300 osync 0 0 ext3 async,data=ordered async 0 0 fsync 100 200 osync 1 2 ext3 sync,data=writeback async 0 200 fsync 0 300 osync 100 2 ext3 async,data=writeback async 0 0 fsync 0 200 osync 100 2 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: synchronous mounts 2001-11-16 3:07 ` Neil Brown @ 2001-11-17 7:13 ` Andrew Morton 0 siblings, 0 replies; 13+ messages in thread From: Andrew Morton @ 2001-11-17 7:13 UTC (permalink / raw) To: Neil Brown; +Cc: Stephen C. Tweedie, lkml Neil Brown wrote: > > Andrew's patch fixes O_SYNC in data=journal and data=ordered modes, > but does not fix fsync in data=writeback mode. > Neil, thanks for spotting this. patch-2.4.10-pre11 introduced a new per-inode dirty buffer queue, inode.i_dirty_data_buffers. generic_commit_write() was changed so that it placed buffers on that queue, rather than i_dirty_buffers. A new function fsync_inode_data_buffers() was added for writing the new buffer list out. That patch changed ext2 and reiserfs to use the new function. The patch was not copied to any mailing list. There was no announcement and no discussion. There are no comments in the code indicating the distinction between these lists and why we have two such. This change broke fsync() for three in-kernel filesystems as well as the then-out-of-kernel ext3 and presumably any other buffer-based Linux filesystems. Ah. google finds this, from April: http://www.uwsg.iu.edu/hypermail/linux/kernel/0104.1/0810.html which attempts to explain why the separate list was added. It appears to be bogus. There is no reason why, if correctly done, syncing a list of ext2 data buffers and indirects should be significantly slower than syncing just the data. Here's a fix. A better fix would be to back out the unnecesary list and its support functions. --- linux-2.4.15-pre5/fs/minix/file.c Mon Sep 10 07:31:30 2001 +++ linux-akpm/fs/minix/file.c Fri Nov 16 22:07:53 2001 @@ -30,8 +30,10 @@ struct inode_operations minix_file_inode int minix_sync_file(struct file * file, struct dentry *dentry, int datasync) { struct inode *inode = dentry->d_inode; - int err = fsync_inode_buffers(inode); + int err; + err = fsync_inode_buffers(inode); + err |= fsync_inode_data_buffers(inode); if (!(inode->i_state & I_DIRTY)) return err; if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) --- linux-2.4.15-pre5/fs/sysv/file.c Thu Nov 15 23:44:29 2001 +++ linux-akpm/fs/sysv/file.c Fri Nov 16 22:09:20 2001 @@ -35,8 +35,10 @@ struct inode_operations sysv_file_inode_ int sysv_sync_file(struct file * file, struct dentry *dentry, int datasync) { struct inode *inode = dentry->d_inode; - int err = fsync_inode_buffers(inode); + int err; + err = fsync_inode_buffers(inode); + err |= fsync_inode_data_buffers(inode); if (!(inode->i_state & I_DIRTY)) return err; if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) --- linux-2.4.15-pre5/fs/udf/fsync.c Mon Jun 11 19:15:27 2001 +++ linux-akpm/fs/udf/fsync.c Fri Nov 16 22:11:03 2001 @@ -45,6 +45,7 @@ int udf_fsync_inode(struct inode *inode, int err; err = fsync_inode_buffers(inode); + err |= fsync_inode_data_buffers(inode); if (!(inode->i_state & I_DIRTY)) return err; if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) --- linux-2.4.15-pre5/fs/ext3/fsync.c Thu Nov 15 23:44:29 2001 +++ linux-akpm/fs/ext3/fsync.c Fri Nov 16 22:09:34 2001 @@ -62,6 +62,7 @@ int ext3_sync_file(struct file * file, s * we'll end up waiting on them in commit. */ ret = fsync_inode_buffers(inode); + ret |= fsync_inode_data_buffers(inode); ext3_force_commit(inode->i_sb); ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2001-11-17 7:14 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-11-15 8:03 synchronous mounts Andrew Morton 2001-11-15 21:45 ` Stephen C. Tweedie 2001-11-15 22:02 ` Andrew Morton 2001-11-15 23:05 ` Stephen C. Tweedie 2001-11-16 0:19 ` Jeff Garzik 2001-11-16 8:33 ` Andrew Morton 2001-11-16 10:47 ` Matthias Andree 2001-11-16 12:28 ` Stephen C. Tweedie 2001-11-16 13:28 ` Jeff Garzik 2001-11-16 13:37 ` Jeff Garzik 2001-11-16 22:40 ` Matthias Andree 2001-11-16 3:07 ` Neil Brown 2001-11-17 7:13 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox