* [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
@ 2005-03-04 6:33 Junfeng Yang
2005-03-04 7:16 ` Matt Mackall
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Junfeng Yang @ 2005-03-04 6:33 UTC (permalink / raw)
To: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser; +Cc: mc
Hi,
FiSC (our file system checker) emits several warnings on ext2, jfs and
reiserfs, complaining that diretories or files are lost while FiSC
believes they should already be persistent on disk. (ext3 behaves
correctly.)
All warnings boil down to a single cause: when these file systems are
mounted -o sync or dirsync, dirty blocks are still written out
asynchronously. It appears to me that these mount options don't have any
effect on these file systems. Is this the intended behavior?
man mount shows:
sync All I/O to the file system should be done
synchronously.
dirsync
All directory updates within the file system should
be
done synchronously. This affects the following
system
calls: creat, link, unlink, symlink, mkdir, rmdir,
mknod
and rename.
Any clafirication on this would be very helpful,
-Junfeng
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang @ 2005-03-04 7:16 ` Matt Mackall 2005-03-04 7:34 ` Jan Engelhardt 2005-03-04 8:43 ` [MC] " Junfeng Yang 2 siblings, 0 replies; 15+ messages in thread From: Matt Mackall @ 2005-03-04 7:16 UTC (permalink / raw) To: Junfeng Yang Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc On Thu, Mar 03, 2005 at 10:33:40PM -0800, Junfeng Yang wrote: > > Hi, > > FiSC (our file system checker) emits several warnings on ext2, jfs and > reiserfs, complaining that diretories or files are lost while FiSC > believes they should already be persistent on disk. (ext3 behaves > correctly.) > > All warnings boil down to a single cause: when these file systems are > mounted -o sync or dirsync, dirty blocks are still written out > asynchronously. It appears to me that these mount options don't have any > effect on these file systems. Is this the intended behavior? I don't believe so. The sync option should definitionally make calls to fsync for integrity redundant. This probably got broken ages ago for ext2 in one of the many buffer/page cache refactorings. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang 2005-03-04 7:16 ` Matt Mackall @ 2005-03-04 7:34 ` Jan Engelhardt 2005-03-04 8:01 ` Junfeng Yang 2005-03-07 17:29 ` Alan Cox 2005-03-04 8:43 ` [MC] " Junfeng Yang 2 siblings, 2 replies; 15+ messages in thread From: Jan Engelhardt @ 2005-03-04 7:34 UTC (permalink / raw) To: Junfeng Yang Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc >All warnings boil down to a single cause: when these file systems are >mounted -o sync or dirsync, dirty blocks are still written out >asynchronously. It appears to me that these mount options don't have any >effect on these file systems. Is this the intended behavior? At least my HDD LED flashes regularly when I add -o sync... (Using `mount / -o remount,sync`) It may happen that FISC reads the disk before the write command even finished. With all the HD head movement optimization in the kernel (block layer, boiling down to TCQ/NCQ), this sounds possible. Jan Engelhardt -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 7:34 ` Jan Engelhardt @ 2005-03-04 8:01 ` Junfeng Yang 2005-03-07 17:29 ` Alan Cox 1 sibling, 0 replies; 15+ messages in thread From: Junfeng Yang @ 2005-03-04 8:01 UTC (permalink / raw) To: Jan Engelhardt Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc > It may happen that FISC reads the disk before the write command even finished. > With all the HD head movement optimization in the kernel (block layer, > boiling down to TCQ/NCQ), this sounds possible. FiSC "crashes" the kernel immediately after a file system operation (creat, mkdir, write, etc) returns. Presumably, if a file system is mounted -o sync, all the FS operations should be done synchronously. i.e., if creat("foo") returns, the file "foo" better be on disk. It turns out not the case for ext2, jfs and reiserfs. -Junfeng ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 7:34 ` Jan Engelhardt 2005-03-04 8:01 ` Junfeng Yang @ 2005-03-07 17:29 ` Alan Cox 2005-03-07 22:29 ` Junfeng Yang 1 sibling, 1 reply; 15+ messages in thread From: Alan Cox @ 2005-03-07 17:29 UTC (permalink / raw) To: Jan Engelhardt Cc: Junfeng Yang, Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc The IDE layer default is still unfortunately broken and leaves write caching enabled. Turn it off with hdparm. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-07 17:29 ` Alan Cox @ 2005-03-07 22:29 ` Junfeng Yang 0 siblings, 0 replies; 15+ messages in thread From: Junfeng Yang @ 2005-03-07 22:29 UTC (permalink / raw) To: Alan Cox Cc: Jan Engelhardt, Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiserfs-list, mc FiSC can still get those warnings with hdparm -W 0, or with a simple ramdisk that serves the disk requests whenever they are submitted. Thanks, -Junfeng On Mon, 7 Mar 2005, Alan Cox wrote: > The IDE layer default is still unfortunately broken and leaves write > caching enabled. Turn it off with hdparm. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang 2005-03-04 7:16 ` Matt Mackall 2005-03-04 7:34 ` Jan Engelhardt @ 2005-03-04 8:43 ` Junfeng Yang 2005-03-04 9:11 ` Andrew Morton 2 siblings, 1 reply; 15+ messages in thread From: Junfeng Yang @ 2005-03-04 8:43 UTC (permalink / raw) To: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser; +Cc: mc On Thu, 3 Mar 2005, Junfeng Yang wrote: > > Hi, > > FiSC (our file system checker) emits several warnings on ext2, jfs and > reiserfs, complaining that diretories or files are lost while FiSC > believes they should already be persistent on disk. (ext3 behaves > correctly.) I forget to mention, we are mainly looking for crash-recovery bugs. The warnings can trigger this way: 1. do several file system operations 2. "crash" the test machine 3. get the crashed disk image, run fsck to recover 4. mount the recovered disk image I'm able to reproduce the same warnings on ext2 using the following program: main() { system("sudo umount /dev/hda9"); system("/sbin/mke2fs /dev/hda9"); system("sudo mount -t ext2 /dev/hda9 /mnt/sbd1 -o sync,dirsync"); creat("/mnt/sbd1/0002", 0777); mkdir("/mnt/sbd1/0003", 0777); // unplug your power cord here :) then use e2fsck to recover } uname -a shows Linux notus 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 8:43 ` [MC] " Junfeng Yang @ 2005-03-04 9:11 ` Andrew Morton 2005-03-04 9:44 ` Junfeng Yang 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2005-03-04 9:11 UTC (permalink / raw) To: Junfeng Yang; +Cc: linux-kernel, ext2-devel, jfs-discussion, reiser, mc Junfeng Yang <yjf@stanford.edu> wrote: > > On Thu, 3 Mar 2005, Junfeng Yang wrote: > > > > > Hi, > > > > FiSC (our file system checker) emits several warnings on ext2, jfs and > > reiserfs, complaining that diretories or files are lost while FiSC > > believes they should already be persistent on disk. (ext3 behaves > > correctly.) > > I forget to mention, we are mainly looking for crash-recovery bugs. The > warnings can trigger this way: > 1. do several file system operations > 2. "crash" the test machine > 3. get the crashed disk image, run fsck to recover > 4. mount the recovered disk image > > I'm able to reproduce the same warnings on ext2 using the following > program: > > main() > { > system("sudo umount /dev/hda9"); > system("/sbin/mke2fs /dev/hda9"); > system("sudo mount -t ext2 /dev/hda9 /mnt/sbd1 -o sync,dirsync"); > creat("/mnt/sbd1/0002", 0777); > mkdir("/mnt/sbd1/0003", 0777); > // unplug your power cord here :) then use e2fsck to recover > } That would be a bug. Please send the e2fsck output. > uname -a shows > Linux notus 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux It would be much better to test vaguely contemporary kernels. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 9:11 ` Andrew Morton @ 2005-03-04 9:44 ` Junfeng Yang 2005-03-04 10:27 ` Lars Marowsky-Bree 0 siblings, 1 reply; 15+ messages in thread From: Junfeng Yang @ 2005-03-04 9:44 UTC (permalink / raw) To: Andrew Morton Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc > That would be a bug. Please send the e2fsck output. Here is the trace 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60 and mounted with -o sync,dirsync 1. operations FiSC did: creat(/mnt/sbd0/0001) write(/mnt/sbd0/0001) rename(/mnt/sbd0/0001, /mnt/sbd0/0002) mkdir(/mnt/sbd0/0003) 2. FiSC "crashed" the test machine after mkdir returns. Crashed disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2 e2fsck output is: e2fsck 1.36 (05-Feb-2005) /dev/hda9 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Inode 12, i_blocks is 16, should be 2. Fix? yes Pass 2: Checking directory structure Entry '0003' in / (2) has deleted/unused inode 13. Clear? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -21 Fix? yes Free blocks count wrong for group #0 (38, counted=39). Fix? yes Free blocks count wrong (38, counted=39). Fix? yes Inode bitmap differences: -13 Fix? yes Free inodes count wrong for group #0 (3, counted=4). Fix? yes Directories count wrong for group #0 (3, counted=2). Fix? yes Free inodes count wrong (3, counted=4). Fix? yes /dev/hda9: ***** FILE SYSTEM WAS MODIFIED ***** /dev/hda9: 12/16 files (0.0% non-contiguous), 21/60 blocks > > > It would be much better to test vaguely contemporary kernels. > I'm going to check 2.6.11 tonight. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 9:44 ` Junfeng Yang @ 2005-03-04 10:27 ` Lars Marowsky-Bree 2005-03-04 11:20 ` Andrew Morton 0 siblings, 1 reply; 15+ messages in thread From: Lars Marowsky-Bree @ 2005-03-04 10:27 UTC (permalink / raw) To: Junfeng Yang, Andrew Morton Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc On 2005-03-04T01:44:06, Junfeng Yang <yjf@stanford.edu> wrote: > > That would be a bug. Please send the e2fsck output. > > Here is the trace > > 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60 > and mounted with -o sync,dirsync > > 1. operations FiSC did: > > creat(/mnt/sbd0/0001) > write(/mnt/sbd0/0001) > rename(/mnt/sbd0/0001, /mnt/sbd0/0002) > mkdir(/mnt/sbd0/0003) > > 2. FiSC "crashed" the test machine after mkdir returns. Crashed > disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2 I've run into similar issues. For example, a "touch foo" also isn't synchronous with -o sync, but stays entirely in the cache. Andrea tells me this is expected behaviour, so I've given up on this one... Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 10:27 ` Lars Marowsky-Bree @ 2005-03-04 11:20 ` Andrew Morton 2005-03-04 23:03 ` Junfeng Yang 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2005-03-04 11:20 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: yjf, linux-kernel, ext2-devel, jfs-discussion, reiser, mc Lars Marowsky-Bree <lmb@suse.de> wrote: > > On 2005-03-04T01:44:06, Junfeng Yang <yjf@stanford.edu> wrote: > > > > That would be a bug. Please send the e2fsck output. > > > > Here is the trace > > > > 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60 > > and mounted with -o sync,dirsync > > > > 1. operations FiSC did: > > > > creat(/mnt/sbd0/0001) > > write(/mnt/sbd0/0001) > > rename(/mnt/sbd0/0001, /mnt/sbd0/0002) > > mkdir(/mnt/sbd0/0003) > > > > 2. FiSC "crashed" the test machine after mkdir returns. Crashed > > disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2 > > I've run into similar issues. For example, a "touch foo" also isn't > synchronous with -o sync, but stays entirely in the cache. Andrea tells > me this is expected behaviour, so I've given up on this one... > Why is that expected behaviour? I have vague memories which agree with that, but I cannot remember the reasoning. >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and there might be some O_SYNC ones there as well. Problem is, it's subtle because we try to defer I/O until the last stage, to avoid doing extra I/O. So this wild scattergun patch probably does extra work and possibly extra I/O all over the place, but I'd be interested if Junfeng could give it a quick test. It's against 2.6.11. A real patch would take some painstaking work. diff -puN fs/ext2/balloc.c~ext2-sync-fix fs/ext2/balloc.c --- 25/fs/ext2/balloc.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/balloc.c 2005-03-04 02:49:00.000000000 -0800 @@ -139,8 +139,9 @@ static void release_blocks(struct super_ } } -static int group_reserve_blocks(struct ext2_sb_info *sbi, int group_no, - struct ext2_group_desc *desc, struct buffer_head *bh, int count) +static int group_reserve_blocks(struct super_block *sb, + struct ext2_sb_info *sbi, int group_no, struct ext2_group_desc *desc, + struct buffer_head *bh, int count) { unsigned free_blocks; @@ -154,6 +155,8 @@ static int group_reserve_blocks(struct e desc->bg_free_blocks_count = cpu_to_le16(free_blocks - count); spin_unlock(sb_bgl_lock(sbi, group_no)); mark_buffer_dirty(bh); + if (sb->s_flags & MS_SYNCHRONOUS) + sync_dirty_buffer(bh); return count; } @@ -170,6 +173,8 @@ static void group_release_blocks(struct spin_unlock(sb_bgl_lock(sbi, group_no)); sb->s_dirt = 1; mark_buffer_dirty(bh); + if (sb->s_flags & MS_SYNCHRONOUS) + sync_dirty_buffer(bh); } } @@ -377,7 +382,7 @@ int ext2_new_block(struct inode *inode, goto io_error; } - group_alloc = group_reserve_blocks(sbi, group_no, desc, + group_alloc = group_reserve_blocks(sb, sbi, group_no, desc, gdp_bh, es_alloc); if (group_alloc) { ret_block = ((goal - le32_to_cpu(es->s_first_data_block)) % @@ -413,7 +418,7 @@ retry: desc = ext2_get_group_desc(sb, group_no, &gdp_bh); if (!desc) goto io_error; - group_alloc = group_reserve_blocks(sbi, group_no, desc, + group_alloc = group_reserve_blocks(sb, sbi, group_no, desc, gdp_bh, es_alloc); } if (!group_alloc) { diff -puN fs/ext2/ialloc.c~ext2-sync-fix fs/ext2/ialloc.c --- 25/fs/ext2/ialloc.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/ialloc.c 2005-03-04 02:54:13.000000000 -0800 @@ -86,6 +86,8 @@ static void ext2_release_inode(struct su percpu_counter_dec(&EXT2_SB(sb)->s_dirs_counter); sb->s_dirt = 1; mark_buffer_dirty(bh); + if (sb->s_flags & MS_SYNCHRONOUS) + sync_dirty_buffer(bh); } /* @@ -563,6 +565,8 @@ got: sb->s_dirt = 1; mark_buffer_dirty(bh2); + if (sb->s_flags & MS_SYNCHRONOUS) + sync_dirty_buffer(bh2); inode->i_uid = current->fsuid; if (test_opt (sb, GRPID)) inode->i_gid = dir->i_gid; @@ -614,7 +618,7 @@ got: DQUOT_FREE_INODE(inode); goto fail2; } - mark_inode_dirty(inode); + ext2_mark_inode_dirty(inode); ext2_debug("allocating inode %lu\n", inode->i_ino); ext2_preread_inode(inode); return inode; diff -puN fs/ext2/super.c~ext2-sync-fix fs/ext2/super.c --- 25/fs/ext2/super.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/super.c 2005-03-04 02:49:00.000000000 -0800 @@ -1097,6 +1097,8 @@ static ssize_t ext2_quota_write(struct s set_buffer_uptodate(bh); mark_buffer_dirty(bh); unlock_buffer(bh); + if (sb->s_flags & MS_SYNCHRONOUS) + sync_dirty_buffer(bh); brelse(bh); offset = 0; towrite -= tocopy; @@ -1110,8 +1112,8 @@ out: i_size_write(inode, off+len-towrite); inode->i_version++; inode->i_mtime = inode->i_ctime = CURRENT_TIME; - mark_inode_dirty(inode); up(&inode->i_sem); + ext2_mark_inode_dirty(inode); return len - towrite; } diff -puN fs/ext2/xattr.c~ext2-sync-fix fs/ext2/xattr.c --- 25/fs/ext2/xattr.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/xattr.c 2005-03-04 02:49:00.000000000 -0800 @@ -348,6 +348,8 @@ static void ext2_xattr_update_super_bloc sb->s_dirt = 1; mark_buffer_dirty(EXT2_SB(sb)->s_sbh); unlock_super(sb); + if (sb->s_flags & MS_SYNCHRONOUS) + sync_dirty_buffer(EXT2_SB(sb)->s_sbh); } /* diff -puN fs/ext2/dir.c~ext2-sync-fix fs/ext2/dir.c --- 25/fs/ext2/dir.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/dir.c 2005-03-04 02:49:00.000000000 -0800 @@ -428,7 +428,7 @@ void ext2_set_link(struct inode *dir, st ext2_put_page(page); dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC; EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL; - mark_inode_dirty(dir); + ext2_mark_inode_dirty(dir); } /* @@ -518,7 +518,7 @@ got_it: err = ext2_commit_chunk(page, from, to); dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC; EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL; - mark_inode_dirty(dir); + ext2_mark_inode_dirty(dir); /* OFFSET_CACHE */ out_put: ext2_put_page(page); @@ -566,7 +566,7 @@ int ext2_delete_entry (struct ext2_dir_e err = ext2_commit_chunk(page, from, to); inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC; EXT2_I(inode)->i_flags &= ~EXT2_BTREE_FL; - mark_inode_dirty(inode); + ext2_mark_inode_dirty(inode); out: ext2_put_page(page); return err; diff -puN fs/ext2/inode.c~ext2-sync-fix fs/ext2/inode.c --- 25/fs/ext2/inode.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/inode.c 2005-03-04 02:49:00.000000000 -0800 @@ -41,6 +41,17 @@ MODULE_LICENSE("GPL"); static int ext2_update_inode(struct inode * inode, int do_sync); /* + * dirty an ext2 inode and sync it if needed + */ +int ext2_mark_inode_dirty(struct inode *inode) +{ + mark_inode_dirty(inode); + if (inode_needs_sync(inode)) + return ext2_update_inode(inode, 1); + return 0; +} + +/* * Test whether an inode is a fast symlink. */ static inline int ext2_inode_is_fast_symlink(struct inode *inode) @@ -60,8 +71,7 @@ void ext2_delete_inode (struct inode * i if (is_bad_inode(inode)) goto no_delete; EXT2_I(inode)->i_dtime = get_seconds(); - mark_inode_dirty(inode); - ext2_update_inode(inode, inode_needs_sync(inode)); + ext2_mark_inode_dirty(inode); inode->i_size = 0; if (inode->i_blocks) diff -puN fs/ext2/acl.c~ext2-sync-fix fs/ext2/acl.c diff -puN fs/ext2/ioctl.c~ext2-sync-fix fs/ext2/ioctl.c --- 25/fs/ext2/ioctl.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/ioctl.c 2005-03-04 02:49:00.000000000 -0800 @@ -60,7 +60,7 @@ int ext2_ioctl (struct inode * inode, st ext2_set_inode_flags(inode); inode->i_ctime = CURRENT_TIME_SEC; - mark_inode_dirty(inode); + ext2_mark_inode_dirty(inode); return 0; } case EXT2_IOC_GETVERSION: @@ -73,7 +73,7 @@ int ext2_ioctl (struct inode * inode, st if (get_user(inode->i_generation, (int __user *) arg)) return -EFAULT; inode->i_ctime = CURRENT_TIME_SEC; - mark_inode_dirty(inode); + ext2_mark_inode_dirty(inode); return 0; default: return -ENOTTY; diff -puN fs/ext2/namei.c~ext2-sync-fix fs/ext2/namei.c --- 25/fs/ext2/namei.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/namei.c 2005-03-04 02:55:15.000000000 -0800 @@ -132,7 +132,7 @@ static int ext2_create (struct inode * d inode->i_mapping->a_ops = &ext2_nobh_aops; else inode->i_mapping->a_ops = &ext2_aops; - mark_inode_dirty(inode); + ext2_mark_inode_dirty(inode); err = ext2_add_nondir(dentry, inode); } return err; @@ -153,7 +153,7 @@ static int ext2_mknod (struct inode * di #ifdef CONFIG_EXT2_FS_XATTR inode->i_op = &ext2_special_inode_operations; #endif - mark_inode_dirty(inode); + ext2_mark_inode_dirty(inode); err = ext2_add_nondir(dentry, inode); } return err; @@ -191,7 +191,7 @@ static int ext2_symlink (struct inode * memcpy((char*)(EXT2_I(inode)->i_data),symname,l); inode->i_size = l-1; } - mark_inode_dirty(inode); + ext2_mark_inode_dirty(inode); err = ext2_add_nondir(dentry, inode); out: diff -puN fs/ext2/ext2.h~ext2-sync-fix fs/ext2/ext2.h --- 25/fs/ext2/ext2.h~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800 +++ 25-akpm/fs/ext2/ext2.h 2005-03-04 02:49:00.000000000 -0800 @@ -116,6 +116,7 @@ extern unsigned long ext2_count_free (st /* inode.c */ extern void ext2_read_inode (struct inode *); extern int ext2_write_inode (struct inode *, int); +int ext2_mark_inode_dirty(struct inode *inode); extern void ext2_delete_inode (struct inode *); extern int ext2_sync_inode (struct inode *); extern void ext2_discard_prealloc (struct inode *); _ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 11:20 ` Andrew Morton @ 2005-03-04 23:03 ` Junfeng Yang 2005-03-04 23:29 ` Andrew Morton 2005-03-08 0:31 ` Bernd Eckenfels 0 siblings, 2 replies; 15+ messages in thread From: Junfeng Yang @ 2005-03-04 23:03 UTC (permalink / raw) To: Andrew Morton Cc: Lars Marowsky-Bree, Linux Kernel Mailing List, ext2-devel, mc > >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and > there might be some O_SYNC ones there as well. I should be able to easily add O_SYNC check to FiSC. Several questions: 1. Does O_SYNC apply to directory as well? 2. For the same file, if I open twice, once with O_SYNC and another time without, only writes through the O_SYNC fd will be sychonous, right? 3. I open a file w/o O_SYNC, issue a bunch of writes, then call ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes. Only the second set of writes are synchronous? btw, man page show that O_DSYNC and O_RSYNC are just O_SYNC. Is this true for current linux kernel (2.6)? > So this wild scattergun patch probably does extra work and possibly extra > I/O all over the place, but I'd be interested if Junfeng could give it a > quick test. It's against 2.6.11. I checked 2.6.11 with your patch just now. Looks like the problem is still there. If you need more information, let me know. Image is at http://fisc.stanford.edu/bug2/crash-1.img.bz2. Below is the output from e2fsck. e2fsck 1.36 (05-Feb-2005) /dev/ide/host0/bus0/target0/lun0/part9 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Inode 13, i_blocks is 16, should be 2. Fix? yes Inode 15 is a zero-length directory. Clear? yes Pass 2: Checking directory structure Entry '0005' in / (2) has deleted/unused inode 15. Clear? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Inode 2 ref count is 4, should be 3. Fix? yes Pass 5: Checking group summary information Block bitmap differences: -21 Fix? yes Free blocks count wrong for group #0 (38, counted=39). Fix? yes Free blocks count wrong (38, counted=39). Fix? yes Inode bitmap differences: -15 Fix? yes Free inodes count wrong for group #0 (1, counted=2). Fix? yes Directories count wrong for group #0 (3, counted=2). Fix? yes Free inodes count wrong (1, counted=2). Fix? yes /dev/ide/host0/bus0/target0/lun0/part9: ***** FILE SYSTEM WAS MODIFIED ***** /dev/ide/host0/bus0/target0/lun0/part9: 14/16 files (0.0% non-contiguous), 21/60 blocks ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 23:03 ` Junfeng Yang @ 2005-03-04 23:29 ` Andrew Morton 2005-03-08 0:31 ` Bernd Eckenfels 1 sibling, 0 replies; 15+ messages in thread From: Andrew Morton @ 2005-03-04 23:29 UTC (permalink / raw) To: Junfeng Yang; +Cc: lmb, linux-kernel, ext2-devel, mc Junfeng Yang <yjf@stanford.edu> wrote: > > > >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and > > there might be some O_SYNC ones there as well. > > I should be able to easily add O_SYNC check to FiSC. Several questions: > 1. Does O_SYNC apply to directory as well? Only if you can open directores for writing ;) > 2. For the same file, if I open twice, once with O_SYNC and another time > without, only writes through the O_SYNC fd will be sychonous, right? Yes, O_SYNC is a per-fd thing. > 3. I open a file w/o O_SYNC, issue a bunch of writes, then call > ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes. > Only the second set of writes are synchronous? FIOSYNC is unrelated to O_SYNC. OSYNC can only be set at open(). > btw, man page show that O_DSYNC and O_RSYNC are just O_SYNC. Is this true > for current linux kernel (2.6)? The kernel only supports O_SYNC (equivalent behaviour to O_RSYNC|O_DSYNC). Perhaps glibc does a conversion. > > So this wild scattergun patch probably does extra work and possibly extra > > I/O all over the place, but I'd be interested if Junfeng could give it a > > quick test. It's against 2.6.11. > > I checked 2.6.11 with your patch just now. Looks like the problem is > still there. If you need more information, let me know. Image is at > http://fisc.stanford.edu/bug2/crash-1.img.bz2. Below is the output from > e2fsck. ugh. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-04 23:03 ` Junfeng Yang 2005-03-04 23:29 ` Andrew Morton @ 2005-03-08 0:31 ` Bernd Eckenfels 2005-03-19 22:34 ` Florian Weimer 1 sibling, 1 reply; 15+ messages in thread From: Bernd Eckenfels @ 2005-03-08 0:31 UTC (permalink / raw) To: linux-kernel In article <Pine.GSO.4.44.0503041440030.17155-100000@elaine24.Stanford.EDU> you wrote: > 3. I open a file w/o O_SYNC, issue a bunch of writes, then call > ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes. > Only the second set of writes are synchronous? I also am curious if one can open a file, write to it, close it, open it and do fsync()/fdatasync() on it? Greetings Bernd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? 2005-03-08 0:31 ` Bernd Eckenfels @ 2005-03-19 22:34 ` Florian Weimer 0 siblings, 0 replies; 15+ messages in thread From: Florian Weimer @ 2005-03-19 22:34 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel * Bernd Eckenfels: > In article <Pine.GSO.4.44.0503041440030.17155-100000@elaine24.Stanford.EDU> you wrote: >> 3. I open a file w/o O_SYNC, issue a bunch of writes, then call >> ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes. >> Only the second set of writes are synchronous? > > I also am curious if one can open a file, write to it, close it, open it and > do fsync()/fdatasync() on it? Hopefully the fsync/fdatasync call will flush all previous writes (even from other processes). Berkeley DB relies on this behavior for correct operation. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2005-03-19 22:35 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-04 6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang 2005-03-04 7:16 ` Matt Mackall 2005-03-04 7:34 ` Jan Engelhardt 2005-03-04 8:01 ` Junfeng Yang 2005-03-07 17:29 ` Alan Cox 2005-03-07 22:29 ` Junfeng Yang 2005-03-04 8:43 ` [MC] " Junfeng Yang 2005-03-04 9:11 ` Andrew Morton 2005-03-04 9:44 ` Junfeng Yang 2005-03-04 10:27 ` Lars Marowsky-Bree 2005-03-04 11:20 ` Andrew Morton 2005-03-04 23:03 ` Junfeng Yang 2005-03-04 23:29 ` Andrew Morton 2005-03-08 0:31 ` Bernd Eckenfels 2005-03-19 22:34 ` Florian Weimer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox