[CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
@ 2005-03-04  6:33 Junfeng Yang
  2005-03-04  7:16 ` Matt Mackall
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Junfeng Yang @ 2005-03-04  6:33 UTC (permalink / raw)
  To: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser; +Cc: mc

Hi,

FiSC (our file system checker) emits several warnings on ext2, jfs and
reiserfs, complaining that diretories or files are lost while FiSC
believes they should already be persistent on disk. (ext3 behaves
correctly.)

All warnings boil down to a single cause:  when these file systems are
mounted -o sync or dirsync, dirty blocks are still written out
asynchronously.  It appears to me that these mount options don't have any
effect on these file systems.  Is this the intended behavior?

man mount shows:

              sync   All  I/O to the file system should be done
synchronously.

              dirsync
                     All directory updates within the file  system  should
be
                     done  synchronously.   This  affects the following
system
                     calls: creat, link, unlink, symlink, mkdir, rmdir,
mknod
                     and rename.

Any clafirication on this would be very helpful,

-Junfeng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang
@ 2005-03-04  7:16 ` Matt Mackall
  2005-03-04  7:34 ` Jan Engelhardt
  2005-03-04  8:43 ` [MC] " Junfeng Yang
  2 siblings, 0 replies; 15+ messages in thread
From: Matt Mackall @ 2005-03-04  7:16 UTC (permalink / raw)
  To: Junfeng Yang
  Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc

On Thu, Mar 03, 2005 at 10:33:40PM -0800, Junfeng Yang wrote:
> 
> Hi,
> 
> FiSC (our file system checker) emits several warnings on ext2, jfs and
> reiserfs, complaining that diretories or files are lost while FiSC
> believes they should already be persistent on disk. (ext3 behaves
> correctly.)
> 
> All warnings boil down to a single cause:  when these file systems are
> mounted -o sync or dirsync, dirty blocks are still written out
> asynchronously.  It appears to me that these mount options don't have any
> effect on these file systems.  Is this the intended behavior?

I don't believe so. The sync option should definitionally make calls
to fsync for integrity redundant. This probably got broken ages ago
for ext2 in one of the many buffer/page cache refactorings.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang
  2005-03-04  7:16 ` Matt Mackall
@ 2005-03-04  7:34 ` Jan Engelhardt
  2005-03-04  8:01   ` Junfeng Yang
  2005-03-07 17:29   ` Alan Cox
  2005-03-04  8:43 ` [MC] " Junfeng Yang
  2 siblings, 2 replies; 15+ messages in thread
From: Jan Engelhardt @ 2005-03-04  7:34 UTC (permalink / raw)
  To: Junfeng Yang
  Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc

>All warnings boil down to a single cause:  when these file systems are
>mounted -o sync or dirsync, dirty blocks are still written out
>asynchronously.  It appears to me that these mount options don't have any
>effect on these file systems.  Is this the intended behavior?

At least my HDD LED flashes regularly when I add -o sync...
(Using `mount / -o remount,sync`)

It may happen that FISC reads the disk before the write command even finished. 
With all the HD head movement optimization in the kernel (block layer, 
boiling down to TCQ/NCQ), this sounds possible.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  7:34 ` Jan Engelhardt
@ 2005-03-04  8:01   ` Junfeng Yang
  2005-03-07 17:29   ` Alan Cox
  1 sibling, 0 replies; 15+ messages in thread
From: Junfeng Yang @ 2005-03-04  8:01 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc

> It may happen that FISC reads the disk before the write command even finished.
> With all the HD head movement optimization in the kernel (block layer,
> boiling down to TCQ/NCQ), this sounds possible.

FiSC "crashes" the kernel immediately after a file system operation
(creat, mkdir, write, etc) returns.  Presumably, if a file system is
mounted -o sync, all the FS operations should be done synchronously. i.e.,
if creat("foo") returns, the file "foo" better be on disk.  It turns out
not the case for ext2, jfs and reiserfs.

-Junfeng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  7:34 ` Jan Engelhardt
  2005-03-04  8:01   ` Junfeng Yang
@ 2005-03-07 17:29   ` Alan Cox
  2005-03-07 22:29     ` Junfeng Yang
  1 sibling, 1 reply; 15+ messages in thread
From: Alan Cox @ 2005-03-07 17:29 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Junfeng Yang, Linux Kernel Mailing List, ext2-devel,
	jfs-discussion, reiser, mc

The IDE layer default is still unfortunately broken and leaves write
caching enabled. Turn it off with hdparm.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-07 17:29   ` Alan Cox
@ 2005-03-07 22:29     ` Junfeng Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Junfeng Yang @ 2005-03-07 22:29 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jan Engelhardt, Linux Kernel Mailing List, ext2-devel,
	jfs-discussion, reiserfs-list, mc


FiSC can still get those warnings with hdparm -W 0, or with a simple
ramdisk that serves the disk requests whenever they are submitted.

Thanks,
-Junfeng

On Mon, 7 Mar 2005, Alan Cox wrote:

> The IDE layer default is still unfortunately broken and leaves write
> caching enabled. Turn it off with hdparm.
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang
  2005-03-04  7:16 ` Matt Mackall
  2005-03-04  7:34 ` Jan Engelhardt
@ 2005-03-04  8:43 ` Junfeng Yang
  2005-03-04  9:11   ` Andrew Morton
  2 siblings, 1 reply; 15+ messages in thread
From: Junfeng Yang @ 2005-03-04  8:43 UTC (permalink / raw)
  To: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser; +Cc: mc

On Thu, 3 Mar 2005, Junfeng Yang wrote:

>
> Hi,
>
> FiSC (our file system checker) emits several warnings on ext2, jfs and
> reiserfs, complaining that diretories or files are lost while FiSC
> believes they should already be persistent on disk. (ext3 behaves
> correctly.)

I forget to mention, we are mainly looking for crash-recovery bugs.  The
warnings can trigger this way:
1. do several file system operations
2. "crash" the test machine
3. get the crashed disk image, run fsck to recover
4. mount the recovered disk image

I'm able to reproduce the same warnings on ext2 using the following
program:

main()
{
        system("sudo umount /dev/hda9");
        system("/sbin/mke2fs /dev/hda9");
        system("sudo mount -t ext2 /dev/hda9 /mnt/sbd1 -o sync,dirsync");
        creat("/mnt/sbd1/0002", 0777);
        mkdir("/mnt/sbd1/0003", 0777);
	// unplug your power cord here :)  then use e2fsck to recover
}

uname -a shows
Linux notus 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  8:43 ` [MC] " Junfeng Yang
@ 2005-03-04  9:11   ` Andrew Morton
  2005-03-04  9:44     ` Junfeng Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2005-03-04  9:11 UTC (permalink / raw)
  To: Junfeng Yang; +Cc: linux-kernel, ext2-devel, jfs-discussion, reiser, mc

Junfeng Yang <yjf@stanford.edu> wrote:
>
> On Thu, 3 Mar 2005, Junfeng Yang wrote:
> 
> >
> > Hi,
> >
> > FiSC (our file system checker) emits several warnings on ext2, jfs and
> > reiserfs, complaining that diretories or files are lost while FiSC
> > believes they should already be persistent on disk. (ext3 behaves
> > correctly.)
> 
> I forget to mention, we are mainly looking for crash-recovery bugs.  The
> warnings can trigger this way:
> 1. do several file system operations
> 2. "crash" the test machine
> 3. get the crashed disk image, run fsck to recover
> 4. mount the recovered disk image
>
> I'm able to reproduce the same warnings on ext2 using the following
> program:
> 
> main()
> {
>         system("sudo umount /dev/hda9");
>         system("/sbin/mke2fs /dev/hda9");
>         system("sudo mount -t ext2 /dev/hda9 /mnt/sbd1 -o sync,dirsync");
>         creat("/mnt/sbd1/0002", 0777);
>         mkdir("/mnt/sbd1/0003", 0777);
> 	// unplug your power cord here :)  then use e2fsck to recover
> }

That would be a bug.  Please send the e2fsck output.

> uname -a shows
> Linux notus 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux

It would be much better to test vaguely contemporary kernels.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  9:11   ` Andrew Morton
@ 2005-03-04  9:44     ` Junfeng Yang
  2005-03-04 10:27       ` Lars Marowsky-Bree
  0 siblings, 1 reply; 15+ messages in thread
From: Junfeng Yang @ 2005-03-04  9:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc

> That would be a bug.  Please send the e2fsck output.

Here is the trace

1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60
and  mounted with -o sync,dirsync

1.  operations FiSC did:

creat(/mnt/sbd0/0001)
write(/mnt/sbd0/0001)
rename(/mnt/sbd0/0001, /mnt/sbd0/0002)
mkdir(/mnt/sbd0/0003)

2.  FiSC "crashed" the test machine  after mkdir returns.  Crashed
disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2

e2fsck output is:

e2fsck 1.36 (05-Feb-2005)
/dev/hda9 was not cleanly unmounted, check
forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 12, i_blocks is 16, should be 2.  Fix? yes

Pass 2: Checking directory structure
Entry '0003' in / (2) has deleted/unused inode 13.  Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -21
Fix? yes

Free blocks count wrong for group #0 (38, counted=39).
Fix? yes

Free blocks count wrong (38, counted=39).
Fix? yes

Inode bitmap differences:  -13
Fix? yes

Free inodes count wrong for group #0 (3, counted=4).
Fix? yes

Directories count wrong for group #0 (3, counted=2).
Fix? yes

Free inodes count wrong (3, counted=4).
Fix? yes

/dev/hda9: ***** FILE SYSTEM WAS MODIFIED
*****
/dev/hda9: 12/16 files (0.0% non-contiguous), 21/60 blocks

>
>
> It would be much better to test vaguely contemporary kernels.
>

I'm going to check 2.6.11 tonight.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04  9:44     ` Junfeng Yang
@ 2005-03-04 10:27       ` Lars Marowsky-Bree
  2005-03-04 11:20         ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Lars Marowsky-Bree @ 2005-03-04 10:27 UTC (permalink / raw)
  To: Junfeng Yang, Andrew Morton
  Cc: Linux Kernel Mailing List, ext2-devel, jfs-discussion, reiser, mc

On 2005-03-04T01:44:06, Junfeng Yang <yjf@stanford.edu> wrote:

> > That would be a bug.  Please send the e2fsck output.
> 
> Here is the trace
> 
> 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60
> and  mounted with -o sync,dirsync
> 
> 1.  operations FiSC did:
> 
> creat(/mnt/sbd0/0001)
> write(/mnt/sbd0/0001)
> rename(/mnt/sbd0/0001, /mnt/sbd0/0002)
> mkdir(/mnt/sbd0/0003)
> 
> 2.  FiSC "crashed" the test machine  after mkdir returns.  Crashed
> disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2

I've run into similar issues. For example, a "touch foo" also isn't
synchronous with -o sync, but stays entirely in the cache. Andrea tells
me this is expected behaviour, so I've given up on this one...


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04 10:27       ` Lars Marowsky-Bree
@ 2005-03-04 11:20         ` Andrew Morton
  2005-03-04 23:03           ` Junfeng Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2005-03-04 11:20 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: yjf, linux-kernel, ext2-devel, jfs-discussion, reiser, mc

Lars Marowsky-Bree <lmb@suse.de> wrote:
>
> On 2005-03-04T01:44:06, Junfeng Yang <yjf@stanford.edu> wrote:
> 
> > > That would be a bug.  Please send the e2fsck output.
> > 
> > Here is the trace
> > 
> > 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60
> > and  mounted with -o sync,dirsync
> > 
> > 1.  operations FiSC did:
> > 
> > creat(/mnt/sbd0/0001)
> > write(/mnt/sbd0/0001)
> > rename(/mnt/sbd0/0001, /mnt/sbd0/0002)
> > mkdir(/mnt/sbd0/0003)
> > 
> > 2.  FiSC "crashed" the test machine  after mkdir returns.  Crashed
> > disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2
> 
> I've run into similar issues. For example, a "touch foo" also isn't
> synchronous with -o sync, but stays entirely in the cache. Andrea tells
> me this is expected behaviour, so I've given up on this one...
> 

Why is that expected behaviour?  I have vague memories which agree with
that, but I cannot remember the reasoning.

>From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and
there might be some O_SYNC ones there as well.

Problem is, it's subtle because we try to defer I/O until the last stage,
to avoid doing extra I/O.

So this wild scattergun patch probably does extra work and possibly extra
I/O all over the place, but I'd be interested if Junfeng could give it a
quick test.   It's against 2.6.11.

A real patch would take some painstaking work.


diff -puN fs/ext2/balloc.c~ext2-sync-fix fs/ext2/balloc.c
--- 25/fs/ext2/balloc.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/balloc.c	2005-03-04 02:49:00.000000000 -0800
@@ -139,8 +139,9 @@ static void release_blocks(struct super_
 	}
 }
 
-static int group_reserve_blocks(struct ext2_sb_info *sbi, int group_no,
-	struct ext2_group_desc *desc, struct buffer_head *bh, int count)
+static int group_reserve_blocks(struct super_block *sb,
+	struct ext2_sb_info *sbi, int group_no, struct ext2_group_desc *desc,
+	struct buffer_head *bh, int count)
 {
 	unsigned free_blocks;
 
@@ -154,6 +155,8 @@ static int group_reserve_blocks(struct e
 	desc->bg_free_blocks_count = cpu_to_le16(free_blocks - count);
 	spin_unlock(sb_bgl_lock(sbi, group_no));
 	mark_buffer_dirty(bh);
+	if (sb->s_flags & MS_SYNCHRONOUS)
+		sync_dirty_buffer(bh);
 	return count;
 }
 
@@ -170,6 +173,8 @@ static void group_release_blocks(struct 
 		spin_unlock(sb_bgl_lock(sbi, group_no));
 		sb->s_dirt = 1;
 		mark_buffer_dirty(bh);
+		if (sb->s_flags & MS_SYNCHRONOUS)
+			sync_dirty_buffer(bh);
 	}
 }
 
@@ -377,7 +382,7 @@ int ext2_new_block(struct inode *inode, 
 		goto io_error;
 	}
 
-	group_alloc = group_reserve_blocks(sbi, group_no, desc,
+	group_alloc = group_reserve_blocks(sb, sbi, group_no, desc,
 					gdp_bh, es_alloc);
 	if (group_alloc) {
 		ret_block = ((goal - le32_to_cpu(es->s_first_data_block)) %
@@ -413,7 +418,7 @@ retry:
 		desc = ext2_get_group_desc(sb, group_no, &gdp_bh);
 		if (!desc)
 			goto io_error;
-		group_alloc = group_reserve_blocks(sbi, group_no, desc,
+		group_alloc = group_reserve_blocks(sb, sbi, group_no, desc,
 						gdp_bh, es_alloc);
 	}
 	if (!group_alloc) {
diff -puN fs/ext2/ialloc.c~ext2-sync-fix fs/ext2/ialloc.c
--- 25/fs/ext2/ialloc.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/ialloc.c	2005-03-04 02:54:13.000000000 -0800
@@ -86,6 +86,8 @@ static void ext2_release_inode(struct su
 		percpu_counter_dec(&EXT2_SB(sb)->s_dirs_counter);
 	sb->s_dirt = 1;
 	mark_buffer_dirty(bh);
+	if (sb->s_flags & MS_SYNCHRONOUS)
+		sync_dirty_buffer(bh);
 }
 
 /*
@@ -563,6 +565,8 @@ got:
 
 	sb->s_dirt = 1;
 	mark_buffer_dirty(bh2);
+	if (sb->s_flags & MS_SYNCHRONOUS)
+		sync_dirty_buffer(bh2);
 	inode->i_uid = current->fsuid;
 	if (test_opt (sb, GRPID))
 		inode->i_gid = dir->i_gid;
@@ -614,7 +618,7 @@ got:
 		DQUOT_FREE_INODE(inode);
 		goto fail2;
 	}
-	mark_inode_dirty(inode);
+	ext2_mark_inode_dirty(inode);
 	ext2_debug("allocating inode %lu\n", inode->i_ino);
 	ext2_preread_inode(inode);
 	return inode;
diff -puN fs/ext2/super.c~ext2-sync-fix fs/ext2/super.c
--- 25/fs/ext2/super.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/super.c	2005-03-04 02:49:00.000000000 -0800
@@ -1097,6 +1097,8 @@ static ssize_t ext2_quota_write(struct s
 		set_buffer_uptodate(bh);
 		mark_buffer_dirty(bh);
 		unlock_buffer(bh);
+		if (sb->s_flags & MS_SYNCHRONOUS)
+			sync_dirty_buffer(bh);
 		brelse(bh);
 		offset = 0;
 		towrite -= tocopy;
@@ -1110,8 +1112,8 @@ out:
 		i_size_write(inode, off+len-towrite);
 	inode->i_version++;
 	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-	mark_inode_dirty(inode);
 	up(&inode->i_sem);
+	ext2_mark_inode_dirty(inode);
 	return len - towrite;
 }
 
diff -puN fs/ext2/xattr.c~ext2-sync-fix fs/ext2/xattr.c
--- 25/fs/ext2/xattr.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/xattr.c	2005-03-04 02:49:00.000000000 -0800
@@ -348,6 +348,8 @@ static void ext2_xattr_update_super_bloc
 	sb->s_dirt = 1;
 	mark_buffer_dirty(EXT2_SB(sb)->s_sbh);
 	unlock_super(sb);
+	if (sb->s_flags & MS_SYNCHRONOUS)
+		sync_dirty_buffer(EXT2_SB(sb)->s_sbh);
 }
 
 /*
diff -puN fs/ext2/dir.c~ext2-sync-fix fs/ext2/dir.c
--- 25/fs/ext2/dir.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/dir.c	2005-03-04 02:49:00.000000000 -0800
@@ -428,7 +428,7 @@ void ext2_set_link(struct inode *dir, st
 	ext2_put_page(page);
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
 	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
-	mark_inode_dirty(dir);
+	ext2_mark_inode_dirty(dir);
 }
 
 /*
@@ -518,7 +518,7 @@ got_it:
 	err = ext2_commit_chunk(page, from, to);
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
 	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
-	mark_inode_dirty(dir);
+	ext2_mark_inode_dirty(dir);
 	/* OFFSET_CACHE */
 out_put:
 	ext2_put_page(page);
@@ -566,7 +566,7 @@ int ext2_delete_entry (struct ext2_dir_e
 	err = ext2_commit_chunk(page, from, to);
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
 	EXT2_I(inode)->i_flags &= ~EXT2_BTREE_FL;
-	mark_inode_dirty(inode);
+	ext2_mark_inode_dirty(inode);
 out:
 	ext2_put_page(page);
 	return err;
diff -puN fs/ext2/inode.c~ext2-sync-fix fs/ext2/inode.c
--- 25/fs/ext2/inode.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/inode.c	2005-03-04 02:49:00.000000000 -0800
@@ -41,6 +41,17 @@ MODULE_LICENSE("GPL");
 static int ext2_update_inode(struct inode * inode, int do_sync);
 
 /*
+ * dirty an ext2 inode and sync it if needed
+ */
+int ext2_mark_inode_dirty(struct inode *inode)
+{
+	mark_inode_dirty(inode);
+	if (inode_needs_sync(inode))
+		return ext2_update_inode(inode, 1);
+	return 0;
+}
+
+/*
  * Test whether an inode is a fast symlink.
  */
 static inline int ext2_inode_is_fast_symlink(struct inode *inode)
@@ -60,8 +71,7 @@ void ext2_delete_inode (struct inode * i
 	if (is_bad_inode(inode))
 		goto no_delete;
 	EXT2_I(inode)->i_dtime	= get_seconds();
-	mark_inode_dirty(inode);
-	ext2_update_inode(inode, inode_needs_sync(inode));
+	ext2_mark_inode_dirty(inode);
 
 	inode->i_size = 0;
 	if (inode->i_blocks)
diff -puN fs/ext2/acl.c~ext2-sync-fix fs/ext2/acl.c
diff -puN fs/ext2/ioctl.c~ext2-sync-fix fs/ext2/ioctl.c
--- 25/fs/ext2/ioctl.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/ioctl.c	2005-03-04 02:49:00.000000000 -0800
@@ -60,7 +60,7 @@ int ext2_ioctl (struct inode * inode, st
 
 		ext2_set_inode_flags(inode);
 		inode->i_ctime = CURRENT_TIME_SEC;
-		mark_inode_dirty(inode);
+		ext2_mark_inode_dirty(inode);
 		return 0;
 	}
 	case EXT2_IOC_GETVERSION:
@@ -73,7 +73,7 @@ int ext2_ioctl (struct inode * inode, st
 		if (get_user(inode->i_generation, (int __user *) arg))
 			return -EFAULT;	
 		inode->i_ctime = CURRENT_TIME_SEC;
-		mark_inode_dirty(inode);
+		ext2_mark_inode_dirty(inode);
 		return 0;
 	default:
 		return -ENOTTY;
diff -puN fs/ext2/namei.c~ext2-sync-fix fs/ext2/namei.c
--- 25/fs/ext2/namei.c~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/namei.c	2005-03-04 02:55:15.000000000 -0800
@@ -132,7 +132,7 @@ static int ext2_create (struct inode * d
 			inode->i_mapping->a_ops = &ext2_nobh_aops;
 		else
 			inode->i_mapping->a_ops = &ext2_aops;
-		mark_inode_dirty(inode);
+		ext2_mark_inode_dirty(inode);
 		err = ext2_add_nondir(dentry, inode);
 	}
 	return err;
@@ -153,7 +153,7 @@ static int ext2_mknod (struct inode * di
 #ifdef CONFIG_EXT2_FS_XATTR
 		inode->i_op = &ext2_special_inode_operations;
 #endif
-		mark_inode_dirty(inode);
+		ext2_mark_inode_dirty(inode);
 		err = ext2_add_nondir(dentry, inode);
 	}
 	return err;
@@ -191,7 +191,7 @@ static int ext2_symlink (struct inode * 
 		memcpy((char*)(EXT2_I(inode)->i_data),symname,l);
 		inode->i_size = l-1;
 	}
-	mark_inode_dirty(inode);
+	ext2_mark_inode_dirty(inode);
 
 	err = ext2_add_nondir(dentry, inode);
 out:
diff -puN fs/ext2/ext2.h~ext2-sync-fix fs/ext2/ext2.h
--- 25/fs/ext2/ext2.h~ext2-sync-fix	2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/ext2.h	2005-03-04 02:49:00.000000000 -0800
@@ -116,6 +116,7 @@ extern unsigned long ext2_count_free (st
 /* inode.c */
 extern void ext2_read_inode (struct inode *);
 extern int ext2_write_inode (struct inode *, int);
+int ext2_mark_inode_dirty(struct inode *inode);
 extern void ext2_delete_inode (struct inode *);
 extern int ext2_sync_inode (struct inode *);
 extern void ext2_discard_prealloc (struct inode *);
_


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04 11:20         ` Andrew Morton
@ 2005-03-04 23:03           ` Junfeng Yang
  2005-03-04 23:29             ` Andrew Morton
  2005-03-08  0:31             ` Bernd Eckenfels
  0 siblings, 2 replies; 15+ messages in thread
From: Junfeng Yang @ 2005-03-04 23:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lars Marowsky-Bree, Linux Kernel Mailing List, ext2-devel, mc

> >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and
> there might be some O_SYNC ones there as well.

I should be able to easily add O_SYNC check to FiSC.  Several questions:
1. Does O_SYNC apply to directory as well?
2. For the same file, if I open twice, once with O_SYNC and another time
without, only writes through the O_SYNC fd will be sychonous, right?
3. I open a file w/o O_SYNC, issue a bunch of writes, then call
ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
Only the second set of writes are synchronous?

btw, man page show that O_DSYNC and O_RSYNC are just O_SYNC.  Is this true
for current linux kernel (2.6)?

> So this wild scattergun patch probably does extra work and possibly extra
> I/O all over the place, but I'd be interested if Junfeng could give it a
> quick test.   It's against 2.6.11.

I checked 2.6.11 with your patch just now.  Looks like the problem is
still there.  If you need more information, let me know.  Image is at
http://fisc.stanford.edu/bug2/crash-1.img.bz2.  Below is the output from
e2fsck.

e2fsck 1.36 (05-Feb-2005)
/dev/ide/host0/bus0/target0/lun0/part9 was not cleanly unmounted, check
forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 13, i_blocks is 16, should be 2.  Fix? yes

Inode 15 is a zero-length directory.  Clear? yes

Pass 2: Checking directory structure
Entry '0005' in / (2) has deleted/unused inode 15.  Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 2 ref count is 4, should be 3.  Fix? yes

Pass 5: Checking group summary information
Block bitmap differences:  -21
Fix? yes

Free blocks count wrong for group #0 (38, counted=39).
Fix? yes

Free blocks count wrong (38, counted=39).
Fix? yes

Inode bitmap differences:  -15
Fix? yes

Free inodes count wrong for group #0 (1, counted=2).
Fix? yes

Directories count wrong for group #0 (3, counted=2).
Fix? yes

Free inodes count wrong (1, counted=2).
Fix? yes

/dev/ide/host0/bus0/target0/lun0/part9: ***** FILE SYSTEM WAS MODIFIED
*****
/dev/ide/host0/bus0/target0/lun0/part9: 14/16 files (0.0% non-contiguous),
21/60 blocks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04 23:03           ` Junfeng Yang
@ 2005-03-04 23:29             ` Andrew Morton
  2005-03-08  0:31             ` Bernd Eckenfels
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2005-03-04 23:29 UTC (permalink / raw)
  To: Junfeng Yang; +Cc: lmb, linux-kernel, ext2-devel, mc

Junfeng Yang <yjf@stanford.edu> wrote:
>
> > >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and
> > there might be some O_SYNC ones there as well.
> 
> I should be able to easily add O_SYNC check to FiSC.  Several questions:
> 1. Does O_SYNC apply to directory as well?

Only if you can open directores for writing ;)

> 2. For the same file, if I open twice, once with O_SYNC and another time
> without, only writes through the O_SYNC fd will be sychonous, right?

Yes, O_SYNC is a per-fd thing.

> 3. I open a file w/o O_SYNC, issue a bunch of writes, then call
> ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
> Only the second set of writes are synchronous?

FIOSYNC is unrelated to O_SYNC.  OSYNC can only be set at open().

> btw, man page show that O_DSYNC and O_RSYNC are just O_SYNC.  Is this true
> for current linux kernel (2.6)?

The kernel only supports O_SYNC (equivalent behaviour to O_RSYNC|O_DSYNC). 
Perhaps glibc does a conversion.

> > So this wild scattergun patch probably does extra work and possibly extra
> > I/O all over the place, but I'd be interested if Junfeng could give it a
> > quick test.   It's against 2.6.11.
> 
> I checked 2.6.11 with your patch just now.  Looks like the problem is
> still there.  If you need more information, let me know.  Image is at
> http://fisc.stanford.edu/bug2/crash-1.img.bz2.  Below is the output from
> e2fsck.

ugh.  Thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-04 23:03           ` Junfeng Yang
  2005-03-04 23:29             ` Andrew Morton
@ 2005-03-08  0:31             ` Bernd Eckenfels
  2005-03-19 22:34               ` Florian Weimer
  1 sibling, 1 reply; 15+ messages in thread
From: Bernd Eckenfels @ 2005-03-08  0:31 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.GSO.4.44.0503041440030.17155-100000@elaine24.Stanford.EDU> you wrote:
> 3. I open a file w/o O_SYNC, issue a bunch of writes, then call
> ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
> Only the second set of writes are synchronous?

I also am curious if one can open a file, write to it, close it, open it and
do fsync()/fdatasync() on it?

Greetings
Bernd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
  2005-03-08  0:31             ` Bernd Eckenfels
@ 2005-03-19 22:34               ` Florian Weimer
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Weimer @ 2005-03-19 22:34 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

* Bernd Eckenfels:

> In article <Pine.GSO.4.44.0503041440030.17155-100000@elaine24.Stanford.EDU> you wrote:
>> 3. I open a file w/o O_SYNC, issue a bunch of writes, then call
>> ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
>> Only the second set of writes are synchronous?
>
> I also am curious if one can open a file, write to it, close it, open it and
> do fsync()/fdatasync() on it?

Hopefully the fsync/fdatasync call will flush all previous writes
(even from other processes).  Berkeley DB relies on this behavior for
correct operation.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-03-19 22:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-04  6:33 [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option? Junfeng Yang
2005-03-04  7:16 ` Matt Mackall
2005-03-04  7:34 ` Jan Engelhardt
2005-03-04  8:01   ` Junfeng Yang
2005-03-07 17:29   ` Alan Cox
2005-03-07 22:29     ` Junfeng Yang
2005-03-04  8:43 ` [MC] " Junfeng Yang
2005-03-04  9:11   ` Andrew Morton
2005-03-04  9:44     ` Junfeng Yang
2005-03-04 10:27       ` Lars Marowsky-Bree
2005-03-04 11:20         ` Andrew Morton
2005-03-04 23:03           ` Junfeng Yang
2005-03-04 23:29             ` Andrew Morton
2005-03-08  0:31             ` Bernd Eckenfels
2005-03-19 22:34               ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox