* [PATCH] fix problems related to journaling in Reiserfs
@ 2005-08-31 12:39 Hifumi Hisashi
2005-08-31 13:42 ` michael chang
2005-09-01 3:35 ` Hans Reiser
0 siblings, 2 replies; 9+ messages in thread
From: Hifumi Hisashi @ 2005-08-31 12:39 UTC (permalink / raw)
To: reiser, reiserfs-dev; +Cc: reiserfs-list, linux-fsdevel
Hello.
I noticed that the Reiserfs has some problems related to meta-data journaling.
I suppose that transactions regarding meta-data should be written to a
disk every
meta-data change (for example, i_size is increased) in ordered-mode while
synchronous
writing is performed. But, it seems to me the Reiserfs does not do that.
I did a following test.
1, Mount the Reiserfs in ordered-mode.
2, Run the test program that opens a test file with O_SYNC|O_CREAT flag,
and continues to
write 4bytes data in busy loop. Every write() is and its file size is
increasing.
3, While above the program is running, disconnect SCSI cable.
4, Hexdump a disk and see journal on a disk.
5, Reboot a system and again mount the Reiserfs in ordered mode.
I checked the size of file the test program created. I could not see
whole content of this file
because file size was shorter than it should be. The cause of this problem
is that
even though an i_size was changed, meta-data was not logged to a journal
area on disk
under synchronous writing.
I did same test on Ext3, and there was no such a problem. Ext3(jbd)
logged every change of
an i_size while an O_SYNC writing was being performed.
Following patch fix this problem.
Signed-off-by :Hisashi Hifumi<hifumi.hisashi@lab.ntt.co.jp>
diff -Nru linux-2.6.13/fs/reiserfs/file.c linux-2.6.13_fix/fs/reiserfs/file.c
--- linux-2.6.13/fs/reiserfs/file.c 2005-08-29 08:41:01.000000000 +0900
+++ linux-2.6.13_fix/fs/reiserfs/file.c 2005-08-31 16:33:33.000000000 +0900
@@ -819,7 +819,6 @@
int i; // loop counter
int offset; // Writing offset in page.
int orig_write_bytes = write_bytes;
- int sd_update = 0;
for (i = 0, offset = (pos & (PAGE_CACHE_SIZE - 1)); i < num_pages;
i++, offset = 0) {
@@ -855,17 +854,17 @@
if (th->t_trans_id) {
reiserfs_write_lock(inode->i_sb);
- reiserfs_update_sd(th, inode); // And update on-disk metadata
+ status = journal_end(th, th->t_super, th->t_blocks_allocated);
+ if (status)
+ retval = status;
reiserfs_write_unlock(inode->i_sb);
- } else
- inode->i_sb->s_op->dirty_inode(inode);
-
- sd_update = 1;
+ }
+ mark_inode_dirty(inode);
+ th->t_trans_id = 0;
}
if (th->t_trans_id) {
reiserfs_write_lock(inode->i_sb);
- if (!sd_update)
- reiserfs_update_sd(th, inode);
+ reiserfs_update_sd(th, inode);
status = journal_end(th, th->t_super, th->t_blocks_allocated);
if (status)
retval = status;
@@ -1526,10 +1525,13 @@
}
}
- if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
+ if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
res =
generic_osync_inode(inode, file->f_mapping,
OSYNC_METADATA | OSYNC_DATA);
+ if (res)
+ already_written = 0;
+ }
up(&inode->i_sem);
reiserfs_async_progress_wait(inode->i_sb);
diff -Nru linux-2.6.13/fs/reiserfs/inode.c linux-2.6.13_fix/fs/reiserfs/inode.c
--- linux-2.6.13/fs/reiserfs/inode.c 2005-08-29 08:41:01.000000000 +0900
+++ linux-2.6.13_fix/fs/reiserfs/inode.c 2005-08-31 16:33:33.000000000 +0900
@@ -1642,6 +1642,7 @@
{
struct reiserfs_transaction_handle th;
int jbegin_count = 1;
+ int err = 0;
if (inode->i_sb->s_flags & MS_RDONLY)
return -EROFS;
@@ -1654,11 +1655,11 @@
reiserfs_write_lock(inode->i_sb);
if (!journal_begin(&th, inode->i_sb, jbegin_count)) {
reiserfs_update_sd(&th, inode);
- journal_end_sync(&th, inode->i_sb, jbegin_count);
+ err = journal_end_sync(&th, inode->i_sb, jbegin_count);
}
reiserfs_write_unlock(inode->i_sb);
}
- return 0;
+ return err;
}
/* stat data of new object is inserted already, this inserts the item
Thanks,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-08-31 12:39 [PATCH] fix problems related to journaling in Reiserfs Hifumi Hisashi
@ 2005-08-31 13:42 ` michael chang
2005-09-01 0:02 ` Hifumi Hisashi
2005-09-01 3:35 ` Hans Reiser
1 sibling, 1 reply; 9+ messages in thread
From: michael chang @ 2005-08-31 13:42 UTC (permalink / raw)
To: Hifumi Hisashi; +Cc: reiser, reiserfs-dev, reiserfs-list, linux-fsdevel
On 8/31/05, Hifumi Hisashi <hifumi.hisashi@lab.ntt.co.jp> wrote:
> I noticed that the Reiserfs has some problems related to meta-data journaling.
> I suppose that transactions regarding meta-data should be written to a
> disk every meta-data change (for example, i_size is increased) in ordered-mode while
> synchronous writing is performed.
Surely we don't want this. Look at the papers on Namesys's websites,
about the atomicaty and the banking example. But that's just my
personal opinion. Besides, I believe it's more likely that usually
the power gets lost than the SCSI or IDE cable gets disconnected,
AFAIK...
--
~Mike
- Just my two cents
- No man is an island, and no man is unable.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-08-31 13:42 ` michael chang
@ 2005-09-01 0:02 ` Hifumi Hisashi
2005-09-01 0:37 ` michael chang
0 siblings, 1 reply; 9+ messages in thread
From: Hifumi Hisashi @ 2005-09-01 0:02 UTC (permalink / raw)
To: michael chang; +Cc: reiser, reiserfs-dev, reiserfs-list, linux-fsdevel
michael chang wrote:
>Surely we don't want this. Look at the papers on Namesys's websites,
>about the atomicaty and the banking example. But that's just my
>personal opinion. Besides, I believe it's more likely that usually
>the power gets lost than the SCSI or IDE cable gets disconnected,
>AFAIK...
>
>
A write() syscall with the O_SYNC flag must ensure that not only
file data block
but also journal (meta-data update) are written to a disk when this
syscall end.
But, current implementation of Reiserfs does not do that. If a system
crashes,
a filesystem recovers from the journal transaction log. But, Reiserfs
may not
recover in some cases.
I checked other filesystems like ext3, jfs, xfs. Those filesystem
write transactions
to a disk everytime write() with the O_SYNC is performed. In those
filesystem,
I have no trouble mentioned above.
I should say, the Reiserfs would be "un"reliable filesystem..........
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-09-01 0:02 ` Hifumi Hisashi
@ 2005-09-01 0:37 ` michael chang
2005-09-01 3:30 ` Hans Reiser
0 siblings, 1 reply; 9+ messages in thread
From: michael chang @ 2005-09-01 0:37 UTC (permalink / raw)
To: Hifumi Hisashi; +Cc: reiser, reiserfs-dev, reiserfs-list, linux-fsdevel
On 8/31/05, Hifumi Hisashi <hifumi.hisashi@lab.ntt.co.jp> wrote:
> michael chang wrote:
>
> >Surely we don't want this. Look at the papers on Namesys's websites,
> >about the atomicaty and the banking example. But that's just my
> >personal opinion. Besides, I believe it's more likely that usually
> >the power gets lost than the SCSI or IDE cable gets disconnected,
> >AFAIK...
> >
> >
> A write() syscall with the O_SYNC flag must ensure that not only
> file data block
> but also journal (meta-data update) are written to a disk when this
> syscall end.
> But, current implementation of Reiserfs does not do that. If a system
> crashes,
> a filesystem recovers from the journal transaction log. But, Reiserfs
> may not
> recover in some cases.
> I checked other filesystems like ext3, jfs, xfs. Those filesystem
> write transactions
> to a disk everytime write() with the O_SYNC is performed. In those
> filesystem,
> I have no trouble mentioned above.
>
> I should say, the Reiserfs would be "un"reliable filesystem..........
That said, AFAIK, Reiser(fs) 3.6 patches are somewhat redundant
(although if they solve a "problem", sure, go ahead) since this
funcationality should be present in Reiser4 in one form or another --
I don't know if Reiser3.6 is still "supported" per se, anyways. But
don't bash on me -- I'm not subscribed to the reiserfs-dev nor
linux-fsdevel lists, so don't bash me for saying something I shouldn't
say otherwise (I don't see how removing these lists from the replies
would help, but if that is requested, let me know).
--
~Mike
- Just my two cents
- No man is an island, and no man is unable.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-09-01 0:37 ` michael chang
@ 2005-09-01 3:30 ` Hans Reiser
0 siblings, 0 replies; 9+ messages in thread
From: Hans Reiser @ 2005-09-01 3:30 UTC (permalink / raw)
To: michael chang
Cc: Hifumi Hisashi, reiserfs-dev, reiserfs-list, linux-fsdevel,
Chris Mason
3.6 will always be supported for bug fixes, but will not receive new
features if I can help it. It is stable supported code, and should be
kept that way.
Regarding the journaling implementation, let's see what Chris says
before I comment.
Hans
michael chang wrote:
>On 8/31/05, Hifumi Hisashi <hifumi.hisashi@lab.ntt.co.jp> wrote:
>
>
>>michael chang wrote:
>>
>>
>>
>>>Surely we don't want this. Look at the papers on Namesys's websites,
>>>about the atomicaty and the banking example. But that's just my
>>>personal opinion. Besides, I believe it's more likely that usually
>>>the power gets lost than the SCSI or IDE cable gets disconnected,
>>>AFAIK...
>>>
>>>
>>>
>>>
>> A write() syscall with the O_SYNC flag must ensure that not only
>>file data block
>>but also journal (meta-data update) are written to a disk when this
>>syscall end.
>> But, current implementation of Reiserfs does not do that. If a system
>>crashes,
>>a filesystem recovers from the journal transaction log. But, Reiserfs
>>may not
>>recover in some cases.
>> I checked other filesystems like ext3, jfs, xfs. Those filesystem
>>write transactions
>>to a disk everytime write() with the O_SYNC is performed. In those
>>filesystem,
>>I have no trouble mentioned above.
>>
>> I should say, the Reiserfs would be "un"reliable filesystem..........
>>
>>
>
>That said, AFAIK, Reiser(fs) 3.6 patches are somewhat redundant
>(although if they solve a "problem", sure, go ahead) since this
>funcationality should be present in Reiser4 in one form or another --
>I don't know if Reiser3.6 is still "supported" per se, anyways. But
>don't bash on me -- I'm not subscribed to the reiserfs-dev nor
>linux-fsdevel lists, so don't bash me for saying something I shouldn't
>say otherwise (I don't see how removing these lists from the replies
>would help, but if that is requested, let me know).
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-08-31 12:39 [PATCH] fix problems related to journaling in Reiserfs Hifumi Hisashi
2005-08-31 13:42 ` michael chang
@ 2005-09-01 3:35 ` Hans Reiser
2005-09-02 2:03 ` Chris Mason
1 sibling, 1 reply; 9+ messages in thread
From: Hans Reiser @ 2005-09-01 3:35 UTC (permalink / raw)
To: Hifumi Hisashi, Chris Mason; +Cc: reiserfs-dev, reiserfs-list, linux-fsdevel
Thanks much Hifumi!
Chris, please comment on the patch.
Hans
Hifumi Hisashi wrote:
> Hello.
>
> I noticed that the Reiserfs has some problems related to meta-data
> journaling.
> I suppose that transactions regarding meta-data should be written to
> a disk every
> meta-data change (for example, i_size is increased) in ordered-mode
> while synchronous
> writing is performed. But, it seems to me the Reiserfs does not do that.
>
> I did a following test.
> 1, Mount the Reiserfs in ordered-mode.
> 2, Run the test program that opens a test file with O_SYNC|O_CREAT
> flag, and continues to
> write 4bytes data in busy loop. Every write() is and its file size is
> increasing.
> 3, While above the program is running, disconnect SCSI cable.
> 4, Hexdump a disk and see journal on a disk.
> 5, Reboot a system and again mount the Reiserfs in ordered mode.
>
> I checked the size of file the test program created. I could not see
> whole content of this file
> because file size was shorter than it should be. The cause of this
> problem is that
> even though an i_size was changed, meta-data was not logged to a
> journal area on disk
> under synchronous writing.
>
> I did same test on Ext3, and there was no such a problem. Ext3(jbd)
> logged every change of
> an i_size while an O_SYNC writing was being performed.
>
> Following patch fix this problem.
>
> Signed-off-by :Hisashi Hifumi<hifumi.hisashi@lab.ntt.co.jp>
>
> diff -Nru linux-2.6.13/fs/reiserfs/file.c
> linux-2.6.13_fix/fs/reiserfs/file.c
> --- linux-2.6.13/fs/reiserfs/file.c 2005-08-29 08:41:01.000000000
> +0900
> +++ linux-2.6.13_fix/fs/reiserfs/file.c 2005-08-31
> 16:33:33.000000000 +0900
> @@ -819,7 +819,6 @@
> int i; // loop counter
> int offset; // Writing offset in page.
> int orig_write_bytes = write_bytes;
> - int sd_update = 0;
>
> for (i = 0, offset = (pos & (PAGE_CACHE_SIZE - 1)); i < num_pages;
> i++, offset = 0) {
> @@ -855,17 +854,17 @@
>
> if (th->t_trans_id) {
> reiserfs_write_lock(inode->i_sb);
> - reiserfs_update_sd(th, inode); // And update on-disk
> metadata
> + status = journal_end(th, th->t_super,
> th->t_blocks_allocated);
> + if (status)
> + retval = status;
> reiserfs_write_unlock(inode->i_sb);
> - } else
> - inode->i_sb->s_op->dirty_inode(inode);
> -
> - sd_update = 1;
> + }
> + mark_inode_dirty(inode);
> + th->t_trans_id = 0;
> }
> if (th->t_trans_id) {
> reiserfs_write_lock(inode->i_sb);
> - if (!sd_update)
> - reiserfs_update_sd(th, inode);
> + reiserfs_update_sd(th, inode);
> status = journal_end(th, th->t_super, th->t_blocks_allocated);
> if (status)
> retval = status;
> @@ -1526,10 +1525,13 @@
> }
> }
>
> - if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
> + if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
> res =
> generic_osync_inode(inode, file->f_mapping,
> OSYNC_METADATA | OSYNC_DATA);
> + if (res)
> + already_written = 0;
> + }
>
> up(&inode->i_sem);
> reiserfs_async_progress_wait(inode->i_sb);
> diff -Nru linux-2.6.13/fs/reiserfs/inode.c
> linux-2.6.13_fix/fs/reiserfs/inode.c
> --- linux-2.6.13/fs/reiserfs/inode.c 2005-08-29 08:41:01.000000000
> +0900
> +++ linux-2.6.13_fix/fs/reiserfs/inode.c 2005-08-31
> 16:33:33.000000000 +0900
> @@ -1642,6 +1642,7 @@
> {
> struct reiserfs_transaction_handle th;
> int jbegin_count = 1;
> + int err = 0;
>
> if (inode->i_sb->s_flags & MS_RDONLY)
> return -EROFS;
> @@ -1654,11 +1655,11 @@
> reiserfs_write_lock(inode->i_sb);
> if (!journal_begin(&th, inode->i_sb, jbegin_count)) {
> reiserfs_update_sd(&th, inode);
> - journal_end_sync(&th, inode->i_sb, jbegin_count);
> + err = journal_end_sync(&th, inode->i_sb, jbegin_count);
> }
> reiserfs_write_unlock(inode->i_sb);
> }
> - return 0;
> + return err;
> }
>
> /* stat data of new object is inserted already, this inserts the item
>
> Thanks,
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-09-01 3:35 ` Hans Reiser
@ 2005-09-02 2:03 ` Chris Mason
2005-10-04 8:47 ` Hifumi Hisashi
0 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2005-09-02 2:03 UTC (permalink / raw)
To: Hans Reiser; +Cc: Hifumi Hisashi, reiserfs-dev, reiserfs-list, linux-fsdevel
On Wed, 31 Aug 2005 20:35:52 -0700
Hans Reiser <reiser@namesys.com> wrote:
> Thanks much Hifumi!
>
> Chris, please comment on the patch.
The problem is that I'm not always making the inode dirty during the
reiserfs_file_write. The get_block based write function does an
explicit commit during O_SYNC mode. I've got a cleanup related to this
for quotas and other things, but I didn't realize it would help O_SYNC
as well.
I'll diff/test against mainline in the morning and send out.
-chris
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-09-02 2:03 ` Chris Mason
@ 2005-10-04 8:47 ` Hifumi Hisashi
2005-10-04 10:40 ` Hans Reiser
0 siblings, 1 reply; 9+ messages in thread
From: Hifumi Hisashi @ 2005-10-04 8:47 UTC (permalink / raw)
To: Chris Mason, Hans Reiser; +Cc: reiserfs-dev, reiserfs-list, linux-fsdevel
Hello Chris.
I noticed that fsync() also has a BUG related to meta-data journaling in
Reiserfs.
When write() extends a file(i_size is increased), fsync() must write a
meta-data change to a journaling area on a disk.
But, it seems to me that the Reiserfs does not do this.
I suppose following patch would fix this bug.
Thanks.
Signed-off-by :Hifumi Hisashi <hifumi.hisashi@lab.ntt.co.jp>
diff -Nru linux-2.6.14-rc3/fs/reiserfs/file.c
linux-2.6.14-rc3_fix/fs/reiserfs/file.c
--- linux-2.6.14-rc3/fs/reiserfs/file.c 2005-10-03 14:13:57.000000000 +0900
+++ linux-2.6.14-rc3_fix/fs/reiserfs/file.c 2005-10-03 15:27:42.000000000 +0900
@@ -1320,7 +1320,6 @@
reiserfs_write_unlock(inode->i_sb);
return err;
}
- reiserfs_update_inode_transaction(inode);
mark_inode_dirty(inode);
err = journal_end(&th, inode->i_sb, 1);
if (err) {
diff -Nru linux-2.6.14-rc3/fs/reiserfs/super.c
linux-2.6.14-rc3_fix/fs/reiserfs/super.c
--- linux-2.6.14-rc3/fs/reiserfs/super.c 2005-10-03 14:13:57.000000000 +0900
+++ linux-2.6.14-rc3_fix/fs/reiserfs/super.c 2005-10-03 15:27:42.000000000 +0900
@@ -563,6 +563,7 @@
reiserfs_write_unlock(inode->i_sb);
return;
}
+ reiserfs_update_inode_transaction(inode);
reiserfs_update_sd(&th, inode);
journal_end(&th, inode->i_sb, 1);
reiserfs_write_unlock(inode->i_sb);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] fix problems related to journaling in Reiserfs
2005-10-04 8:47 ` Hifumi Hisashi
@ 2005-10-04 10:40 ` Hans Reiser
0 siblings, 0 replies; 9+ messages in thread
From: Hans Reiser @ 2005-10-04 10:40 UTC (permalink / raw)
To: Hifumi Hisashi; +Cc: Chris Mason, reiserfs-dev, reiserfs-list, linux-fsdevel
Thanks for reviewing the code like this. Please let me know if you find
the reiser4 code to have any problems, it is very generous for you to
donate your time like this.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-10-04 10:40 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-31 12:39 [PATCH] fix problems related to journaling in Reiserfs Hifumi Hisashi
2005-08-31 13:42 ` michael chang
2005-09-01 0:02 ` Hifumi Hisashi
2005-09-01 0:37 ` michael chang
2005-09-01 3:30 ` Hans Reiser
2005-09-01 3:35 ` Hans Reiser
2005-09-02 2:03 ` Chris Mason
2005-10-04 8:47 ` Hifumi Hisashi
2005-10-04 10:40 ` Hans Reiser
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).