* BUG in writeback mode - from data journaling patch
@ 2004-08-19 18:32 Vijayan Prabhakaran
2004-08-30 16:03 ` Dieter Nützel
0 siblings, 1 reply; 7+ messages in thread
From: Vijayan Prabhakaran @ 2004-08-19 18:32 UTC (permalink / raw)
To: mason, reiserfs-list; +Cc: vijayan
Dear Chris Mason,
I found a bug in writeback mode. In writeback mode, when a file block
was overwritten, the file's stat information was not getting updated.
For more details on this, please refer to my previous mail on the same
topic.
I tracked down the bug to the data journaling patch.
Bug description:
----------------
In file inode.c, there is a function called __commit_write(). In that
function, there is a "if" condition:
if (reiserfs_data_ordered(inode->i_sb)) {
lock_kernel();
add_to_flushlist(inode, bh);
/* if we don't update the inode trans information,
* an fsync(fd) might not catch these data blocks
*/
reiserfs_update_inode_transaction(inode);
unlock_kernel();
} else {
buffer_insert_inode_data_queue(bh, inode);
}
If you look at the condition, for ordered mode, the inode transaction
information gets updated by the call
reiserfs_update_inode_transaction(). But, for writeback mode, this is
not done.
Since the inode transaction information is not updated, later in
function __commit_trans_jl() (which is in file journal.c), the
following "if" condition fails.
flush_commit_only:
if (journal_list_still_alive(inode->i_sb, id)) {
...
}
Because of this failure, the file's stat data, even though available
in the transaction list, did not get written to the disk.
Bug fix:
--------
If we add reiserfs_update_inode_transaction() call in __commit_write()
for the writeback mode, then it works fine. That is,
if (reiserfs_data_ordered(inode->i_sb)) {
lock_kernel();
add_to_flushlist(inode, bh);
/* if we don't update the inode trans information,
* an fsync(fd) might not catch these data blocks
*/
reiserfs_update_inode_transaction(inode);
unlock_kernel();
} else {
reiserfs_update_inode_transaction(inode);
buffer_insert_inode_data_queue(bh, inode);
}
Could you please verify this fix and if you think it is appropriate,
can you please add this to your patch.
I appreciate your help.
Thanks,
Vijayan
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: BUG in writeback mode - from data journaling patch 2004-08-19 18:32 BUG in writeback mode - from data journaling patch Vijayan Prabhakaran @ 2004-08-30 16:03 ` Dieter Nützel 2004-08-30 16:32 ` Konstantin Besch 2004-08-30 16:42 ` Chris Mason 0 siblings, 2 replies; 7+ messages in thread From: Dieter Nützel @ 2004-08-30 16:03 UTC (permalink / raw) To: mason, vijayan; +Cc: reiserfs-list, Vijayan Prabhakaran Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran: > Dear Chris Mason, > > I found a bug in writeback mode. In writeback mode, when a file block > was overwritten, the file's stat information was not getting updated. > For more details on this, please refer to my previous mail on the same > topic. > > I tracked down the bug to the data journaling patch. > > Bug description: > ---------------- > > In file inode.c, there is a function called __commit_write(). In that > function, there is a "if" condition: > > if (reiserfs_data_ordered(inode->i_sb)) { > lock_kernel(); > add_to_flushlist(inode, bh); > /* if we don't update the inode trans information, > * an fsync(fd) might not catch these data blocks > */ > > reiserfs_update_inode_transaction(inode); > unlock_kernel(); > } else { > buffer_insert_inode_data_queue(bh, inode); > } > > If you look at the condition, for ordered mode, the inode transaction > information gets updated by the call > reiserfs_update_inode_transaction(). But, for writeback mode, this is > not done. > > Since the inode transaction information is not updated, later in > function __commit_trans_jl() (which is in file journal.c), the > following "if" condition fails. > > flush_commit_only: > if (journal_list_still_alive(inode->i_sb, id)) { > ... > } > > Because of this failure, the file's stat data, even though available > in the transaction list, did not get written to the disk. > > Bug fix: > -------- > > If we add reiserfs_update_inode_transaction() call in __commit_write() > for the writeback mode, then it works fine. That is, > > if (reiserfs_data_ordered(inode->i_sb)) { > lock_kernel(); > add_to_flushlist(inode, bh); > /* if we don't update the inode trans information, > * an fsync(fd) might not catch these data blocks > */ > > reiserfs_update_inode_transaction(inode); > unlock_kernel(); > } else { > reiserfs_update_inode_transaction(inode); > buffer_insert_inode_data_queue(bh, inode); > } > > Could you please verify this fix and if you think it is appropriate, > can you please add this to your patch. > > I appreciate your help. Chris, any thoughts on this? -Dieter ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG in writeback mode - from data journaling patch 2004-08-30 16:03 ` Dieter Nützel @ 2004-08-30 16:32 ` Konstantin Besch 2004-08-30 16:42 ` Chris Mason 1 sibling, 0 replies; 7+ messages in thread From: Konstantin Besch @ 2004-08-30 16:32 UTC (permalink / raw) To: reiserfs-list The notification about this bug been there for a while - seems like Vijayan spent some time hunting it down ( I know he did ) . Doesn't seem like a lot of time from the developers to verify validity of the bug/fix introduced , unless Chris missed on the information and didn't notice it. On the other hand, since the software is GPL we might not even know the answer , not that we paid for ReiserFS with our credit cards or anything :) --K just my two cents - since I would like to know if there is an answer at all to this question , too . On Monday 30 August 2004 12:03 pm, Dieter Nützel wrote: > Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran: > > Dear Chris Mason, > > > > I found a bug in writeback mode. In writeback mode, when a file block > > was overwritten, the file's stat information was not getting updated. > > For more details on this, please refer to my previous mail on the same > > topic. > > > > I tracked down the bug to the data journaling patch. > > > > Bug description: > > ---------------- > > > > In file inode.c, there is a function called __commit_write(). In that > > function, there is a "if" condition: > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > lock_kernel(); > > add_to_flushlist(inode, bh); > > /* if we don't update the inode trans information, > > * an fsync(fd) might not catch these data blocks > > */ > > > > reiserfs_update_inode_transaction(inode); > > unlock_kernel(); > > } else { > > buffer_insert_inode_data_queue(bh, inode); > > } > > > > If you look at the condition, for ordered mode, the inode transaction > > information gets updated by the call > > reiserfs_update_inode_transaction(). But, for writeback mode, this is > > not done. > > > > Since the inode transaction information is not updated, later in > > function __commit_trans_jl() (which is in file journal.c), the > > following "if" condition fails. > > > > flush_commit_only: > > if (journal_list_still_alive(inode->i_sb, id)) { > > ... > > } > > > > Because of this failure, the file's stat data, even though available > > in the transaction list, did not get written to the disk. > > > > Bug fix: > > -------- > > > > If we add reiserfs_update_inode_transaction() call in __commit_write() > > for the writeback mode, then it works fine. That is, > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > lock_kernel(); > > add_to_flushlist(inode, bh); > > /* if we don't update the inode trans information, > > * an fsync(fd) might not catch these data blocks > > */ > > > > reiserfs_update_inode_transaction(inode); > > unlock_kernel(); > > } else { > > reiserfs_update_inode_transaction(inode); > > buffer_insert_inode_data_queue(bh, inode); > > } > > > > Could you please verify this fix and if you think it is appropriate, > > can you please add this to your patch. > > > > I appreciate your help. > > Chris, > > any thoughts on this? > > -Dieter ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG in writeback mode - from data journaling patch 2004-08-30 16:03 ` Dieter Nützel 2004-08-30 16:32 ` Konstantin Besch @ 2004-08-30 16:42 ` Chris Mason 2004-08-30 16:50 ` Vijayan Prabhakaran 2004-08-30 17:09 ` Vijayan Prabhakaran 1 sibling, 2 replies; 7+ messages in thread From: Chris Mason @ 2004-08-30 16:42 UTC (permalink / raw) To: Dieter Nützel; +Cc: vijayan, reiserfs-list, Vijayan Prabhakaran On Mon, 2004-08-30 at 12:03, Dieter Nützel wrote: > Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran: > > Dear Chris Mason, > > > > I found a bug in writeback mode. In writeback mode, when a file block > > was overwritten, the file's stat information was not getting updated. > > For more details on this, please refer to my previous mail on the same > > topic. > > > > I tracked down the bug to the data journaling patch. > > > > Bug description: > > ---------------- > > > > In file inode.c, there is a function called __commit_write(). In that > > function, there is a "if" condition: > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > lock_kernel(); > > add_to_flushlist(inode, bh); > > /* if we don't update the inode trans information, > > * an fsync(fd) might not catch these data blocks > > */ > > > > reiserfs_update_inode_transaction(inode); > > unlock_kernel(); > > } else { > > buffer_insert_inode_data_queue(bh, inode); > > } > > > > If you look at the condition, for ordered mode, the inode transaction > > information gets updated by the call > > reiserfs_update_inode_transaction(). But, for writeback mode, this is > > not done. > > > > Since the inode transaction information is not updated, later in > > function __commit_trans_jl() (which is in file journal.c), the > > following "if" condition fails. > > > > flush_commit_only: > > if (journal_list_still_alive(inode->i_sb, id)) { > > ... > > } > > > > Because of this failure, the file's stat data, even though available > > in the transaction list, did not get written to the disk. > > > > Bug fix: > > -------- > > > > If we add reiserfs_update_inode_transaction() call in __commit_write() > > for the writeback mode, then it works fine. That is, > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > lock_kernel(); > > add_to_flushlist(inode, bh); > > /* if we don't update the inode trans information, > > * an fsync(fd) might not catch these data blocks > > */ > > > > reiserfs_update_inode_transaction(inode); > > unlock_kernel(); > > } else { > > reiserfs_update_inode_transaction(inode); > > buffer_insert_inode_data_queue(bh, inode); > > } > > > > Could you please verify this fix and if you think it is appropriate, > > can you please add this to your patch. > > > > I appreciate your help. [ full quote since it has been a while ] Sorry, I thought I answered this, that was a busy time in terms of releases here. Basically, the inode times are not promised to get updated as part of the fsync (look at ext3 here as well). In cases of a database server doing lots of fsyncs, you want to avoid the transaction commits when the server is only overwriting existing data. The times do get logged in data=ordered because it's the only way to make sure the ordered data blocks are actually written to disk. The 2.6.x code doesn't have that bug, and as a result data=ordered is much more usable in database workloads. -chris ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG in writeback mode - from data journaling patch 2004-08-30 16:42 ` Chris Mason @ 2004-08-30 16:50 ` Vijayan Prabhakaran [not found] ` <2f9ccaae04083010074d49a0a0@mail.gmail.com> 2004-08-30 17:09 ` Vijayan Prabhakaran 1 sibling, 1 reply; 7+ messages in thread From: Vijayan Prabhakaran @ 2004-08-30 16:50 UTC (permalink / raw) To: Chris Mason; +Cc: Dieter =?unknown-8bit?q?N=FCtzel?=, vijayan, reiserfs-list The fsync man page reads as: "...fsync copies all in-core parts of a file to disk, and waits until the device reports that all parts are on stable storage. It also updates metadata stat information." So, according to POSIX standards, the stat data must be updated on fsyncs. Any thoughts ? thanks, On Mon, 30 Aug 2004 12:42:17 -0400, Chris Mason <mason@suse.com> wrote: > > > On Mon, 2004-08-30 at 12:03, Dieter Nützel wrote: > > Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran: > > > Dear Chris Mason, > > > > > > I found a bug in writeback mode. In writeback mode, when a file block > > > was overwritten, the file's stat information was not getting updated. > > > For more details on this, please refer to my previous mail on the same > > > topic. > > > > > > I tracked down the bug to the data journaling patch. > > > > > > Bug description: > > > ---------------- > > > > > > In file inode.c, there is a function called __commit_write(). In that > > > function, there is a "if" condition: > > > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > > lock_kernel(); > > > add_to_flushlist(inode, bh); > > > /* if we don't update the inode trans information, > > > * an fsync(fd) might not catch these data blocks > > > */ > > > > > > reiserfs_update_inode_transaction(inode); > > > unlock_kernel(); > > > } else { > > > buffer_insert_inode_data_queue(bh, inode); > > > } > > > > > > If you look at the condition, for ordered mode, the inode transaction > > > information gets updated by the call > > > reiserfs_update_inode_transaction(). But, for writeback mode, this is > > > not done. > > > > > > Since the inode transaction information is not updated, later in > > > function __commit_trans_jl() (which is in file journal.c), the > > > following "if" condition fails. > > > > > > flush_commit_only: > > > if (journal_list_still_alive(inode->i_sb, id)) { > > > ... > > > } > > > > > > Because of this failure, the file's stat data, even though available > > > in the transaction list, did not get written to the disk. > > > > > > Bug fix: > > > -------- > > > > > > If we add reiserfs_update_inode_transaction() call in __commit_write() > > > for the writeback mode, then it works fine. That is, > > > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > > lock_kernel(); > > > add_to_flushlist(inode, bh); > > > /* if we don't update the inode trans information, > > > * an fsync(fd) might not catch these data blocks > > > */ > > > > > > reiserfs_update_inode_transaction(inode); > > > unlock_kernel(); > > > } else { > > > reiserfs_update_inode_transaction(inode); > > > buffer_insert_inode_data_queue(bh, inode); > > > } > > > > > > Could you please verify this fix and if you think it is appropriate, > > > can you please add this to your patch. > > > > > > I appreciate your help. > > [ full quote since it has been a while ] > > Sorry, I thought I answered this, that was a busy time in terms of > releases here. Basically, the inode times are not promised to get > updated as part of the fsync (look at ext3 here as well). In cases of a > database server doing lots of fsyncs, you want to avoid the transaction > commits when the server is only overwriting existing data. > > The times do get logged in data=ordered because it's the only way to > make sure the ordered data blocks are actually written to disk. The > 2.6.x code doesn't have that bug, and as a result data=ordered is much > more usable in database workloads. > > -chris > > ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <2f9ccaae04083010074d49a0a0@mail.gmail.com>]
* Re: BUG in writeback mode - from data journaling patch [not found] ` <2f9ccaae04083010074d49a0a0@mail.gmail.com> @ 2004-08-30 17:12 ` Vijayan Prabhakaran 0 siblings, 0 replies; 7+ messages in thread From: Vijayan Prabhakaran @ 2004-08-30 17:12 UTC (permalink / raw) To: Perry Kundert, reiserfs-list No. That is wrong. If the syntax of the call is to update a stat block, it must be done before the call finishes. On Mon, 30 Aug 2004 11:07:57 -0600, Perry Kundert <perry.kundert@gmail.com> wrote: > On Mon, 30 Aug 2004 11:50:46 -0500, Vijayan Prabhakaran > <pvijayan@gmail.com> wrote: > > The fsync man page reads as: > > > > "...fsync copies all in-core parts of a file to disk, and waits until > > the device reports that all parts are on stable storage. It also > > updates metadata stat information." > > > > So, according to POSIX standards, the stat data must be updated on fsyncs. > > > Looks like it is not absolutely clear on *when* the stat metadata > needs to be updated, though. Perhaps if it is updated later than the > "...parts are on stable storage" that is must wait for... > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG in writeback mode - from data journaling patch 2004-08-30 16:42 ` Chris Mason 2004-08-30 16:50 ` Vijayan Prabhakaran @ 2004-08-30 17:09 ` Vijayan Prabhakaran 1 sibling, 0 replies; 7+ messages in thread From: Vijayan Prabhakaran @ 2004-08-30 17:09 UTC (permalink / raw) To: Chris Mason; +Cc: Dieter =?unknown-8bit?q?N=FCtzel?=, vijayan, reiserfs-list BTW, I ran similar workload in ext3 writeback mode. Ext3 does update the stat information on writeback mode when overwriting a file ! thanks, On Mon, 30 Aug 2004 12:42:17 -0400, Chris Mason <mason@suse.com> wrote: > > > On Mon, 2004-08-30 at 12:03, Dieter Nützel wrote: > > Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran: > > > Dear Chris Mason, > > > > > > I found a bug in writeback mode. In writeback mode, when a file block > > > was overwritten, the file's stat information was not getting updated. > > > For more details on this, please refer to my previous mail on the same > > > topic. > > > > > > I tracked down the bug to the data journaling patch. > > > > > > Bug description: > > > ---------------- > > > > > > In file inode.c, there is a function called __commit_write(). In that > > > function, there is a "if" condition: > > > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > > lock_kernel(); > > > add_to_flushlist(inode, bh); > > > /* if we don't update the inode trans information, > > > * an fsync(fd) might not catch these data blocks > > > */ > > > > > > reiserfs_update_inode_transaction(inode); > > > unlock_kernel(); > > > } else { > > > buffer_insert_inode_data_queue(bh, inode); > > > } > > > > > > If you look at the condition, for ordered mode, the inode transaction > > > information gets updated by the call > > > reiserfs_update_inode_transaction(). But, for writeback mode, this is > > > not done. > > > > > > Since the inode transaction information is not updated, later in > > > function __commit_trans_jl() (which is in file journal.c), the > > > following "if" condition fails. > > > > > > flush_commit_only: > > > if (journal_list_still_alive(inode->i_sb, id)) { > > > ... > > > } > > > > > > Because of this failure, the file's stat data, even though available > > > in the transaction list, did not get written to the disk. > > > > > > Bug fix: > > > -------- > > > > > > If we add reiserfs_update_inode_transaction() call in __commit_write() > > > for the writeback mode, then it works fine. That is, > > > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > > lock_kernel(); > > > add_to_flushlist(inode, bh); > > > /* if we don't update the inode trans information, > > > * an fsync(fd) might not catch these data blocks > > > */ > > > > > > reiserfs_update_inode_transaction(inode); > > > unlock_kernel(); > > > } else { > > > reiserfs_update_inode_transaction(inode); > > > buffer_insert_inode_data_queue(bh, inode); > > > } > > > > > > Could you please verify this fix and if you think it is appropriate, > > > can you please add this to your patch. > > > > > > I appreciate your help. > > [ full quote since it has been a while ] > > Sorry, I thought I answered this, that was a busy time in terms of > releases here. Basically, the inode times are not promised to get > updated as part of the fsync (look at ext3 here as well). In cases of a > database server doing lots of fsyncs, you want to avoid the transaction > commits when the server is only overwriting existing data. > > The times do get logged in data=ordered because it's the only way to > make sure the ordered data blocks are actually written to disk. The > 2.6.x code doesn't have that bug, and as a result data=ordered is much > more usable in database workloads. > > -chris > > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-08-30 17:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-19 18:32 BUG in writeback mode - from data journaling patch Vijayan Prabhakaran
2004-08-30 16:03 ` Dieter Nützel
2004-08-30 16:32 ` Konstantin Besch
2004-08-30 16:42 ` Chris Mason
2004-08-30 16:50 ` Vijayan Prabhakaran
[not found] ` <2f9ccaae04083010074d49a0a0@mail.gmail.com>
2004-08-30 17:12 ` Vijayan Prabhakaran
2004-08-30 17:09 ` Vijayan Prabhakaran
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.