From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vijayan Prabhakaran Subject: Re: BUG in writeback mode - from data journaling patch Date: Mon, 30 Aug 2004 12:09:31 -0500 Message-ID: References: <200408301803.57727.Dieter.Nuetzel@hamburg.de> <1093884137.21878.442.camel@watt.suse.com> Reply-To: Vijayan Prabhakaran Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <1093884137.21878.442.camel@watt.suse.com> List-Id: Content-Type: text/plain; charset="iso-8859-1" To: Chris Mason Cc: Dieter =?unknown-8bit?q?N=FCtzel?= , vijayan@cs.wisc.edu, reiserfs-list@namesys.com BTW, I ran similar workload in ext3 writeback mode. Ext3 does update the stat information on writeback mode when overwriting a file ! thanks, On Mon, 30 Aug 2004 12:42:17 -0400, Chris Mason wrote: >=20 >=20 > On Mon, 2004-08-30 at 12:03, Dieter N=FCtzel wrote: > > Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran: > > > Dear Chris Mason, > > > > > > I found a bug in writeback mode. In writeback mode, when a file block > > > was overwritten, the file's stat information was not getting updated. > > > For more details on this, please refer to my previous mail on the sam= e > > > topic. > > > > > > I tracked down the bug to the data journaling patch. > > > > > > Bug description: > > > ---------------- > > > > > > In file inode.c, there is a function called __commit_write(). In that > > > function, there is a "if" condition: > > > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > > lock_kernel(); > > > add_to_flushlist(inode, bh); > > > /* if we don't update the inode trans information, > > > * an fsync(fd) might not catch these data blocks > > > */ > > > > > > reiserfs_update_inode_transaction(inode); > > > unlock_kernel(); > > > } else { > > > buffer_insert_inode_data_queue(bh, inode); > > > } > > > > > > If you look at the condition, for ordered mode, the inode transaction > > > information gets updated by the call > > > reiserfs_update_inode_transaction(). But, for writeback mode, this is > > > not done. > > > > > > Since the inode transaction information is not updated, later in > > > function __commit_trans_jl() (which is in file journal.c), the > > > following "if" condition fails. > > > > > > flush_commit_only: > > > if (journal_list_still_alive(inode->i_sb, id)) { > > > ... > > > } > > > > > > Because of this failure, the file's stat data, even though available > > > in the transaction list, did not get written to the disk. > > > > > > Bug fix: > > > -------- > > > > > > If we add reiserfs_update_inode_transaction() call in __commit_write(= ) > > > for the writeback mode, then it works fine. That is, > > > > > > if (reiserfs_data_ordered(inode->i_sb)) { > > > lock_kernel(); > > > add_to_flushlist(inode, bh); > > > /* if we don't update the inode trans information, > > > * an fsync(fd) might not catch these data blocks > > > */ > > > > > > reiserfs_update_inode_transaction(inode); > > > unlock_kernel(); > > > } else { > > > reiserfs_update_inode_transaction(inode); > > > buffer_insert_inode_data_queue(bh, inode); > > > } > > > > > > Could you please verify this fix and if you think it is appropriate, > > > can you please add this to your patch. > > > > > > I appreciate your help. >=20 > [ full quote since it has been a while ] >=20 > Sorry, I thought I answered this, that was a busy time in terms of > releases here. Basically, the inode times are not promised to get > updated as part of the fsync (look at ext3 here as well). In cases of a > database server doing lots of fsyncs, you want to avoid the transaction > commits when the server is only overwriting existing data. >=20 > The times do get logged in data=3Dordered because it's the only way to > make sure the ordered data blocks are actually written to disk. The > 2.6.x code doesn't have that bug, and as a result data=3Dordered is much > more usable in database workloads. >=20 > -chris >=20 >