All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG in writeback mode - from data journaling patch
@ 2004-08-19 18:32 Vijayan Prabhakaran
  2004-08-30 16:03 ` Dieter Nützel
  0 siblings, 1 reply; 7+ messages in thread
From: Vijayan Prabhakaran @ 2004-08-19 18:32 UTC (permalink / raw)
  To: mason, reiserfs-list; +Cc: vijayan

Dear Chris Mason,

I found a bug in writeback mode. In writeback mode, when a file block
was overwritten, the file's stat information was not getting updated.
For more details on this, please refer to my previous mail on the same
topic.

I tracked down the bug to the data journaling patch.

Bug description:
----------------

In file inode.c, there is a function called __commit_write(). In that
function, there is a "if" condition:

if (reiserfs_data_ordered(inode->i_sb)) {
  lock_kernel();
  add_to_flushlist(inode, bh);
  /* if we don't update the inode trans information,
   * an fsync(fd) might not catch these data blocks
   */

  reiserfs_update_inode_transaction(inode);
  unlock_kernel();
} else {
  buffer_insert_inode_data_queue(bh, inode);
}

If you look at the condition, for ordered mode, the inode transaction
information gets updated by the call
reiserfs_update_inode_transaction(). But, for writeback mode, this is
not done.

Since the inode transaction information is not updated, later in
function __commit_trans_jl() (which is in file journal.c), the
following "if" condition fails.

flush_commit_only:
if (journal_list_still_alive(inode->i_sb, id)) {
  ...
}

Because of this failure, the file's stat data, even though available
in the transaction list, did not get written to the disk.

Bug fix:
--------

If we add reiserfs_update_inode_transaction() call in __commit_write()
for the writeback mode, then it works fine. That is,

if (reiserfs_data_ordered(inode->i_sb)) {
  lock_kernel();
  add_to_flushlist(inode, bh);
  /* if we don't update the inode trans information,
   * an fsync(fd) might not catch these data blocks
   */

  reiserfs_update_inode_transaction(inode);
  unlock_kernel();
} else {
  reiserfs_update_inode_transaction(inode);
  buffer_insert_inode_data_queue(bh, inode);
}

Could you please verify this fix and if you think it is appropriate,
can you please add this to your patch.

I appreciate your help.

Thanks,
Vijayan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG in writeback mode - from data journaling patch
  2004-08-19 18:32 BUG in writeback mode - from data journaling patch Vijayan Prabhakaran
@ 2004-08-30 16:03 ` Dieter Nützel
  2004-08-30 16:32   ` Konstantin Besch
  2004-08-30 16:42   ` Chris Mason
  0 siblings, 2 replies; 7+ messages in thread
From: Dieter Nützel @ 2004-08-30 16:03 UTC (permalink / raw)
  To: mason, vijayan; +Cc: reiserfs-list, Vijayan Prabhakaran

Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran:
> Dear Chris Mason,
>
> I found a bug in writeback mode. In writeback mode, when a file block
> was overwritten, the file's stat information was not getting updated.
> For more details on this, please refer to my previous mail on the same
> topic.
>
> I tracked down the bug to the data journaling patch.
>
> Bug description:
> ----------------
>
> In file inode.c, there is a function called __commit_write(). In that
> function, there is a "if" condition:
>
> if (reiserfs_data_ordered(inode->i_sb)) {
>   lock_kernel();
>   add_to_flushlist(inode, bh);
>   /* if we don't update the inode trans information,
>    * an fsync(fd) might not catch these data blocks
>    */
>
>   reiserfs_update_inode_transaction(inode);
>   unlock_kernel();
> } else {
>   buffer_insert_inode_data_queue(bh, inode);
> }
>
> If you look at the condition, for ordered mode, the inode transaction
> information gets updated by the call
> reiserfs_update_inode_transaction(). But, for writeback mode, this is
> not done.
>
> Since the inode transaction information is not updated, later in
> function __commit_trans_jl() (which is in file journal.c), the
> following "if" condition fails.
>
> flush_commit_only:
> if (journal_list_still_alive(inode->i_sb, id)) {
>   ...
> }
>
> Because of this failure, the file's stat data, even though available
> in the transaction list, did not get written to the disk.
>
> Bug fix:
> --------
>
> If we add reiserfs_update_inode_transaction() call in __commit_write()
> for the writeback mode, then it works fine. That is,
>
> if (reiserfs_data_ordered(inode->i_sb)) {
>   lock_kernel();
>   add_to_flushlist(inode, bh);
>   /* if we don't update the inode trans information,
>    * an fsync(fd) might not catch these data blocks
>    */
>
>   reiserfs_update_inode_transaction(inode);
>   unlock_kernel();
> } else {
>   reiserfs_update_inode_transaction(inode);
>   buffer_insert_inode_data_queue(bh, inode);
> }
>
> Could you please verify this fix and if you think it is appropriate,
> can you please add this to your patch.
>
> I appreciate your help.

Chris,

any thoughts on this?

-Dieter

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG in writeback mode - from data journaling patch
  2004-08-30 16:03 ` Dieter Nützel
@ 2004-08-30 16:32   ` Konstantin Besch
  2004-08-30 16:42   ` Chris Mason
  1 sibling, 0 replies; 7+ messages in thread
From: Konstantin Besch @ 2004-08-30 16:32 UTC (permalink / raw)
  To: reiserfs-list

The notification about this bug been there for a while - seems 
like Vijayan spent some time hunting it down ( I know he did  ) . 
Doesn't seem like a lot of time from the developers to verify validity of the 
bug/fix introduced , unless Chris missed on the information and didn't notice 
it. 
On the other hand, since the software is GPL we might not even know the
answer , not that we paid for ReiserFS with our credit cards or anything :)

             --K

just my two cents - since I would like to know if there is an answer at all to 
this question , too .

On Monday 30 August 2004 12:03 pm, Dieter Nützel wrote: 
> Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran:
> > Dear Chris Mason,
> >
> > I found a bug in writeback mode. In writeback mode, when a file block
> > was overwritten, the file's stat information was not getting updated.
> > For more details on this, please refer to my previous mail on the same
> > topic.
> >
> > I tracked down the bug to the data journaling patch.
> >
> > Bug description:
> > ----------------
> >
> > In file inode.c, there is a function called __commit_write(). In that
> > function, there is a "if" condition:
> >
> > if (reiserfs_data_ordered(inode->i_sb)) {
> >   lock_kernel();
> >   add_to_flushlist(inode, bh);
> >   /* if we don't update the inode trans information,
> >    * an fsync(fd) might not catch these data blocks
> >    */
> >
> >   reiserfs_update_inode_transaction(inode);
> >   unlock_kernel();
> > } else {
> >   buffer_insert_inode_data_queue(bh, inode);
> > }
> >
> > If you look at the condition, for ordered mode, the inode transaction
> > information gets updated by the call
> > reiserfs_update_inode_transaction(). But, for writeback mode, this is
> > not done.
> >
> > Since the inode transaction information is not updated, later in
> > function __commit_trans_jl() (which is in file journal.c), the
> > following "if" condition fails.
> >
> > flush_commit_only:
> > if (journal_list_still_alive(inode->i_sb, id)) {
> >   ...
> > }
> >
> > Because of this failure, the file's stat data, even though available
> > in the transaction list, did not get written to the disk.
> >
> > Bug fix:
> > --------
> >
> > If we add reiserfs_update_inode_transaction() call in __commit_write()
> > for the writeback mode, then it works fine. That is,
> >
> > if (reiserfs_data_ordered(inode->i_sb)) {
> >   lock_kernel();
> >   add_to_flushlist(inode, bh);
> >   /* if we don't update the inode trans information,
> >    * an fsync(fd) might not catch these data blocks
> >    */
> >
> >   reiserfs_update_inode_transaction(inode);
> >   unlock_kernel();
> > } else {
> >   reiserfs_update_inode_transaction(inode);
> >   buffer_insert_inode_data_queue(bh, inode);
> > }
> >
> > Could you please verify this fix and if you think it is appropriate,
> > can you please add this to your patch.
> >
> > I appreciate your help.
>
> Chris,
>
> any thoughts on this?
>
> -Dieter

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG in writeback mode - from data journaling patch
  2004-08-30 16:03 ` Dieter Nützel
  2004-08-30 16:32   ` Konstantin Besch
@ 2004-08-30 16:42   ` Chris Mason
  2004-08-30 16:50     ` Vijayan Prabhakaran
  2004-08-30 17:09     ` Vijayan Prabhakaran
  1 sibling, 2 replies; 7+ messages in thread
From: Chris Mason @ 2004-08-30 16:42 UTC (permalink / raw)
  To: Dieter Nützel; +Cc: vijayan, reiserfs-list, Vijayan Prabhakaran

On Mon, 2004-08-30 at 12:03, Dieter Nützel wrote:
> Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran:
> > Dear Chris Mason,
> >
> > I found a bug in writeback mode. In writeback mode, when a file block
> > was overwritten, the file's stat information was not getting updated.
> > For more details on this, please refer to my previous mail on the same
> > topic.
> >
> > I tracked down the bug to the data journaling patch.
> >
> > Bug description:
> > ----------------
> >
> > In file inode.c, there is a function called __commit_write(). In that
> > function, there is a "if" condition:
> >
> > if (reiserfs_data_ordered(inode->i_sb)) {
> >   lock_kernel();
> >   add_to_flushlist(inode, bh);
> >   /* if we don't update the inode trans information,
> >    * an fsync(fd) might not catch these data blocks
> >    */
> >
> >   reiserfs_update_inode_transaction(inode);
> >   unlock_kernel();
> > } else {
> >   buffer_insert_inode_data_queue(bh, inode);
> > }
> >
> > If you look at the condition, for ordered mode, the inode transaction
> > information gets updated by the call
> > reiserfs_update_inode_transaction(). But, for writeback mode, this is
> > not done.
> >
> > Since the inode transaction information is not updated, later in
> > function __commit_trans_jl() (which is in file journal.c), the
> > following "if" condition fails.
> >
> > flush_commit_only:
> > if (journal_list_still_alive(inode->i_sb, id)) {
> >   ...
> > }
> >
> > Because of this failure, the file's stat data, even though available
> > in the transaction list, did not get written to the disk.
> >
> > Bug fix:
> > --------
> >
> > If we add reiserfs_update_inode_transaction() call in __commit_write()
> > for the writeback mode, then it works fine. That is,
> >
> > if (reiserfs_data_ordered(inode->i_sb)) {
> >   lock_kernel();
> >   add_to_flushlist(inode, bh);
> >   /* if we don't update the inode trans information,
> >    * an fsync(fd) might not catch these data blocks
> >    */
> >
> >   reiserfs_update_inode_transaction(inode);
> >   unlock_kernel();
> > } else {
> >   reiserfs_update_inode_transaction(inode);
> >   buffer_insert_inode_data_queue(bh, inode);
> > }
> >
> > Could you please verify this fix and if you think it is appropriate,
> > can you please add this to your patch.
> >
> > I appreciate your help.

[ full quote since it has been a while ]

Sorry, I thought I answered this, that was a busy time in terms of
releases here.  Basically, the inode times are not promised to get
updated as part of the fsync (look at ext3 here as well).  In cases of a
database server doing lots of fsyncs, you want to avoid the transaction
commits when the server is only overwriting existing data.

The times do get logged in data=ordered because it's the only way to
make sure the ordered data blocks are actually written to disk.  The
2.6.x code doesn't have that bug, and as a result data=ordered is much
more usable in database workloads.

-chris



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG in writeback mode - from data journaling patch
  2004-08-30 16:42   ` Chris Mason
@ 2004-08-30 16:50     ` Vijayan Prabhakaran
       [not found]       ` <2f9ccaae04083010074d49a0a0@mail.gmail.com>
  2004-08-30 17:09     ` Vijayan Prabhakaran
  1 sibling, 1 reply; 7+ messages in thread
From: Vijayan Prabhakaran @ 2004-08-30 16:50 UTC (permalink / raw)
  To: Chris Mason; +Cc: Dieter =?unknown-8bit?q?N=FCtzel?=, vijayan, reiserfs-list

The fsync man page reads as:

"...fsync copies all in-core parts of a file to disk, and waits  until
 the device  reports  that all parts are on stable storage.  It also
updates metadata stat information."

So, according to POSIX standards, the stat data must be updated on fsyncs.

Any thoughts ?

thanks,

On Mon, 30 Aug 2004 12:42:17 -0400, Chris Mason <mason@suse.com> wrote:
> 
> 
> On Mon, 2004-08-30 at 12:03, Dieter Nützel wrote:
> > Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran:
> > > Dear Chris Mason,
> > >
> > > I found a bug in writeback mode. In writeback mode, when a file block
> > > was overwritten, the file's stat information was not getting updated.
> > > For more details on this, please refer to my previous mail on the same
> > > topic.
> > >
> > > I tracked down the bug to the data journaling patch.
> > >
> > > Bug description:
> > > ----------------
> > >
> > > In file inode.c, there is a function called __commit_write(). In that
> > > function, there is a "if" condition:
> > >
> > > if (reiserfs_data_ordered(inode->i_sb)) {
> > >   lock_kernel();
> > >   add_to_flushlist(inode, bh);
> > >   /* if we don't update the inode trans information,
> > >    * an fsync(fd) might not catch these data blocks
> > >    */
> > >
> > >   reiserfs_update_inode_transaction(inode);
> > >   unlock_kernel();
> > > } else {
> > >   buffer_insert_inode_data_queue(bh, inode);
> > > }
> > >
> > > If you look at the condition, for ordered mode, the inode transaction
> > > information gets updated by the call
> > > reiserfs_update_inode_transaction(). But, for writeback mode, this is
> > > not done.
> > >
> > > Since the inode transaction information is not updated, later in
> > > function __commit_trans_jl() (which is in file journal.c), the
> > > following "if" condition fails.
> > >
> > > flush_commit_only:
> > > if (journal_list_still_alive(inode->i_sb, id)) {
> > >   ...
> > > }
> > >
> > > Because of this failure, the file's stat data, even though available
> > > in the transaction list, did not get written to the disk.
> > >
> > > Bug fix:
> > > --------
> > >
> > > If we add reiserfs_update_inode_transaction() call in __commit_write()
> > > for the writeback mode, then it works fine. That is,
> > >
> > > if (reiserfs_data_ordered(inode->i_sb)) {
> > >   lock_kernel();
> > >   add_to_flushlist(inode, bh);
> > >   /* if we don't update the inode trans information,
> > >    * an fsync(fd) might not catch these data blocks
> > >    */
> > >
> > >   reiserfs_update_inode_transaction(inode);
> > >   unlock_kernel();
> > > } else {
> > >   reiserfs_update_inode_transaction(inode);
> > >   buffer_insert_inode_data_queue(bh, inode);
> > > }
> > >
> > > Could you please verify this fix and if you think it is appropriate,
> > > can you please add this to your patch.
> > >
> > > I appreciate your help.
> 
> [ full quote since it has been a while ]
> 
> Sorry, I thought I answered this, that was a busy time in terms of
> releases here.  Basically, the inode times are not promised to get
> updated as part of the fsync (look at ext3 here as well).  In cases of a
> database server doing lots of fsyncs, you want to avoid the transaction
> commits when the server is only overwriting existing data.
> 
> The times do get logged in data=ordered because it's the only way to
> make sure the ordered data blocks are actually written to disk.  The
> 2.6.x code doesn't have that bug, and as a result data=ordered is much
> more usable in database workloads.
> 
> -chris
> 
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG in writeback mode - from data journaling patch
  2004-08-30 16:42   ` Chris Mason
  2004-08-30 16:50     ` Vijayan Prabhakaran
@ 2004-08-30 17:09     ` Vijayan Prabhakaran
  1 sibling, 0 replies; 7+ messages in thread
From: Vijayan Prabhakaran @ 2004-08-30 17:09 UTC (permalink / raw)
  To: Chris Mason; +Cc: Dieter =?unknown-8bit?q?N=FCtzel?=, vijayan, reiserfs-list

BTW, I ran similar workload in ext3 writeback mode. Ext3 does update
the stat information on writeback mode when overwriting a file !

thanks,

On Mon, 30 Aug 2004 12:42:17 -0400, Chris Mason <mason@suse.com> wrote:
> 
> 
> On Mon, 2004-08-30 at 12:03, Dieter Nützel wrote:
> > Am Donnerstag, 19. August 2004 20:32 schrieb Vijayan Prabhakaran:
> > > Dear Chris Mason,
> > >
> > > I found a bug in writeback mode. In writeback mode, when a file block
> > > was overwritten, the file's stat information was not getting updated.
> > > For more details on this, please refer to my previous mail on the same
> > > topic.
> > >
> > > I tracked down the bug to the data journaling patch.
> > >
> > > Bug description:
> > > ----------------
> > >
> > > In file inode.c, there is a function called __commit_write(). In that
> > > function, there is a "if" condition:
> > >
> > > if (reiserfs_data_ordered(inode->i_sb)) {
> > >   lock_kernel();
> > >   add_to_flushlist(inode, bh);
> > >   /* if we don't update the inode trans information,
> > >    * an fsync(fd) might not catch these data blocks
> > >    */
> > >
> > >   reiserfs_update_inode_transaction(inode);
> > >   unlock_kernel();
> > > } else {
> > >   buffer_insert_inode_data_queue(bh, inode);
> > > }
> > >
> > > If you look at the condition, for ordered mode, the inode transaction
> > > information gets updated by the call
> > > reiserfs_update_inode_transaction(). But, for writeback mode, this is
> > > not done.
> > >
> > > Since the inode transaction information is not updated, later in
> > > function __commit_trans_jl() (which is in file journal.c), the
> > > following "if" condition fails.
> > >
> > > flush_commit_only:
> > > if (journal_list_still_alive(inode->i_sb, id)) {
> > >   ...
> > > }
> > >
> > > Because of this failure, the file's stat data, even though available
> > > in the transaction list, did not get written to the disk.
> > >
> > > Bug fix:
> > > --------
> > >
> > > If we add reiserfs_update_inode_transaction() call in __commit_write()
> > > for the writeback mode, then it works fine. That is,
> > >
> > > if (reiserfs_data_ordered(inode->i_sb)) {
> > >   lock_kernel();
> > >   add_to_flushlist(inode, bh);
> > >   /* if we don't update the inode trans information,
> > >    * an fsync(fd) might not catch these data blocks
> > >    */
> > >
> > >   reiserfs_update_inode_transaction(inode);
> > >   unlock_kernel();
> > > } else {
> > >   reiserfs_update_inode_transaction(inode);
> > >   buffer_insert_inode_data_queue(bh, inode);
> > > }
> > >
> > > Could you please verify this fix and if you think it is appropriate,
> > > can you please add this to your patch.
> > >
> > > I appreciate your help.
> 
> [ full quote since it has been a while ]
> 
> Sorry, I thought I answered this, that was a busy time in terms of
> releases here.  Basically, the inode times are not promised to get
> updated as part of the fsync (look at ext3 here as well).  In cases of a
> database server doing lots of fsyncs, you want to avoid the transaction
> commits when the server is only overwriting existing data.
> 
> The times do get logged in data=ordered because it's the only way to
> make sure the ordered data blocks are actually written to disk.  The
> 2.6.x code doesn't have that bug, and as a result data=ordered is much
> more usable in database workloads.
> 
> -chris
> 
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG in writeback mode - from data journaling patch
       [not found]       ` <2f9ccaae04083010074d49a0a0@mail.gmail.com>
@ 2004-08-30 17:12         ` Vijayan Prabhakaran
  0 siblings, 0 replies; 7+ messages in thread
From: Vijayan Prabhakaran @ 2004-08-30 17:12 UTC (permalink / raw)
  To: Perry Kundert, reiserfs-list

No. That is wrong. If the syntax of the call is to update a stat
block, it must be done before the call finishes.

On Mon, 30 Aug 2004 11:07:57 -0600, Perry Kundert
<perry.kundert@gmail.com> wrote:
> On Mon, 30 Aug 2004 11:50:46 -0500, Vijayan Prabhakaran
> <pvijayan@gmail.com> wrote:
> > The fsync man page reads as:
> >
> > "...fsync copies all in-core parts of a file to disk, and waits  until
> >  the device  reports  that all parts are on stable storage.  It also
> > updates metadata stat information."
> >
> > So, according to POSIX standards, the stat data must be updated on fsyncs.
> > 
> Looks like it is not absolutely clear on *when* the stat metadata
> needs to be updated, though.  Perhaps if it is updated later than the
> "...parts are on stable storage" that is must wait for...
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-08-30 17:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-19 18:32 BUG in writeback mode - from data journaling patch Vijayan Prabhakaran
2004-08-30 16:03 ` Dieter Nützel
2004-08-30 16:32   ` Konstantin Besch
2004-08-30 16:42   ` Chris Mason
2004-08-30 16:50     ` Vijayan Prabhakaran
     [not found]       ` <2f9ccaae04083010074d49a0a0@mail.gmail.com>
2004-08-30 17:12         ` Vijayan Prabhakaran
2004-08-30 17:09     ` Vijayan Prabhakaran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.