All of lore.kernel.org
 help / color / mirror / Atom feed
* ReiserFS post-crash issues
@ 2004-09-21  8:32 Ash
  2004-09-21  9:51 ` Will Smith
  2004-09-21 15:01 ` Hans Reiser
  0 siblings, 2 replies; 6+ messages in thread
From: Ash @ 2004-09-21  8:32 UTC (permalink / raw)
  To: reiserfs-list

Hi

I have been running a few tests on ReiserFS to check durability of
common filesystem operations.
For example, create a certain number of files and crash the machine
(poweroff) immediately after this.
On rebooting, check the number of files actually present on the
filesystem after log replay.

Similarly, I tried for some other operations like rename, link and delete.
I am using a C program with open, rename and link system calls to
perform these operations respectively
and crashing the system using a network power switch immediately after
my C program finishes doing its stuff.
So the delay in-between completion of the operations and the machine crashing
should be, according to me, less than 1-2 seconds (which is the time
required to establish a telnet connection to the power switch)

It seems that ReiserFS operations are not durable for most of the cases I tried.

For file create, when tried with 50K, 100K and 1M files, I got
34224, 99492, and 998594 files respectively after system rebooted from the
crash. Similarly for operations like rename and link, the number of files
renamed or linked after reboot is less than what the filesystem reports prior
to the crash.

Now introducing a fsync() after every open() call does solve the problem
but the performance degradation seen is very high. In fact, I did notice
the related discussion on the FAQ at namesys.com.

Also, operations like rename, link and delete also seem to give problems.

However, with other filesystems like XFS, I get much better results (almost
100% durability) on similar tests.

I am using ReiserFS with linux kernel 2.6.7

Any comments/suggestions will be helpful.

Thanks,
Ash

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ReiserFS post-crash issues
  2004-09-21  8:32 ReiserFS post-crash issues Ash
@ 2004-09-21  9:51 ` Will Smith
  2004-09-21 15:09   ` Hans Reiser
  2004-09-21 15:01 ` Hans Reiser
  1 sibling, 1 reply; 6+ messages in thread
From: Will Smith @ 2004-09-21  9:51 UTC (permalink / raw)
  To: reiserfs-list

Sorry if the below is obvious...

In reiser4, there's a parameter tmgr.atom_max_age which is the maximum
time an atomic operation (=transaction in database language, I believe)
can remain 'dirty' before being written to disk.   It defaults to 600
seconds.  I'd argue that's too high, it should be low for safety and
tunable up.  Maybe I don't this properly, but I see very high dirty
values, remaining for minutes at at time, in /proc/meminfo when using reiser4.

In reiserfs, I'm not what the default is (but I see
JOURNAL_MAX_TRANS_AGE=30) in reiserfs_fs.h) or how it's tunable.

In ext3, I belive the default is for the journal to be flushed after 5
seconds, and the data after 30 The ext3 limits also seem to be related
to the kernel parameters
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 500
(from /sbin/sysctl -a).

Maybe if you are concerned about power failure events, you have to
sacrifice performance a bit with a lower flush interval for the journal
and/or data.


Will Smith



Ash wrote:
> Hi
> 
> I have been running a few tests on ReiserFS to check durability of
> common filesystem operations.
> For example, create a certain number of files and crash the machine
> (poweroff) immediately after this.
> On rebooting, check the number of files actually present on the
> filesystem after log replay.
> 
> Similarly, I tried for some other operations like rename, link and delete.
> I am using a C program with open, rename and link system calls to
> perform these operations respectively
> and crashing the system using a network power switch immediately after
> my C program finishes doing its stuff.
> So the delay in-between completion of the operations and the machine crashing
> should be, according to me, less than 1-2 seconds (which is the time
> required to establish a telnet connection to the power switch)
> 
> It seems that ReiserFS operations are not durable for most of the cases I tried.
> 
> For file create, when tried with 50K, 100K and 1M files, I got
> 34224, 99492, and 998594 files respectively after system rebooted from the
> crash. Similarly for operations like rename and link, the number of files
> renamed or linked after reboot is less than what the filesystem reports prior
> to the crash.
> 
> Now introducing a fsync() after every open() call does solve the problem
> but the performance degradation seen is very high. In fact, I did notice
> the related discussion on the FAQ at namesys.com.
> 
> Also, operations like rename, link and delete also seem to give problems.
> 
> However, with other filesystems like XFS, I get much better results (almost
> 100% durability) on similar tests.
> 
> I am using ReiserFS with linux kernel 2.6.7
> 
> Any comments/suggestions will be helpful.
> 
> Thanks,
> Ash
> 




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ReiserFS post-crash issues
  2004-09-21  8:32 ReiserFS post-crash issues Ash
  2004-09-21  9:51 ` Will Smith
@ 2004-09-21 15:01 ` Hans Reiser
  2004-09-22 11:16   ` Ash
  1 sibling, 1 reply; 6+ messages in thread
From: Hans Reiser @ 2004-09-21 15:01 UTC (permalink / raw)
  To: Ash; +Cc: reiserfs-list

FS operations are not persistent unless you fsync or wait long enough. 
That is the expected norm for unix fs design.

Hans

Ash wrote:

>Hi
>
>I have been running a few tests on ReiserFS to check durability of
>common filesystem operations.
>For example, create a certain number of files and crash the machine
>(poweroff) immediately after this.
>On rebooting, check the number of files actually present on the
>filesystem after log replay.
>
>Similarly, I tried for some other operations like rename, link and delete.
>I am using a C program with open, rename and link system calls to
>perform these operations respectively
>and crashing the system using a network power switch immediately after
>my C program finishes doing its stuff.
>So the delay in-between completion of the operations and the machine crashing
>should be, according to me, less than 1-2 seconds (which is the time
>required to establish a telnet connection to the power switch)
>
>It seems that ReiserFS operations are not durable for most of the cases I tried.
>
>For file create, when tried with 50K, 100K and 1M files, I got
>34224, 99492, and 998594 files respectively after system rebooted from the
>crash. Similarly for operations like rename and link, the number of files
>renamed or linked after reboot is less than what the filesystem reports prior
>to the crash.
>
>Now introducing a fsync() after every open() call does solve the problem
>but the performance degradation seen is very high. In fact, I did notice
>the related discussion on the FAQ at namesys.com.
>
>Also, operations like rename, link and delete also seem to give problems.
>
>However, with other filesystems like XFS, I get much better results (almost
>100% durability) on similar tests.
>
>I am using ReiserFS with linux kernel 2.6.7
>
>Any comments/suggestions will be helpful.
>
>Thanks,
>Ash
>
>
>  
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ReiserFS post-crash issues
  2004-09-21  9:51 ` Will Smith
@ 2004-09-21 15:09   ` Hans Reiser
  2004-09-30  9:18     ` Ash
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Reiser @ 2004-09-21 15:09 UTC (permalink / raw)
  To: Will Smith; +Cc: reiserfs-list

Will Smith wrote:

> Sorry if the below is obvious...
>
> In reiser4, there's a parameter tmgr.atom_max_age which is the maximum
> time an atomic operation (=transaction in database language, I believe)
> can remain 'dirty' before being written to disk.   It defaults to 600
> seconds.  I'd argue that's too high, it should be low for safety and
> tunable up.  Maybe I don't this properly, but I see very high dirty
> values, remaining for minutes at at time, in /proc/meminfo when using 
> reiser4.
>
> In reiserfs, I'm not what the default is (but I see
> JOURNAL_MAX_TRANS_AGE=30) in reiserfs_fs.h) or how it's tunable.
>
> In ext3, I belive the default is for the journal to be flushed after 5
> seconds, and the data after 30 The ext3 limits also seem to be related
> to the kernel parameters
> vm.dirty_expire_centisecs = 3000
> vm.dirty_writeback_centisecs = 500
> (from /sbin/sysctl -a).
>
> Maybe if you are concerned about power failure events, you have to
> sacrifice performance a bit with a lower flush interval for the journal
> and/or data.
>
>
> Will Smith
>
>
>>
>
>
>
>
>
There is no right answer to that setting except to let the user control 
it.  Developers doing compiles would want it as it is, as fsync takes 
care of their edits, and repeat compiles are significantly optimized by 
write caching of them.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ReiserFS post-crash issues
  2004-09-21 15:01 ` Hans Reiser
@ 2004-09-22 11:16   ` Ash
  0 siblings, 0 replies; 6+ messages in thread
From: Ash @ 2004-09-22 11:16 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

Agreed.
And I did see that introducing 'fsync's in my test does make the
creates persistent.
However my concern is that ReiserFS performance degrades substantially
in this case
as compared to other filesystems.
For example, creating 50K 0 byte files took less than 2-3 seconds
without the fsyncs after ever
create,  while introducing the fsyncs increased the time to 87 secs.
This time was way higher than other filesystems like XFS or JFS for
the same test.

Thanks,
Ash


On Tue, 21 Sep 2004 08:01:19 -0700, Hans Reiser <reiser@namesys.com> wrote:
> FS operations are not persistent unless you fsync or wait long enough.
> That is the expected norm for unix fs design.
> 
> Hans
> 
> 
> 
> Ash wrote:
> 
> >Hi
> >
> >I have been running a few tests on ReiserFS to check durability of
> >common filesystem operations.
> >For example, create a certain number of files and crash the machine
> >(poweroff) immediately after this.
> >On rebooting, check the number of files actually present on the
> >filesystem after log replay.
> >
> >Similarly, I tried for some other operations like rename, link and delete.
> >I am using a C program with open, rename and link system calls to
> >perform these operations respectively
> >and crashing the system using a network power switch immediately after
> >my C program finishes doing its stuff.
> >So the delay in-between completion of the operations and the machine crashing
> >should be, according to me, less than 1-2 seconds (which is the time
> >required to establish a telnet connection to the power switch)
> >
> >It seems that ReiserFS operations are not durable for most of the cases I tried.
> >
> >For file create, when tried with 50K, 100K and 1M files, I got
> >34224, 99492, and 998594 files respectively after system rebooted from the
> >crash. Similarly for operations like rename and link, the number of files
> >renamed or linked after reboot is less than what the filesystem reports prior
> >to the crash.
> >
> >Now introducing a fsync() after every open() call does solve the problem
> >but the performance degradation seen is very high. In fact, I did notice
> >the related discussion on the FAQ at namesys.com.
> >
> >Also, operations like rename, link and delete also seem to give problems.
> >
> >However, with other filesystems like XFS, I get much better results (almost
> >100% durability) on similar tests.
> >
> >I am using ReiserFS with linux kernel 2.6.7
> >
> >Any comments/suggestions will be helpful.
> >
> >Thanks,
> >Ash
> >
> >
> >
> >
> 
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ReiserFS post-crash issues
  2004-09-21 15:09   ` Hans Reiser
@ 2004-09-30  9:18     ` Ash
  0 siblings, 0 replies; 6+ messages in thread
From: Ash @ 2004-09-30  9:18 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Will Smith, reiserfs-list

Hi,

Is there any mount / mkfs option which tunes this maximum transacation age ?
I didn't see anything in the man pages for reiserfstune or mkreiserfs
but maybe I missed something.
I saw a discussion in the list archives about a "commit" option in mount.
What is this about ? I don't see anything on it in the docs.

What options do I have, either at compile time or runtime, to tune transaction
commit values ?

Thanks,
Ash

On Tue, 21 Sep 2004 08:09:41 -0700, Hans Reiser <reiser@namesys.com> wrote:
> Will Smith wrote:
> 
> > Sorry if the below is obvious...
> >
> > In reiser4, there's a parameter tmgr.atom_max_age which is the maximum
> > time an atomic operation (=transaction in database language, I believe)
> > can remain 'dirty' before being written to disk.   It defaults to 600
> > seconds.  I'd argue that's too high, it should be low for safety and
> > tunable up.  Maybe I don't this properly, but I see very high dirty
> > values, remaining for minutes at at time, in /proc/meminfo when using
> > reiser4.
> >
> > In reiserfs, I'm not what the default is (but I see
> > JOURNAL_MAX_TRANS_AGE=30) in reiserfs_fs.h) or how it's tunable.
> >
> > In ext3, I belive the default is for the journal to be flushed after 5
> > seconds, and the data after 30 The ext3 limits also seem to be related
> > to the kernel parameters
> > vm.dirty_expire_centisecs = 3000
> > vm.dirty_writeback_centisecs = 500
> > (from /sbin/sysctl -a).
> >
> > Maybe if you are concerned about power failure events, you have to
> > sacrifice performance a bit with a lower flush interval for the journal
> > and/or data.
> >
> >
> > Will Smith
> >
> >
> >>
> >
> >
> >
> >
> >
> There is no right answer to that setting except to let the user control
> it.  Developers doing compiles would want it as it is, as fsync takes
> care of their edits, and repeat compiles are significantly optimized by
> write caching of them.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-09-30  9:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-21  8:32 ReiserFS post-crash issues Ash
2004-09-21  9:51 ` Will Smith
2004-09-21 15:09   ` Hans Reiser
2004-09-30  9:18     ` Ash
2004-09-21 15:01 ` Hans Reiser
2004-09-22 11:16   ` Ash

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.