reiser4: first impression (vs xfs and jfs)

All of lore.kernel.org
 help / color / mirror / Atom feed

* reiser4: first impression (vs xfs and jfs)
@ 2006-05-23 15:51 Tom Vier
  2006-05-23 19:08 ` Gregory Maxwell
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Tom Vier @ 2006-05-23 15:51 UTC (permalink / raw)
  To: reiserfs-list

I finally decided to try a few different fs'es on my 250gig raid1. (I use
reiserfs3 most of the time.) Here's some things i noticed, between r4, xfs,
and jfs.

Both r4 and xfs suffer from io pauses. This is on a dual 2.6ghz opteron,
btw. I don't see high cpu usage, but clock throttling could be screwing up
top's % calcs (tho i think all usage is measured by time, so it shouldn't).

What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev.
When using r4 (xfs behaves similarly), after several seconds, reading from
the source and writing to the destination stops for 3 or 4 seconds, then
brief burst of writes to the r4 fs (the dest), a 1 second pause, and then
reading and periodic writes resume, until it happens again.

It seems that both r4 and xfs allow a large number of pages to be dirtied,
before queuing them for writeback, and this has a negative effect on
throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
10 minutes slower than jfs.

One thing that surprised me was, once r4 does write out, it is very fast.
Fast enough that i wasn't sure it was actually writing whole files! However,
i did a umount; mount and ran cksum, and sure enough, the files were good.
8)

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier
@ 2006-05-23 19:08 ` Gregory Maxwell
  2006-05-23 19:13 ` Alexey Polyakov
  2006-06-06 13:44 ` Tom Vier
  2 siblings, 0 replies; 19+ messages in thread
From: Gregory Maxwell @ 2006-05-23 19:08 UTC (permalink / raw)
  To: Tom Vier; +Cc: reiserfs-list

On 5/23/06, Tom Vier <tmv@comcast.net> wrote:
[snip]
> What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev.
> When using r4 (xfs behaves similarly), after several seconds, reading from
> the source and writing to the destination stops for 3 or 4 seconds, then
> brief burst of writes to the r4 fs (the dest), a 1 second pause, and then
> reading and periodic writes resume, until it happens again.
>
> It seems that both r4 and xfs allow a large number of pages to be dirtied,
> before queuing them for writeback, and this has a negative effect on
> throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> 10 minutes slower than jfs.
[snip]

Have you tested a pure write load? It may be that rsync's combined
reading writing is triggering a corner case for FSes with delayed
allocation. It may not be issuing it's checksumming reads far enough
ahead of time and end up disk latency bound.

It's interesting that you saw the same issues with XFS... I use XFS on
my audio workstation computer because  it (combined with a low latency
patched kernel) had by far the lowest worst case latencies of all the
FSes I tested at the time.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier
  2006-05-23 19:08 ` Gregory Maxwell
@ 2006-05-23 19:13 ` Alexey Polyakov
       [not found]   ` <20060523201712.GD25889@zero>
  2006-06-06 13:44 ` Tom Vier
  2 siblings, 1 reply; 19+ messages in thread
From: Alexey Polyakov @ 2006-05-23 19:13 UTC (permalink / raw)
  To: Tom Vier; +Cc: reiserfs-list

Hi Tom,

what kind of raid do you use? Is it software md, or a hw raid solution?
Also, what's the size of your r4 partition?

On 5/23/06, Tom Vier <tmv@comcast.net> wrote:
> I finally decided to try a few different fs'es on my 250gig raid1. (I use
> reiserfs3 most of the time.) Here's some things i noticed, between r4, xfs,
> and jfs.
>
> Both r4 and xfs suffer from io pauses. This is on a dual 2.6ghz opteron,
> btw. I don't see high cpu usage, but clock throttling could be screwing up
> top's % calcs (tho i think all usage is measured by time, so it shouldn't).
>
> What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev.
> When using r4 (xfs behaves similarly), after several seconds, reading from
> the source and writing to the destination stops for 3 or 4 seconds, then
> brief burst of writes to the r4 fs (the dest), a 1 second pause, and then
> reading and periodic writes resume, until it happens again.
>
> It seems that both r4 and xfs allow a large number of pages to be dirtied,
> before queuing them for writeback, and this has a negative effect on
> throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> 10 minutes slower than jfs.
>
> One thing that surprised me was, once r4 does write out, it is very fast.
> Fast enough that i wasn't sure it was actually writing whole files! However,
> i did a umount; mount and ran cksum, and sure enough, the files were good.
> 8)
>
> --
> Tom Vier <tmv@comcast.net>
> DSA Key ID 0x15741ECE
>


-- 
Alexey Polyakov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
       [not found]   ` <20060523201712.GD25889@zero>
@ 2006-05-23 21:00     ` Alexey Polyakov
  0 siblings, 0 replies; 19+ messages in thread
From: Alexey Polyakov @ 2006-05-23 21:00 UTC (permalink / raw)
  To: Tom Vier; +Cc: reiserfs-list

Mine is 2xOpteron280, on a hardware RAID (Adaptec 2010S on 3xSCSI
146Gx15K). It's a heavily loaded web server. It suffers from
write-outs too. I've tested XFS and JFS, and found out that R4 behaves
better after system crash (due to power), and it gives much better
performance.

What I do for my server is:
1) Get vanilla
2) Do patch-o-matic-ng patches (I wonder why those patches are not
included in vanilla)
3) Apply latest available reiser4

Right now it looks like that:

root@titanic [~]# df -Tm
Filesystem    Type   1M-blocks      Used Available Use% Mounted on
/dev/i2o/hda2            reiser4        9504      4488      5016  48% /
/dev/i2o/hda1	         ext3             99        50        44  54% /boot
/dev/i2o/hda3            reiser4       22659     13884      8776  62% /var
/dev/i2o/hda5            reiser4         917        29       889   4% /tmp
/dev/i2o/hda7            reiser4       18135     14140      3995  78% /usr
/dev/i2o/hda6            reiser4       54382     53278      1104  98% /home
/dev/i2o/hda8            reiser4       54382     48583      5799  90% /home2
/dev/i2o/hda9            reiser4      106370     62854     43517  60% /home3

What's the most interesting, I had (and continuing to have) a lot of
hardware crashes. Reiser4 does the best job - XFS would make some
files (created right before the crash) length 0, reiserfs would render
the fs unusable, and ext3 would lose up to 30% of files on a FS.


On 5/24/06, Tom Vier <tmv@comcast.net> wrote:
> It's linux software raid1. 250gigs:
>
> md1 : active raid1 sdd1[1] sdc1[0]
>      262156544 blocks [2/2] [UU]
>
> I should've mentioned:
>
> Linux zero 2.6.16.16r4-2 #2 SMP PREEMPT Thu May 18 23:49:20 EDT 2006 i686
> GNU/Linux
>
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
>
> It's a dual 2.6ghz opteron box, running an x86 kernel.
>
> On Tue, May 23, 2006 at 11:13:05PM +0400, Alexey Polyakov wrote:
> > what kind of raid do you use? Is it software md, or a hw raid solution?
> > Also, what's the size of your r4 partition?
> >
> > On 5/23/06, Tom Vier <tmv@comcast.net> wrote:
> > >I finally decided to try a few different fs'es on my 250gig raid1. (I use
> > >reiserfs3 most of the time.) Here's some things i noticed, between r4, xfs,
> > >and jfs.
> > >
> > >Both r4 and xfs suffer from io pauses. This is on a dual 2.6ghz opteron,
> > >btw. I don't see high cpu usage, but clock throttling could be screwing up
> > >top's % calcs (tho i think all usage is measured by time, so it shouldn't).
> > >
> > >What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev.
> > >When using r4 (xfs behaves similarly), after several seconds, reading from
> > >the source and writing to the destination stops for 3 or 4 seconds, then
> > >brief burst of writes to the r4 fs (the dest), a 1 second pause, and then
> > >reading and periodic writes resume, until it happens again.
> > >
> > >It seems that both r4 and xfs allow a large number of pages to be dirtied,
> > >before queuing them for writeback, and this has a negative effect on
> > >throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> > >10 minutes slower than jfs.
> > >
> > >One thing that surprised me was, once r4 does write out, it is very fast.
> > >Fast enough that i wasn't sure it was actually writing whole files!
> > >However,
> > >i did a umount; mount and ran cksum, and sure enough, the files were good.
> > >8)
> > >
> > >--
> > >Tom Vier <tmv@comcast.net>
> > >DSA Key ID 0x15741ECE
> > >
> >
> >
> > --
> > Alexey Polyakov
>
> --
> Tom Vier <tmv@comcast.net>
> DSA Key ID 0x15741ECE
>


-- 
Alexey Polyakov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier
  2006-05-23 19:08 ` Gregory Maxwell
  2006-05-23 19:13 ` Alexey Polyakov
@ 2006-06-06 13:44 ` Tom Vier
  2006-06-06 14:38   ` Vladimir V. Saveliev
  2 siblings, 1 reply; 19+ messages in thread
From: Tom Vier @ 2006-06-06 13:44 UTC (permalink / raw)
  To: reiserfs-list

On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote:
> It seems that both r4 and xfs allow a large number of pages to be dirtied,
> before queuing them for writeback, and this has a negative effect on
> throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> 10 minutes slower than jfs.

Just to follow up on this (i've been too busy lately), that's how delayed
allocation works. It waits til the vm forces writeouts.

In my case of copying large files from a slower drive, the delayed allocation
of r4 and xfs is stalling reads from the source, since neither will write
until the vw forces it.

Is there a way in r4 to force sync a mount every so often, ala flushd? ext3
has the commit option. Does r4 have a hard coded sync timer already? If not,
i think it's an important feature that should be added (and made a mount
option). Otherwise, a lot of data can be lost. Does the kernel do a system
wide sync every 30sec, like it used to?

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 13:44 ` Tom Vier
@ 2006-06-06 14:38   ` Vladimir V. Saveliev
  2006-06-06 15:30     ` Clay Barnes
  2006-06-07 17:58     ` Tom Vier
  0 siblings, 2 replies; 19+ messages in thread
From: Vladimir V. Saveliev @ 2006-06-06 14:38 UTC (permalink / raw)
  To: Tom Vier; +Cc: reiserfs-list

Hello

On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote:
> On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote:
> > It seems that both r4 and xfs allow a large number of pages to be dirtied,
> > before queuing them for writeback, and this has a negative effect on
> > throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> > 10 minutes slower than jfs.
> 
> Just to follow up on this (i've been too busy lately), that's how delayed
> allocation works. It waits til the vm forces writeouts.
> 
> In my case of copying large files from a slower drive, the delayed allocation
> of r4 and xfs is stalling reads from the source, since neither will write
> until the vw forces it.
> 
> Is there a way in r4 to force sync a mount every so often, ala flushd? 

reiser4 has an option for that.
mount -o tmgr.atom_max_age=N
N is decimal number of seconds. Changes older than N will be forced to
commit.

> ext3
> has the commit option. Does r4 have a hard coded sync timer already? If not,
> i think it's an important feature that should be added (and made a mount
> option). Otherwise, a lot of data can be lost. Does the kernel do a system
> wide sync every 30sec, like it used to?
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 14:38   ` Vladimir V. Saveliev
@ 2006-06-06 15:30     ` Clay Barnes
  2006-06-06 17:47       ` PFC
  2006-06-06 19:25       ` Hans Reiser
  2006-06-07 17:58     ` Tom Vier
  1 sibling, 2 replies; 19+ messages in thread
From: Clay Barnes @ 2006-06-06 15:30 UTC (permalink / raw)
  To: Vladimir V. Saveliev; +Cc: reiserfs-list

On 18:38 Tue 06 Jun     , Vladimir V. Saveliev wrote:
> Hello
> 
> On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote:
> > On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote:
> > > It seems that both r4 and xfs allow a large number of pages to be dirtied,
> > > before queuing them for writeback, and this has a negative effect on
> > > throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> > > 10 minutes slower than jfs.
> > 
> > Just to follow up on this (i've been too busy lately), that's how delayed
> > allocation works. It waits til the vm forces writeouts.
> > 
> > In my case of copying large files from a slower drive, the delayed allocation
> > of r4 and xfs is stalling reads from the source, since neither will write
> > until the vw forces it.
> > 
> > Is there a way in r4 to force sync a mount every so often, ala flushd? 
> 
> reiser4 has an option for that.
> mount -o tmgr.atom_max_age=N
> N is decimal number of seconds. Changes older than N will be forced to
> commit.
This may have been mentioned before, but perhaps there could be a
"trickle-out" option along the lines of "if the hard drive is idle (and
optionally only if it's spun up), slowly write out the changes to the
disk structure."  This could also be paired with keeping as much of the
data in memory as necessary to mantain the speed boost that r4 gets from
temporal locality of reference, possibly just giving it to the system
cache.
> 
> > ext3
> > has the commit option. Does r4 have a hard coded sync timer already? If not,
> > i think it's an important feature that should be added (and made a mount
> > option). Otherwise, a lot of data can be lost. Does the kernel do a system
> > wide sync every 30sec, like it used to?
> > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 15:30     ` Clay Barnes
@ 2006-06-06 17:47       ` PFC
  2006-06-06 19:26         ` Hans Reiser
  2006-06-06 19:25       ` Hans Reiser
  1 sibling, 1 reply; 19+ messages in thread
From: PFC @ 2006-06-06 17:47 UTC (permalink / raw)
  To: Clay Barnes, Vladimir V. Saveliev; +Cc: reiserfs-list


> This may have been mentioned before, but perhaps there could be a
> "trickle-out" option along the lines of "if the hard drive is idle (and
> optionally only if it's spun up), slowly write out the changes to the
> disk structure."  This could also be paired with keeping as much of the
> data in memory as necessary to mantain the speed boost that r4 gets from
> temporal locality of reference, possibly just giving it to the system
> cache.

	Hm actually, this looks a lot like read-ahead algorithms, but instead  
it's "write-ahead" :

	For instance :
	- Sequential writes on large files should stream through the cache.
	- Random writes or small file writes should be kept as long as possible  
in dirty pages so they can be coalesced into larger writes with a better  
disk layout on flush, or not written at all if it was temp files from a  
make, for instance.

	Do the file copying programs open their output files with O_SEQUENTIAL ?  
If so, there is information to exploit...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 15:30     ` Clay Barnes
  2006-06-06 17:47       ` PFC
@ 2006-06-06 19:25       ` Hans Reiser
  2006-06-07  0:13         ` Clay Barnes
  2006-06-08 14:06         ` Tom Vier
  1 sibling, 2 replies; 19+ messages in thread
From: Hans Reiser @ 2006-06-06 19:25 UTC (permalink / raw)
  To: Clay Barnes; +Cc: Vladimir V. Saveliev, reiserfs-list

Clay Barnes wrote:

>On 18:38 Tue 06 Jun     , Vladimir V. Saveliev wrote:
>  
>
>>Hello
>>
>>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote:
>>    
>>
>>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote:
>>>      
>>>
>>>>It seems that both r4 and xfs allow a large number of pages to be dirtied,
>>>>before queuing them for writeback, and this has a negative effect on
>>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
>>>>10 minutes slower than jfs.
>>>>        
>>>>
>>>Just to follow up on this (i've been too busy lately), that's how delayed
>>>allocation works. It waits til the vm forces writeouts.
>>>
>>>In my case of copying large files from a slower drive, the delayed allocation
>>>of r4 and xfs is stalling reads from the source, since neither will write
>>>until the vw forces it.
>>>
>>>Is there a way in r4 to force sync a mount every so often, ala flushd? 
>>>      
>>>
>>reiser4 has an option for that.
>>mount -o tmgr.atom_max_age=N
>>N is decimal number of seconds. Changes older than N will be forced to
>>commit.
>>    
>>
>This may have been mentioned before, but perhaps there could be a
>"trickle-out" option along the lines of "if the hard drive is idle (and
>optionally only if it's spun up), slowly write out the changes to the
>disk structure." 
>
Yes, I will take a patch to do the above as it would be good, but I am
not convinced it explains the problem described.  I don't really
understand how writes to a fast drive can slow reads from a slow drive. 
I am missing something.

Maybe I should ask the following: is the slow drive using reiser4?  If
reiser4, was the slow drive image created by copying from a reiser4
image or an ext3 image?   (Standard benchmarking mistake: creating an
image for a test from a filesystem not the one that is being tested. 
readdir order matters.)

> This could also be paired with keeping as much of the
>data in memory as necessary to mantain the speed boost that r4 gets from
>temporal locality of reference, possibly just giving it to the system
>cache.
>  
>
>>>ext3
>>>has the commit option. Does r4 have a hard coded sync timer already? If not,
>>>i think it's an important feature that should be added (and made a mount
>>>option). Otherwise, a lot of data can be lost. Does the kernel do a system
>>>wide sync every 30sec, like it used to?
>>>
>>>      
>>>
>
>
>  
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 17:47       ` PFC
@ 2006-06-06 19:26         ` Hans Reiser
  2006-06-07 17:21           ` PFC
  0 siblings, 1 reply; 19+ messages in thread
From: Hans Reiser @ 2006-06-06 19:26 UTC (permalink / raw)
  To: PFC; +Cc: Clay Barnes, Vladimir V. Saveliev, reiserfs-list

PFC wrote:

>
>> This may have been mentioned before, but perhaps there could be a
>> "trickle-out" option along the lines of "if the hard drive is idle (and
>> optionally only if it's spun up), slowly write out the changes to the
>> disk structure."  This could also be paired with keeping as much of the
>> data in memory as necessary to mantain the speed boost that r4 gets from
>> temporal locality of reference, possibly just giving it to the system
>> cache.
>
>
>     Hm actually, this looks a lot like read-ahead algorithms, but
> instead  it's "write-ahead" :
>
>     For instance :
>     - Sequential writes on large files should stream through the cache.
>     - Random writes or small file writes should be kept as long as
> possible  in dirty pages so they can be coalesced into larger writes
> with a better  disk layout on flush, or not written at all if it was
> temp files from a  make, for instance.
>
>     Do the file copying programs open their output files with
> O_SEQUENTIAL ?  If so, there is information to exploit...
>
>
You can change them to do so....

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 19:25       ` Hans Reiser
@ 2006-06-07  0:13         ` Clay Barnes
  2006-06-07  0:42           ` Hans Reiser
  2006-06-08  0:55           ` Nate Diller
  2006-06-08 14:06         ` Tom Vier
  1 sibling, 2 replies; 19+ messages in thread
From: Clay Barnes @ 2006-06-07  0:13 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Vladimir V. Saveliev, reiserfs-list

On 12:25 Tue 06 Jun     , Hans Reiser wrote:
> Clay Barnes wrote:
> 
> >On 18:38 Tue 06 Jun     , Vladimir V. Saveliev wrote:
> >  
> >
> >>Hello
> >>
> >>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote:
> >>    
> >>
> >>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote:
> >>>      
> >>>
> >>>>It seems that both r4 and xfs allow a large number of pages to be dirtied,
> >>>>before queuing them for writeback, and this has a negative effect on
> >>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> >>>>10 minutes slower than jfs.
> >>>>        
> >>>>
> >>>Just to follow up on this (i've been too busy lately), that's how delayed
> >>>allocation works. It waits til the vm forces writeouts.
> >>>
> >>>In my case of copying large files from a slower drive, the delayed allocation
> >>>of r4 and xfs is stalling reads from the source, since neither will write
> >>>until the vw forces it.
> >>>
> >>>Is there a way in r4 to force sync a mount every so often, ala flushd? 
> >>>      
> >>>
> >>reiser4 has an option for that.
> >>mount -o tmgr.atom_max_age=N
> >>N is decimal number of seconds. Changes older than N will be forced to
> >>commit.
> >>    
> >>
> >This may have been mentioned before, but perhaps there could be a
> >"trickle-out" option along the lines of "if the hard drive is idle (and
> >optionally only if it's spun up), slowly write out the changes to the
> >disk structure." 
> >
> Yes, I will take a patch to do the above as it would be good, but I am
I wish I had the skills to do so.  Perhaps after a few more C classes
and my next degree. :-/
> not convinced it explains the problem described.  I don't really
Well, very possibly (likely) not, but it just happened to bring to mind
the particular thought I had had several times before.
> understand how writes to a fast drive can slow reads from a slow drive. 
> I am missing something.
> 
> Maybe I should ask the following: is the slow drive using reiser4?  If
> reiser4, was the slow drive image created by copying from a reiser4
> image or an ext3 image?   (Standard benchmarking mistake: creating an
> image for a test from a filesystem not the one that is being tested. 
> readdir order matters.)
> 
> > This could also be paired with keeping as much of the
> >data in memory as necessary to mantain the speed boost that r4 gets from
> >temporal locality of reference, possibly just giving it to the system
> >cache.
> >  
> >
> >>>ext3
> >>>has the commit option. Does r4 have a hard coded sync timer already? If not,
> >>>i think it's an important feature that should be added (and made a mount
> >>>option). Otherwise, a lot of data can be lost. Does the kernel do a system
> >>>wide sync every 30sec, like it used to?
> >>>
> >>>      
> >>>
> >
> >
> >  
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-07  0:13         ` Clay Barnes
@ 2006-06-07  0:42           ` Hans Reiser
  2006-06-08  0:55           ` Nate Diller
  1 sibling, 0 replies; 19+ messages in thread
From: Hans Reiser @ 2006-06-07  0:42 UTC (permalink / raw)
  To: Clay Barnes; +Cc: Vladimir V. Saveliev, reiserfs-list

Clay Barnes wrote:

>On 12:25 Tue 06 Jun     , Hans Reiser wrote:
>  
>
>>Clay Barnes wrote:
>>
>>    
>>
>>>On 18:38 Tue 06 Jun     , Vladimir V. Saveliev wrote:
>>> 
>>>
>>>      
>>>
>>>>Hello
>>>>
>>>>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote:
>>>>   
>>>>
>>>>        
>>>>
>>>>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote:
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>It seems that both r4 and xfs allow a large number of pages to be dirtied,
>>>>>>before queuing them for writeback, and this has a negative effect on
>>>>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
>>>>>>10 minutes slower than jfs.
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>Just to follow up on this (i've been too busy lately), that's how delayed
>>>>>allocation works. It waits til the vm forces writeouts.
>>>>>
>>>>>In my case of copying large files from a slower drive, the delayed allocation
>>>>>of r4 and xfs is stalling reads from the source, since neither will write
>>>>>until the vw forces it.
>>>>>
>>>>>Is there a way in r4 to force sync a mount every so often, ala flushd? 
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>reiser4 has an option for that.
>>>>mount -o tmgr.atom_max_age=N
>>>>N is decimal number of seconds. Changes older than N will be forced to
>>>>commit.
>>>>   
>>>>
>>>>        
>>>>
>>>This may have been mentioned before, but perhaps there could be a
>>>"trickle-out" option along the lines of "if the hard drive is idle (and
>>>optionally only if it's spun up), slowly write out the changes to the
>>>disk structure." 
>>>
>>>      
>>>
>>Yes, I will take a patch to do the above as it would be good, but I am
>>    
>>
>I wish I had the skills to do so.  Perhaps after a few more C classes
>and my next degree. :-/
>  
>
When we get our wiki going, we should create a desired features page,
and maybe you can add this to that.

Someday we will surely code this, but when I cannot today say.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 19:26         ` Hans Reiser
@ 2006-06-07 17:21           ` PFC
  0 siblings, 0 replies; 19+ messages in thread
From: PFC @ 2006-06-07 17:21 UTC (permalink / raw)
  To: reiserfs-list


>>     Do the file copying programs open their output files with
>> O_SEQUENTIAL ?  If so, there is information to exploit...
>>
>>
> You can change them to do so....

	I rather meant : if a program opens a file for write with O_SEQUENTIAL  
(which should be done when copying files), will reiser4 exploit the  
information by flushing sooner, and in a more "streaming" fashion ? Or  
will it use the default algorithm, flushing under memory pressure ?

	The current behaviour of reiser4, flushing dirty pages late in case they  
are modified again before being written, is excellent for many use cases  
(random writes, small file writes, temporary files, copying files inside a  
single spindle etc) ; but it isn't optimal for writing large amounts of  
data in sequential fashion, like copying large files between disks, for  
instance. Adapting this would be quite tricky I guess...

	Consider the two following scenarios :

	- A database is doing an UPDATE query on a table. It will issue a lot of  
reads and writes, probably in random order. In this case, if the working  
set fits in RAM, it pays big time to flush as late as possible, ideally  
when the query is finished, because some pages may be written to multiple  
times. Also, delayed allocation will reduce fragmentation of the files.
	It's the same thing for doing a Make, unzipping an archive, copying many  
small files, etc.

	- A process acquires audio or video in realtime and streams it to disk,  
or copies files from one disk to another. In this case it is better to  
stream the data directly to disk, especially if the files are large.

	Guessing is a pain in the ass. How can the application inform the  
filesystem of its intentions ?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 14:38   ` Vladimir V. Saveliev
  2006-06-06 15:30     ` Clay Barnes
@ 2006-06-07 17:58     ` Tom Vier
  2006-06-08  0:41       ` Nate Diller
  1 sibling, 1 reply; 19+ messages in thread
From: Tom Vier @ 2006-06-07 17:58 UTC (permalink / raw)
  To: reiserfs-list

On Tue, Jun 06, 2006 at 06:38:26PM +0400, Vladimir V. Saveliev wrote:
> reiser4 has an option for that.
> mount -o tmgr.atom_max_age=N
> N is decimal number of seconds. Changes older than N will be forced to
> commit.

Unfortunetly, this causes even more read pauses from the source, when
running rsync. I also tried cpio (-p mode) and cp -a, same pauses. When
syncing, r4 seems to have about a 5 second pause, then a burst of seeks and
writes. I also tried disabling clock throttling, no difference. I have a
fast system. Two single core 2.6ghz opterons. CPU time during the pause
after sync and before the writes doesn't seem to very high. I thought r4
might be cpu bound, but it doesn't seem to be. I'm not sure what's causing
this pause. If i had more free time, i'd setup kernel profiling.

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-07 17:58     ` Tom Vier
@ 2006-06-08  0:41       ` Nate Diller
  0 siblings, 0 replies; 19+ messages in thread
From: Nate Diller @ 2006-06-08  0:41 UTC (permalink / raw)
  To: Tom Vier; +Cc: reiserfs-list

On 6/7/06, Tom Vier <tmv@comcast.net> wrote:
> On Tue, Jun 06, 2006 at 06:38:26PM +0400, Vladimir V. Saveliev wrote:
> > reiser4 has an option for that.
> > mount -o tmgr.atom_max_age=N
> > N is decimal number of seconds. Changes older than N will be forced to
> > commit.
>
> Unfortunetly, this causes even more read pauses from the source, when
> running rsync. I also tried cpio (-p mode) and cp -a, same pauses. When
> syncing, r4 seems to have about a 5 second pause, then a burst of seeks and
> writes. I also tried disabling clock throttling, no difference. I have a
> fast system. Two single core 2.6ghz opterons. CPU time during the pause
> after sync and before the writes doesn't seem to very high. I thought r4
> might be cpu bound, but it doesn't seem to be. I'm not sure what's causing
> this pause. If i had more free time, i'd setup kernel profiling.

this could easily be a disk scheduler issue.  in particular, reiser4
may be interacting with the block device congestion code, leading it
to fail to submit all the writes at once.  this may be helped by the
upconing patch for creating bio structures more than 4k at a time, or
by using a different scheduler.

try 'echo 512 > /sys/block/[dev]/queue/nr_requests' and see if the
behaviour changes

NATE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-07  0:13         ` Clay Barnes
  2006-06-07  0:42           ` Hans Reiser
@ 2006-06-08  0:55           ` Nate Diller
  2006-06-08 14:18             ` Tom Vier
  1 sibling, 1 reply; 19+ messages in thread
From: Nate Diller @ 2006-06-08  0:55 UTC (permalink / raw)
  To: Clay Barnes; +Cc: Hans Reiser, Vladimir V. Saveliev, reiserfs-list

On 6/6/06, Clay Barnes <clay.barnes@gmail.com> wrote:
> On 12:25 Tue 06 Jun     , Hans Reiser wrote:
> > Clay Barnes wrote:
> >
> > >On 18:38 Tue 06 Jun     , Vladimir V. Saveliev wrote:
> > >
> > >
> > >>Hello
> > >>
> > >>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote:
> > >>
> > >>
> > >>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote:
> > >>>
> > >>>
> > >>>>It seems that both r4 and xfs allow a large number of pages to be dirtied,
> > >>>>before queuing them for writeback, and this has a negative effect on
> > >>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
> > >>>>10 minutes slower than jfs.
> > >>>>
> > >>>>
> > >>>Just to follow up on this (i've been too busy lately), that's how delayed
> > >>>allocation works. It waits til the vm forces writeouts.
> > >>>
> > >>>In my case of copying large files from a slower drive, the delayed allocation
> > >>>of r4 and xfs is stalling reads from the source, since neither will write
> > >>>until the vw forces it.
> > >>>
> > >>>Is there a way in r4 to force sync a mount every so often, ala flushd?
> > >>>
> > >>>
> > >>reiser4 has an option for that.
> > >>mount -o tmgr.atom_max_age=N
> > >>N is decimal number of seconds. Changes older than N will be forced to
> > >>commit.
> > >>
> > >>
> > >This may have been mentioned before, but perhaps there could be a
> > >"trickle-out" option along the lines of "if the hard drive is idle (and
> > >optionally only if it's spun up), slowly write out the changes to the
> > >disk structure."
> > >
> > Yes, I will take a patch to do the above as it would be good, but I am
> I wish I had the skills to do so.  Perhaps after a few more C classes
> and my next degree. :-/
> > not convinced it explains the problem described.  I don't really
> Well, very possibly (likely) not, but it just happened to bring to mind
> the particular thought I had had several times before.

this is something i've been wanting to address for a while now.  at
the moment, the VM starts to flush pages when
/proc/sys/vm/dirty_background_ratio is exceeded, and it flushes pages
without regard to which process dirtied them or which file they are
in.  it begins to throttle processes (make them wait for some dirty
pages to be cleaned before they can dirty more of them) when
dirty_ratio is exceeded.

PFC <lists@peufeu.com> said (forgive me for cross-quoting):

        Consider the two following scenarios :

       - A database is doing an UPDATE query on a table. It will issue a lot of
reads and writes, probably in random order. In this case, if the working
set fits in RAM, it pays big time to flush as late as possible, ideally
when the query is finished, because some pages may be written to multiple
times. Also, delayed allocation will reduce fragmentation of the files.
       It's the same thing for doing a Make, unzipping an archive, copying many
small files, etc.

       - A process acquires audio or video in realtime and streams it to disk,
or copies files from one disk to another. In this case it is better to
stream the data directly to disk, especially if the files are large.

       Guessing is a pain in the ass. How can the application inform the
filesystem of its intentions ?

---

the real problem is, what if both of those processes are running at
once?  the correct behaviour would be to leave the database pages
dirty in memory, and sequentially clean the streaming ones.
furthermore, the database shouldn't be throttled unless its working
set it bigger than available memory, but the stream should always be.

i've been pondering this problem for the past year, and haven't found
the silver bullet yet.  i am still resolved to re-write the throttling
and congestion code at some point, because it's fundamentally broken

but i bet hans is right, these pauses you're seeing could probably be
alleviated by fixing something simple in reiser4.

NATE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-06 19:25       ` Hans Reiser
  2006-06-07  0:13         ` Clay Barnes
@ 2006-06-08 14:06         ` Tom Vier
  2006-06-09  8:05           ` Hans Reiser
  1 sibling, 1 reply; 19+ messages in thread
From: Tom Vier @ 2006-06-08 14:06 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

On Tue, Jun 06, 2006 at 12:25:15PM -0700, Hans Reiser wrote:
> Maybe I should ask the following: is the slow drive using reiser4?  If

No, it was ext2.

> reiser4, was the slow drive image created by copying from a reiser4
> image or an ext3 image?   (Standard benchmarking mistake: creating an
> image for a test from a filesystem not the one that is being tested. 
> readdir order matters.)

Would that really make much difference?

I think the problem here is a general problem with delayed allocation,
regardless of which fs impliments it. The fs'es need to stream out writes.
If it's possible (i don't know if fs'es are allowed this info from the vfs),
i think after a short timeout of a file no longer being open for writing, it
should be written. Maybe have a longer delay for smaller files, so they pack
better. Past a certain size threshold, once a file is closed (or only opened
read-only) i think it should be flushes without much delay. Especially if
the blk dev is idle (but knowing that at the fs level may well be impossible
w/o modding the vfs api). I think linux (and other os'es) are in need of
more intelligent io scheduling (higher level than just sector elevators).

One problem with my suggestion is that apps don't always close or reopen
read-only after they write a file.

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-08  0:55           ` Nate Diller
@ 2006-06-08 14:18             ` Tom Vier
  0 siblings, 0 replies; 19+ messages in thread
From: Tom Vier @ 2006-06-08 14:18 UTC (permalink / raw)
  To: Nate Diller; +Cc: reiserfs-list

On Wed, Jun 07, 2006 at 05:55:25PM -0700, Nate Diller wrote:
> this is something i've been wanting to address for a while now.  at
> the moment, the VM starts to flush pages when
> /proc/sys/vm/dirty_background_ratio is exceeded, and it flushes pages
> without regard to which process dirtied them or which file they are
> in.  it begins to throttle processes (make them wait for some dirty
> pages to be cleaned before they can dirty more of them) when
> dirty_ratio is exceeded.

That ratio is something i'll have to play with.
/proc/sys/vm/dirty_writeback_centisecs looks like it could help me as well.
With two (and eventually ext3, as well) fs'es using delayed allocation, it
may be best to change those values (especially when you have several gigs of
ram). I don't suspose there's a way to make it per fs, tho. fs'es don't walk
dirty pages, do they? Only the vm decides when to write out? ext3 and r4 do
have write timeouts, tho.

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: reiser4: first impression (vs xfs and jfs)
  2006-06-08 14:06         ` Tom Vier
@ 2006-06-09  8:05           ` Hans Reiser
  0 siblings, 0 replies; 19+ messages in thread
From: Hans Reiser @ 2006-06-09  8:05 UTC (permalink / raw)
  To: Tom Vier; +Cc: reiserfs-list

Make both XFS or both reiser4, and speed will likely increase.  Please
try and report the result if you could.


Tom Vier wrote:

>On Tue, Jun 06, 2006 at 12:25:15PM -0700, Hans Reiser wrote:
>  
>
>>Maybe I should ask the following: is the slow drive using reiser4?  If
>>    
>>
>
>No, it was ext2.
>
>  
>
>>reiser4, was the slow drive image created by copying from a reiser4
>>image or an ext3 image?   (Standard benchmarking mistake: creating an
>>image for a test from a filesystem not the one that is being tested. 
>>readdir order matters.)
>>    
>>
>
>Would that really make much difference?
>
>I think the problem here is a general problem with delayed allocation,
>regardless of which fs impliments it. The fs'es need to stream out writes.
>If it's possible (i don't know if fs'es are allowed this info from the vfs),
>i think after a short timeout of a file no longer being open for writing, it
>should be written. Maybe have a longer delay for smaller files, so they pack
>better. Past a certain size threshold, once a file is closed (or only opened
>read-only) i think it should be flushes without much delay. Especially if
>the blk dev is idle (but knowing that at the fs level may well be impossible
>w/o modding the vfs api). I think linux (and other os'es) are in need of
>more intelligent io scheduling (higher level than just sector elevators).
>
>One problem with my suggestion is that apps don't always close or reopen
>read-only after they write a file.
>
>  
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-06-09  8:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier
2006-05-23 19:08 ` Gregory Maxwell
2006-05-23 19:13 ` Alexey Polyakov
     [not found]   ` <20060523201712.GD25889@zero>
2006-05-23 21:00     ` Alexey Polyakov
2006-06-06 13:44 ` Tom Vier
2006-06-06 14:38   ` Vladimir V. Saveliev
2006-06-06 15:30     ` Clay Barnes
2006-06-06 17:47       ` PFC
2006-06-06 19:26         ` Hans Reiser
2006-06-07 17:21           ` PFC
2006-06-06 19:25       ` Hans Reiser
2006-06-07  0:13         ` Clay Barnes
2006-06-07  0:42           ` Hans Reiser
2006-06-08  0:55           ` Nate Diller
2006-06-08 14:18             ` Tom Vier
2006-06-08 14:06         ` Tom Vier
2006-06-09  8:05           ` Hans Reiser
2006-06-07 17:58     ` Tom Vier
2006-06-08  0:41       ` Nate Diller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.