* reiser4: first impression (vs xfs and jfs)
@ 2006-05-23 15:51 Tom Vier
2006-05-23 19:08 ` Gregory Maxwell
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Tom Vier @ 2006-05-23 15:51 UTC (permalink / raw)
To: reiserfs-list
I finally decided to try a few different fs'es on my 250gig raid1. (I use
reiserfs3 most of the time.) Here's some things i noticed, between r4, xfs,
and jfs.
Both r4 and xfs suffer from io pauses. This is on a dual 2.6ghz opteron,
btw. I don't see high cpu usage, but clock throttling could be screwing up
top's % calcs (tho i think all usage is measured by time, so it shouldn't).
What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev.
When using r4 (xfs behaves similarly), after several seconds, reading from
the source and writing to the destination stops for 3 or 4 seconds, then
brief burst of writes to the r4 fs (the dest), a 1 second pause, and then
reading and periodic writes resume, until it happens again.
It seems that both r4 and xfs allow a large number of pages to be dirtied,
before queuing them for writeback, and this has a negative effect on
throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost
10 minutes slower than jfs.
One thing that surprised me was, once r4 does write out, it is very fast.
Fast enough that i wasn't sure it was actually writing whole files! However,
i did a umount; mount and ran cksum, and sure enough, the files were good.
8)
--
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: reiser4: first impression (vs xfs and jfs) 2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier @ 2006-05-23 19:08 ` Gregory Maxwell 2006-05-23 19:13 ` Alexey Polyakov 2006-06-06 13:44 ` Tom Vier 2 siblings, 0 replies; 19+ messages in thread From: Gregory Maxwell @ 2006-05-23 19:08 UTC (permalink / raw) To: Tom Vier; +Cc: reiserfs-list On 5/23/06, Tom Vier <tmv@comcast.net> wrote: [snip] > What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev. > When using r4 (xfs behaves similarly), after several seconds, reading from > the source and writing to the destination stops for 3 or 4 seconds, then > brief burst of writes to the r4 fs (the dest), a 1 second pause, and then > reading and periodic writes resume, until it happens again. > > It seems that both r4 and xfs allow a large number of pages to be dirtied, > before queuing them for writeback, and this has a negative effect on > throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > 10 minutes slower than jfs. [snip] Have you tested a pure write load? It may be that rsync's combined reading writing is triggering a corner case for FSes with delayed allocation. It may not be issuing it's checksumming reads far enough ahead of time and end up disk latency bound. It's interesting that you saw the same issues with XFS... I use XFS on my audio workstation computer because it (combined with a low latency patched kernel) had by far the lowest worst case latencies of all the FSes I tested at the time. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier 2006-05-23 19:08 ` Gregory Maxwell @ 2006-05-23 19:13 ` Alexey Polyakov [not found] ` <20060523201712.GD25889@zero> 2006-06-06 13:44 ` Tom Vier 2 siblings, 1 reply; 19+ messages in thread From: Alexey Polyakov @ 2006-05-23 19:13 UTC (permalink / raw) To: Tom Vier; +Cc: reiserfs-list Hi Tom, what kind of raid do you use? Is it software md, or a hw raid solution? Also, what's the size of your r4 partition? On 5/23/06, Tom Vier <tmv@comcast.net> wrote: > I finally decided to try a few different fs'es on my 250gig raid1. (I use > reiserfs3 most of the time.) Here's some things i noticed, between r4, xfs, > and jfs. > > Both r4 and xfs suffer from io pauses. This is on a dual 2.6ghz opteron, > btw. I don't see high cpu usage, but clock throttling could be screwing up > top's % calcs (tho i think all usage is measured by time, so it shouldn't). > > What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev. > When using r4 (xfs behaves similarly), after several seconds, reading from > the source and writing to the destination stops for 3 or 4 seconds, then > brief burst of writes to the r4 fs (the dest), a 1 second pause, and then > reading and periodic writes resume, until it happens again. > > It seems that both r4 and xfs allow a large number of pages to be dirtied, > before queuing them for writeback, and this has a negative effect on > throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > 10 minutes slower than jfs. > > One thing that surprised me was, once r4 does write out, it is very fast. > Fast enough that i wasn't sure it was actually writing whole files! However, > i did a umount; mount and ran cksum, and sure enough, the files were good. > 8) > > -- > Tom Vier <tmv@comcast.net> > DSA Key ID 0x15741ECE > -- Alexey Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20060523201712.GD25889@zero>]
* Re: reiser4: first impression (vs xfs and jfs) [not found] ` <20060523201712.GD25889@zero> @ 2006-05-23 21:00 ` Alexey Polyakov 0 siblings, 0 replies; 19+ messages in thread From: Alexey Polyakov @ 2006-05-23 21:00 UTC (permalink / raw) To: Tom Vier; +Cc: reiserfs-list Mine is 2xOpteron280, on a hardware RAID (Adaptec 2010S on 3xSCSI 146Gx15K). It's a heavily loaded web server. It suffers from write-outs too. I've tested XFS and JFS, and found out that R4 behaves better after system crash (due to power), and it gives much better performance. What I do for my server is: 1) Get vanilla 2) Do patch-o-matic-ng patches (I wonder why those patches are not included in vanilla) 3) Apply latest available reiser4 Right now it looks like that: root@titanic [~]# df -Tm Filesystem Type 1M-blocks Used Available Use% Mounted on /dev/i2o/hda2 reiser4 9504 4488 5016 48% / /dev/i2o/hda1 ext3 99 50 44 54% /boot /dev/i2o/hda3 reiser4 22659 13884 8776 62% /var /dev/i2o/hda5 reiser4 917 29 889 4% /tmp /dev/i2o/hda7 reiser4 18135 14140 3995 78% /usr /dev/i2o/hda6 reiser4 54382 53278 1104 98% /home /dev/i2o/hda8 reiser4 54382 48583 5799 90% /home2 /dev/i2o/hda9 reiser4 106370 62854 43517 60% /home3 What's the most interesting, I had (and continuing to have) a lot of hardware crashes. Reiser4 does the best job - XFS would make some files (created right before the crash) length 0, reiserfs would render the fs unusable, and ext3 would lose up to 30% of files on a FS. On 5/24/06, Tom Vier <tmv@comcast.net> wrote: > It's linux software raid1. 250gigs: > > md1 : active raid1 sdd1[1] sdc1[0] > 262156544 blocks [2/2] [UU] > > I should've mentioned: > > Linux zero 2.6.16.16r4-2 #2 SMP PREEMPT Thu May 18 23:49:20 EDT 2006 i686 > GNU/Linux > > CONFIG_PREEMPT=y > CONFIG_PREEMPT_BKL=y > > It's a dual 2.6ghz opteron box, running an x86 kernel. > > On Tue, May 23, 2006 at 11:13:05PM +0400, Alexey Polyakov wrote: > > what kind of raid do you use? Is it software md, or a hw raid solution? > > Also, what's the size of your r4 partition? > > > > On 5/23/06, Tom Vier <tmv@comcast.net> wrote: > > >I finally decided to try a few different fs'es on my 250gig raid1. (I use > > >reiserfs3 most of the time.) Here's some things i noticed, between r4, xfs, > > >and jfs. > > > > > >Both r4 and xfs suffer from io pauses. This is on a dual 2.6ghz opteron, > > >btw. I don't see high cpu usage, but clock throttling could be screwing up > > >top's % calcs (tho i think all usage is measured by time, so it shouldn't). > > > > > >What i'm doing is rsyncing from a slower drive (on 1394) to the raid1 dev. > > >When using r4 (xfs behaves similarly), after several seconds, reading from > > >the source and writing to the destination stops for 3 or 4 seconds, then > > >brief burst of writes to the r4 fs (the dest), a 1 second pause, and then > > >reading and periodic writes resume, until it happens again. > > > > > >It seems that both r4 and xfs allow a large number of pages to be dirtied, > > >before queuing them for writeback, and this has a negative effect on > > >throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > > >10 minutes slower than jfs. > > > > > >One thing that surprised me was, once r4 does write out, it is very fast. > > >Fast enough that i wasn't sure it was actually writing whole files! > > >However, > > >i did a umount; mount and ran cksum, and sure enough, the files were good. > > >8) > > > > > >-- > > >Tom Vier <tmv@comcast.net> > > >DSA Key ID 0x15741ECE > > > > > > > > > -- > > Alexey Polyakov > > -- > Tom Vier <tmv@comcast.net> > DSA Key ID 0x15741ECE > -- Alexey Polyakov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier 2006-05-23 19:08 ` Gregory Maxwell 2006-05-23 19:13 ` Alexey Polyakov @ 2006-06-06 13:44 ` Tom Vier 2006-06-06 14:38 ` Vladimir V. Saveliev 2 siblings, 1 reply; 19+ messages in thread From: Tom Vier @ 2006-06-06 13:44 UTC (permalink / raw) To: reiserfs-list On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote: > It seems that both r4 and xfs allow a large number of pages to be dirtied, > before queuing them for writeback, and this has a negative effect on > throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > 10 minutes slower than jfs. Just to follow up on this (i've been too busy lately), that's how delayed allocation works. It waits til the vm forces writeouts. In my case of copying large files from a slower drive, the delayed allocation of r4 and xfs is stalling reads from the source, since neither will write until the vw forces it. Is there a way in r4 to force sync a mount every so often, ala flushd? ext3 has the commit option. Does r4 have a hard coded sync timer already? If not, i think it's an important feature that should be added (and made a mount option). Otherwise, a lot of data can be lost. Does the kernel do a system wide sync every 30sec, like it used to? -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 13:44 ` Tom Vier @ 2006-06-06 14:38 ` Vladimir V. Saveliev 2006-06-06 15:30 ` Clay Barnes 2006-06-07 17:58 ` Tom Vier 0 siblings, 2 replies; 19+ messages in thread From: Vladimir V. Saveliev @ 2006-06-06 14:38 UTC (permalink / raw) To: Tom Vier; +Cc: reiserfs-list Hello On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote: > On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote: > > It seems that both r4 and xfs allow a large number of pages to be dirtied, > > before queuing them for writeback, and this has a negative effect on > > throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > > 10 minutes slower than jfs. > > Just to follow up on this (i've been too busy lately), that's how delayed > allocation works. It waits til the vm forces writeouts. > > In my case of copying large files from a slower drive, the delayed allocation > of r4 and xfs is stalling reads from the source, since neither will write > until the vw forces it. > > Is there a way in r4 to force sync a mount every so often, ala flushd? reiser4 has an option for that. mount -o tmgr.atom_max_age=N N is decimal number of seconds. Changes older than N will be forced to commit. > ext3 > has the commit option. Does r4 have a hard coded sync timer already? If not, > i think it's an important feature that should be added (and made a mount > option). Otherwise, a lot of data can be lost. Does the kernel do a system > wide sync every 30sec, like it used to? > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 14:38 ` Vladimir V. Saveliev @ 2006-06-06 15:30 ` Clay Barnes 2006-06-06 17:47 ` PFC 2006-06-06 19:25 ` Hans Reiser 2006-06-07 17:58 ` Tom Vier 1 sibling, 2 replies; 19+ messages in thread From: Clay Barnes @ 2006-06-06 15:30 UTC (permalink / raw) To: Vladimir V. Saveliev; +Cc: reiserfs-list On 18:38 Tue 06 Jun , Vladimir V. Saveliev wrote: > Hello > > On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote: > > On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote: > > > It seems that both r4 and xfs allow a large number of pages to be dirtied, > > > before queuing them for writeback, and this has a negative effect on > > > throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > > > 10 minutes slower than jfs. > > > > Just to follow up on this (i've been too busy lately), that's how delayed > > allocation works. It waits til the vm forces writeouts. > > > > In my case of copying large files from a slower drive, the delayed allocation > > of r4 and xfs is stalling reads from the source, since neither will write > > until the vw forces it. > > > > Is there a way in r4 to force sync a mount every so often, ala flushd? > > reiser4 has an option for that. > mount -o tmgr.atom_max_age=N > N is decimal number of seconds. Changes older than N will be forced to > commit. This may have been mentioned before, but perhaps there could be a "trickle-out" option along the lines of "if the hard drive is idle (and optionally only if it's spun up), slowly write out the changes to the disk structure." This could also be paired with keeping as much of the data in memory as necessary to mantain the speed boost that r4 gets from temporal locality of reference, possibly just giving it to the system cache. > > > ext3 > > has the commit option. Does r4 have a hard coded sync timer already? If not, > > i think it's an important feature that should be added (and made a mount > > option). Otherwise, a lot of data can be lost. Does the kernel do a system > > wide sync every 30sec, like it used to? > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 15:30 ` Clay Barnes @ 2006-06-06 17:47 ` PFC 2006-06-06 19:26 ` Hans Reiser 2006-06-06 19:25 ` Hans Reiser 1 sibling, 1 reply; 19+ messages in thread From: PFC @ 2006-06-06 17:47 UTC (permalink / raw) To: Clay Barnes, Vladimir V. Saveliev; +Cc: reiserfs-list > This may have been mentioned before, but perhaps there could be a > "trickle-out" option along the lines of "if the hard drive is idle (and > optionally only if it's spun up), slowly write out the changes to the > disk structure." This could also be paired with keeping as much of the > data in memory as necessary to mantain the speed boost that r4 gets from > temporal locality of reference, possibly just giving it to the system > cache. Hm actually, this looks a lot like read-ahead algorithms, but instead it's "write-ahead" : For instance : - Sequential writes on large files should stream through the cache. - Random writes or small file writes should be kept as long as possible in dirty pages so they can be coalesced into larger writes with a better disk layout on flush, or not written at all if it was temp files from a make, for instance. Do the file copying programs open their output files with O_SEQUENTIAL ? If so, there is information to exploit... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 17:47 ` PFC @ 2006-06-06 19:26 ` Hans Reiser 2006-06-07 17:21 ` PFC 0 siblings, 1 reply; 19+ messages in thread From: Hans Reiser @ 2006-06-06 19:26 UTC (permalink / raw) To: PFC; +Cc: Clay Barnes, Vladimir V. Saveliev, reiserfs-list PFC wrote: > >> This may have been mentioned before, but perhaps there could be a >> "trickle-out" option along the lines of "if the hard drive is idle (and >> optionally only if it's spun up), slowly write out the changes to the >> disk structure." This could also be paired with keeping as much of the >> data in memory as necessary to mantain the speed boost that r4 gets from >> temporal locality of reference, possibly just giving it to the system >> cache. > > > Hm actually, this looks a lot like read-ahead algorithms, but > instead it's "write-ahead" : > > For instance : > - Sequential writes on large files should stream through the cache. > - Random writes or small file writes should be kept as long as > possible in dirty pages so they can be coalesced into larger writes > with a better disk layout on flush, or not written at all if it was > temp files from a make, for instance. > > Do the file copying programs open their output files with > O_SEQUENTIAL ? If so, there is information to exploit... > > You can change them to do so.... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 19:26 ` Hans Reiser @ 2006-06-07 17:21 ` PFC 0 siblings, 0 replies; 19+ messages in thread From: PFC @ 2006-06-07 17:21 UTC (permalink / raw) To: reiserfs-list >> Do the file copying programs open their output files with >> O_SEQUENTIAL ? If so, there is information to exploit... >> >> > You can change them to do so.... I rather meant : if a program opens a file for write with O_SEQUENTIAL (which should be done when copying files), will reiser4 exploit the information by flushing sooner, and in a more "streaming" fashion ? Or will it use the default algorithm, flushing under memory pressure ? The current behaviour of reiser4, flushing dirty pages late in case they are modified again before being written, is excellent for many use cases (random writes, small file writes, temporary files, copying files inside a single spindle etc) ; but it isn't optimal for writing large amounts of data in sequential fashion, like copying large files between disks, for instance. Adapting this would be quite tricky I guess... Consider the two following scenarios : - A database is doing an UPDATE query on a table. It will issue a lot of reads and writes, probably in random order. In this case, if the working set fits in RAM, it pays big time to flush as late as possible, ideally when the query is finished, because some pages may be written to multiple times. Also, delayed allocation will reduce fragmentation of the files. It's the same thing for doing a Make, unzipping an archive, copying many small files, etc. - A process acquires audio or video in realtime and streams it to disk, or copies files from one disk to another. In this case it is better to stream the data directly to disk, especially if the files are large. Guessing is a pain in the ass. How can the application inform the filesystem of its intentions ? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 15:30 ` Clay Barnes 2006-06-06 17:47 ` PFC @ 2006-06-06 19:25 ` Hans Reiser 2006-06-07 0:13 ` Clay Barnes 2006-06-08 14:06 ` Tom Vier 1 sibling, 2 replies; 19+ messages in thread From: Hans Reiser @ 2006-06-06 19:25 UTC (permalink / raw) To: Clay Barnes; +Cc: Vladimir V. Saveliev, reiserfs-list Clay Barnes wrote: >On 18:38 Tue 06 Jun , Vladimir V. Saveliev wrote: > > >>Hello >> >>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote: >> >> >>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote: >>> >>> >>>>It seems that both r4 and xfs allow a large number of pages to be dirtied, >>>>before queuing them for writeback, and this has a negative effect on >>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost >>>>10 minutes slower than jfs. >>>> >>>> >>>Just to follow up on this (i've been too busy lately), that's how delayed >>>allocation works. It waits til the vm forces writeouts. >>> >>>In my case of copying large files from a slower drive, the delayed allocation >>>of r4 and xfs is stalling reads from the source, since neither will write >>>until the vw forces it. >>> >>>Is there a way in r4 to force sync a mount every so often, ala flushd? >>> >>> >>reiser4 has an option for that. >>mount -o tmgr.atom_max_age=N >>N is decimal number of seconds. Changes older than N will be forced to >>commit. >> >> >This may have been mentioned before, but perhaps there could be a >"trickle-out" option along the lines of "if the hard drive is idle (and >optionally only if it's spun up), slowly write out the changes to the >disk structure." > Yes, I will take a patch to do the above as it would be good, but I am not convinced it explains the problem described. I don't really understand how writes to a fast drive can slow reads from a slow drive. I am missing something. Maybe I should ask the following: is the slow drive using reiser4? If reiser4, was the slow drive image created by copying from a reiser4 image or an ext3 image? (Standard benchmarking mistake: creating an image for a test from a filesystem not the one that is being tested. readdir order matters.) > This could also be paired with keeping as much of the >data in memory as necessary to mantain the speed boost that r4 gets from >temporal locality of reference, possibly just giving it to the system >cache. > > >>>ext3 >>>has the commit option. Does r4 have a hard coded sync timer already? If not, >>>i think it's an important feature that should be added (and made a mount >>>option). Otherwise, a lot of data can be lost. Does the kernel do a system >>>wide sync every 30sec, like it used to? >>> >>> >>> > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 19:25 ` Hans Reiser @ 2006-06-07 0:13 ` Clay Barnes 2006-06-07 0:42 ` Hans Reiser 2006-06-08 0:55 ` Nate Diller 2006-06-08 14:06 ` Tom Vier 1 sibling, 2 replies; 19+ messages in thread From: Clay Barnes @ 2006-06-07 0:13 UTC (permalink / raw) To: Hans Reiser; +Cc: Vladimir V. Saveliev, reiserfs-list On 12:25 Tue 06 Jun , Hans Reiser wrote: > Clay Barnes wrote: > > >On 18:38 Tue 06 Jun , Vladimir V. Saveliev wrote: > > > > > >>Hello > >> > >>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote: > >> > >> > >>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote: > >>> > >>> > >>>>It seems that both r4 and xfs allow a large number of pages to be dirtied, > >>>>before queuing them for writeback, and this has a negative effect on > >>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > >>>>10 minutes slower than jfs. > >>>> > >>>> > >>>Just to follow up on this (i've been too busy lately), that's how delayed > >>>allocation works. It waits til the vm forces writeouts. > >>> > >>>In my case of copying large files from a slower drive, the delayed allocation > >>>of r4 and xfs is stalling reads from the source, since neither will write > >>>until the vw forces it. > >>> > >>>Is there a way in r4 to force sync a mount every so often, ala flushd? > >>> > >>> > >>reiser4 has an option for that. > >>mount -o tmgr.atom_max_age=N > >>N is decimal number of seconds. Changes older than N will be forced to > >>commit. > >> > >> > >This may have been mentioned before, but perhaps there could be a > >"trickle-out" option along the lines of "if the hard drive is idle (and > >optionally only if it's spun up), slowly write out the changes to the > >disk structure." > > > Yes, I will take a patch to do the above as it would be good, but I am I wish I had the skills to do so. Perhaps after a few more C classes and my next degree. :-/ > not convinced it explains the problem described. I don't really Well, very possibly (likely) not, but it just happened to bring to mind the particular thought I had had several times before. > understand how writes to a fast drive can slow reads from a slow drive. > I am missing something. > > Maybe I should ask the following: is the slow drive using reiser4? If > reiser4, was the slow drive image created by copying from a reiser4 > image or an ext3 image? (Standard benchmarking mistake: creating an > image for a test from a filesystem not the one that is being tested. > readdir order matters.) > > > This could also be paired with keeping as much of the > >data in memory as necessary to mantain the speed boost that r4 gets from > >temporal locality of reference, possibly just giving it to the system > >cache. > > > > > >>>ext3 > >>>has the commit option. Does r4 have a hard coded sync timer already? If not, > >>>i think it's an important feature that should be added (and made a mount > >>>option). Otherwise, a lot of data can be lost. Does the kernel do a system > >>>wide sync every 30sec, like it used to? > >>> > >>> > >>> > > > > > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-07 0:13 ` Clay Barnes @ 2006-06-07 0:42 ` Hans Reiser 2006-06-08 0:55 ` Nate Diller 1 sibling, 0 replies; 19+ messages in thread From: Hans Reiser @ 2006-06-07 0:42 UTC (permalink / raw) To: Clay Barnes; +Cc: Vladimir V. Saveliev, reiserfs-list Clay Barnes wrote: >On 12:25 Tue 06 Jun , Hans Reiser wrote: > > >>Clay Barnes wrote: >> >> >> >>>On 18:38 Tue 06 Jun , Vladimir V. Saveliev wrote: >>> >>> >>> >>> >>>>Hello >>>> >>>>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote: >>>> >>>> >>>> >>>> >>>>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote: >>>>> >>>>> >>>>> >>>>> >>>>>>It seems that both r4 and xfs allow a large number of pages to be dirtied, >>>>>>before queuing them for writeback, and this has a negative effect on >>>>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost >>>>>>10 minutes slower than jfs. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Just to follow up on this (i've been too busy lately), that's how delayed >>>>>allocation works. It waits til the vm forces writeouts. >>>>> >>>>>In my case of copying large files from a slower drive, the delayed allocation >>>>>of r4 and xfs is stalling reads from the source, since neither will write >>>>>until the vw forces it. >>>>> >>>>>Is there a way in r4 to force sync a mount every so often, ala flushd? >>>>> >>>>> >>>>> >>>>> >>>>reiser4 has an option for that. >>>>mount -o tmgr.atom_max_age=N >>>>N is decimal number of seconds. Changes older than N will be forced to >>>>commit. >>>> >>>> >>>> >>>> >>>This may have been mentioned before, but perhaps there could be a >>>"trickle-out" option along the lines of "if the hard drive is idle (and >>>optionally only if it's spun up), slowly write out the changes to the >>>disk structure." >>> >>> >>> >>Yes, I will take a patch to do the above as it would be good, but I am >> >> >I wish I had the skills to do so. Perhaps after a few more C classes >and my next degree. :-/ > > When we get our wiki going, we should create a desired features page, and maybe you can add this to that. Someday we will surely code this, but when I cannot today say. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-07 0:13 ` Clay Barnes 2006-06-07 0:42 ` Hans Reiser @ 2006-06-08 0:55 ` Nate Diller 2006-06-08 14:18 ` Tom Vier 1 sibling, 1 reply; 19+ messages in thread From: Nate Diller @ 2006-06-08 0:55 UTC (permalink / raw) To: Clay Barnes; +Cc: Hans Reiser, Vladimir V. Saveliev, reiserfs-list On 6/6/06, Clay Barnes <clay.barnes@gmail.com> wrote: > On 12:25 Tue 06 Jun , Hans Reiser wrote: > > Clay Barnes wrote: > > > > >On 18:38 Tue 06 Jun , Vladimir V. Saveliev wrote: > > > > > > > > >>Hello > > >> > > >>On Tue, 2006-06-06 at 09:44 -0400, Tom Vier wrote: > > >> > > >> > > >>>On Tue, May 23, 2006 at 11:51:02AM -0400, Tom Vier wrote: > > >>> > > >>> > > >>>>It seems that both r4 and xfs allow a large number of pages to be dirtied, > > >>>>before queuing them for writeback, and this has a negative effect on > > >>>>throughput. In my test (rsync'ing ~50gigs of flacs), r4 and xfs are almost > > >>>>10 minutes slower than jfs. > > >>>> > > >>>> > > >>>Just to follow up on this (i've been too busy lately), that's how delayed > > >>>allocation works. It waits til the vm forces writeouts. > > >>> > > >>>In my case of copying large files from a slower drive, the delayed allocation > > >>>of r4 and xfs is stalling reads from the source, since neither will write > > >>>until the vw forces it. > > >>> > > >>>Is there a way in r4 to force sync a mount every so often, ala flushd? > > >>> > > >>> > > >>reiser4 has an option for that. > > >>mount -o tmgr.atom_max_age=N > > >>N is decimal number of seconds. Changes older than N will be forced to > > >>commit. > > >> > > >> > > >This may have been mentioned before, but perhaps there could be a > > >"trickle-out" option along the lines of "if the hard drive is idle (and > > >optionally only if it's spun up), slowly write out the changes to the > > >disk structure." > > > > > Yes, I will take a patch to do the above as it would be good, but I am > I wish I had the skills to do so. Perhaps after a few more C classes > and my next degree. :-/ > > not convinced it explains the problem described. I don't really > Well, very possibly (likely) not, but it just happened to bring to mind > the particular thought I had had several times before. this is something i've been wanting to address for a while now. at the moment, the VM starts to flush pages when /proc/sys/vm/dirty_background_ratio is exceeded, and it flushes pages without regard to which process dirtied them or which file they are in. it begins to throttle processes (make them wait for some dirty pages to be cleaned before they can dirty more of them) when dirty_ratio is exceeded. PFC <lists@peufeu.com> said (forgive me for cross-quoting): Consider the two following scenarios : - A database is doing an UPDATE query on a table. It will issue a lot of reads and writes, probably in random order. In this case, if the working set fits in RAM, it pays big time to flush as late as possible, ideally when the query is finished, because some pages may be written to multiple times. Also, delayed allocation will reduce fragmentation of the files. It's the same thing for doing a Make, unzipping an archive, copying many small files, etc. - A process acquires audio or video in realtime and streams it to disk, or copies files from one disk to another. In this case it is better to stream the data directly to disk, especially if the files are large. Guessing is a pain in the ass. How can the application inform the filesystem of its intentions ? --- the real problem is, what if both of those processes are running at once? the correct behaviour would be to leave the database pages dirty in memory, and sequentially clean the streaming ones. furthermore, the database shouldn't be throttled unless its working set it bigger than available memory, but the stream should always be. i've been pondering this problem for the past year, and haven't found the silver bullet yet. i am still resolved to re-write the throttling and congestion code at some point, because it's fundamentally broken but i bet hans is right, these pauses you're seeing could probably be alleviated by fixing something simple in reiser4. NATE ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-08 0:55 ` Nate Diller @ 2006-06-08 14:18 ` Tom Vier 0 siblings, 0 replies; 19+ messages in thread From: Tom Vier @ 2006-06-08 14:18 UTC (permalink / raw) To: Nate Diller; +Cc: reiserfs-list On Wed, Jun 07, 2006 at 05:55:25PM -0700, Nate Diller wrote: > this is something i've been wanting to address for a while now. at > the moment, the VM starts to flush pages when > /proc/sys/vm/dirty_background_ratio is exceeded, and it flushes pages > without regard to which process dirtied them or which file they are > in. it begins to throttle processes (make them wait for some dirty > pages to be cleaned before they can dirty more of them) when > dirty_ratio is exceeded. That ratio is something i'll have to play with. /proc/sys/vm/dirty_writeback_centisecs looks like it could help me as well. With two (and eventually ext3, as well) fs'es using delayed allocation, it may be best to change those values (especially when you have several gigs of ram). I don't suspose there's a way to make it per fs, tho. fs'es don't walk dirty pages, do they? Only the vm decides when to write out? ext3 and r4 do have write timeouts, tho. -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 19:25 ` Hans Reiser 2006-06-07 0:13 ` Clay Barnes @ 2006-06-08 14:06 ` Tom Vier 2006-06-09 8:05 ` Hans Reiser 1 sibling, 1 reply; 19+ messages in thread From: Tom Vier @ 2006-06-08 14:06 UTC (permalink / raw) To: Hans Reiser; +Cc: reiserfs-list On Tue, Jun 06, 2006 at 12:25:15PM -0700, Hans Reiser wrote: > Maybe I should ask the following: is the slow drive using reiser4? If No, it was ext2. > reiser4, was the slow drive image created by copying from a reiser4 > image or an ext3 image? (Standard benchmarking mistake: creating an > image for a test from a filesystem not the one that is being tested. > readdir order matters.) Would that really make much difference? I think the problem here is a general problem with delayed allocation, regardless of which fs impliments it. The fs'es need to stream out writes. If it's possible (i don't know if fs'es are allowed this info from the vfs), i think after a short timeout of a file no longer being open for writing, it should be written. Maybe have a longer delay for smaller files, so they pack better. Past a certain size threshold, once a file is closed (or only opened read-only) i think it should be flushes without much delay. Especially if the blk dev is idle (but knowing that at the fs level may well be impossible w/o modding the vfs api). I think linux (and other os'es) are in need of more intelligent io scheduling (higher level than just sector elevators). One problem with my suggestion is that apps don't always close or reopen read-only after they write a file. -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-08 14:06 ` Tom Vier @ 2006-06-09 8:05 ` Hans Reiser 0 siblings, 0 replies; 19+ messages in thread From: Hans Reiser @ 2006-06-09 8:05 UTC (permalink / raw) To: Tom Vier; +Cc: reiserfs-list Make both XFS or both reiser4, and speed will likely increase. Please try and report the result if you could. Tom Vier wrote: >On Tue, Jun 06, 2006 at 12:25:15PM -0700, Hans Reiser wrote: > > >>Maybe I should ask the following: is the slow drive using reiser4? If >> >> > >No, it was ext2. > > > >>reiser4, was the slow drive image created by copying from a reiser4 >>image or an ext3 image? (Standard benchmarking mistake: creating an >>image for a test from a filesystem not the one that is being tested. >>readdir order matters.) >> >> > >Would that really make much difference? > >I think the problem here is a general problem with delayed allocation, >regardless of which fs impliments it. The fs'es need to stream out writes. >If it's possible (i don't know if fs'es are allowed this info from the vfs), >i think after a short timeout of a file no longer being open for writing, it >should be written. Maybe have a longer delay for smaller files, so they pack >better. Past a certain size threshold, once a file is closed (or only opened >read-only) i think it should be flushes without much delay. Especially if >the blk dev is idle (but knowing that at the fs level may well be impossible >w/o modding the vfs api). I think linux (and other os'es) are in need of >more intelligent io scheduling (higher level than just sector elevators). > >One problem with my suggestion is that apps don't always close or reopen >read-only after they write a file. > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-06 14:38 ` Vladimir V. Saveliev 2006-06-06 15:30 ` Clay Barnes @ 2006-06-07 17:58 ` Tom Vier 2006-06-08 0:41 ` Nate Diller 1 sibling, 1 reply; 19+ messages in thread From: Tom Vier @ 2006-06-07 17:58 UTC (permalink / raw) To: reiserfs-list On Tue, Jun 06, 2006 at 06:38:26PM +0400, Vladimir V. Saveliev wrote: > reiser4 has an option for that. > mount -o tmgr.atom_max_age=N > N is decimal number of seconds. Changes older than N will be forced to > commit. Unfortunetly, this causes even more read pauses from the source, when running rsync. I also tried cpio (-p mode) and cp -a, same pauses. When syncing, r4 seems to have about a 5 second pause, then a burst of seeks and writes. I also tried disabling clock throttling, no difference. I have a fast system. Two single core 2.6ghz opterons. CPU time during the pause after sync and before the writes doesn't seem to very high. I thought r4 might be cpu bound, but it doesn't seem to be. I'm not sure what's causing this pause. If i had more free time, i'd setup kernel profiling. -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: reiser4: first impression (vs xfs and jfs) 2006-06-07 17:58 ` Tom Vier @ 2006-06-08 0:41 ` Nate Diller 0 siblings, 0 replies; 19+ messages in thread From: Nate Diller @ 2006-06-08 0:41 UTC (permalink / raw) To: Tom Vier; +Cc: reiserfs-list On 6/7/06, Tom Vier <tmv@comcast.net> wrote: > On Tue, Jun 06, 2006 at 06:38:26PM +0400, Vladimir V. Saveliev wrote: > > reiser4 has an option for that. > > mount -o tmgr.atom_max_age=N > > N is decimal number of seconds. Changes older than N will be forced to > > commit. > > Unfortunetly, this causes even more read pauses from the source, when > running rsync. I also tried cpio (-p mode) and cp -a, same pauses. When > syncing, r4 seems to have about a 5 second pause, then a burst of seeks and > writes. I also tried disabling clock throttling, no difference. I have a > fast system. Two single core 2.6ghz opterons. CPU time during the pause > after sync and before the writes doesn't seem to very high. I thought r4 > might be cpu bound, but it doesn't seem to be. I'm not sure what's causing > this pause. If i had more free time, i'd setup kernel profiling. this could easily be a disk scheduler issue. in particular, reiser4 may be interacting with the block device congestion code, leading it to fail to submit all the writes at once. this may be helped by the upconing patch for creating bio structures more than 4k at a time, or by using a different scheduler. try 'echo 512 > /sys/block/[dev]/queue/nr_requests' and see if the behaviour changes NATE ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-06-09 8:05 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-23 15:51 reiser4: first impression (vs xfs and jfs) Tom Vier
2006-05-23 19:08 ` Gregory Maxwell
2006-05-23 19:13 ` Alexey Polyakov
[not found] ` <20060523201712.GD25889@zero>
2006-05-23 21:00 ` Alexey Polyakov
2006-06-06 13:44 ` Tom Vier
2006-06-06 14:38 ` Vladimir V. Saveliev
2006-06-06 15:30 ` Clay Barnes
2006-06-06 17:47 ` PFC
2006-06-06 19:26 ` Hans Reiser
2006-06-07 17:21 ` PFC
2006-06-06 19:25 ` Hans Reiser
2006-06-07 0:13 ` Clay Barnes
2006-06-07 0:42 ` Hans Reiser
2006-06-08 0:55 ` Nate Diller
2006-06-08 14:18 ` Tom Vier
2006-06-08 14:06 ` Tom Vier
2006-06-09 8:05 ` Hans Reiser
2006-06-07 17:58 ` Tom Vier
2006-06-08 0:41 ` Nate Diller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.