* some hard numbers on ext3 & batching performance issue [not found] ` <20080228175422.GU155259@sgi.com> @ 2008-03-05 19:19 ` Ric Wheeler 2008-03-05 20:20 ` Josef Bacik 2008-03-06 0:28 ` David Chinner 0 siblings, 2 replies; 8+ messages in thread From: Ric Wheeler @ 2008-03-05 19:19 UTC (permalink / raw) To: David Chinner Cc: Josef Bacik, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi After the IO/FS workshop last week, I posted some details on the slow down we see with ext3 when we have a low latency back end instead of a normal local disk (SCSI/S-ATA/etc). As a follow up to that thread, I wanted to post some real numbers that Andy from our performance team pulled together. Andy tested various patches using three classes of storage (S-ATA, RAM disk and Clariion array). Note that this testing was done on a SLES10/SP1 kernel, but the code in question has not changed in mainline but we should probably retest on something newer just to clear up any doubts. The work load is generated using fs_mark (http://sourceforge.net/projects/fsmark/) which is basically a write workload with small files, each file gets fsync'ed before close. The metric is "files/sec". The clearest result used a ramdisk to store 4k files. We modified ext3 and jbd to accept a new mount option: bdelay Use it like: mount -o bdelay=n dev mountpoint n is passed to schedule_timeout_interruptible() in the jbd code. if n == 0, it skips the whole loop. if n is "yield", then substitute the schedule...(n) with yield(). Note that the first row is the value of the delay with a 250HZ build followed by the number of concurrent threads writing 4KB files. Ramdisk test: bdelay 1 2 4 8 10 20 0 4640 4498 3226 1721 1436 664 yield 4640 4078 2977 1611 1136 551 1 4647 250 482 588 629 483 2 4522 149 233 422 450 389 3 4504 86 165 271 308 334 4 4425 84 128 222 253 293 Midrange clariion: bdelay 1 2 4 8 10 20 0 778 923 1567 1424 1276 785 yield 791 931 1551 1473 1328 806 1 793 304 499 714 751 760 2 789 132 201 382 441 589 3 792 124 168 298 342 471 4 786 71 116 237 277 393 Local disk: bdelay 1 2 4 8 10 20 0 47 51 81 135 160 234 yield 36 45 74 117 138 214 1 44 52 86 148 183 258 2 40 60 109 163 184 265 3 40 52 97 148 171 264 4 35 42 83 149 169 246 Apologies for mangling the nicely formatted tables. Note that the justification for the batching as we have it today is basically this last local drive test case. It would be really interesting to rerun some of these tests on xfs which Dave explained in the thread last week has a more self tuning way to batch up transactions.... Note that all of those poor users who have a synchronous write workload today are in the "1" row for each of the above tables. ric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some hard numbers on ext3 & batching performance issue 2008-03-05 19:19 ` some hard numbers on ext3 & batching performance issue Ric Wheeler @ 2008-03-05 20:20 ` Josef Bacik 2008-03-07 20:08 ` Ric Wheeler 2008-03-06 0:28 ` David Chinner 1 sibling, 1 reply; 8+ messages in thread From: Josef Bacik @ 2008-03-05 20:20 UTC (permalink / raw) To: ric Cc: David Chinner, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote: > After the IO/FS workshop last week, I posted some details on the slow > down we see with ext3 when we have a low latency back end instead of a > normal local disk (SCSI/S-ATA/etc). > > As a follow up to that thread, I wanted to post some real numbers that > Andy from our performance team pulled together. Andy tested various > patches using three classes of storage (S-ATA, RAM disk and Clariion > array). > > Note that this testing was done on a SLES10/SP1 kernel, but the code in > question has not changed in mainline but we should probably retest on > something newer just to clear up any doubts. > > The work load is generated using fs_mark > (http://sourceforge.net/projects/fsmark/) which is basically a write > workload with small files, each file gets fsync'ed before close. The > metric is "files/sec". > > The clearest result used a ramdisk to store 4k files. > > We modified ext3 and jbd to accept a new mount option: bdelay Use it like: > > mount -o bdelay=n dev mountpoint > > n is passed to schedule_timeout_interruptible() in the jbd code. if n == > 0, it skips the whole loop. if n is "yield", then substitute the > schedule...(n) with yield(). > > Note that the first row is the value of the delay with a 250HZ build > followed by the number of concurrent threads writing 4KB files. > > Ramdisk test: > > bdelay 1 2 4 8 10 20 > 0 4640 4498 3226 1721 1436 664 > yield 4640 4078 2977 1611 1136 551 > 1 4647 250 482 588 629 483 > 2 4522 149 233 422 450 389 > 3 4504 86 165 271 308 334 > 4 4425 84 128 222 253 293 > > Midrange clariion: > > bdelay 1 2 4 8 10 20 > 0 778 923 1567 1424 1276 785 > yield 791 931 1551 1473 1328 806 > 1 793 304 499 714 751 760 > 2 789 132 201 382 441 589 > 3 792 124 168 298 342 471 > 4 786 71 116 237 277 393 > > Local disk: > > bdelay 1 2 4 8 10 20 > 0 47 51 81 135 160 234 > yield 36 45 74 117 138 214 > 1 44 52 86 148 183 258 > 2 40 60 109 163 184 265 > 3 40 52 97 148 171 264 > 4 35 42 83 149 169 246 > > Apologies for mangling the nicely formatted tables. > > Note that the justification for the batching as we have it today is > basically this last local drive test case. > > It would be really interesting to rerun some of these tests on xfs which > Dave explained in the thread last week has a more self tuning way to > batch up transactions.... > > Note that all of those poor users who have a synchronous write workload > today are in the "1" row for each of the above tables. Mind giving this a whirl? The fastest thing I've got here is an Apple X RAID and its being used for something else atm, so I've only tested this on local disk to make sure it didn't make local performance suck (which it doesn't btw). This should be equivalent with what David says XFS does. Thanks much, Josef diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c index c6cbb6c..4596e1c 100644 --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -1333,8 +1333,7 @@ int journal_stop(handle_t *handle) { transaction_t *transaction = handle->h_transaction; journal_t *journal = transaction->t_journal; - int old_handle_count, err; - pid_t pid; + int err; J_ASSERT(journal_current_handle() == handle); @@ -1353,32 +1352,22 @@ int journal_stop(handle_t *handle) jbd_debug(4, "Handle %p going down\n", handle); - /* - * Implement synchronous transaction batching. If the handle - * was synchronous, don't force a commit immediately. Let's - * yield and let another thread piggyback onto this transaction. - * Keep doing that while new threads continue to arrive. - * It doesn't cost much - we're about to run a commit and sleep - * on IO anyway. Speeds up many-threaded, many-dir operations - * by 30x or more... - * - * But don't do this if this process was the most recent one to - * perform a synchronous write. We do this to detect the case where a - * single process is doing a stream of sync writes. No point in waiting - * for joiners in that case. - */ - pid = current->pid; - if (handle->h_sync && journal->j_last_sync_writer != pid) { - journal->j_last_sync_writer = pid; - do { - old_handle_count = transaction->t_handle_count; - schedule_timeout_uninterruptible(1); - } while (old_handle_count != transaction->t_handle_count); - } - current->journal_info = NULL; spin_lock(&journal->j_state_lock); spin_lock(&transaction->t_handle_lock); + + if (journal->j_committing_transaction && handle->h_sync) { + tid_t tid = journal->j_committing_transaction->t_tid; + + spin_unlock(&transaction->t_handle_lock); + spin_unlock(&journal->j_state_lock); + + err = log_wait_commit(journal, tid); + + spin_lock(&journal->j_state_lock); + spin_lock(&transaction->t_handle_lock); + } + transaction->t_outstanding_credits -= handle->h_buffer_credits; transaction->t_updates--; if (!transaction->t_updates) { ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: some hard numbers on ext3 & batching performance issue 2008-03-05 20:20 ` Josef Bacik @ 2008-03-07 20:08 ` Ric Wheeler 2008-03-07 20:40 ` Josef Bacik 0 siblings, 1 reply; 8+ messages in thread From: Ric Wheeler @ 2008-03-07 20:08 UTC (permalink / raw) To: Josef Bacik Cc: David Chinner, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi Josef Bacik wrote: > On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote: >> After the IO/FS workshop last week, I posted some details on the slow >> down we see with ext3 when we have a low latency back end instead of a >> normal local disk (SCSI/S-ATA/etc). ... ... ... >> It would be really interesting to rerun some of these tests on xfs which >> Dave explained in the thread last week has a more self tuning way to >> batch up transactions.... >> >> Note that all of those poor users who have a synchronous write workload >> today are in the "1" row for each of the above tables. > > Mind giving this a whirl? The fastest thing I've got here is an Apple X RAID > and its being used for something else atm, so I've only tested this on local > disk to make sure it didn't make local performance suck (which it doesn't btw). > This should be equivalent with what David says XFS does. Thanks much, > > Josef > > diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c > index c6cbb6c..4596e1c 100644 > --- a/fs/jbd/transaction.c > +++ b/fs/jbd/transaction.c > @@ -1333,8 +1333,7 @@ int journal_stop(handle_t *handle) > { > transaction_t *transaction = handle->h_transaction; > journal_t *journal = transaction->t_journal; > - int old_handle_count, err; > - pid_t pid; > + int err; > > J_ASSERT(journal_current_handle() == handle); > > @@ -1353,32 +1352,22 @@ int journal_stop(handle_t *handle) > > jbd_debug(4, "Handle %p going down\n", handle); > > - /* > - * Implement synchronous transaction batching. If the handle > - * was synchronous, don't force a commit immediately. Let's > - * yield and let another thread piggyback onto this transaction. > - * Keep doing that while new threads continue to arrive. > - * It doesn't cost much - we're about to run a commit and sleep > - * on IO anyway. Speeds up many-threaded, many-dir operations > - * by 30x or more... > - * > - * But don't do this if this process was the most recent one to > - * perform a synchronous write. We do this to detect the case where a > - * single process is doing a stream of sync writes. No point in waiting > - * for joiners in that case. > - */ > - pid = current->pid; > - if (handle->h_sync && journal->j_last_sync_writer != pid) { > - journal->j_last_sync_writer = pid; > - do { > - old_handle_count = transaction->t_handle_count; > - schedule_timeout_uninterruptible(1); > - } while (old_handle_count != transaction->t_handle_count); > - } > - > current->journal_info = NULL; > spin_lock(&journal->j_state_lock); > spin_lock(&transaction->t_handle_lock); > + > + if (journal->j_committing_transaction && handle->h_sync) { > + tid_t tid = journal->j_committing_transaction->t_tid; > + > + spin_unlock(&transaction->t_handle_lock); > + spin_unlock(&journal->j_state_lock); > + > + err = log_wait_commit(journal, tid); > + > + spin_lock(&journal->j_state_lock); > + spin_lock(&transaction->t_handle_lock); > + } > + > transaction->t_outstanding_credits -= handle->h_buffer_credits; > transaction->t_updates--; > if (!transaction->t_updates) { > > > Running with Josef's patch, I was able to see a clear improvement for batching these synchronous operations on ext3 with the RAM disk and array. It is not too often that you get to do a simple change and see a 27 times improvement ;-) On the bad side, the local disk case took as much as a 30% drop in performance. The specific disk is not one that I have a lot of experience with, I would like to retry on a disk that has been qualified by our group (i.e., we have reasonable confidence that there are no firmware issues, etc). Now for the actual results. The results are the average value of 5 runs for each number of threads. Type Threads Baseline Josef Speedup (Josef/Baseline) array 1 320.5 325.4 1.01 array 2 174.9 351.9 2.01 array 4 382.7 593.5 1.55 array 8 644.1 963.0 1.49 array 10 842.9 1038.7 1.23 array 20 1319.6 1432.3 1.08 RAM disk 1 5621.4 5595.1 0.99 RAM disk 2 281.5 7613.3 27.04 RAM disk 4 579.9 9111.5 15.71 RAM disk 8 891.1 9357.3 10.50 RAM disk 10 1116.3 9873.6 8.84 RAM disk 20 1952.0 10703.6 5.48 S-ATA disk 1 19.0 15.1 0.79 S-ATA disk 2 19.9 14.4 0.72 S-ATA disk 4 41.0 27.9 0.68 S-ATA disk 8 60.4 43.2 0.71 S-ATA disk 10 67.1 48.7 0.72 S-ATA disk 20 102.7 74.0 0.72 Background on the tests: All of this is measured on three devices - a relatively old & slow array, the local (slow!) 2.5" S-ATA disk in the box and a RAM disk. These numbers are used fs_mark to write 4096 byte files with the following commands: fs_mark -d /home/test/t -s 4096 -n 40000 -N 50 -D 64 -t 1 ... fs_mark -d /home/test/t -s 4096 -n 20000 -N 50 -D 64 -t 2 ... fs_mark -d /home/test/t -s 4096 -n 10000 -N 50 -D 64 -t 4 ... fs_mark -d /home/test/t -s 4096 -n 5000 -N 50 -D 64 -t 8 ... fs_mark -d /home/test/t -s 4096 -n 4000 -N 50 -D 64 -t 10 ... fs_mark -d /home/test/t -s 4096 -n 2000 -N 50 -D 64 -t 20 ... Note that this spreads the files across 64 subdirectories, each thread writes 50 files and then moves on to the next in a round robin. ric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some hard numbers on ext3 & batching performance issue 2008-03-07 20:08 ` Ric Wheeler @ 2008-03-07 20:40 ` Josef Bacik 2008-03-07 20:45 ` Ric Wheeler 0 siblings, 1 reply; 8+ messages in thread From: Josef Bacik @ 2008-03-07 20:40 UTC (permalink / raw) To: ric Cc: David Chinner, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi On Friday 07 March 2008 3:08:32 pm Ric Wheeler wrote: > Josef Bacik wrote: > > On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote: > >> After the IO/FS workshop last week, I posted some details on the slow > >> down we see with ext3 when we have a low latency back end instead of a > >> normal local disk (SCSI/S-ATA/etc). > > ... > ... > ... > > >> It would be really interesting to rerun some of these tests on xfs which > >> Dave explained in the thread last week has a more self tuning way to > >> batch up transactions.... > >> > >> Note that all of those poor users who have a synchronous write workload > >> today are in the "1" row for each of the above tables. > > > > Mind giving this a whirl? The fastest thing I've got here is an Apple X > > RAID and its being used for something else atm, so I've only tested this > > on local disk to make sure it didn't make local performance suck (which > > it doesn't btw). This should be equivalent with what David says XFS does. > > Thanks much, > > > > Josef > > > > diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c > > index c6cbb6c..4596e1c 100644 > > --- a/fs/jbd/transaction.c > > +++ b/fs/jbd/transaction.c > > @@ -1333,8 +1333,7 @@ int journal_stop(handle_t *handle) > > { > > transaction_t *transaction = handle->h_transaction; > > journal_t *journal = transaction->t_journal; > > - int old_handle_count, err; > > - pid_t pid; > > + int err; > > > > J_ASSERT(journal_current_handle() == handle); > > > > @@ -1353,32 +1352,22 @@ int journal_stop(handle_t *handle) > > > > jbd_debug(4, "Handle %p going down\n", handle); > > > > - /* > > - * Implement synchronous transaction batching. If the handle > > - * was synchronous, don't force a commit immediately. Let's > > - * yield and let another thread piggyback onto this transaction. > > - * Keep doing that while new threads continue to arrive. > > - * It doesn't cost much - we're about to run a commit and sleep > > - * on IO anyway. Speeds up many-threaded, many-dir operations > > - * by 30x or more... > > - * > > - * But don't do this if this process was the most recent one to > > - * perform a synchronous write. We do this to detect the case where a > > - * single process is doing a stream of sync writes. No point in > > waiting - * for joiners in that case. > > - */ > > - pid = current->pid; > > - if (handle->h_sync && journal->j_last_sync_writer != pid) { > > - journal->j_last_sync_writer = pid; > > - do { > > - old_handle_count = transaction->t_handle_count; > > - schedule_timeout_uninterruptible(1); > > - } while (old_handle_count != transaction->t_handle_count); > > - } > > - > > current->journal_info = NULL; > > spin_lock(&journal->j_state_lock); > > spin_lock(&transaction->t_handle_lock); > > + > > + if (journal->j_committing_transaction && handle->h_sync) { > > + tid_t tid = journal->j_committing_transaction->t_tid; > > + > > + spin_unlock(&transaction->t_handle_lock); > > + spin_unlock(&journal->j_state_lock); > > + > > + err = log_wait_commit(journal, tid); > > + > > + spin_lock(&journal->j_state_lock); > > + spin_lock(&transaction->t_handle_lock); > > + } > > + > > transaction->t_outstanding_credits -= handle->h_buffer_credits; > > transaction->t_updates--; > > if (!transaction->t_updates) { > > Running with Josef's patch, I was able to see a clear improvement for > batching these synchronous operations on ext3 with the RAM disk and > array. It is not too often that you get to do a simple change and see a > 27 times improvement ;-) > > On the bad side, the local disk case took as much as a 30% drop in > performance. The specific disk is not one that I have a lot of > experience with, I would like to retry on a disk that has been qualified > by our group (i.e., we have reasonable confidence that there are no > firmware issues, etc). > > Now for the actual results. > > The results are the average value of 5 runs for each number of threads. > > Type Threads Baseline Josef Speedup (Josef/Baseline) > array 1 320.5 325.4 1.01 > array 2 174.9 351.9 2.01 > array 4 382.7 593.5 1.55 > array 8 644.1 963.0 1.49 > array 10 842.9 1038.7 1.23 > array 20 1319.6 1432.3 1.08 > > RAM disk 1 5621.4 5595.1 0.99 > RAM disk 2 281.5 7613.3 27.04 > RAM disk 4 579.9 9111.5 15.71 > RAM disk 8 891.1 9357.3 10.50 > RAM disk 10 1116.3 9873.6 8.84 > RAM disk 20 1952.0 10703.6 5.48 > > S-ATA disk 1 19.0 15.1 0.79 > S-ATA disk 2 19.9 14.4 0.72 > S-ATA disk 4 41.0 27.9 0.68 > S-ATA disk 8 60.4 43.2 0.71 > S-ATA disk 10 67.1 48.7 0.72 > S-ATA disk 20 102.7 74.0 0.72 > > Background on the tests: > > All of this is measured on three devices - a relatively old & slow > array, the local (slow!) 2.5" S-ATA disk in the box and a RAM disk. > > These numbers are used fs_mark to write 4096 byte files with the > following commands: > > fs_mark -d /home/test/t -s 4096 -n 40000 -N 50 -D 64 -t 1 > ... > fs_mark -d /home/test/t -s 4096 -n 20000 -N 50 -D 64 -t 2 > ... > fs_mark -d /home/test/t -s 4096 -n 10000 -N 50 -D 64 -t 4 > ... > fs_mark -d /home/test/t -s 4096 -n 5000 -N 50 -D 64 -t 8 > ... > fs_mark -d /home/test/t -s 4096 -n 4000 -N 50 -D 64 -t 10 > ... > fs_mark -d /home/test/t -s 4096 -n 2000 -N 50 -D 64 -t 20 > ... > > Note that this spreads the files across 64 subdirectories, each thread > writes 50 files and then moves on to the next in a round robin. > I'm starting to wonder about the disks I have, because my files/second is spanking yours, and its just a local samsung 3gb/s sata drive. With those commands I'm consistently getting over 700 files/sec. I'm seeing about a 1-5% increase in speed locally with my patch. I guess I'll start looking around for some other hardware and check on there in case this box is more badass than I think it is. Thanks much, Josef ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some hard numbers on ext3 & batching performance issue 2008-03-07 20:40 ` Josef Bacik @ 2008-03-07 20:45 ` Ric Wheeler 2008-03-12 18:37 ` Josef Bacik 0 siblings, 1 reply; 8+ messages in thread From: Ric Wheeler @ 2008-03-07 20:45 UTC (permalink / raw) To: Josef Bacik Cc: David Chinner, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi Josef Bacik wrote: > On Friday 07 March 2008 3:08:32 pm Ric Wheeler wrote: >> Josef Bacik wrote: >>> On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote: >>>> After the IO/FS workshop last week, I posted some details on the slow >>>> down we see with ext3 when we have a low latency back end instead of a >>>> normal local disk (SCSI/S-ATA/etc). >> ... >> ... >> ... >> >>>> It would be really interesting to rerun some of these tests on xfs which >>>> Dave explained in the thread last week has a more self tuning way to >>>> batch up transactions.... >>>> >>>> Note that all of those poor users who have a synchronous write workload >>>> today are in the "1" row for each of the above tables. >>> Mind giving this a whirl? The fastest thing I've got here is an Apple X >>> RAID and its being used for something else atm, so I've only tested this >>> on local disk to make sure it didn't make local performance suck (which >>> it doesn't btw). This should be equivalent with what David says XFS does. >>> Thanks much, >>> >>> Josef >>> >>> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c >>> index c6cbb6c..4596e1c 100644 >>> --- a/fs/jbd/transaction.c >>> +++ b/fs/jbd/transaction.c >>> @@ -1333,8 +1333,7 @@ int journal_stop(handle_t *handle) >>> { >>> transaction_t *transaction = handle->h_transaction; >>> journal_t *journal = transaction->t_journal; >>> - int old_handle_count, err; >>> - pid_t pid; >>> + int err; >>> >>> J_ASSERT(journal_current_handle() == handle); >>> >>> @@ -1353,32 +1352,22 @@ int journal_stop(handle_t *handle) >>> >>> jbd_debug(4, "Handle %p going down\n", handle); >>> >>> - /* >>> - * Implement synchronous transaction batching. If the handle >>> - * was synchronous, don't force a commit immediately. Let's >>> - * yield and let another thread piggyback onto this transaction. >>> - * Keep doing that while new threads continue to arrive. >>> - * It doesn't cost much - we're about to run a commit and sleep >>> - * on IO anyway. Speeds up many-threaded, many-dir operations >>> - * by 30x or more... >>> - * >>> - * But don't do this if this process was the most recent one to >>> - * perform a synchronous write. We do this to detect the case where a >>> - * single process is doing a stream of sync writes. No point in >>> waiting - * for joiners in that case. >>> - */ >>> - pid = current->pid; >>> - if (handle->h_sync && journal->j_last_sync_writer != pid) { >>> - journal->j_last_sync_writer = pid; >>> - do { >>> - old_handle_count = transaction->t_handle_count; >>> - schedule_timeout_uninterruptible(1); >>> - } while (old_handle_count != transaction->t_handle_count); >>> - } >>> - >>> current->journal_info = NULL; >>> spin_lock(&journal->j_state_lock); >>> spin_lock(&transaction->t_handle_lock); >>> + >>> + if (journal->j_committing_transaction && handle->h_sync) { >>> + tid_t tid = journal->j_committing_transaction->t_tid; >>> + >>> + spin_unlock(&transaction->t_handle_lock); >>> + spin_unlock(&journal->j_state_lock); >>> + >>> + err = log_wait_commit(journal, tid); >>> + >>> + spin_lock(&journal->j_state_lock); >>> + spin_lock(&transaction->t_handle_lock); >>> + } >>> + >>> transaction->t_outstanding_credits -= handle->h_buffer_credits; >>> transaction->t_updates--; >>> if (!transaction->t_updates) { >> Running with Josef's patch, I was able to see a clear improvement for >> batching these synchronous operations on ext3 with the RAM disk and >> array. It is not too often that you get to do a simple change and see a >> 27 times improvement ;-) >> >> On the bad side, the local disk case took as much as a 30% drop in >> performance. The specific disk is not one that I have a lot of >> experience with, I would like to retry on a disk that has been qualified >> by our group (i.e., we have reasonable confidence that there are no >> firmware issues, etc). >> >> Now for the actual results. >> >> The results are the average value of 5 runs for each number of threads. >> >> Type Threads Baseline Josef Speedup (Josef/Baseline) >> array 1 320.5 325.4 1.01 >> array 2 174.9 351.9 2.01 >> array 4 382.7 593.5 1.55 >> array 8 644.1 963.0 1.49 >> array 10 842.9 1038.7 1.23 >> array 20 1319.6 1432.3 1.08 >> >> RAM disk 1 5621.4 5595.1 0.99 >> RAM disk 2 281.5 7613.3 27.04 >> RAM disk 4 579.9 9111.5 15.71 >> RAM disk 8 891.1 9357.3 10.50 >> RAM disk 10 1116.3 9873.6 8.84 >> RAM disk 20 1952.0 10703.6 5.48 >> >> S-ATA disk 1 19.0 15.1 0.79 >> S-ATA disk 2 19.9 14.4 0.72 >> S-ATA disk 4 41.0 27.9 0.68 >> S-ATA disk 8 60.4 43.2 0.71 >> S-ATA disk 10 67.1 48.7 0.72 >> S-ATA disk 20 102.7 74.0 0.72 >> >> Background on the tests: >> >> All of this is measured on three devices - a relatively old & slow >> array, the local (slow!) 2.5" S-ATA disk in the box and a RAM disk. >> >> These numbers are used fs_mark to write 4096 byte files with the >> following commands: >> >> fs_mark -d /home/test/t -s 4096 -n 40000 -N 50 -D 64 -t 1 >> ... >> fs_mark -d /home/test/t -s 4096 -n 20000 -N 50 -D 64 -t 2 >> ... >> fs_mark -d /home/test/t -s 4096 -n 10000 -N 50 -D 64 -t 4 >> ... >> fs_mark -d /home/test/t -s 4096 -n 5000 -N 50 -D 64 -t 8 >> ... >> fs_mark -d /home/test/t -s 4096 -n 4000 -N 50 -D 64 -t 10 >> ... >> fs_mark -d /home/test/t -s 4096 -n 2000 -N 50 -D 64 -t 20 >> ... >> >> Note that this spreads the files across 64 subdirectories, each thread >> writes 50 files and then moves on to the next in a round robin. >> > > I'm starting to wonder about the disks I have, because my files/second is > spanking yours, and its just a local samsung 3gb/s sata drive. With those > commands I'm consistently getting over 700 files/sec. I'm seeing about a 1-5% > increase in speed locally with my patch. I guess I'll start looking around for > some other hardware and check on there in case this box is more badass than I > think it is. Thanks much, > > Josef > Sounds like you might be running with write cache on & barriers off ;-) Make sure you have write cache & barriers enabled on the drive. With a good S-ATA drive, you should be seeing about 35-50 files/sec with a single threaded writer. The local disk that I tested on is a relatively slow s-ata disk that is more laptop quality/performance than server. One thought I had about the results is that we might be flipping the IO sequence with the local disk case. It is the only device of the three that I tested which is seek/head movement sensitive for small files. ric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some hard numbers on ext3 & batching performance issue 2008-03-07 20:45 ` Ric Wheeler @ 2008-03-12 18:37 ` Josef Bacik 2008-03-13 11:26 ` Ric Wheeler 0 siblings, 1 reply; 8+ messages in thread From: Josef Bacik @ 2008-03-12 18:37 UTC (permalink / raw) To: Ric Wheeler Cc: Josef Bacik, David Chinner, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi On Fri, Mar 07, 2008 at 03:45:58PM -0500, Ric Wheeler wrote: > Josef Bacik wrote: >> On Friday 07 March 2008 3:08:32 pm Ric Wheeler wrote: >>> Josef Bacik wrote: >>>> On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote: >>>>> After the IO/FS workshop last week, I posted some details on the slow >>>>> down we see with ext3 when we have a low latency back end instead of a >>>>> normal local disk (SCSI/S-ATA/etc). >>> ... >>> ... >>> >>> Note that this spreads the files across 64 subdirectories, each thread >>> writes 50 files and then moves on to the next in a round robin. >>> >> >> I'm starting to wonder about the disks I have, because my files/second is >> spanking yours, and its just a local samsung 3gb/s sata drive. With those >> commands I'm consistently getting over 700 files/sec. I'm seeing about a >> 1-5% increase in speed locally with my patch. I guess I'll start looking >> around for some other hardware and check on there in case this box is more >> badass than I think it is. Thanks much, >> >> Josef >> > > Sounds like you might be running with write cache on & barriers off ;-) > > Make sure you have write cache & barriers enabled on the drive. With a good > S-ATA drive, you should be seeing about 35-50 files/sec with a single > threaded writer. > > The local disk that I tested on is a relatively slow s-ata disk that is > more laptop quality/performance than server. > > One thought I had about the results is that we might be flipping the IO > sequence with the local disk case. It is the only device of the three that > I tested which is seek/head movement sensitive for small files. > Ahh yes turning off write cache off and barriers on I get your numbers, however I'm not seeing the slowdown that you are, with and without my patch I'm seeing the same performance. Its just a plane jane intel sata controller with a samsung sata disk set at 1.5gbps. Same thing with an nvidia sata controller. I'll think about this some more and see if there is something better that could be done that may help you. Thanks much, Josef ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some hard numbers on ext3 & batching performance issue 2008-03-12 18:37 ` Josef Bacik @ 2008-03-13 11:26 ` Ric Wheeler 0 siblings, 0 replies; 8+ messages in thread From: Ric Wheeler @ 2008-03-13 11:26 UTC (permalink / raw) To: Josef Bacik Cc: David Chinner, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi Josef Bacik wrote: > On Fri, Mar 07, 2008 at 03:45:58PM -0500, Ric Wheeler wrote: >> Josef Bacik wrote: >>> On Friday 07 March 2008 3:08:32 pm Ric Wheeler wrote: >>>> Josef Bacik wrote: >>>>> On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote: >>>>>> After the IO/FS workshop last week, I posted some details on the slow >>>>>> down we see with ext3 when we have a low latency back end instead of a >>>>>> normal local disk (SCSI/S-ATA/etc). >>>> ... >>>> ... >>>> >>>> Note that this spreads the files across 64 subdirectories, each thread >>>> writes 50 files and then moves on to the next in a round robin. >>>> >>> I'm starting to wonder about the disks I have, because my files/second is >>> spanking yours, and its just a local samsung 3gb/s sata drive. With those >>> commands I'm consistently getting over 700 files/sec. I'm seeing about a >>> 1-5% increase in speed locally with my patch. I guess I'll start looking >>> around for some other hardware and check on there in case this box is more >>> badass than I think it is. Thanks much, >>> >>> Josef >>> >> Sounds like you might be running with write cache on & barriers off ;-) >> >> Make sure you have write cache & barriers enabled on the drive. With a good >> S-ATA drive, you should be seeing about 35-50 files/sec with a single >> threaded writer. >> >> The local disk that I tested on is a relatively slow s-ata disk that is >> more laptop quality/performance than server. >> >> One thought I had about the results is that we might be flipping the IO >> sequence with the local disk case. It is the only device of the three that >> I tested which is seek/head movement sensitive for small files. >> > > Ahh yes turning off write cache off and barriers on I get your numbers, however > I'm not seeing the slowdown that you are, with and without my patch I'm seeing > the same performance. Its just a plane jane intel sata controller with a > samsung sata disk set at 1.5gbps. Same thing with an nvidia sata controller. > I'll think about this some more and see if there is something better that could > be done that may help you. Thanks much, > > Josef > Thanks - you should see the numbers with write cache enabled and barriers on as well, but for small files, write cache disabled is quite close ;-) I am happy to rerun the tests at any point, I have a variety of disk types and controllers (lots of Intel AHCI boxes) to use. ric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: some hard numbers on ext3 & batching performance issue 2008-03-05 19:19 ` some hard numbers on ext3 & batching performance issue Ric Wheeler 2008-03-05 20:20 ` Josef Bacik @ 2008-03-06 0:28 ` David Chinner 1 sibling, 0 replies; 8+ messages in thread From: David Chinner @ 2008-03-06 0:28 UTC (permalink / raw) To: Ric Wheeler Cc: David Chinner, Josef Bacik, Theodore Ts'o, adilger, jack, Feld, Andy, linux-fsdevel, linux-scsi On Wed, Mar 05, 2008 at 02:19:48PM -0500, Ric Wheeler wrote: > The work load is generated using fs_mark > (http://sourceforge.net/projects/fsmark/) which is basically a write > workload with small files, each file gets fsync'ed before close. The > metric is "files/sec". ....... > It would be really interesting to rerun some of these tests on xfs which > Dave explained in the thread last week has a more self tuning way to > batch up transactions.... Ok, so XFS numbers. note these are all on a CONFIG_XFS_DEBUG=y kernel, so there's lots of extra checks in the code as compared to a normal production kernel. Local disk (15krpm SCSI, WCD, CONFIG_XFS_DEBUG=y): threads files/s 1 97 2 117 4 109 8 110 10 113 20 116 Local disk (15krpm SCSI, WCE, nobarrier, CONFIG_XFS_DEBUG=y): threads files/s 1 203 2 216 4 243 8 332 10 405 20 424 Ramdisk (nobarrier, CONFIG_XFS_DEBUG=y): agcount=4 agcount=16 threads files/s files/s 1 1298 1298 2 2073 2394 4 3296 3321 8 3464 4199 10 3394 3937 20 3251 3691 Note the difference the amount of parallel allocation in the filesystem makes - agcount=4 only allows up to 4 parallel allocations at once, so even if they are all aggregated into the one log I/O, no further allocation can take place until that log I/O is complete. And at about 4000 files/s the system (4p ia64) is becoming CPU bound due to all the debug checks in XFS. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-03-13 11:28 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <47C6A46D.8020700@emc.com>
[not found] ` <200802281005.13068.jbacik@redhat.com>
[not found] ` <200802281041.01411.jbacik@redhat.com>
[not found] ` <47C6B2A5.4030609@emc.com>
[not found] ` <20080228175422.GU155259@sgi.com>
2008-03-05 19:19 ` some hard numbers on ext3 & batching performance issue Ric Wheeler
2008-03-05 20:20 ` Josef Bacik
2008-03-07 20:08 ` Ric Wheeler
2008-03-07 20:40 ` Josef Bacik
2008-03-07 20:45 ` Ric Wheeler
2008-03-12 18:37 ` Josef Bacik
2008-03-13 11:26 ` Ric Wheeler
2008-03-06 0:28 ` David Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).