From: Josef Bacik <jbacik@redhat.com>
To: ric@emc.com
Cc: David Chinner <dgc@sgi.com>, "Theodore Ts'o" <tytso@mit.edu>,
adilger@sun.com, jack@ucw.cz, "Feld, Andy" <Feld_Andy@emc.com>,
linux-fsdevel@vger.kernel.org,
linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: some hard numbers on ext3 & batching performance issue
Date: Fri, 7 Mar 2008 15:40:13 -0500 [thread overview]
Message-ID: <200803071540.13958.jbacik@redhat.com> (raw)
In-Reply-To: <47D1A0C0.8010908@emc.com>
On Friday 07 March 2008 3:08:32 pm Ric Wheeler wrote:
> Josef Bacik wrote:
> > On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote:
> >> After the IO/FS workshop last week, I posted some details on the slow
> >> down we see with ext3 when we have a low latency back end instead of a
> >> normal local disk (SCSI/S-ATA/etc).
>
> ...
> ...
> ...
>
> >> It would be really interesting to rerun some of these tests on xfs which
> >> Dave explained in the thread last week has a more self tuning way to
> >> batch up transactions....
> >>
> >> Note that all of those poor users who have a synchronous write workload
> >> today are in the "1" row for each of the above tables.
> >
> > Mind giving this a whirl? The fastest thing I've got here is an Apple X
> > RAID and its being used for something else atm, so I've only tested this
> > on local disk to make sure it didn't make local performance suck (which
> > it doesn't btw). This should be equivalent with what David says XFS does.
> > Thanks much,
> >
> > Josef
> >
> > diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> > index c6cbb6c..4596e1c 100644
> > --- a/fs/jbd/transaction.c
> > +++ b/fs/jbd/transaction.c
> > @@ -1333,8 +1333,7 @@ int journal_stop(handle_t *handle)
> > {
> > transaction_t *transaction = handle->h_transaction;
> > journal_t *journal = transaction->t_journal;
> > - int old_handle_count, err;
> > - pid_t pid;
> > + int err;
> >
> > J_ASSERT(journal_current_handle() == handle);
> >
> > @@ -1353,32 +1352,22 @@ int journal_stop(handle_t *handle)
> >
> > jbd_debug(4, "Handle %p going down\n", handle);
> >
> > - /*
> > - * Implement synchronous transaction batching. If the handle
> > - * was synchronous, don't force a commit immediately. Let's
> > - * yield and let another thread piggyback onto this transaction.
> > - * Keep doing that while new threads continue to arrive.
> > - * It doesn't cost much - we're about to run a commit and sleep
> > - * on IO anyway. Speeds up many-threaded, many-dir operations
> > - * by 30x or more...
> > - *
> > - * But don't do this if this process was the most recent one to
> > - * perform a synchronous write. We do this to detect the case where a
> > - * single process is doing a stream of sync writes. No point in
> > waiting - * for joiners in that case.
> > - */
> > - pid = current->pid;
> > - if (handle->h_sync && journal->j_last_sync_writer != pid) {
> > - journal->j_last_sync_writer = pid;
> > - do {
> > - old_handle_count = transaction->t_handle_count;
> > - schedule_timeout_uninterruptible(1);
> > - } while (old_handle_count != transaction->t_handle_count);
> > - }
> > -
> > current->journal_info = NULL;
> > spin_lock(&journal->j_state_lock);
> > spin_lock(&transaction->t_handle_lock);
> > +
> > + if (journal->j_committing_transaction && handle->h_sync) {
> > + tid_t tid = journal->j_committing_transaction->t_tid;
> > +
> > + spin_unlock(&transaction->t_handle_lock);
> > + spin_unlock(&journal->j_state_lock);
> > +
> > + err = log_wait_commit(journal, tid);
> > +
> > + spin_lock(&journal->j_state_lock);
> > + spin_lock(&transaction->t_handle_lock);
> > + }
> > +
> > transaction->t_outstanding_credits -= handle->h_buffer_credits;
> > transaction->t_updates--;
> > if (!transaction->t_updates) {
>
> Running with Josef's patch, I was able to see a clear improvement for
> batching these synchronous operations on ext3 with the RAM disk and
> array. It is not too often that you get to do a simple change and see a
> 27 times improvement ;-)
>
> On the bad side, the local disk case took as much as a 30% drop in
> performance. The specific disk is not one that I have a lot of
> experience with, I would like to retry on a disk that has been qualified
> by our group (i.e., we have reasonable confidence that there are no
> firmware issues, etc).
>
> Now for the actual results.
>
> The results are the average value of 5 runs for each number of threads.
>
> Type Threads Baseline Josef Speedup (Josef/Baseline)
> array 1 320.5 325.4 1.01
> array 2 174.9 351.9 2.01
> array 4 382.7 593.5 1.55
> array 8 644.1 963.0 1.49
> array 10 842.9 1038.7 1.23
> array 20 1319.6 1432.3 1.08
>
> RAM disk 1 5621.4 5595.1 0.99
> RAM disk 2 281.5 7613.3 27.04
> RAM disk 4 579.9 9111.5 15.71
> RAM disk 8 891.1 9357.3 10.50
> RAM disk 10 1116.3 9873.6 8.84
> RAM disk 20 1952.0 10703.6 5.48
>
> S-ATA disk 1 19.0 15.1 0.79
> S-ATA disk 2 19.9 14.4 0.72
> S-ATA disk 4 41.0 27.9 0.68
> S-ATA disk 8 60.4 43.2 0.71
> S-ATA disk 10 67.1 48.7 0.72
> S-ATA disk 20 102.7 74.0 0.72
>
> Background on the tests:
>
> All of this is measured on three devices - a relatively old & slow
> array, the local (slow!) 2.5" S-ATA disk in the box and a RAM disk.
>
> These numbers are used fs_mark to write 4096 byte files with the
> following commands:
>
> fs_mark -d /home/test/t -s 4096 -n 40000 -N 50 -D 64 -t 1
> ...
> fs_mark -d /home/test/t -s 4096 -n 20000 -N 50 -D 64 -t 2
> ...
> fs_mark -d /home/test/t -s 4096 -n 10000 -N 50 -D 64 -t 4
> ...
> fs_mark -d /home/test/t -s 4096 -n 5000 -N 50 -D 64 -t 8
> ...
> fs_mark -d /home/test/t -s 4096 -n 4000 -N 50 -D 64 -t 10
> ...
> fs_mark -d /home/test/t -s 4096 -n 2000 -N 50 -D 64 -t 20
> ...
>
> Note that this spreads the files across 64 subdirectories, each thread
> writes 50 files and then moves on to the next in a round robin.
>
I'm starting to wonder about the disks I have, because my files/second is
spanking yours, and its just a local samsung 3gb/s sata drive. With those
commands I'm consistently getting over 700 files/sec. I'm seeing about a 1-5%
increase in speed locally with my patch. I guess I'll start looking around for
some other hardware and check on there in case this box is more badass than I
think it is. Thanks much,
Josef
next prev parent reply other threads:[~2008-03-07 20:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-28 12:09 background on the ext3 batching performance issue Ric Wheeler
2008-02-28 15:05 ` Josef Bacik
2008-02-28 15:41 ` Josef Bacik
2008-02-28 13:03 ` Ric Wheeler
2008-02-28 13:09 ` Ric Wheeler
2008-02-28 16:41 ` Jan Kara
2008-02-28 17:02 ` Chris Mason
2008-02-28 17:13 ` Jan Kara
2008-02-28 17:35 ` Chris Mason
2008-02-28 18:15 ` Jan Kara
2008-02-28 17:54 ` David Chinner
2008-02-28 19:48 ` Ric Wheeler
2008-02-29 14:52 ` Ric Wheeler
2008-03-05 19:19 ` some hard numbers on ext3 & " Ric Wheeler
2008-03-05 20:20 ` Josef Bacik
2008-03-07 20:08 ` Ric Wheeler
2008-03-07 20:40 ` Josef Bacik [this message]
2008-03-07 20:45 ` Ric Wheeler
2008-03-12 18:37 ` Josef Bacik
2008-03-13 11:26 ` Ric Wheeler
2008-03-06 0:28 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200803071540.13958.jbacik@redhat.com \
--to=jbacik@redhat.com \
--cc=Feld_Andy@emc.com \
--cc=adilger@sun.com \
--cc=dgc@sgi.com \
--cc=jack@ucw.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=ric@emc.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).