Re: some hard numbers on ext3 & batching performance issue

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Josef Bacik <jbacik@redhat.com>
To: ric@emc.com
Cc: David Chinner <dgc@sgi.com>, "Theodore Ts'o" <tytso@mit.edu>,
	adilger@sun.com, jack@ucw.cz, "Feld, Andy" <Feld_Andy@emc.com>,
	linux-fsdevel@vger.kernel.org,
	linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: some hard numbers on ext3 & batching performance issue
Date: Fri, 7 Mar 2008 15:40:13 -0500	[thread overview]
Message-ID: <200803071540.13958.jbacik@redhat.com> (raw)
In-Reply-To: <47D1A0C0.8010908@emc.com>

On Friday 07 March 2008 3:08:32 pm Ric Wheeler wrote:
> Josef Bacik wrote:
> > On Wednesday 05 March 2008 2:19:48 pm Ric Wheeler wrote:
> >> After the IO/FS workshop last week, I posted some details on the slow
> >> down we see with ext3 when we have a low latency back end instead of a
> >> normal local disk (SCSI/S-ATA/etc).
>
> ...
> ...
> ...
>
> >> It would be really interesting to rerun some of these tests on xfs which
> >> Dave explained in the thread last week has a more self tuning way to
> >> batch up transactions....
> >>
> >> Note that all of those poor users who have a synchronous write workload
> >> today are in the "1" row for each of the above tables.
> >
> > Mind giving this a whirl?  The fastest thing I've got here is an Apple X
> > RAID and its being used for something else atm, so I've only tested this
> > on local disk to make sure it didn't make local performance suck (which
> > it doesn't btw). This should be equivalent with what David says XFS does.
> >  Thanks much,
> >
> > Josef
> >
> > diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> > index c6cbb6c..4596e1c 100644
> > --- a/fs/jbd/transaction.c
> > +++ b/fs/jbd/transaction.c
> > @@ -1333,8 +1333,7 @@ int journal_stop(handle_t *handle)
> >  {
> >  	transaction_t *transaction = handle->h_transaction;
> >  	journal_t *journal = transaction->t_journal;
> > -	int old_handle_count, err;
> > -	pid_t pid;
> > +	int err;
> >
> >  	J_ASSERT(journal_current_handle() == handle);
> >
> > @@ -1353,32 +1352,22 @@ int journal_stop(handle_t *handle)
> >
> >  	jbd_debug(4, "Handle %p going down\n", handle);
> >
> > -	/*
> > -	 * Implement synchronous transaction batching.  If the handle
> > -	 * was synchronous, don't force a commit immediately.  Let's
> > -	 * yield and let another thread piggyback onto this transaction.
> > -	 * Keep doing that while new threads continue to arrive.
> > -	 * It doesn't cost much - we're about to run a commit and sleep
> > -	 * on IO anyway.  Speeds up many-threaded, many-dir operations
> > -	 * by 30x or more...
> > -	 *
> > -	 * But don't do this if this process was the most recent one to
> > -	 * perform a synchronous write.  We do this to detect the case where a
> > -	 * single process is doing a stream of sync writes.  No point in
> > waiting -	 * for joiners in that case.
> > -	 */
> > -	pid = current->pid;
> > -	if (handle->h_sync && journal->j_last_sync_writer != pid) {
> > -		journal->j_last_sync_writer = pid;
> > -		do {
> > -			old_handle_count = transaction->t_handle_count;
> > -			schedule_timeout_uninterruptible(1);
> > -		} while (old_handle_count != transaction->t_handle_count);
> > -	}
> > -
> >  	current->journal_info = NULL;
> >  	spin_lock(&journal->j_state_lock);
> >  	spin_lock(&transaction->t_handle_lock);
> > +
> > +	if (journal->j_committing_transaction && handle->h_sync) {
> > +		tid_t tid = journal->j_committing_transaction->t_tid;
> > +
> > +		spin_unlock(&transaction->t_handle_lock);
> > +		spin_unlock(&journal->j_state_lock);
> > +
> > +		err = log_wait_commit(journal, tid);
> > +
> > +		spin_lock(&journal->j_state_lock);
> > +		spin_lock(&transaction->t_handle_lock);
> > +	}
> > +
> >  	transaction->t_outstanding_credits -= handle->h_buffer_credits;
> >  	transaction->t_updates--;
> >  	if (!transaction->t_updates) {
>
> Running with Josef's patch, I was able to see a clear improvement for
> batching these synchronous operations on ext3 with the RAM disk and
> array. It is not too often that you get to do a simple change and see a
> 27 times improvement ;-)
>
> On the bad side, the local disk case took as much as a 30% drop in
> performance.  The specific disk is not one that I have a lot of
> experience with, I would like to retry on a disk that has been qualified
>   by our group (i.e., we have reasonable confidence that there are no
> firmware issues, etc).
>
> Now for the actual results.
>
> The results are the average value of 5 runs for each number of threads.
>
> Type     Threads   Baseline    Josef    Speedup (Josef/Baseline)
> array	    1	     320.5      325.4      1.01
> array	    2	     174.9      351.9      2.01
> array	    4	     382.7      593.5      1.55
> array	    8	     644.1      963.0      1.49
> array	    10	     842.9     1038.7      1.23
> array	    20	    1319.6     1432.3      1.08
>
> RAM disk    1       5621.4     5595.1      0.99
> RAM disk    2        281.5     7613.3     27.04
> RAM disk    4        579.9     9111.5     15.71
> RAM disk    8        891.1     9357.3     10.50
> RAM disk    10      1116.3     9873.6      8.84
> RAM disk    20      1952.0    10703.6      5.48
>
> S-ATA disk  1         19.0       15.1      0.79
> S-ATA disk  2         19.9       14.4      0.72
> S-ATA disk  4         41.0       27.9      0.68
> S-ATA disk  8         60.4       43.2      0.71
> S-ATA disk  10        67.1       48.7      0.72
> S-ATA disk  20       102.7       74.0      0.72
>
> Background on the tests:
>
> All of this is measured on three devices - a relatively old & slow
> array, the local (slow!) 2.5" S-ATA disk in the box and a RAM disk.
>
> These numbers are used fs_mark to write 4096 byte files with the
> following commands:
>
> fs_mark  -d  /home/test/t  -s  4096  -n  40000  -N  50  -D  64  -t  1
> ...
> fs_mark  -d  /home/test/t  -s  4096  -n  20000  -N  50  -D  64  -t  2
> ...
> fs_mark  -d  /home/test/t  -s  4096  -n  10000  -N  50  -D  64  -t  4
> ...
> fs_mark  -d  /home/test/t  -s  4096  -n  5000  -N  50  -D  64  -t  8
> ...
> fs_mark  -d  /home/test/t  -s  4096  -n  4000  -N  50  -D  64  -t  10
> ...
> fs_mark  -d  /home/test/t  -s  4096  -n  2000  -N  50  -D  64  -t  20
> ...
>
> Note that this spreads the files across 64 subdirectories, each thread
> writes 50 files and then moves on to the next in a round robin.
>

I'm starting to wonder about the disks I have, because my files/second is 
spanking yours, and its just a local samsung 3gb/s sata drive.  With those 
commands I'm consistently getting over 700 files/sec.  I'm seeing about a 1-5% 
increase in speed locally with my patch.  I guess I'll start looking around for 
some other hardware and check on there in case this box is more badass than I 
think it is.  Thanks much,

Josef

next prev parent reply	other threads:[~2008-03-07 20:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-28 12:09 background on the ext3 batching performance issue Ric Wheeler
2008-02-28 15:05 ` Josef Bacik
2008-02-28 15:41   ` Josef Bacik
2008-02-28 13:03     ` Ric Wheeler
2008-02-28 13:09     ` Ric Wheeler
2008-02-28 16:41       ` Jan Kara
2008-02-28 17:02       ` Chris Mason
2008-02-28 17:13         ` Jan Kara
2008-02-28 17:35           ` Chris Mason
2008-02-28 18:15             ` Jan Kara
2008-02-28 17:54       ` David Chinner
2008-02-28 19:48         ` Ric Wheeler
2008-02-29 14:52         ` Ric Wheeler
2008-03-05 19:19         ` some hard numbers on ext3 & " Ric Wheeler
2008-03-05 20:20           ` Josef Bacik
2008-03-07 20:08             ` Ric Wheeler
2008-03-07 20:40               ` Josef Bacik [this message]
2008-03-07 20:45                 ` Ric Wheeler
2008-03-12 18:37                   ` Josef Bacik
2008-03-13 11:26                     ` Ric Wheeler
2008-03-06  0:28           ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200803071540.13958.jbacik@redhat.com \
    --to=jbacik@redhat.com \
    --cc=Feld_Andy@emc.com \
    --cc=adilger@sun.com \
    --cc=dgc@sgi.com \
    --cc=jack@ucw.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=ric@emc.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).