From: Josef Bacik <jbacik@redhat.com>
To: Ric Wheeler <ric@emc.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
adilger@sun.com, David Chinner <dgc@sgi.com>,
jack@ucw.cz, "Feld, Andy" <Feld_Andy@emc.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: background on the ext3 batching performance issue
Date: Thu, 28 Feb 2008 10:41:00 -0500 [thread overview]
Message-ID: <200802281041.01411.jbacik@redhat.com> (raw)
In-Reply-To: <200802281005.13068.jbacik@redhat.com>
On Thursday 28 February 2008 10:05:11 am Josef Bacik wrote:
> On Thursday 28 February 2008 7:09:17 am Ric Wheeler wrote:
> > At the LSF workshop, I mentioned that we have tripped across an
> > embarrassing performance issue in the jbd transaction code which is
> > clearly not tuned for low latency devices.
> >
> > The short summary is that we can do say 800 10k files/sec in a
> > write/fsync/close loop with a single thread, but drop down to under 250
> > files/sec with 2 or more threads.
> >
> > This is pretty easy to reproduce with any small file write synchronous
> > workload (i.e., fsync() each file before close). We used my fs_mark
> > tool to reproduce.
> >
> > The core of the issue is the call in the jbd transaction code call out
> > to schedule_timeout_uninterruptible(1) which causes us to sleep for 4ms:
> >
> > pid = current->pid;
> > if (handle->h_sync && journal->j_last_sync_writer != pid) {
> > journal->j_last_sync_writer = pid;
> > do {
> > old_handle_count = transaction->t_handle_count;
> > schedule_timeout_uninterruptible(1);
> > } while (old_handle_count !=
> > transaction->t_handle_count); }
> >
> > This is quite topical to the concern we had with low latency devices in
> > general, but specifically things like SSD's.
>
> Your testcase does in fact show a weakness in this optimization, but look
> at the more likely case, where you have multiple writers on the same
> filesystem rather than one guy doing write/fsync. If we wait we could
> potentially add quite a few more buffers to this transaction before
> flushing it, rather than flushing a buffer or two at a time. What would
> you propose as a solution?
>
Forgive me, I said that badly, now that I've had my morning coffee let me try
again. You are ping-ponging the j_last_sync_writer back and forth between the
two threads, so you don't get the speedup you would get with one thread where
we would just bypass the next sleep since we know we've got one thread doing
write/sync. So this brings up the question, should we try and figure out if we
have the situation where we have multiple threads doing write/sync and
therefore exploiting the weakness in this optimization, and if we should, how
would we do this properly? The only thing I can think to do is to track sync
writers on a transaction, and if its more than one bypass this little snippet.
In fact I think I'll go ahead and do that and see what fs_mark comes up with.
Thank you,
Josef
next prev parent reply other threads:[~2008-02-28 15:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-28 12:09 background on the ext3 batching performance issue Ric Wheeler
2008-02-28 15:05 ` Josef Bacik
2008-02-28 15:41 ` Josef Bacik [this message]
2008-02-28 13:03 ` Ric Wheeler
2008-02-28 13:09 ` Ric Wheeler
2008-02-28 16:41 ` Jan Kara
2008-02-28 17:02 ` Chris Mason
2008-02-28 17:13 ` Jan Kara
2008-02-28 17:35 ` Chris Mason
2008-02-28 18:15 ` Jan Kara
2008-02-28 17:54 ` David Chinner
2008-02-28 19:48 ` Ric Wheeler
2008-02-29 14:52 ` Ric Wheeler
2008-03-05 19:19 ` some hard numbers on ext3 & " Ric Wheeler
2008-03-05 20:20 ` Josef Bacik
2008-03-07 20:08 ` Ric Wheeler
2008-03-07 20:40 ` Josef Bacik
2008-03-07 20:45 ` Ric Wheeler
2008-03-12 18:37 ` Josef Bacik
2008-03-13 11:26 ` Ric Wheeler
2008-03-06 0:28 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200802281041.01411.jbacik@redhat.com \
--to=jbacik@redhat.com \
--cc=Feld_Andy@emc.com \
--cc=adilger@sun.com \
--cc=dgc@sgi.com \
--cc=jack@ucw.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=ric@emc.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).