linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ted Ts'o <tytso@mit.edu>
To: torn5 <torn5@shiftmail.org>
Cc: Josef Bacik <josef@redhat.com>,
	Jon Leighton <j@jonathanleighton.com>,
	linux-ext4@vger.kernel.org
Subject: Re: Severe slowdown caused by jbd2 process
Date: Fri, 21 Jan 2011 20:34:15 -0500	[thread overview]
Message-ID: <20110122013415.GN3043@thunk.org> (raw)
In-Reply-To: <4D3A2EC6.3020700@shiftmail.org>

On Sat, Jan 22, 2011 at 02:11:34AM +0100, torn5 wrote:
> I think that currently the fsyncs have a double meaning: they are
> used to make a filesystem operation happen before another filesystem
> operation, and to make a filesystem operation happen before a
> network operation. I don't think the second case can be speeded up
> (there can be a distributed transaction involved) 

It all depends on the application.  If you have many simultanous
transactions with different peers (say, SMTP for example), you could
just simply have the server batch multiple commits for multiple
incoming mail messages into the database before sending allowing
sending 200 acknowledgement which means, "yes I have this mail
message" to the various MTA's.  In other cases, if you are sending a
huge number of transactions from one server to another, maybe you
change things so that you transactions get acknowledged batches.  So
that might require an application protocol change, but it could be
done (if you have control of both the ends of the connection).

At the end of the day, though, if the application protocol design is
stupid, there's not much you can do.  That's like the difference
between XMODEM (for those who are old enough to remember it), and
ZMODEM (which had a sliding window acknowledgement system).

> Do you think nobarrier + data=journal would provide the same
> guarantees of barrier and almost the same performances of nobarrier
> (for random I/O)?

No.  Fundamentally barriers are bout making sure the data actually
hits the disk platters.  If you don't use a barrier operation, the
hard drive could potential delay writing disk sectors for seconds,
perhaps even minutes, in order to try to optimize disk head movements.
So if you have a sudden power drop, without barriers, even though you
*think* you had sent the commit to disk, and had told your network
partner, "I have it, and commit not to lose it", if you drop power at
precisely the wrong time, data could be lost.  Using data=journal
doesn't change this fact.

> But then there should be a mount option (barriersonlyjournal?) so
> that barriers are only generated every so many seconds and only for
> committing a big transaction to the journal, while applications'
> fsyncs would be made with nobarriers.

In general, an fsync() has to force a journal commit.  There are a few
cases where an fdatasync() could avoid needing a journal commit, but
usually when application uses fdatasync(), they really want to assure
that their data writes are really pushed out to the disk platter, and
a barriersonlyjournal command would defeat that need for a database
which is trying to provide ACID semantics.

      	 					- Ted


  reply	other threads:[~2011-01-22  1:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-21  0:13 Severe slowdown caused by jbd2 process Jon Leighton
2011-01-21  1:31 ` Josef Bacik
     [not found]   ` <1295601083.5799.3.camel@tybalt>
2011-01-21 12:59     ` Josef Bacik
2011-01-21 14:03       ` Josef Bacik
2011-01-21 14:28         ` Jon Leighton
2011-01-21 14:31           ` Josef Bacik
2011-01-21 23:56             ` Ted Ts'o
2011-01-22  1:11               ` torn5
2011-01-22  1:34                 ` Ted Ts'o [this message]
2011-01-22 16:21                   ` torn5
2011-01-22 19:37                     ` Theodore Tso
2011-01-22 23:22                       ` torn5
2011-01-23  5:17                         ` Ted Ts'o
2011-01-23 18:43                           ` torn5
2011-01-24 20:16                             ` Ted Ts'o
2011-01-22 13:05               ` Ric Wheeler
2011-01-24 20:41             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110122013415.GN3043@thunk.org \
    --to=tytso@mit.edu \
    --cc=j@jonathanleighton.com \
    --cc=josef@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=torn5@shiftmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).