linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: tytso@mit.edu
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 1/2] ext4/jbd2: fix io-barrier logic in case of external journal
Date: Mon, 22 Mar 2010 11:03:42 -0400	[thread overview]
Message-ID: <20100322150342.GG11560@thunk.org> (raw)
In-Reply-To: <87iq8o4fmk.fsf@openvz.org>

On Mon, Mar 22, 2010 at 05:04:19PM +0300, Dmitry Monakhov wrote:
> tytso@mit.edu writes:
> 
> > On Fri, Mar 12, 2010 at 08:26:49PM +0300, Dmitry Monakhov wrote:
> >>  start_journal_io:
> >> +			if (bufs)
> >> +				commit_transaction->t_flushed_data_blocks = 1;
> >> +
> >
> > I'm not convinced this is right.
> >
> > From your test case, the problem isn't because we have journaled
> > metadata blocks (which is what bufs) counts, but because fsync()
> > depends on data blocks also getting flushed out to disks.
> >
> > However, if we aren't closing the transaction because of fsync(), I
> > don't think we need to do a barrier in the case of an external
> > journal.  So instead of effectively unconditionally setting
> > t_flushed_data_blocks (since bufs is nearly always going to be
> > non-zero), I think the better fix is to test to see if the journal
> > device != to the fs data device in fsync(), and if so, start the
> > barrier operation there.
> >
> > Do you agree?
> Yes.

Just to be clear, since I realized I wrote fsync() when I should have
written fs/ext4/fsync.c, my proposal was to put this check in
ext4_sync_file().

> BTW Would it be correct to update j_tail in
> jbd2_journal_commit_transaction() to something more recent
> if we have issued an io-barrier to j_fs_dev?
> This will helps to reduce journal_recovery time which may be really
> painful in some slow devices.

Um, maybe.  We are already calling __jbd2_journal_clean_checkpoint_list(),
and the barrier operation *is* expensive.

On the other hand, updating the journal superblock on every sync is
another seek that would have to made before the barrier operation, and
I'm a bit concerned that this seek would be noticeable.  If it is
noticeable, is it worth it to optimize for the uncommon case (a power
failure requiring a journal replay) when it might cost us something,
however, small, on every single journal update?

Do we really think the journal replay time is really something that is
a major pain point.  I can think of optimizations that involve
skipping writes that will get updated later in future transactions,
but it means complicating the replay code, which has been stable for a
long time, and it's not clear to me that the costs are worth the
benefits.

> I've take a look at async commit logic: fs/jbd2/commit.c
> void jbd2_journal_commit_transaction(journal_t *journal)
> {
> 725:    /* Done it all: now write the commit record asynchronously. */
>         if (JBD2_HAS_INCOMPAT_FEATURE(journal,
>                                       JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT))
>                                       {
>                 err = journal_submit_commit_record(journal,
>                 commit_transaction,
>                                                  &cbh, crc32_sum);
>                 if (err)
>                         __jbd2_journal_abort_hard(journal);
>                 if (journal->j_flags & JBD2_BARRIER)
>                         blkdev_issue_flush(journal->j_dev, NULL);
> <<< blkdev_issue_flush is wait for barrier to complete by default, but
> <<< in fact we don't have to wait for barrier here. I've prepared a
> <<< patch wich add flags to control blkdev_issue_flush() wait
> <<< behavior, and this is the place for no-wait variant.

I think that's right, as long as we're confident that subsequent
writes won't get scheduled before the no-wait barrier.  If it did, it
would be a bug in the block I/O layer, so it should be OK.

      	       	      	    	- Ted
 

  reply	other threads:[~2010-03-22 15:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-12 17:26 [PATCH 1/2] ext4/jbd2: fix io-barrier logic in case of external journal Dmitry Monakhov
2010-03-12 17:26 ` [PATCH 2/2] ext3, jbd: Add barriers for file systems with exernal journals Dmitry Monakhov
2010-03-22  1:20 ` [PATCH 1/2] ext4/jbd2: fix io-barrier logic in case of external journal tytso
2010-03-22 14:04   ` Dmitry Monakhov
2010-03-22 15:03     ` tytso [this message]
2010-03-22 16:14       ` Dmitry Monakhov
2010-03-22 20:22         ` tytso
2010-03-30  2:14           ` Jan Kara
2010-03-30  1:47         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100322150342.GG11560@thunk.org \
    --to=tytso@mit.edu \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).