linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Mingming Cao <cmm@us.ibm.com>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] JBD ordered mode rewrite
Date: Mon, 10 Mar 2008 19:29:03 +0100	[thread overview]
Message-ID: <20080310182903.GH30435@duck.suse.cz> (raw)
In-Reply-To: <1204887301.3627.25.camel@localhost.localdomain>

  Hi Mingming,

On Fri 07-03-08 02:55:01, Mingming Cao wrote:
> On Thu, 2008-03-06 at 18:42 +0100, Jan Kara wrote:
> >   Below is my rewrite of ordered mode in JBD. Now we don't have a list of
> > data buffers that need syncing on transaction commit but a list of inodes
> > that need writeout during commit. This brings all sorts of advantages such
> > as possibility to get rid of journal heads and buffer heads for data
> > buffers in ordered mode, better ordering of writes on transaction commit,
> > simplification of some JBD code, no more anonymous pages when truncate of
> > data being committed happens. The patch has survived some light testing
> > but it still has some potential of eating your data so beware :) I've run
> > dbench to see whether we didn't decrease performance by different handling
> > of truncate and the throughput I'm getting on my machine is the same (OK,
> > is lower by 0.5%) if I disable the code in truncate waiting for commit to
> > finish... Also the throughput of dbench is about 2% better with my patch
> > than with current JBD.
> 
> I know ext4 is keep changing that it's a bit hard to create patch
> against ext4, but I feel features like especially rewrite the default
> ordered mode should done in ext4/jbd2. I could port to current ext4 and
> JBD2 if you agrees with this.
  I definitely agree with you :). This was just the first version of the
patch, more like a proof-of-concept, and it was easier to code against
jbd/ext3 for me :) But when I have something more definitive, I'll port it
against jbd2/ext4.

> Also, would it make sense to create a new ordered mode writepage
> routines, and keep the old ordered mode code there for a while, to allow
> easy comparison? This could a good transition for people to start
> experiment this ordered mode without worrying about put data in danger
> by default.
  This is an interesting idea. I'll have a look how hard would it be to do
that. It would be also good because we could separate the patch into
several chunks and it would still compile / work between them.

> >   Any comments or testing most welcom
> 
> [...]
> >  /*
> >   * Note that we always start a transaction even if we're not journalling
> >   * data.  This is to preserve ordering: any hole instantiation within
> > @@ -1465,15 +1444,11 @@ static int journal_dirty_data_fn(handle_t *handle, struct buffer_head *bh)
> >   * We don't honour synchronous mounts for writepage().  That would be
> >   * disastrous.  Any write() or metadata operation will sync the fs for
> >   * us.
> > - *
> > - * AKPM2: if all the page's buffers are mapped to disk and !data=journal,
> > - * we don't need to open a transaction here.
> >   */
> >  static int ext3_ordered_writepage(struct page *page,
> >  				struct writeback_control *wbc)
> >  {
> >  	struct inode *inode = page->mapping->host;
> > -	struct buffer_head *page_bufs;
> >  	handle_t *handle = NULL;
> >  	int ret = 0;
> >  	int err;
> > @@ -1487,46 +1462,49 @@ static int ext3_ordered_writepage(struct page *page,
> >  	if (ext3_journal_current_handle())
> >  		goto out_fail;
> > 
> > -	handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
> > -
> > -	if (IS_ERR(handle)) {
> > -		ret = PTR_ERR(handle);
> > -		goto out_fail;
> > +	/*
> > +	 * Now there are two different reasons why we can be called:
> > +	 *   1) write out during commit
> > +	 *   2) fsync / writeout to free memory
> > +	 *
> > +	 * In the first case, we just need to write the buffer to disk, in the
> > +	 * second case we may need to do hole filling and attach the inode to
> > +	 * the transaction. Note that even in the first case, we may get an
> > +	 * unmapped buffer (hole fill with data via mmap) but we don't have to
> > +	 * write it - actually, we can't because from a transaction commit we
> > +	 * cannot start a new transaction or we could deadlock.
> > +	 */
> 
> Any thoughts how to handle the unmapped page under case 1)? Right now I
> see it fails. Your comments here saying that we still have the issue
> that "can't start a new transaction while commiting", but likely, with
> delayed allocation, starting a new transaction could to happen a lot to
> do defered block allocation. 
> 
> I really hope this new mode could be easy to add delayed allocation
> support.  Any thoughts that we could workaround the locking in the JBD2
> layer?
  "can't start a new transaction while commiting" is a fundamental problem
that you cannot add to a journal when you are trying to clean it up. It's
not just a locking problem :). As Mark pointed out, there's another problem
with the fact that we cannot afford to take page_lock while committing
because that's essentially a clasical lock inversion between a transaction
start and a page lock. How to solve that one, I don't know yet.
  To your question about delayed allocation - currently, we just skip those
buffers so as you write in some other email, you basically get the
guarantees of writeback mode. Actually, I guess the result is the same with
the old way of ordered-mode handling since you cannot afford to do the block
allocation from the commit code as well... We could get around this
limitation (actually easier than in the old code) if we were sure we have
enough credits in the transaction to really do the allocation - we would
do the allocation in the writepage() and attach changed buffers to the
committing transaction.

> > +	if (!wbc->skip_unmapped) {
> > +		handle = ext3_journal_start(inode,
> > +				ext3_writepage_trans_blocks(inode));
> > +		if (IS_ERR(handle)) {
> > +			ret = PTR_ERR(handle);
> > +			goto out_fail;
> > +		}
> >  	}
> > +	else if (!PageMappedToDisk(page))
> > +		goto out_fail;
> > 

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2008-03-10 18:29 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-06 17:42 [RFC] JBD ordered mode rewrite Jan Kara
2008-03-06 19:05 ` Josef Bacik
2008-03-10 16:30   ` Jan Kara
2008-03-06 23:53 ` Andrew Morton
2008-03-10 17:38   ` Jan Kara
2008-03-07  1:34 ` Mark Fasheh
2008-03-10 18:00   ` Jan Kara
2008-03-07 10:55 ` Mingming Cao
2008-03-10 18:29   ` Jan Kara [this message]
2008-03-07 23:52 ` Andreas Dilger
2008-03-08  0:08   ` Mingming Cao
2008-03-08 12:14   ` Christoph Hellwig
2008-03-10 19:54   ` Jan Kara
2008-03-10 21:37     ` Andreas Dilger
2008-04-25 23:38 ` Possible race between direct IO and JBD? Mingming Cao
2008-04-26 10:41   ` Andrew Morton
2008-04-28 12:26   ` Jan Kara
2008-04-28 17:11     ` Badari Pulavarty
2008-04-28 18:09       ` Jan Kara
2008-04-28 19:09         ` Mingming Cao
2008-04-29 12:43           ` Jan Kara
2008-04-29 17:49             ` Mingming Cao
2008-05-01 15:16             ` [PATCH] jbd_commit_transaction() races with journal_try_to_drop_buffers() causing DIO failures Badari Pulavarty
2008-05-01 22:08               ` Mingming Cao
2008-05-05 17:06               ` Jan Kara
2008-05-05 17:53                 ` Mingming Cao
2008-05-06  0:10                 ` Badari Pulavarty
2008-05-09 22:27                 ` Mingming Cao
2008-05-09 22:39                   ` [PATCH] JBD:need hold j_state_lock to updates to transaction t_state to T_COMMIT Mingming Cao
2008-05-12  9:34                     ` Jan Kara
2008-05-12 15:54                   ` [PATCH] jbd_commit_transaction() races with journal_try_to_drop_buffers() causing DIO failures Jan Kara
2008-05-12 19:23                     ` Mingming Cao
2008-05-13 14:20                       ` Jan Kara
2008-05-13  0:39                     ` Mingming Cao
2008-05-13 14:54                       ` Jan Kara
2008-05-13 16:37                         ` Mingming Cao
2008-05-13 22:23                         ` Mingming Cao
2008-05-14 17:08                           ` Jan Kara
2008-05-14 17:41                             ` Mingming Cao
2008-05-14 18:14                               ` Jan Kara
2008-05-16 14:13                                 ` Mingming Cao
2008-05-16 14:14                                 ` [PATCH] Fix DIO EIO error caused by race between jbd_commit_transaction() and journal_try_to_drop_buffers() Mingming Cao
2008-05-16 15:01                                   ` Josef Bacik
2008-05-16 17:11                                     ` Mingming Cao
2008-05-16 17:17                                       ` Badari Pulavarty
2008-05-16 17:30                                         ` Mingming Cao
2008-05-16 17:12                                   ` Badari Pulavarty
2008-05-16 21:01                                     ` [PATCH] JBD: Fix DIO EIO error caused by race between free buffer and commit trasanction Mingming Cao
2008-05-18 22:37                                       ` Jan Kara
2008-05-19 19:59                                         ` Mingming Cao
2008-05-19 20:25                                           ` Andrew Morton
2008-05-19 22:07                                             ` Mingming Cao
2008-05-20  9:30                                               ` Jens Axboe
2008-05-20 17:47                                                 ` Mingming Cao
2008-05-20 18:02                                               ` [PATCH-v2] JBD: Fix " Mingming Cao
2008-05-20 23:53                                                 ` Jan Kara
2008-05-21 17:14                                                   ` Mingming
2008-05-24 22:44                                                     ` Jan Kara
2008-05-28 18:18                                                       ` Mingming Cao
2008-05-28 18:55                                                         ` Jan Kara
2008-05-29  0:15                                                           ` Mingming Cao
2008-05-29  0:16                                                           ` [PATCH][take 5] " Mingming Cao
2008-05-29  0:18                                                             ` [PATCH][take 5] JBD2: " Mingming Cao
2008-05-30  6:24                                                               ` Aneesh Kumar K.V
2008-05-30 15:17                                                                 ` Mingming Cao
2008-05-21 23:38                                                 ` [PATCH 1/2][TAKE3] JBD: " Mingming
2008-05-22  5:57                                                   ` Andrew Morton
2008-05-21 23:39                                                 ` [PATCH 2/2][TAKE3] JBD2: " Mingming
2008-05-20 18:03                                               ` [PATCH -v2] JBD2: Fix race between journal " Mingming Cao
2008-05-16 21:01                                     ` [PATCH] JBD2: Fix DIO EIO error caused by race between " Mingming Cao
2008-05-09 22:39                 ` [PATCH] JBD2:need hold j_state_lock to updates to transaction t_state to T_COMMIT Mingming Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080310182903.GH30435@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).