From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Alex Tomas <alex@clusterfs.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>,
sct@redhat.com, akpmext2-devel@lists.sourceforge.net,
linux-fsdevel@vger.kernel.org
Subject: Delayed alloc for ordered-mode
Date: Sun, 13 Mar 2005 20:11:17 +0530 [thread overview]
Message-ID: <20050313144117.GA4471@in.ibm.com> (raw)
In-Reply-To: <20050304180235.0a8ff966.alex@clusterfs.com>
What would be really nice is if we could do this in a way that
enables reuse of generic paths even for ordered mode. One thought
that comes to mind is journal commit waiting for writeback to
complete on the data pages which need to be flushed to disk before
meta-data can be committed, much like we do for O_SYNC.
I realise that JBD is intended to work at a level of abstraction
where it has no awareness of filesystems - hence the correspondence
with buffer heads all through. So would the above be a complete
no-no ?
Regards
Suparna
On Fri, Mar 04, 2005 at 06:02:35PM +0300, Alex Tomas wrote:
> On 03 Mar 2005 17:12:14 -0800
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
>
> > One more thing, we need to keep in mind is - we need to make sure
> > that "ordered" mode also improved - since all our testcode
> > focuses on "writeback" mode and the default mode is "ordered" :(
> >
>
> I've just cooked the patch to implement ordered mode for delayed
> allocation path. please take it:
>
> ftp://ftp.clusterfs.com/pub/people/alex/2.6.11/ext3-delalloc-ordered-2.6.11-0.1.patch
>
> Stephen, Andrew could you review it, please?
>
> thanks, Alex
>
>
> Index: linux-2.6.11/include/linux/jbd.h
> ===================================================================
> --- linux-2.6.11.orig/include/linux/jbd.h 2005-03-02 20:49:13.000000000 +0300
> +++ linux-2.6.11/include/linux/jbd.h 2005-03-04 17:03:52.000000000 +0300
> @@ -486,6 +486,12 @@
> struct journal_head *t_sync_datalist;
>
> /*
> + * Number of BIO's submited in context of the transaction we
> + * want to complete before committing
> + */
> + atomic_t t_bios_in_flight;
> +
> + /*
> * Doubly-linked circular list of all forget buffers (superseded
> * buffers which we can un-checkpoint once this transaction commits)
> * [j_list_lock]
> @@ -678,6 +684,9 @@
> /* Wait queue to wait for updates to complete */
> wait_queue_head_t j_wait_updates;
>
> + /* Wait queue to wait for all BIOs to complete */
> + wait_queue_head_t j_wait_bios;
> +
> /* Semaphore for locking against concurrent checkpoints */
> struct semaphore j_checkpoint_sem;
>
> Index: linux-2.6.11/fs/jbd/commit.c
> ===================================================================
> --- linux-2.6.11.orig/fs/jbd/commit.c 2005-03-02 20:49:09.000000000 +0300
> +++ linux-2.6.11/fs/jbd/commit.c 2005-03-04 17:53:52.000000000 +0300
> @@ -619,6 +620,13 @@
> if (is_journal_aborted(journal))
> goto skip_commit;
>
> + /*
> + * Before the commit record, we have to wait for all bio's
> + * ext3_wb_writepages() issued against newly-allocated blocks
> + */
> + wait_event(journal->j_wait_bios,
> + atomic_read(&commit_transaction->t_bios_in_flight) == 0);
> +
> /* Done it all: now write the commit record. We should have
> * cleaned up our previous buffers by now, so if we are in abort
> * mode we can now just skip the rest of the journal write
> Index: linux-2.6.11/fs/jbd/transaction.c
> ===================================================================
> --- linux-2.6.11.orig/fs/jbd/transaction.c 2005-03-02 20:49:09.000000000 +0300
> +++ linux-2.6.11/fs/jbd/transaction.c 2005-03-04 17:05:28.000000000 +0300
> @@ -51,6 +51,7 @@
> transaction->t_tid = journal->j_transaction_sequence++;
> transaction->t_expires = jiffies + journal->j_commit_interval;
> spin_lock_init(&transaction->t_handle_lock);
> + atomic_set(&transaction->t_bios_in_flight, 0);
>
> /* Set up the commit timer for the new transaction. */
> journal->j_commit_timer->expires = transaction->t_expires;
> Index: linux-2.6.11/fs/jbd/journal.c
> ===================================================================
> --- linux-2.6.11.orig/fs/jbd/journal.c 2005-03-04 17:04:29.000000000 +0300
> +++ linux-2.6.11/fs/jbd/journal.c 2005-03-04 17:04:40.000000000 +0300
> @@ -671,6 +671,7 @@
> init_waitqueue_head(&journal->j_wait_checkpoint);
> init_waitqueue_head(&journal->j_wait_commit);
> init_waitqueue_head(&journal->j_wait_updates);
> + init_waitqueue_head(&journal->j_wait_bios);
> init_MUTEX(&journal->j_barrier);
> init_MUTEX(&journal->j_checkpoint_sem);
> spin_lock_init(&journal->j_revoke_lock);
> Index: linux-2.6.11/fs/ext3/writeback.c
> ===================================================================
> --- linux-2.6.11.orig/fs/ext3/writeback.c 2005-03-04 15:10:01.000000000 +0300
> +++ linux-2.6.11/fs/ext3/writeback.c 2005-03-04 17:33:05.000000000 +0300
> @@ -145,6 +145,17 @@
> if (bio->bi_size)
> return 1;
>
> + if (bio->bi_private) {
> + transaction_t *transaction = bio->bi_private;
> +
> + /*
> + * journal_commit_transaction() may be awaiting
> + * the bio to complete.
> + */
> + if (atomic_dec_and_test(&transaction->t_bios_in_flight))
> + wake_up(&transaction->t_journal->j_wait_bios);
> + }
> +
> do {
> struct page *page = bvec->bv_page;
>
> @@ -162,6 +173,16 @@
> static struct bio *ext3_wb_bio_submit(struct bio *bio, handle_t *handle)
> {
> bio->bi_end_io = ext3_wb_end_io;
> + if (handle) {
> + /*
> + * In data=ordered we shouldn't commit the transaction
> + * until all data related to the transaction get on a
> + * platter.
> + */
> + atomic_inc(&handle->h_transaction->t_bios_in_flight);
> + bio->bi_private = handle->h_transaction;
> + } else
> + bio->bi_private = NULL;
> submit_bio(WRITE, bio);
> return NULL;
> }
--
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India
next prev parent reply other threads:[~2005-03-13 14:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-03 8:33 Reviewing ext3 improvement patches (delalloc, mballoc, extents) Suparna Bhattacharya
2005-03-03 9:40 ` Andreas Dilger
2005-03-03 22:10 ` Theodore Ts'o
2005-03-03 22:30 ` Alex Tomas
2005-03-04 11:13 ` Suparna Bhattacharya
2005-03-04 12:29 ` Alex Tomas
2005-03-04 18:25 ` [Ext2-devel] " Andreas Dilger
2005-03-04 1:12 ` [Ext2-devel] " Badari Pulavarty
2005-03-04 1:46 ` Mingming Cao
2005-03-04 3:26 ` Suparna Bhattacharya
2005-03-14 8:36 ` Werner Almesberger
2005-03-14 9:04 ` Suparna Bhattacharya
2005-03-14 15:02 ` Werner Almesberger
2005-03-14 15:43 ` Alex Tomas
2005-03-14 16:37 ` [Ext2-devel] " Werner Almesberger
2005-03-14 17:13 ` Alex Tomas
2005-03-15 0:28 ` Werner Almesberger
2005-03-14 22:23 ` Bryan Henderson
2005-03-15 0:42 ` Werner Almesberger
2005-03-15 21:59 ` Bryan Henderson
2005-03-04 11:30 ` [Ext2-devel] " Alex Tomas
2005-03-04 15:02 ` Alex Tomas
2005-03-13 14:41 ` Suparna Bhattacharya [this message]
2005-03-13 19:32 ` Delayed alloc for ordered-mode Badari Pulavarty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050313144117.GA4471@in.ibm.com \
--to=suparna@in.ibm.com \
--cc=akpmext2-devel@lists.sourceforge.net \
--cc=alex@clusterfs.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=pbadari@us.ibm.com \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).