From mboxrd@z Thu Jan 1 00:00:00 1970 From: Suparna Bhattacharya Subject: Delayed alloc for ordered-mode Date: Sun, 13 Mar 2005 20:11:17 +0530 Message-ID: <20050313144117.GA4471@in.ibm.com> References: <20050303083349.GA4896@in.ibm.com> <1109898734.4961.11.camel@dyn318077bld.beaverton.ibm.com> <20050304180235.0a8ff966.alex@clusterfs.com> Reply-To: suparna@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Badari Pulavarty , sct@redhat.com, akpmext2-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org Received: from e35.co.us.ibm.com ([32.97.110.133]:28035 "EHLO e35.co.us.ibm.com") by vger.kernel.org with ESMTP id S261281AbVCMOby (ORCPT ); Sun, 13 Mar 2005 09:31:54 -0500 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j2DEVrLg584922 for ; Sun, 13 Mar 2005 09:31:53 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j2DEVrqr168062 for ; Sun, 13 Mar 2005 07:31:53 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2DEVq4w024860 for ; Sun, 13 Mar 2005 07:31:53 -0700 To: Alex Tomas Content-Disposition: inline In-Reply-To: <20050304180235.0a8ff966.alex@clusterfs.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org What would be really nice is if we could do this in a way that enables reuse of generic paths even for ordered mode. One thought that comes to mind is journal commit waiting for writeback to complete on the data pages which need to be flushed to disk before meta-data can be committed, much like we do for O_SYNC. I realise that JBD is intended to work at a level of abstraction where it has no awareness of filesystems - hence the correspondence with buffer heads all through. So would the above be a complete no-no ? Regards Suparna On Fri, Mar 04, 2005 at 06:02:35PM +0300, Alex Tomas wrote: > On 03 Mar 2005 17:12:14 -0800 > Badari Pulavarty wrote: > > > One more thing, we need to keep in mind is - we need to make sure > > that "ordered" mode also improved - since all our testcode > > focuses on "writeback" mode and the default mode is "ordered" :( > > > > I've just cooked the patch to implement ordered mode for delayed > allocation path. please take it: > > ftp://ftp.clusterfs.com/pub/people/alex/2.6.11/ext3-delalloc-ordered-2.6.11-0.1.patch > > Stephen, Andrew could you review it, please? > > thanks, Alex > > > Index: linux-2.6.11/include/linux/jbd.h > =================================================================== > --- linux-2.6.11.orig/include/linux/jbd.h 2005-03-02 20:49:13.000000000 +0300 > +++ linux-2.6.11/include/linux/jbd.h 2005-03-04 17:03:52.000000000 +0300 > @@ -486,6 +486,12 @@ > struct journal_head *t_sync_datalist; > > /* > + * Number of BIO's submited in context of the transaction we > + * want to complete before committing > + */ > + atomic_t t_bios_in_flight; > + > + /* > * Doubly-linked circular list of all forget buffers (superseded > * buffers which we can un-checkpoint once this transaction commits) > * [j_list_lock] > @@ -678,6 +684,9 @@ > /* Wait queue to wait for updates to complete */ > wait_queue_head_t j_wait_updates; > > + /* Wait queue to wait for all BIOs to complete */ > + wait_queue_head_t j_wait_bios; > + > /* Semaphore for locking against concurrent checkpoints */ > struct semaphore j_checkpoint_sem; > > Index: linux-2.6.11/fs/jbd/commit.c > =================================================================== > --- linux-2.6.11.orig/fs/jbd/commit.c 2005-03-02 20:49:09.000000000 +0300 > +++ linux-2.6.11/fs/jbd/commit.c 2005-03-04 17:53:52.000000000 +0300 > @@ -619,6 +620,13 @@ > if (is_journal_aborted(journal)) > goto skip_commit; > > + /* > + * Before the commit record, we have to wait for all bio's > + * ext3_wb_writepages() issued against newly-allocated blocks > + */ > + wait_event(journal->j_wait_bios, > + atomic_read(&commit_transaction->t_bios_in_flight) == 0); > + > /* Done it all: now write the commit record. We should have > * cleaned up our previous buffers by now, so if we are in abort > * mode we can now just skip the rest of the journal write > Index: linux-2.6.11/fs/jbd/transaction.c > =================================================================== > --- linux-2.6.11.orig/fs/jbd/transaction.c 2005-03-02 20:49:09.000000000 +0300 > +++ linux-2.6.11/fs/jbd/transaction.c 2005-03-04 17:05:28.000000000 +0300 > @@ -51,6 +51,7 @@ > transaction->t_tid = journal->j_transaction_sequence++; > transaction->t_expires = jiffies + journal->j_commit_interval; > spin_lock_init(&transaction->t_handle_lock); > + atomic_set(&transaction->t_bios_in_flight, 0); > > /* Set up the commit timer for the new transaction. */ > journal->j_commit_timer->expires = transaction->t_expires; > Index: linux-2.6.11/fs/jbd/journal.c > =================================================================== > --- linux-2.6.11.orig/fs/jbd/journal.c 2005-03-04 17:04:29.000000000 +0300 > +++ linux-2.6.11/fs/jbd/journal.c 2005-03-04 17:04:40.000000000 +0300 > @@ -671,6 +671,7 @@ > init_waitqueue_head(&journal->j_wait_checkpoint); > init_waitqueue_head(&journal->j_wait_commit); > init_waitqueue_head(&journal->j_wait_updates); > + init_waitqueue_head(&journal->j_wait_bios); > init_MUTEX(&journal->j_barrier); > init_MUTEX(&journal->j_checkpoint_sem); > spin_lock_init(&journal->j_revoke_lock); > Index: linux-2.6.11/fs/ext3/writeback.c > =================================================================== > --- linux-2.6.11.orig/fs/ext3/writeback.c 2005-03-04 15:10:01.000000000 +0300 > +++ linux-2.6.11/fs/ext3/writeback.c 2005-03-04 17:33:05.000000000 +0300 > @@ -145,6 +145,17 @@ > if (bio->bi_size) > return 1; > > + if (bio->bi_private) { > + transaction_t *transaction = bio->bi_private; > + > + /* > + * journal_commit_transaction() may be awaiting > + * the bio to complete. > + */ > + if (atomic_dec_and_test(&transaction->t_bios_in_flight)) > + wake_up(&transaction->t_journal->j_wait_bios); > + } > + > do { > struct page *page = bvec->bv_page; > > @@ -162,6 +173,16 @@ > static struct bio *ext3_wb_bio_submit(struct bio *bio, handle_t *handle) > { > bio->bi_end_io = ext3_wb_end_io; > + if (handle) { > + /* > + * In data=ordered we shouldn't commit the transaction > + * until all data related to the transaction get on a > + * platter. > + */ > + atomic_inc(&handle->h_transaction->t_bios_in_flight); > + bio->bi_private = handle->h_transaction; > + } else > + bio->bi_private = NULL; > submit_bio(WRITE, bio); > return NULL; > } -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India