From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Tomas Subject: Re: [Ext2-devel] Reviewing ext3 improvement patches (delalloc, mballoc, extents) Date: Fri, 4 Mar 2005 18:02:35 +0300 Message-ID: <20050304180235.0a8ff966.alex@clusterfs.com> References: <20050303083349.GA4896@in.ibm.com> <1109898734.4961.11.camel@dyn318077bld.beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: suparna@in.ibm.com, sct@redhat.com, akpmext2-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org Received: from [83.102.214.158] ([83.102.214.158]:36561 "EHLO gw.home.net") by vger.kernel.org with ESMTP id S262904AbVCDPEW (ORCPT ); Fri, 4 Mar 2005 10:04:22 -0500 To: Badari Pulavarty In-Reply-To: <1109898734.4961.11.camel@dyn318077bld.beaverton.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 03 Mar 2005 17:12:14 -0800 Badari Pulavarty wrote: > One more thing, we need to keep in mind is - we need to make sure > that "ordered" mode also improved - since all our testcode > focuses on "writeback" mode and the default mode is "ordered" :( > I've just cooked the patch to implement ordered mode for delayed allocation path. please take it: ftp://ftp.clusterfs.com/pub/people/alex/2.6.11/ext3-delalloc-ordered-2.6.11-0.1.patch Stephen, Andrew could you review it, please? thanks, Alex Index: linux-2.6.11/include/linux/jbd.h =================================================================== --- linux-2.6.11.orig/include/linux/jbd.h 2005-03-02 20:49:13.000000000 +0300 +++ linux-2.6.11/include/linux/jbd.h 2005-03-04 17:03:52.000000000 +0300 @@ -486,6 +486,12 @@ struct journal_head *t_sync_datalist; /* + * Number of BIO's submited in context of the transaction we + * want to complete before committing + */ + atomic_t t_bios_in_flight; + + /* * Doubly-linked circular list of all forget buffers (superseded * buffers which we can un-checkpoint once this transaction commits) * [j_list_lock] @@ -678,6 +684,9 @@ /* Wait queue to wait for updates to complete */ wait_queue_head_t j_wait_updates; + /* Wait queue to wait for all BIOs to complete */ + wait_queue_head_t j_wait_bios; + /* Semaphore for locking against concurrent checkpoints */ struct semaphore j_checkpoint_sem; Index: linux-2.6.11/fs/jbd/commit.c =================================================================== --- linux-2.6.11.orig/fs/jbd/commit.c 2005-03-02 20:49:09.000000000 +0300 +++ linux-2.6.11/fs/jbd/commit.c 2005-03-04 17:53:52.000000000 +0300 @@ -619,6 +620,13 @@ if (is_journal_aborted(journal)) goto skip_commit; + /* + * Before the commit record, we have to wait for all bio's + * ext3_wb_writepages() issued against newly-allocated blocks + */ + wait_event(journal->j_wait_bios, + atomic_read(&commit_transaction->t_bios_in_flight) == 0); + /* Done it all: now write the commit record. We should have * cleaned up our previous buffers by now, so if we are in abort * mode we can now just skip the rest of the journal write Index: linux-2.6.11/fs/jbd/transaction.c =================================================================== --- linux-2.6.11.orig/fs/jbd/transaction.c 2005-03-02 20:49:09.000000000 +0300 +++ linux-2.6.11/fs/jbd/transaction.c 2005-03-04 17:05:28.000000000 +0300 @@ -51,6 +51,7 @@ transaction->t_tid = journal->j_transaction_sequence++; transaction->t_expires = jiffies + journal->j_commit_interval; spin_lock_init(&transaction->t_handle_lock); + atomic_set(&transaction->t_bios_in_flight, 0); /* Set up the commit timer for the new transaction. */ journal->j_commit_timer->expires = transaction->t_expires; Index: linux-2.6.11/fs/jbd/journal.c =================================================================== --- linux-2.6.11.orig/fs/jbd/journal.c 2005-03-04 17:04:29.000000000 +0300 +++ linux-2.6.11/fs/jbd/journal.c 2005-03-04 17:04:40.000000000 +0300 @@ -671,6 +671,7 @@ init_waitqueue_head(&journal->j_wait_checkpoint); init_waitqueue_head(&journal->j_wait_commit); init_waitqueue_head(&journal->j_wait_updates); + init_waitqueue_head(&journal->j_wait_bios); init_MUTEX(&journal->j_barrier); init_MUTEX(&journal->j_checkpoint_sem); spin_lock_init(&journal->j_revoke_lock); Index: linux-2.6.11/fs/ext3/writeback.c =================================================================== --- linux-2.6.11.orig/fs/ext3/writeback.c 2005-03-04 15:10:01.000000000 +0300 +++ linux-2.6.11/fs/ext3/writeback.c 2005-03-04 17:33:05.000000000 +0300 @@ -145,6 +145,17 @@ if (bio->bi_size) return 1; + if (bio->bi_private) { + transaction_t *transaction = bio->bi_private; + + /* + * journal_commit_transaction() may be awaiting + * the bio to complete. + */ + if (atomic_dec_and_test(&transaction->t_bios_in_flight)) + wake_up(&transaction->t_journal->j_wait_bios); + } + do { struct page *page = bvec->bv_page; @@ -162,6 +173,16 @@ static struct bio *ext3_wb_bio_submit(struct bio *bio, handle_t *handle) { bio->bi_end_io = ext3_wb_end_io; + if (handle) { + /* + * In data=ordered we shouldn't commit the transaction + * until all data related to the transaction get on a + * platter. + */ + atomic_inc(&handle->h_transaction->t_bios_in_flight); + bio->bi_private = handle->h_transaction; + } else + bio->bi_private = NULL; submit_bio(WRITE, bio); return NULL; }