From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [JBD] change batching logic to improve O_SYNC performance Date: Thu, 15 Dec 2005 15:55:52 -0800 Message-ID: <20051215155552.1f71a16e.akpm@osdl.org> References: <20051215145951.GB2444@kvack.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: sct@redhat.com, linux-fsdevel@vger.kernel.org Return-path: Received: from smtp.osdl.org ([65.172.181.4]:23757 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S1751204AbVLOXzI (ORCPT ); Thu, 15 Dec 2005 18:55:08 -0500 To: Benjamin LaHaise In-Reply-To: <20051215145951.GB2444@kvack.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Benjamin LaHaise wrote: > > Hello folks, > > When writing files out using O_SYNC, jbd's 1 jiffy delay results in a > significant drop in throughput as the disk sits idle. The patch below > results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on > my IDE test box) when writing out files using O_SYNC. That's really sad. Thanks for working that out. > Instead of always > delaying for 1 jiffy when trying to batch, merely do a yield() to allow > other processes to execute and potentially batch requests. Yeah, 2.4 has yield(). The O(1) yield semantics resulted in a performance catastrophe in ext3 when the system was busy, so the batching code got changed to a one-jiffy-sleep. I don't think we can go back to yield(). Worst-case we should just dump the batching code: single-threaded O_SYNC/fsync is probably a commoner case than multi-threaded, dunno. But surely we can do better than that. How's about something simple like just saying "if the last process which did a synchronous write is not this process, do the batching thing". Something like this? fs/jbd/transaction.c | 10 +++++++++- include/linux/jbd.h | 4 ++++ 2 files changed, 13 insertions(+), 1 deletion(-) diff -puN fs/jbd/transaction.c~jbd-fix-transaction-batching fs/jbd/transaction.c --- 25/fs/jbd/transaction.c~jbd-fix-transaction-batching Thu Dec 15 15:47:52 2005 +++ 25-akpm/fs/jbd/transaction.c Thu Dec 15 15:53:28 2005 @@ -1308,6 +1308,7 @@ int journal_stop(handle_t *handle) transaction_t *transaction = handle->h_transaction; journal_t *journal = transaction->t_journal; int old_handle_count, err; + pid_t pid; J_ASSERT(transaction->t_updates > 0); J_ASSERT(journal_current_handle() == handle); @@ -1333,8 +1334,15 @@ int journal_stop(handle_t *handle) * It doesn't cost much - we're about to run a commit and sleep * on IO anyway. Speeds up many-threaded, many-dir operations * by 30x or more... + * + * But don't do this if this process was the most recent one to + * perform a synchronous write. We do this to detect the case where a + * single process is doing a stream of sync writes. No point in waiting + * for joiners in that case. */ - if (handle->h_sync) { + pid = current->pid; + if (handle->h_sync && journal->j_last_sync_writer != pid) { + journal->j_last_sync_writer = pid; do { old_handle_count = transaction->t_handle_count; schedule_timeout_uninterruptible(1); diff -puN include/linux/jbd.h~jbd-fix-transaction-batching include/linux/jbd.h --- 25/include/linux/jbd.h~jbd-fix-transaction-batching Thu Dec 15 15:48:25 2005 +++ 25-akpm/include/linux/jbd.h Thu Dec 15 15:53:46 2005 @@ -23,6 +23,7 @@ #define jfs_debug jbd_debug #else +#include #include #include #include @@ -618,6 +619,7 @@ struct transaction_s * @j_wbuf: array of buffer_heads for journal_commit_transaction * @j_wbufsize: maximum number of buffer_heads allowed in j_wbuf, the * number that will fit in j_blocksize + * @j_last_sync_writer: most recent pid which did a synchronous write * @j_private: An opaque pointer to fs-private information. */ @@ -807,6 +809,8 @@ struct journal_s struct buffer_head **j_wbuf; int j_wbufsize; + pid_t j_last_sync_writer; + /* * An opaque pointer to fs-private information. ext3 puts its * superblock pointer here _