From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: [JBD] change batching logic to improve O_SYNC performance Date: Thu, 15 Dec 2005 16:39:12 -0500 Message-ID: <43A1E280.9080609@emc.com> References: <20051215145951.GB2444@kvack.org> <20051215155552.1f71a16e.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Benjamin LaHaise , sct@redhat.com, linux-fsdevel@vger.kernel.org Return-path: Received: from mexforward.lss.emc.com ([168.159.213.200]:30575 "EHLO mexforward.lss.emc.com") by vger.kernel.org with ESMTP id S1751221AbVLPAjt (ORCPT ); Thu, 15 Dec 2005 19:39:49 -0500 To: Andrew Morton In-Reply-To: <20051215155552.1f71a16e.akpm@osdl.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Andrew Morton wrote: > Benjamin LaHaise wrote: > >>Hello folks, >> >>When writing files out using O_SYNC, jbd's 1 jiffy delay results in a >>significant drop in throughput as the disk sits idle. The patch below >>results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on >>my IDE test box) when writing out files using O_SYNC. > > > That's really sad. Thanks for working that out. > > >> Instead of always >>delaying for 1 jiffy when trying to batch, merely do a yield() to allow >>other processes to execute and potentially batch requests. > > > Yeah, 2.4 has yield(). The O(1) yield semantics resulted in a performance > catastrophe in ext3 when the system was busy, so the batching code got > changed to a one-jiffy-sleep. I don't think we can go back to yield(). > > Worst-case we should just dump the batching code: single-threaded > O_SYNC/fsync is probably a commoner case than multi-threaded, dunno. I think that the above assumption might be true for a single threaded O_SYNC process, but is not normally true for fsync() heavy workloads. We have a multi-threaded write workload since we can boost files/sec by about 4-5x the single threaded write rate. Using the a properly configured write barrier (highly recommended if you care about your data ;-)) makes the cost of a fsync() call quite high so batching is a huge win. I think that NFS servers and other multi-threaded apps (mail servers?) might have a similar profile . In these cases, you definitely benefit by combining multiple fsync() requests in one disk operation. > > But surely we can do better than that. > > How's about something simple like just saying "if the last process which > did a synchronous write is not this process, do the batching thing". > > Despite some obvious complexity, I still think that adjusting the delay based on rate of the synchronous requests would be the best case. For example, even in the O_SYNC write case, if you have a single thread writing to disk in rapid succession, any delay is probably a waste. Another way to attack this is to actually expose some of the transacation mechanisms to the applications so they can do some explicit control over the commit phase which could be used to build batched fsync(), etc.