From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M1N48-00066x-4s for qemu-devel@nongnu.org; Tue, 05 May 2009 12:00:20 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M1N42-000626-Im for qemu-devel@nongnu.org; Tue, 05 May 2009 12:00:19 -0400 Received: from [199.232.76.173] (port=40902 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M1N42-00061s-1H for qemu-devel@nongnu.org; Tue, 05 May 2009 12:00:14 -0400 Received: from mail2.shareable.org ([80.68.89.115]:54110) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1M1N41-000483-E8 for qemu-devel@nongnu.org; Tue, 05 May 2009 12:00:13 -0400 Date: Tue, 5 May 2009 17:00:11 +0100 From: Jamie Lokier Subject: Re: [Qemu-devel] [PATCH 2/3] barriers: block-raw-posix barrier support Message-ID: <20090505160011.GD31100@shareable.org> References: <20090505120804.GA30651@lst.de> <20090505120836.GB30721@lst.de> <20090505123311.GD25328@shareable.org> <20090505132944.GA3416@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090505132944.GA3416@lst.de> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christoph Hellwig Cc: qemu-devel@nongnu.org Christoph Hellwig wrote: > On Tue, May 05, 2009 at 01:33:11PM +0100, Jamie Lokier wrote: > > You don't need two fdatasyncs if the barrier request is just a > > barrier, no data write, used only to flush previously written data by > > a guest's fsync/fdatasync implementation. > > Yeah. I'll put that optimization in after some testing. I suggest keeping a flag "flush_needed". Set it whenever a write is submitted, don't submit fsync/fdatasync when the flag is clear, clear it whenever an fsync/fdatasync is submitted. Provides a few more optimisation opportunities. > > This is the best argument yet for having distinct "barrier" and "sync" > > operations. "Barrier" is for ordering I/O, such as journalling > > filesystems. > > Doesn't really help as long as we're using the normal Posix filesystem > APIs on the host. The only way to guarantee ordering of multiple > *write* systen calls is to call f(data)sync between them. It doesn't help with journalling barriers, which I agree are dominant in a lot of workloads, but it does help guest fsync-heavy workloads. When "Sync && !Barrier" the guest doesn't require the full ordering guarantee. Therefore you can call f(data)sync _and_ call some writes on other I/O threads in parallel. The f(data)sync) mustn't be started until previous-queued writes are complete, but later-queued writes can be called in parallel with f(data)sync. (Or if using Linux AIO, the same with aio_fsync and later-queued aio_writes in parallel). In other words, with a guest fdatasync-heavy workload, like a database, it could keep the I/O pipeline busy instead of draining it as the full barrier does. It won't help with a journalling-barrier-heavy workload, without changes to the host to expose the distinct barrier types - i.e. a more flexible alternative to f(data)sync, such as is occasionally discussed elsewhere. -- Jamie