From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:56274) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QmPVO-0001he-Iw for qemu-devel@nongnu.org; Thu, 28 Jul 2011 08:15:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QmPVN-0004vc-LD for qemu-devel@nongnu.org; Thu, 28 Jul 2011 08:15:58 -0400 Received: from verein.lst.de ([213.95.11.211]:60255 helo=newverein.lst.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QmPVN-0004vM-Fk for qemu-devel@nongnu.org; Thu, 28 Jul 2011 08:15:57 -0400 Date: Thu, 28 Jul 2011 14:15:56 +0200 From: Christoph Hellwig Message-ID: <20110728121556.GA17125@lst.de> References: <1311791126-11383-1-git-send-email-freddy77@gmail.com> <1311791126-11383-2-git-send-email-freddy77@gmail.com> <20110727183122.GA14736@lst.de> <77083DC7-E37C-4B44-9A59-DB19E34D20E2@gmail.com> <20110727195718.GA16212@lst.de> <4E3113F9.9090902@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E3113F9.9090902@redhat.com> Subject: Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: "qemu-devel@nongnu.org" , Christoph Hellwig , Frediano Ziglio On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote: > > Indeed. This has come up a few times, and actually is a mostly trivial > > task. Maybe we should give up waiting for -blockdev and separate cache > > mode settings and allow a nocache-writethrough or similar mode now? It's > > going to be around 10 lines of code + documentation. > > I understand that there may be reasons for using O_DIRECT | O_DSYNC, but > what is the explanation for O_DSYNC improving performance? There isn't any, at least for modern Linux. O_DSYNC at this point is equivalent to a range fdatasync for each write call, and given that we're doing O_DIRECT the ranges flush doesn't matter. If you do have a modern host and an old guest it might end up beeing faster because the barrier implementation in Linux used to suck so badly, but that's not inhrent to the I/O model. If you guest however doesn't support cache flushes at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems and block devices. > Christoph, on another note: Can we rely on Linux AIO never returning > short writes except on EOF? Currently we return -EINVAL in this case, so > I hope it's true or we wouldn't return the correct error code. More or less. There's one corner case for all Linux I/O, and that is only writes up to INT_MAX are supported, and larger writes (and reads) get truncated to it. It's pretty nasty, but Linux has been vocally opposed to fixing this issue.