From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request Date: Sat, 29 Mar 2008 03:49:16 -0300 Message-ID: <20080329064916.GA23947@dmt> References: <20080328150517.GA18077@dmt> <20080328150703.GA19624@shareable.org> <20080328163116.GA18853@dmt> <20080328180324.GA22555@shareable.org> <20080328183628.GB19547@dmt> <20080329010930.GA30219@shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 To: Paul Brook , qemu-devel@nongnu.org, kvm-devel , Jamie Lokier Return-path: Content-Disposition: inline In-Reply-To: <20080329010930.GA30219@shareable.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces@lists.sourceforge.net Errors-To: kvm-devel-bounces@lists.sourceforge.net List-Id: kvm.vger.kernel.org T24gU2F0LCBNYXIgMjksIDIwMDggYXQgMDE6MDk6MzBBTSArMDAwMCwgSmFtaWUgTG9raWVyIHdy b3RlOgo+IE1hcmNlbG8gVG9zYXR0aSB3cm90ZToKPiA+IEkgZG9uJ3QgdGhpbmsgdGhlIGZpcnN0 IHFlbXVfYWlvX2ZsdXNoKCkgaXMgbmVjZXNzYXJ5IGJlY2F1c2UgdGhlIGZzeW5jCj4gPiByZXF1 ZXN0IHdpbGwgYmUgZW5xdWV1ZWQgYWZ0ZXIgcGVuZGluZyBvbmVzOiAKPiA+IAo+ID4gICAgICAg IGFpb19mc3luYygpIGZ1bmN0aW9uIGRvZXMgYSBzeW5jIG9uIGFsbCBvdXRzdGFuZGluZwo+ID4g ICAgICAgIGFzeW5jaHJvbm91cyBJL08gb3BlcmF0aW9ucyBhc3NvY2lhdGVkIHdpdGgKPiA+ICAg ICAgICBhaW9jYnAtPmFpb19maWxkZXMuCj4gPgo+ID4gICAgICAgIE1vcmUgcHJlY2lzZWx5LCBp ZiBvcCBpcyBPX1NZTkMsIHRoZW4gYWxsIGN1cnJlbnRseSBxdWV1ZWQKPiA+ICAgICAgICBJL08g b3BlcmF0aW9ucyBzaGFsbCBiZSBjb21wbGV0ZWQgYXMgaWYgYnkgYSBjYWxsIG9mCj4gPiAgICAg ICAgZnN5bmMoMiksIGFuZCBpZiBvcCBpcyBPX0RTWU5DLCB0aGlzIGNhbGwgaXMgdGhlIGFzeW5j aHJvbm91cwo+ID4gICAgICAgIGFuYWxvZyBvZiBmZGF0YXN5bmMoMikuICBOb3RlIHRoYXQgdGhp cyBpcyBhIHJlcXVlc3Qgb25seSDigJQKPiA+ICAgICAgICB0aGlzIGNhbGwgZG9lcyBub3Qgd2Fp dCBmb3IgSS9PIGNvbXBsZXRpb24uCj4gPiAKPiA+IGdsaWJjIHNldHMgdGhlIHByaW9yaXR5IGZv ciBmc3luYyBhcyAwLCB3aGljaCBpcyB0aGUgc2FtZSBwcmlvcml0eSBBSU8KPiA+IHJlYWRzIGFu ZCB3cml0ZXMgYXJlIHN1Ym1pdHRlZCBieSBRRU1VLgo+IAo+IERvIEFJTyBvcGVyYXRpb25zIGFs d2F5cyBnZXQgZXhlY3V0ZWQgaW4gdGhlIG9yZGVyIHRoZXkgYXJlIHN1Ym1pdHRlZD8KCldpdGgg Z2xpYmMgQUlPIG9uIHRoZSBzYW1lIGZpbGUgZGVzY3JpcHRvciwgeWVzLiBTZWUKc3lzZGVwcy9w dGhyZWFkcy9haW9fbWlzYy5jLgoKVGhlIGtlcm5lbCBBSU8gaW1wbGVtZW50YXRpb24gKHVzZWQg Zm9yIERJUkVDVCBJTykgZG9lcyBub3QgaW1wbGVtZW50CmFpb19mc3luYy4KCj4gSSB3YXMgdW5k ZXIgdGhlIGltcHJlc3Npb24gdGhpcyBpcyBub3QgZ3VhcmFudGVlZCwgYXMgcmVsYXhlZCBvcmRl cmluZwo+IHBlcm1pdHMgYmV0dGVyIEkvTyBzY2hlZHVsaW5nIChlLmcuIHRvIHJlZHVjZSBkaXNr IHNlZWtzKSAtIHdoaWNoIGlzCj4gb25lIG9mIHRoZSBtb3N0IHVzZWZ1bCBwb2ludHMgb2YgQUlP LiAgKE90aGVyd2lzZSB5b3UgbWlnaHQgYXMgd2VsbAo+IGp1c3QgaGF2ZSBvbmUgd29ya2VyIHRo cmVhZCBkb2luZyBzeW5jaHJvbm91cyBJTyBpbiBvcmRlcikuCj4gCj4gQW5kIGJlY2F1c2Ugb2Yg dGhhdCwgSSB3YXMgdW5kZXIgdGhlIGltcHJlc3Npb24gdGhlIG9ubHkgd2F5IHRvCj4gaW1wbGVt ZW50IGEgd3JpdGUgYmFycmllcitmbHVzaCBpbiBBSU8gd2FzICgxKSB3YWl0IGZvciBwZW5kaW5n IHdyaXRlcwo+IHRvIGNvbXBsZXRlLCB0aGVuICgyKSBhaW9fZnN5bmMsIHRoZW4gKDMpIHdhaXQg Zm9yIHRoZSBhaW9fZnN5bmMuCj4gCj4gSSBjb3VsZCBiZSB3cm9uZywgYnV0IEkgaGF2ZW4ndCBz ZWVuIGFueSBkb2N1bWVudGF0aW9uIHdoaWNoIHNheXMKPiBvdGhlcndpc2UsIGFuZCBpdCdzIHdo YXQgSSdkIGV4cGVjdCBvZiBhbiBpbXBsZW1lbnRhdGlvbi4gIEkuZS4gaXQncwo+IGp1c3QgYW4g YXN5bmNocm9ub3VzIHZlcnNpb24gb2YgZnN5bmMoKS4KCkFsbCBkb2N1bWVudGF0aW9uIChhbmQg c291cmNlIGNvZGUpIEkgY2FuIGZpbmQgaW5kaWNhdGVzIHRoYXQKYWlvX2ZzeW5jKCkgd2lsbCB3 YWl0IGZvciBwZW5kaW5nIEFJTyByZXF1ZXN0cyB0byBmaW5pc2ggYmVmb3JlIGRvaW5nCnRoZSBm c3luYy9mZGF0YXN5bmMgc3lzdGVtIGNhbGwuCgpJIGZpbmQgaXQgaGFyZCB0byBzZWUgbXVjaCBw dXJwb3NlIGluIHN1Y2ggYW4gaW50ZXJmYWNlIG90aGVyd2lzZS4KCmh0dHA6Ly93d3cub3Blbmdy b3VwLm9yZy9vbmxpbmVwdWJzLzAwOTY5NTM5OS9mdW5jdGlvbnMvYWlvX2ZzeW5jLmh0bWwKCmh0 dHA6Ly9kb2NzLmhwLmNvbS9lbi9COTEwNi05MDAxMi9haW8uNS5odG1sICJTeW5jaHJvbml6aW5n IFBlcm1hbmVudCBTdG9yYWdlIgoKaHR0cDovL3d3dy5nbnUub3JnL3NvZnR3YXJlL2xpYnRvb2wv bWFudWFsL2xpYmMvU3luY2hyb25pemluZy1BSU8tT3BlcmF0aW9ucy5odG1sI1N5bmNocm9uaXpp bmctQUlPLU9wZXJhdGlvbnMKCiJXaGVuIGRlYWxpbmcgd2l0aCBhc3luY2hyb25vdXMgb3BlcmF0 aW9ucyBpdCBpcyBzb21ldGltZXMgbmVjZXNzYXJ5IHRvCmdldCBpbnRvIGEgY29uc2lzdGVudCBz dGF0ZS4gVGhpcyB3b3VsZCBtZWFuIGZvciBBSU8gdGhhdCBvbmUgd2FudHMgdG8Ka25vdyB3aGV0 aGVyIGEgY2VydGFpbiByZXF1ZXN0IG9yIGEgZ3JvdXAgb2YgcmVxdWVzdCB3ZXJlIHByb2Nlc3Nl ZC4KVGhpcyBjb3VsZCBiZSBkb25lIGJ5IHdhaXRpbmcgZm9yIHRoZSBub3RpZmljYXRpb24gc2Vu dCBieSB0aGUgc3lzdGVtCmFmdGVyIHRoZSBvcGVyYXRpb24gdGVybWluYXRlZCwgYnV0IHRoaXMg c29tZXRpbWVzIHdvdWxkIG1lYW4gd2FzdGluZwpyZXNvdXJjZXMgKG1haW5seSBjb21wdXRhdGlv biB0aW1lKS4gSW5zdGVhZCBQT1NJWC4xYiBkZWZpbmVzIHR3bwpmdW5jdGlvbnMgd2hpY2ggd2ls bCBoZWxwIHdpdGggbW9zdCBraW5kcyBvZiBjb25zaXN0ZW5jeS4iCgpodHRwOi8vd3d3LmdvdmVy bm1lbnRzZWN1cml0eS5vcmcvYXJ0aWNsZXMvYXJ0aWNsZXMyL0IyMzU1LTkwNjkzLnBkZl9mbC9C MjM1NS05MDY5My0yOS5odG1sCgpUaGlzIGlzIHRoZSBIUFVYIG1hbiBwYWdlLCBtdWNoIGJldHRl ciB0aGFuIHRoZSBMaW51eCBvbmUuCgo+IFRoZSBxdW90ZWQgbWFuIHBhZ2UgZG9lc24ndCBjb252 aW5jZSBtZS4gIEl0IHNheXMgImFsbCBjdXJyZW50bHkKPiBxdWV1ZWQgSS9PIG9wZXJhdGlvbnMg c2hhbGwgYmUgY29tcGxldGVkIiB3aGljaCBfY291bGRfIG1lYW4gdGhhdAo+IGFpb19mc3luYyBp cyBhbiBBSU8gYmFycmllciB0b28uCj4gCj4gQnV0IHRoZW4gImlmIGJ5IGEgY2FsbCBvZiBmc3lu YygyKSIgaW1wbGllcyB0aGF0IGFpb19mc3luYythaW9fc3VzcGVuZAo+IGNvdWxkIGp1c3QgYmUg cmVwbGFjZWQgYnkgZnN5bmMoKSB3aXRoIG5vIGNoYW5nZSBvZiBzZW1hbnRpY3MuICBTbwo+ICJx dWV1ZWQgSS9PIG9wZXJhdGlvbnMiIG1lYW5zIHdoYXQgZnN5bmMoKSBoYW5kbGVzOiBkaXJ0eSBm aWxlIGRhdGEsCj4gbm90IGluLWZsaWdodCBBSU8gd3JpdGVzLgo+IAo+IEFuZCB5b3UgYWxyZWFk eSBub3RpY2VkIHRoYXQgZnN5bmMoKSBpcyBfbm90XyBndWFyYW50ZWVkIHRvIGZsdXNoCj4gaW4t ZmxpZ2h0IEFJTyB3cml0ZXMuICBCZWluZyB0aGUgYXN5bmNocm9ub3VzIGFuYWxvZywgYWlvX2Zz eW5jKCkKPiB3b3VsZCBub3QgZWl0aGVyLgoKVGhhdCBzZWVtcyB0byBiZSB0aGUgd2hvbGUgcG9p bnQgb2YgYWlvX2ZzeW5jKCkuCgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCkNoZWNrIG91dCB0aGUgbmV3IFNv dXJjZUZvcmdlLm5ldCBNYXJrZXRwbGFjZS4KSXQncyB0aGUgYmVzdCBwbGFjZSB0byBidXkgb3Ig c2VsbCBzZXJ2aWNlcyBmb3IKanVzdCBhYm91dCBhbnl0aGluZyBPcGVuIFNvdXJjZS4KaHR0cDov L2FkLmRvdWJsZWNsaWNrLm5ldC9jbGs7MTY0MjE2MjM5OzEzNTAzMDM4O3c/aHR0cDovL3NmLm5l dC9tYXJrZXRwbGFjZQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fXwprdm0tZGV2ZWwgbWFpbGluZyBsaXN0Cmt2bS1kZXZlbEBsaXN0cy5zb3VyY2Vmb3JnZS5u ZXQKaHR0cHM6Ly9saXN0cy5zb3VyY2Vmb3JnZS5uZXQvbGlzdHMvbGlzdGluZm8va3ZtLWRldmVs Cg== From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JfUpY-0000E4-TX for qemu-devel@nongnu.org; Sat, 29 Mar 2008 02:46:20 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JfUpX-0000Ds-Ai for qemu-devel@nongnu.org; Sat, 29 Mar 2008 02:46:19 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JfUpX-0000Dp-3s for qemu-devel@nongnu.org; Sat, 29 Mar 2008 02:46:19 -0400 Received: from mx20.gnu.org ([199.232.41.8]) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JfUpW-0001n3-GM for qemu-devel@nongnu.org; Sat, 29 Mar 2008 02:46:18 -0400 Received: from kanga.kvack.org ([66.96.29.28]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JfUpV-0000vf-Pv for qemu-devel@nongnu.org; Sat, 29 Mar 2008 02:46:18 -0400 Date: Sat, 29 Mar 2008 03:49:16 -0300 From: Marcelo Tosatti Subject: Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request Message-ID: <20080329064916.GA23947@dmt> References: <20080328150517.GA18077@dmt> <20080328150703.GA19624@shareable.org> <20080328163116.GA18853@dmt> <20080328180324.GA22555@shareable.org> <20080328183628.GB19547@dmt> <20080329010930.GA30219@shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: QUOTED-PRINTABLE In-Reply-To: <20080329010930.GA30219@shareable.org> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook , qemu-devel@nongnu.org, kvm-devel , Jamie Lokier On Sat, Mar 29, 2008 at 01:09:30AM +0000, Jamie Lokier wrote: > Marcelo Tosatti wrote: > > I don't think the first qemu_aio_flush() is necessary because the f= sync > > request will be enqueued after pending ones:=20 > >=20 > > aio_fsync() function does a sync on all outstanding > > asynchronous I/O operations associated with > > aiocbp->aio_fildes. > > > > More precisely, if op is O_SYNC, then all currently queued > > I/O operations shall be completed as if by a call of > > fsync(2), and if op is O_DSYNC, this call is the asynchronou= s > > analog of fdatasync(2). Note that this is a request only =E2= =80=94 > > this call does not wait for I/O completion. > >=20 > > glibc sets the priority for fsync as 0, which is the same priority = AIO > > reads and writes are submitted by QEMU. >=20 > Do AIO operations always get executed in the order they are submitted= ? With glibc AIO on the same file descriptor, yes. See sysdeps/pthreads/aio_misc.c. The kernel AIO implementation (used for DIRECT IO) does not implement aio_fsync. > I was under the impression this is not guaranteed, as relaxed orderin= g > permits better I/O scheduling (e.g. to reduce disk seeks) - which is > one of the most useful points of AIO. (Otherwise you might as well > just have one worker thread doing synchronous IO in order). >=20 > And because of that, I was under the impression the only way to > implement a write barrier+flush in AIO was (1) wait for pending write= s > to complete, then (2) aio_fsync, then (3) wait for the aio_fsync. >=20 > I could be wrong, but I haven't seen any documentation which says > otherwise, and it's what I'd expect of an implementation. I.e. it's > just an asynchronous version of fsync(). All documentation (and source code) I can find indicates that aio_fsync() will wait for pending AIO requests to finish before doing the fsync/fdatasync system call. I find it hard to see much purpose in such an interface otherwise. http://www.opengroup.org/onlinepubs/009695399/functions/aio_fsync.html http://docs.hp.com/en/B9106-90012/aio.5.html "Synchronizing Permanent S= torage" http://www.gnu.org/software/libtool/manual/libc/Synchronizing-AIO-Opera= tions.html#Synchronizing-AIO-Operations "When dealing with asynchronous operations it is sometimes necessary to get into a consistent state. This would mean for AIO that one wants to know whether a certain request or a group of request were processed. This could be done by waiting for the notification sent by the system after the operation terminated, but this sometimes would mean wasting resources (mainly computation time). Instead POSIX.1b defines two functions which will help with most kinds of consistency." http://www.governmentsecurity.org/articles/articles2/B2355-90693.pdf_fl= /B2355-90693-29.html This is the HPUX man page, much better than the Linux one. > The quoted man page doesn't convince me. It says "all currently > queued I/O operations shall be completed" which _could_ mean that > aio_fsync is an AIO barrier too. >=20 > But then "if by a call of fsync(2)" implies that aio_fsync+aio_suspen= d > could just be replaced by fsync() with no change of semantics. So > "queued I/O operations" means what fsync() handles: dirty file data, > not in-flight AIO writes. >=20 > And you already noticed that fsync() is _not_ guaranteed to flush > in-flight AIO writes. Being the asynchronous analog, aio_fsync() > would not either. That seems to be the whole point of aio_fsync().