From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request Date: Sat, 29 Mar 2008 01:09:30 +0000 Message-ID: <20080329010930.GA30219@shareable.org> References: <20080328150517.GA18077@dmt> <20080328150703.GA19624@shareable.org> <20080328163116.GA18853@dmt> <20080328180324.GA22555@shareable.org> <20080328183628.GB19547@dmt> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Cc: kvm-devel , Paul Brook , qemu-devel@nongnu.org To: Marcelo Tosatti Return-path: Content-Disposition: inline In-Reply-To: <20080328183628.GB19547@dmt> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces@lists.sourceforge.net Errors-To: kvm-devel-bounces@lists.sourceforge.net List-Id: kvm.vger.kernel.org TWFyY2VsbyBUb3NhdHRpIHdyb3RlOgo+IEkgZG9uJ3QgdGhpbmsgdGhlIGZpcnN0IHFlbXVfYWlv X2ZsdXNoKCkgaXMgbmVjZXNzYXJ5IGJlY2F1c2UgdGhlIGZzeW5jCj4gcmVxdWVzdCB3aWxsIGJl IGVucXVldWVkIGFmdGVyIHBlbmRpbmcgb25lczogCj4gCj4gICAgICAgIGFpb19mc3luYygpIGZ1 bmN0aW9uIGRvZXMgYSBzeW5jIG9uIGFsbCBvdXRzdGFuZGluZwo+ICAgICAgICBhc3luY2hyb25v dXMgSS9PIG9wZXJhdGlvbnMgYXNzb2NpYXRlZCB3aXRoCj4gICAgICAgIGFpb2NicC0+YWlvX2Zp bGRlcy4KPgo+ICAgICAgICBNb3JlIHByZWNpc2VseSwgaWYgb3AgaXMgT19TWU5DLCB0aGVuIGFs bCBjdXJyZW50bHkgcXVldWVkCj4gICAgICAgIEkvTyBvcGVyYXRpb25zIHNoYWxsIGJlIGNvbXBs ZXRlZCBhcyBpZiBieSBhIGNhbGwgb2YKPiAgICAgICAgZnN5bmMoMiksIGFuZCBpZiBvcCBpcyBP X0RTWU5DLCB0aGlzIGNhbGwgaXMgdGhlIGFzeW5jaHJvbm91cwo+ICAgICAgICBhbmFsb2cgb2Yg ZmRhdGFzeW5jKDIpLiAgTm90ZSB0aGF0IHRoaXMgaXMgYSByZXF1ZXN0IG9ubHkg4oCUCj4gICAg ICAgIHRoaXMgY2FsbCBkb2VzIG5vdCB3YWl0IGZvciBJL08gY29tcGxldGlvbi4KPiAKPiBnbGli YyBzZXRzIHRoZSBwcmlvcml0eSBmb3IgZnN5bmMgYXMgMCwgd2hpY2ggaXMgdGhlIHNhbWUgcHJp b3JpdHkgQUlPCj4gcmVhZHMgYW5kIHdyaXRlcyBhcmUgc3VibWl0dGVkIGJ5IFFFTVUuCgpEbyBB SU8gb3BlcmF0aW9ucyBhbHdheXMgZ2V0IGV4ZWN1dGVkIGluIHRoZSBvcmRlciB0aGV5IGFyZSBz dWJtaXR0ZWQ/CgpJIHdhcyB1bmRlciB0aGUgaW1wcmVzc2lvbiB0aGlzIGlzIG5vdCBndWFyYW50 ZWVkLCBhcyByZWxheGVkIG9yZGVyaW5nCnBlcm1pdHMgYmV0dGVyIEkvTyBzY2hlZHVsaW5nIChl LmcuIHRvIHJlZHVjZSBkaXNrIHNlZWtzKSAtIHdoaWNoIGlzCm9uZSBvZiB0aGUgbW9zdCB1c2Vm dWwgcG9pbnRzIG9mIEFJTy4gIChPdGhlcndpc2UgeW91IG1pZ2h0IGFzIHdlbGwKanVzdCBoYXZl IG9uZSB3b3JrZXIgdGhyZWFkIGRvaW5nIHN5bmNocm9ub3VzIElPIGluIG9yZGVyKS4KCkFuZCBi ZWNhdXNlIG9mIHRoYXQsIEkgd2FzIHVuZGVyIHRoZSBpbXByZXNzaW9uIHRoZSBvbmx5IHdheSB0 bwppbXBsZW1lbnQgYSB3cml0ZSBiYXJyaWVyK2ZsdXNoIGluIEFJTyB3YXMgKDEpIHdhaXQgZm9y IHBlbmRpbmcgd3JpdGVzCnRvIGNvbXBsZXRlLCB0aGVuICgyKSBhaW9fZnN5bmMsIHRoZW4gKDMp IHdhaXQgZm9yIHRoZSBhaW9fZnN5bmMuCgpJIGNvdWxkIGJlIHdyb25nLCBidXQgSSBoYXZlbid0 IHNlZW4gYW55IGRvY3VtZW50YXRpb24gd2hpY2ggc2F5cwpvdGhlcndpc2UsIGFuZCBpdCdzIHdo YXQgSSdkIGV4cGVjdCBvZiBhbiBpbXBsZW1lbnRhdGlvbi4gIEkuZS4gaXQncwpqdXN0IGFuIGFz eW5jaHJvbm91cyB2ZXJzaW9uIG9mIGZzeW5jKCkuCgpUaGUgcXVvdGVkIG1hbiBwYWdlIGRvZXNu J3QgY29udmluY2UgbWUuICBJdCBzYXlzICJhbGwgY3VycmVudGx5CnF1ZXVlZCBJL08gb3BlcmF0 aW9ucyBzaGFsbCBiZSBjb21wbGV0ZWQiIHdoaWNoIF9jb3VsZF8gbWVhbiB0aGF0CmFpb19mc3lu YyBpcyBhbiBBSU8gYmFycmllciB0b28uCgpCdXQgdGhlbiAiaWYgYnkgYSBjYWxsIG9mIGZzeW5j KDIpIiBpbXBsaWVzIHRoYXQgYWlvX2ZzeW5jK2Fpb19zdXNwZW5kCmNvdWxkIGp1c3QgYmUgcmVw bGFjZWQgYnkgZnN5bmMoKSB3aXRoIG5vIGNoYW5nZSBvZiBzZW1hbnRpY3MuICBTbwoicXVldWVk IEkvTyBvcGVyYXRpb25zIiBtZWFucyB3aGF0IGZzeW5jKCkgaGFuZGxlczogZGlydHkgZmlsZSBk YXRhLApub3QgaW4tZmxpZ2h0IEFJTyB3cml0ZXMuCgpBbmQgeW91IGFscmVhZHkgbm90aWNlZCB0 aGF0IGZzeW5jKCkgaXMgX25vdF8gZ3VhcmFudGVlZCB0byBmbHVzaAppbi1mbGlnaHQgQUlPIHdy aXRlcy4gIEJlaW5nIHRoZSBhc3luY2hyb25vdXMgYW5hbG9nLCBhaW9fZnN5bmMoKQp3b3VsZCBu b3QgZWl0aGVyLgoKLS0gSmFtaWUKCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KQ2hlY2sgb3V0IHRoZSBuZXcg U291cmNlRm9yZ2UubmV0IE1hcmtldHBsYWNlLgpJdCdzIHRoZSBiZXN0IHBsYWNlIHRvIGJ1eSBv ciBzZWxsIHNlcnZpY2VzIGZvcgpqdXN0IGFib3V0IGFueXRoaW5nIE9wZW4gU291cmNlLgpodHRw Oi8vYWQuZG91YmxlY2xpY2submV0L2NsazsxNjQyMTYyMzk7MTM1MDMwMzg7dz9odHRwOi8vc2Yu bmV0L21hcmtldHBsYWNlCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fCmt2bS1kZXZlbCBtYWlsaW5nIGxpc3QKa3ZtLWRldmVsQGxpc3RzLnNvdXJjZWZvcmdl Lm5ldApodHRwczovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9rdm0tZGV2 ZWwK From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JfPZi-0006Ta-DM for qemu-devel@nongnu.org; Fri, 28 Mar 2008 21:09:38 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JfPZf-0006RI-VE for qemu-devel@nongnu.org; Fri, 28 Mar 2008 21:09:37 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JfPZf-0006RB-KT for qemu-devel@nongnu.org; Fri, 28 Mar 2008 21:09:35 -0400 Received: from mail2.shareable.org ([80.68.89.115]) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JfPZf-0005Wx-8o for qemu-devel@nongnu.org; Fri, 28 Mar 2008 21:09:35 -0400 Date: Sat, 29 Mar 2008 01:09:30 +0000 From: Jamie Lokier Subject: Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request Message-ID: <20080329010930.GA30219@shareable.org> References: <20080328150517.GA18077@dmt> <20080328150703.GA19624@shareable.org> <20080328163116.GA18853@dmt> <20080328180324.GA22555@shareable.org> <20080328183628.GB19547@dmt> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20080328183628.GB19547@dmt> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcelo Tosatti Cc: kvm-devel , Paul Brook , qemu-devel@nongnu.org Marcelo Tosatti wrote: > I don't think the first qemu_aio_flush() is necessary because the fsync > request will be enqueued after pending ones: > > aio_fsync() function does a sync on all outstanding > asynchronous I/O operations associated with > aiocbp->aio_fildes. > > More precisely, if op is O_SYNC, then all currently queued > I/O operations shall be completed as if by a call of > fsync(2), and if op is O_DSYNC, this call is the asynchronous > analog of fdatasync(2). Note that this is a request only — > this call does not wait for I/O completion. > > glibc sets the priority for fsync as 0, which is the same priority AIO > reads and writes are submitted by QEMU. Do AIO operations always get executed in the order they are submitted? I was under the impression this is not guaranteed, as relaxed ordering permits better I/O scheduling (e.g. to reduce disk seeks) - which is one of the most useful points of AIO. (Otherwise you might as well just have one worker thread doing synchronous IO in order). And because of that, I was under the impression the only way to implement a write barrier+flush in AIO was (1) wait for pending writes to complete, then (2) aio_fsync, then (3) wait for the aio_fsync. I could be wrong, but I haven't seen any documentation which says otherwise, and it's what I'd expect of an implementation. I.e. it's just an asynchronous version of fsync(). The quoted man page doesn't convince me. It says "all currently queued I/O operations shall be completed" which _could_ mean that aio_fsync is an AIO barrier too. But then "if by a call of fsync(2)" implies that aio_fsync+aio_suspend could just be replaced by fsync() with no change of semantics. So "queued I/O operations" means what fsync() handles: dirty file data, not in-flight AIO writes. And you already noticed that fsync() is _not_ guaranteed to flush in-flight AIO writes. Being the asynchronous analog, aio_fsync() would not either. -- Jamie