* [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
@ 2008-03-28 15:05 Marcelo Tosatti
2008-03-28 15:07 ` Jamie Lokier
2008-03-28 17:25 ` Ian Jackson
0 siblings, 2 replies; 17+ messages in thread
From: Marcelo Tosatti @ 2008-03-28 15:05 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel, qemu-devel
Its necessary to guarantee that pending AIO writes have reached stable
storage when the flush request returns.
Also change fsync() to fdatasync(), since the modification time is not
critical data.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm-userspace.io/qemu/block-raw-posix.c
===================================================================
--- kvm-userspace.io.orig/qemu/block-raw-posix.c
+++ kvm-userspace.io/qemu/block-raw-posix.c
@@ -557,10 +557,40 @@ static int raw_create(const char *filena
return 0;
}
+static void raw_aio_flush_complete(void *opaque, int ret)
+{
+ if (ret)
+ printf("WARNING: aio_fsync failed (completion)\n");
+}
+
+static void raw_aio_flush(BlockDriverState *bs)
+{
+ RawAIOCB *acb;
+
+ acb = raw_aio_setup(bs, 0, NULL, 0, raw_aio_flush_complete, NULL);
+ if (!acb)
+ return;
+
+ if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
+ qemu_aio_release(acb);
+ perror("aio_fsync");
+ printf("WARNING: aio_fsync failed\n");
+ return;
+ }
+}
+
static void raw_flush(BlockDriverState *bs)
{
BDRVRawState *s = bs->opaque;
- fsync(s->fd);
+ raw_aio_flush(bs);
+ fdatasync(s->fd);
+
+ /* We rely on the fact that no other AIO will be submitted
+ * in parallel, but this should be fixed by per-device
+ * AIO queues when allowing multiple CPU's to process IO
+ * in QEMU.
+ */
+ qemu_aio_flush();
}
BlockDriver bdrv_raw = {
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 15:05 [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request Marcelo Tosatti
@ 2008-03-28 15:07 ` Jamie Lokier
2008-03-28 16:31 ` [kvm-devel] " Marcelo Tosatti
2008-03-28 17:25 ` Ian Jackson
1 sibling, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-03-28 15:07 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel
Marcelo Tosatti wrote:
> Its necessary to guarantee that pending AIO writes have reached stable
> storage when the flush request returns.
>
> Also change fsync() to fdatasync(), since the modification time is not
> critical data.
> + if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
> BDRVRawState *s = bs->opaque;
> - fsync(s->fd);
> + raw_aio_flush(bs);
> + fdatasync(s->fd);
> +
> + /* We rely on the fact that no other AIO will be submitted
> + * in parallel, but this should be fixed by per-device
> + * AIO queues when allowing multiple CPU's to process IO
> + * in QEMU.
> + */
> + qemu_aio_flush();
I'm a bit confused by this. Why do you need aio_fsync(O_DSYNC) _and_
synchronous fdatasync() calls? Aren't they equivalent?
-- Jamie
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 15:07 ` Jamie Lokier
@ 2008-03-28 16:31 ` Marcelo Tosatti
2008-03-28 16:40 ` Paul Brook
2008-03-28 18:03 ` Jamie Lokier
0 siblings, 2 replies; 17+ messages in thread
From: Marcelo Tosatti @ 2008-03-28 16:31 UTC (permalink / raw)
To: Jamie Lokier; +Cc: kvm-devel, qemu-devel
On Fri, Mar 28, 2008 at 03:07:03PM +0000, Jamie Lokier wrote:
> Marcelo Tosatti wrote:
> > Its necessary to guarantee that pending AIO writes have reached stable
> > storage when the flush request returns.
> >
> > Also change fsync() to fdatasync(), since the modification time is not
> > critical data.
> > + if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
>
> > BDRVRawState *s = bs->opaque;
> > - fsync(s->fd);
> > + raw_aio_flush(bs);
> > + fdatasync(s->fd);
> > +
> > + /* We rely on the fact that no other AIO will be submitted
> > + * in parallel, but this should be fixed by per-device
> > + * AIO queues when allowing multiple CPU's to process IO
> > + * in QEMU.
> > + */
> > + qemu_aio_flush();
>
> I'm a bit confused by this. Why do you need aio_fsync(O_DSYNC) _and_
> synchronous fdatasync() calls? Aren't they equivalent?
fdatasync() will write and wait for completion of dirty file data
present in memory.
aio_write() only queues data for submission:
The "asynchronous" means that this call returns as soon as the request
has been enqueued; the write may or may not have completed when the
call returns. One tests for completion using aio_error(3).
So fdatasync() is not enough because data written via AIO may not
have been reflected as "dirty file data" through write() by the time
raw_flush() is called.
The SCSI and IDE drivers use flush() in response to a "flush cache"
request, which is used by the guest OS to implement barriers, for
example.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 16:31 ` [kvm-devel] " Marcelo Tosatti
@ 2008-03-28 16:40 ` Paul Brook
2008-03-28 16:59 ` Marcelo Tosatti
2008-03-28 18:03 ` Jamie Lokier
1 sibling, 1 reply; 17+ messages in thread
From: Paul Brook @ 2008-03-28 16:40 UTC (permalink / raw)
To: qemu-devel; +Cc: Marcelo Tosatti, kvm-devel
On Friday 28 March 2008, Marcelo Tosatti wrote:
> On Fri, Mar 28, 2008 at 03:07:03PM +0000, Jamie Lokier wrote:
> > Marcelo Tosatti wrote:
> > > Its necessary to guarantee that pending AIO writes have reached stable
> > > storage when the flush request returns.
> > >
> > > Also change fsync() to fdatasync(), since the modification time is not
> > > critical data.
> > > + if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
> > >
> > > BDRVRawState *s = bs->opaque;
> > > - fsync(s->fd);
> > > + raw_aio_flush(bs);
> > > + fdatasync(s->fd);
> > > +
> > > + /* We rely on the fact that no other AIO will be submitted
> > > + * in parallel, but this should be fixed by per-device
> > > + * AIO queues when allowing multiple CPU's to process IO
> > > + * in QEMU.
> > > + */
> > > + qemu_aio_flush();
> >
> > I'm a bit confused by this. Why do you need aio_fsync(O_DSYNC) _and_
> > synchronous fdatasync() calls? Aren't they equivalent?
>
> fdatasync() will write and wait for completion of dirty file data
> present in memory.
>
> aio_write() only queues data for submission:
>
> The "asynchronous" means that this call returns as soon as the
> request has been enqueued; the write may or may not have completed when
> the call returns. One tests for completion using aio_error(3).
Surely you should be using the normal aio notification to wait for the
aio_fsync to complete before reporting success to the device.
Paul
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 16:40 ` Paul Brook
@ 2008-03-28 16:59 ` Marcelo Tosatti
2008-03-28 17:00 ` Paul Brook
0 siblings, 1 reply; 17+ messages in thread
From: Marcelo Tosatti @ 2008-03-28 16:59 UTC (permalink / raw)
To: Paul Brook; +Cc: Marcelo Tosatti, kvm-devel, qemu-devel
On Fri, Mar 28, 2008 at 04:40:54PM +0000, Paul Brook wrote:
> On Friday 28 March 2008, Marcelo Tosatti wrote:
> > On Fri, Mar 28, 2008 at 03:07:03PM +0000, Jamie Lokier wrote:
> > > Marcelo Tosatti wrote:
> > > > Its necessary to guarantee that pending AIO writes have reached stable
> > > > storage when the flush request returns.
> > > >
> > > > Also change fsync() to fdatasync(), since the modification time is not
> > > > critical data.
> > > > + if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
> > > >
> > > > BDRVRawState *s = bs->opaque;
> > > > - fsync(s->fd);
> > > > + raw_aio_flush(bs);
> > > > + fdatasync(s->fd);
> > > > +
> > > > + /* We rely on the fact that no other AIO will be submitted
> > > > + * in parallel, but this should be fixed by per-device
> > > > + * AIO queues when allowing multiple CPU's to process IO
> > > > + * in QEMU.
> > > > + */
> > > > + qemu_aio_flush();
> > >
> > > I'm a bit confused by this. Why do you need aio_fsync(O_DSYNC) _and_
> > > synchronous fdatasync() calls? Aren't they equivalent?
> >
> > fdatasync() will write and wait for completion of dirty file data
> > present in memory.
> >
> > aio_write() only queues data for submission:
> >
> > The "asynchronous" means that this call returns as soon as the
> > request has been enqueued; the write may or may not have completed when
> > the call returns. One tests for completion using aio_error(3).
>
> Surely you should be using the normal aio notification to wait for the
> aio_fsync to complete before reporting success to the device.
qemu_aio_flush() will wait for all pending AIO requests (including
aio_fsync) to complete.
Or do you mean something else?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 16:59 ` Marcelo Tosatti
@ 2008-03-28 17:00 ` Paul Brook
2008-03-28 18:13 ` Marcelo Tosatti
0 siblings, 1 reply; 17+ messages in thread
From: Paul Brook @ 2008-03-28 17:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Marcelo Tosatti, kvm-devel
> > Surely you should be using the normal aio notification to wait for the
> > aio_fsync to complete before reporting success to the device.
>
> qemu_aio_flush() will wait for all pending AIO requests (including
> aio_fsync) to complete.
Then why do you need to separate fdatasync?
Paul
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 17:00 ` Paul Brook
@ 2008-03-28 18:13 ` Marcelo Tosatti
2008-03-29 1:17 ` Jamie Lokier
0 siblings, 1 reply; 17+ messages in thread
From: Marcelo Tosatti @ 2008-03-28 18:13 UTC (permalink / raw)
To: Paul Brook; +Cc: kvm-devel, qemu-devel
On Fri, Mar 28, 2008 at 05:00:39PM +0000, Paul Brook wrote:
> > > Surely you should be using the normal aio notification to wait for the
> > > aio_fsync to complete before reporting success to the device.
> >
> > qemu_aio_flush() will wait for all pending AIO requests (including
> > aio_fsync) to complete.
>
> Then why do you need to separate fdatasync?
Oh, I see what Jamie means now: fdatasync() is redundant with
aio_fsync(O_DSYNC).
How's this?
Index: kvm-userspace.io/qemu/block-raw-posix.c
===================================================================
--- kvm-userspace.io.orig/qemu/block-raw-posix.c
+++ kvm-userspace.io/qemu/block-raw-posix.c
@@ -557,10 +557,39 @@ static int raw_create(const char *filena
return 0;
}
+static void raw_aio_flush_complete(void *opaque, int ret)
+{
+ if (ret)
+ printf("WARNING: aio_fsync failed (completion)\n");
+}
+
+static void raw_aio_flush(BlockDriverState *bs)
+{
+ RawAIOCB *acb;
+
+ acb = raw_aio_setup(bs, 0, NULL, 0, raw_aio_flush_complete, NULL);
+ if (!acb)
+ return;
+
+ if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
+ qemu_aio_release(acb);
+ perror("aio_fsync");
+ printf("WARNING: aio_fsync failed\n");
+ return;
+ }
+}
+
static void raw_flush(BlockDriverState *bs)
{
BDRVRawState *s = bs->opaque;
- fsync(s->fd);
+ raw_aio_flush(bs);
+
+ /* We rely on the fact that no other AIO will be submitted
+ * in parallel, but this should be fixed by per-device
+ * AIO queues when allowing multiple CPU's to process IO
+ * in QEMU.
+ */
+ qemu_aio_flush();
}
BlockDriver bdrv_raw = {
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 18:13 ` Marcelo Tosatti
@ 2008-03-29 1:17 ` Jamie Lokier
2008-03-29 2:02 ` Paul Brook
0 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-03-29 1:17 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
Marcelo Tosatti wrote:
> static void raw_flush(BlockDriverState *bs)
> {
> BDRVRawState *s = bs->opaque;
> - fsync(s->fd);
> + raw_aio_flush(bs);
> +
> + /* We rely on the fact that no other AIO will be submitted
> + * in parallel, but this should be fixed by per-device
> + * AIO queues when allowing multiple CPU's to process IO
> + * in QEMU.
> + */
> + qemu_aio_flush();
> }
It depends what raw_flush() is used for.
If you want to be sure this flushes AIO writes in-flight at the time
of the call, I still reckon you need an extra qemu_aio_flush() before
raw_aio_flush() - on at least some POSIX AIO implementations. (I'd be
very interested to find out otherwise, if you know better).
But if, as Ian Jackson suggests, raw_flush() is _only_ used when the
guest driver issues a CACHE FLUSH command _and_ the guest driver
either cannot overlap operations, or cannot depend on overlapping
operations occuring in order, then you don't need it.
That'll depend on what kind of device is emulated. Does the SCSI
emulation handle multiple in-flight commands with any guarantee on
order? To be on the safe side, I'd include the extra qemu_aio_flush,
as I expect it's very unlikely to harm performance and might save
someone's data.
-- Jamie
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-29 1:17 ` Jamie Lokier
@ 2008-03-29 2:02 ` Paul Brook
2008-03-29 2:11 ` Jamie Lokier
0 siblings, 1 reply; 17+ messages in thread
From: Paul Brook @ 2008-03-29 2:02 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel
> That'll depend on what kind of device is emulated. Does the SCSI
> emulation handle multiple in-flight commands with any guarantee on
> order?
SCSI definitely allows (and we emulate) multiple in flight commands.
I can't find any requirement that writes must complete before a subsequent
SYNCHRONISE_CACHE. However I don't claim to know the spec that well, and it's
probably not a bad idea have them complete anyway. Preferably this would be a
completely asynchronous operation. i.e. the sync command returns immediately,
but only completes when all preceding writes have completed and been flushed
to disk.
Paul
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-29 2:02 ` Paul Brook
@ 2008-03-29 2:11 ` Jamie Lokier
2008-03-29 2:43 ` Paul Brook
0 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-03-29 2:11 UTC (permalink / raw)
To: Paul Brook; +Cc: kvm-devel, qemu-devel
Paul Brook wrote:
> > That'll depend on what kind of device is emulated. Does the SCSI
> > emulation handle multiple in-flight commands with any guarantee on
> > order?
>
> SCSI definitely allows (and we emulate) multiple in flight commands.
> I can't find any requirement that writes must complete before a
> subsequent SYNCHRONISE_CACHE. However I don't claim to know the spec
> that well,
Aren't there SCSI tagged barrier commands or something like that,
which allow a host to request ordering between commands?
> it's probably not a bad idea have them complete anyway. Preferably
> this would be a completely asynchronous operation. i.e. the sync
> command returns immediately, but only completes when all preceding
> writes have completed and been flushed to disk.
I agree, that seems the optimal implementation.
-- Jamie
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-29 2:11 ` Jamie Lokier
@ 2008-03-29 2:43 ` Paul Brook
0 siblings, 0 replies; 17+ messages in thread
From: Paul Brook @ 2008-03-29 2:43 UTC (permalink / raw)
To: Jamie Lokier; +Cc: kvm-devel, qemu-devel
On Saturday 29 March 2008, Jamie Lokier wrote:
> Paul Brook wrote:
> > > That'll depend on what kind of device is emulated. Does the SCSI
> > > emulation handle multiple in-flight commands with any guarantee on
> > > order?
> >
> > SCSI definitely allows (and we emulate) multiple in flight commands.
> > I can't find any requirement that writes must complete before a
> > subsequent SYNCHRONISE_CACHE. However I don't claim to know the spec
> > that well,
>
> Aren't there SCSI tagged barrier commands or something like that,
> which allow a host to request ordering between commands?
Ah, yes. Ordering is defined as part of the task management model, not part of
the commands themselves. For parallel scsi barrier tasks are created using
the ORDERED message (The implementation for this in the LSI HBA is currently
absent).
So in theory we don't care about this for SCSI because ordering will already
have been enforced by upper levels.
Paul
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 16:31 ` [kvm-devel] " Marcelo Tosatti
2008-03-28 16:40 ` Paul Brook
@ 2008-03-28 18:03 ` Jamie Lokier
2008-03-28 18:36 ` Marcelo Tosatti
1 sibling, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-03-28 18:03 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm-devel, qemu-devel
Marcelo Tosatti wrote:
> On Fri, Mar 28, 2008 at 03:07:03PM +0000, Jamie Lokier wrote:
> > Marcelo Tosatti wrote:
> > > Its necessary to guarantee that pending AIO writes have reached stable
> > > storage when the flush request returns.
> > >
> > > Also change fsync() to fdatasync(), since the modification time is not
> > > critical data.
> > > + if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
> >
> > > BDRVRawState *s = bs->opaque;
> > > - fsync(s->fd);
> > > + raw_aio_flush(bs);
> > > + fdatasync(s->fd);
> > > +
> > > + /* We rely on the fact that no other AIO will be submitted
> > > + * in parallel, but this should be fixed by per-device
> > > + * AIO queues when allowing multiple CPU's to process IO
> > > + * in QEMU.
> > > + */
> > > + qemu_aio_flush();
> >
> > I'm a bit confused by this. Why do you need aio_fsync(O_DSYNC) _and_
> > synchronous fdatasync() calls? Aren't they equivalent?
>
> fdatasync() will write and wait for completion of dirty file data
> present in memory.
>
> aio_write() only queues data for submission:
>
> The "asynchronous" means that this call returns as soon as the request
> has been enqueued; the write may or may not have completed when the
> call returns. One tests for completion using aio_error(3).
>
> So fdatasync() is not enough because data written via AIO may not
> have been reflected as "dirty file data" through write() by the time
> raw_flush() is called.
Sure. But why isn't the aio_fsync(O_DSYNC) enough by itself?
It seems to me you should have something like this:
/* Flush pending aio_writes until they are dirty data,
and wait before the aio_fsync. */
qemu_aio_flush();
/* Call aio_fsync(O_DSYNC). */
raw_aio_flush(bs);
/* Wait for the aio_fsync to complete. */
qemu_aio_flush();
What am I missing?
-- Jamie
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 18:03 ` Jamie Lokier
@ 2008-03-28 18:36 ` Marcelo Tosatti
2008-03-29 1:09 ` Jamie Lokier
0 siblings, 1 reply; 17+ messages in thread
From: Marcelo Tosatti @ 2008-03-28 18:36 UTC (permalink / raw)
To: Jamie Lokier, Paul Brook; +Cc: kvm-devel, qemu-devel
On Fri, Mar 28, 2008 at 06:03:25PM +0000, Jamie Lokier wrote:
> Marcelo Tosatti wrote:
> > On Fri, Mar 28, 2008 at 03:07:03PM +0000, Jamie Lokier wrote:
> > > Marcelo Tosatti wrote:
> > > > Its necessary to guarantee that pending AIO writes have reached stable
> > > > storage when the flush request returns.
> > > >
> > > > Also change fsync() to fdatasync(), since the modification time is not
> > > > critical data.
> > > > + if (aio_fsync(O_DSYNC, &acb->aiocb) < 0) {
> > >
> > > > BDRVRawState *s = bs->opaque;
> > > > - fsync(s->fd);
> > > > + raw_aio_flush(bs);
> > > > + fdatasync(s->fd);
> > > > +
> > > > + /* We rely on the fact that no other AIO will be submitted
> > > > + * in parallel, but this should be fixed by per-device
> > > > + * AIO queues when allowing multiple CPU's to process IO
> > > > + * in QEMU.
> > > > + */
> > > > + qemu_aio_flush();
> > >
> > > I'm a bit confused by this. Why do you need aio_fsync(O_DSYNC) _and_
> > > synchronous fdatasync() calls? Aren't they equivalent?
> >
> > fdatasync() will write and wait for completion of dirty file data
> > present in memory.
> >
> > aio_write() only queues data for submission:
> >
> > The "asynchronous" means that this call returns as soon as the request
> > has been enqueued; the write may or may not have completed when the
> > call returns. One tests for completion using aio_error(3).
> >
>
> > So fdatasync() is not enough because data written via AIO may not
> > have been reflected as "dirty file data" through write() by the time
> > raw_flush() is called.
>
> Sure. But why isn't the aio_fsync(O_DSYNC) enough by itself?
It is enough, fdatasync() becomes redundant.
> It seems to me you should have something like this:
>
> /* Flush pending aio_writes until they are dirty data,
> and wait before the aio_fsync. */
> qemu_aio_flush();
>
> /* Call aio_fsync(O_DSYNC). */
> raw_aio_flush(bs);
>
> /* Wait for the aio_fsync to complete. */
> qemu_aio_flush();
>
> What am I missing?
I don't think the first qemu_aio_flush() is necessary because the fsync
request will be enqueued after pending ones:
aio_fsync() function does a sync on all outstanding asynchronous
I/O operations associated with aiocbp->aio_fildes.
More precisely, if op is O_SYNC, then all currently queued I/O opera-
tions shall be completed as if by a call of fsync(2), and if op is
O_DSYNC, this call is the asynchronous analog of fdatasync(2). Note
that this is a request only — this call does not wait for I/O comple-
tion.
glibc sets the priority for fsync as 0, which is the same priority AIO
reads and writes are submitted by QEMU.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 18:36 ` Marcelo Tosatti
@ 2008-03-29 1:09 ` Jamie Lokier
2008-03-29 6:49 ` Marcelo Tosatti
0 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-03-29 1:09 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm-devel, Paul Brook, qemu-devel
Marcelo Tosatti wrote:
> I don't think the first qemu_aio_flush() is necessary because the fsync
> request will be enqueued after pending ones:
>
> aio_fsync() function does a sync on all outstanding
> asynchronous I/O operations associated with
> aiocbp->aio_fildes.
>
> More precisely, if op is O_SYNC, then all currently queued
> I/O operations shall be completed as if by a call of
> fsync(2), and if op is O_DSYNC, this call is the asynchronous
> analog of fdatasync(2). Note that this is a request only —
> this call does not wait for I/O completion.
>
> glibc sets the priority for fsync as 0, which is the same priority AIO
> reads and writes are submitted by QEMU.
Do AIO operations always get executed in the order they are submitted?
I was under the impression this is not guaranteed, as relaxed ordering
permits better I/O scheduling (e.g. to reduce disk seeks) - which is
one of the most useful points of AIO. (Otherwise you might as well
just have one worker thread doing synchronous IO in order).
And because of that, I was under the impression the only way to
implement a write barrier+flush in AIO was (1) wait for pending writes
to complete, then (2) aio_fsync, then (3) wait for the aio_fsync.
I could be wrong, but I haven't seen any documentation which says
otherwise, and it's what I'd expect of an implementation. I.e. it's
just an asynchronous version of fsync().
The quoted man page doesn't convince me. It says "all currently
queued I/O operations shall be completed" which _could_ mean that
aio_fsync is an AIO barrier too.
But then "if by a call of fsync(2)" implies that aio_fsync+aio_suspend
could just be replaced by fsync() with no change of semantics. So
"queued I/O operations" means what fsync() handles: dirty file data,
not in-flight AIO writes.
And you already noticed that fsync() is _not_ guaranteed to flush
in-flight AIO writes. Being the asynchronous analog, aio_fsync()
would not either.
-- Jamie
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-29 1:09 ` Jamie Lokier
@ 2008-03-29 6:49 ` Marcelo Tosatti
0 siblings, 0 replies; 17+ messages in thread
From: Marcelo Tosatti @ 2008-03-29 6:49 UTC (permalink / raw)
To: Paul Brook, qemu-devel, kvm-devel, Jamie Lokier
On Sat, Mar 29, 2008 at 01:09:30AM +0000, Jamie Lokier wrote:
> Marcelo Tosatti wrote:
> > I don't think the first qemu_aio_flush() is necessary because the fsync
> > request will be enqueued after pending ones:
> >
> > aio_fsync() function does a sync on all outstanding
> > asynchronous I/O operations associated with
> > aiocbp->aio_fildes.
> >
> > More precisely, if op is O_SYNC, then all currently queued
> > I/O operations shall be completed as if by a call of
> > fsync(2), and if op is O_DSYNC, this call is the asynchronous
> > analog of fdatasync(2). Note that this is a request only —
> > this call does not wait for I/O completion.
> >
> > glibc sets the priority for fsync as 0, which is the same priority AIO
> > reads and writes are submitted by QEMU.
>
> Do AIO operations always get executed in the order they are submitted?
With glibc AIO on the same file descriptor, yes. See
sysdeps/pthreads/aio_misc.c.
The kernel AIO implementation (used for DIRECT IO) does not implement
aio_fsync.
> I was under the impression this is not guaranteed, as relaxed ordering
> permits better I/O scheduling (e.g. to reduce disk seeks) - which is
> one of the most useful points of AIO. (Otherwise you might as well
> just have one worker thread doing synchronous IO in order).
>
> And because of that, I was under the impression the only way to
> implement a write barrier+flush in AIO was (1) wait for pending writes
> to complete, then (2) aio_fsync, then (3) wait for the aio_fsync.
>
> I could be wrong, but I haven't seen any documentation which says
> otherwise, and it's what I'd expect of an implementation. I.e. it's
> just an asynchronous version of fsync().
All documentation (and source code) I can find indicates that
aio_fsync() will wait for pending AIO requests to finish before doing
the fsync/fdatasync system call.
I find it hard to see much purpose in such an interface otherwise.
http://www.opengroup.org/onlinepubs/009695399/functions/aio_fsync.html
http://docs.hp.com/en/B9106-90012/aio.5.html "Synchronizing Permanent Storage"
http://www.gnu.org/software/libtool/manual/libc/Synchronizing-AIO-Operations.html#Synchronizing-AIO-Operations
"When dealing with asynchronous operations it is sometimes necessary to
get into a consistent state. This would mean for AIO that one wants to
know whether a certain request or a group of request were processed.
This could be done by waiting for the notification sent by the system
after the operation terminated, but this sometimes would mean wasting
resources (mainly computation time). Instead POSIX.1b defines two
functions which will help with most kinds of consistency."
http://www.governmentsecurity.org/articles/articles2/B2355-90693.pdf_fl/B2355-90693-29.html
This is the HPUX man page, much better than the Linux one.
> The quoted man page doesn't convince me. It says "all currently
> queued I/O operations shall be completed" which _could_ mean that
> aio_fsync is an AIO barrier too.
>
> But then "if by a call of fsync(2)" implies that aio_fsync+aio_suspend
> could just be replaced by fsync() with no change of semantics. So
> "queued I/O operations" means what fsync() handles: dirty file data,
> not in-flight AIO writes.
>
> And you already noticed that fsync() is _not_ guaranteed to flush
> in-flight AIO writes. Being the asynchronous analog, aio_fsync()
> would not either.
That seems to be the whole point of aio_fsync().
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 15:05 [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request Marcelo Tosatti
2008-03-28 15:07 ` Jamie Lokier
@ 2008-03-28 17:25 ` Ian Jackson
2008-03-28 19:11 ` [kvm-devel] " Marcelo Tosatti
1 sibling, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2008-03-28 17:25 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel
Marcelo Tosatti writes ("[Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request "):
> Its necessary to guarantee that pending AIO writes have reached stable
> storage when the flush request returns.
Surely it isn't necessary to call qemu_aio_flush ? Because those
pending AIO writes have not yet been returned to the guest as
complete, the guest is not entitled to assume that a FLUSH CACHE
command (issued before those writes have completed) completing
successfully means that those interleaved writes have reached stable
storage.
Also, this patch does a synchronous flush (which is bad because it
stalls the guest while the flush takes place) and it ignores any error
return (which is quite bad - see my other messages about bdrv_flush,
caches, etc.)
So I think it would be better to apply
- my bdrv_flush patch from February which I've reposted today
and then
- the asynchronous FLUSH CACHE patch which I've posted today
I think we concluded last time that the change of fsync to fdatasync
is correct but I think we should wait for the dust to settle before
introducing another change on top of all this ...
Ian.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [kvm-devel] [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request
2008-03-28 17:25 ` Ian Jackson
@ 2008-03-28 19:11 ` Marcelo Tosatti
0 siblings, 0 replies; 17+ messages in thread
From: Marcelo Tosatti @ 2008-03-28 19:11 UTC (permalink / raw)
To: Ian Jackson; +Cc: qemu-devel
On Fri, Mar 28, 2008 at 05:25:41PM +0000, Ian Jackson wrote:
> Marcelo Tosatti writes ("[Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request "):
> > Its necessary to guarantee that pending AIO writes have reached stable
> > storage when the flush request returns.
>
> Surely it isn't necessary to call qemu_aio_flush ? Because those
> pending AIO writes have not yet been returned to the guest as
> complete, the guest is not entitled to assume that a FLUSH CACHE
> command (issued before those writes have completed) completing
> successfully means that those interleaved writes have reached stable
> storage.
By the time qemu_aio_flush() finishes, all pending requests submitted
before aio_fsync() will have reached stable storage (aio_fsync will be
queued in the last position of the priority 0 AIO queue).
Unless the scheduling priority of the QEMU process is changed between
AIO write's and the fsync().
> Also, this patch does a synchronous flush (which is bad because it
> stalls the guest while the flush takes place) and it ignores any error
> return (which is quite bad - see my other messages about bdrv_flush,
> caches, etc.)
>
> So I think it would be better to apply
> - my bdrv_flush patch from February which I've reposted today
> and then
> - the asynchronous FLUSH CACHE patch which I've posted today
>
> I think we concluded last time that the change of fsync to fdatasync
> is correct but I think we should wait for the dust to settle before
> introducing another change on top of all this ...
Alright, please also fix the SCSI driver to perform asynchronous flush
with bdrv_aio_flush() and complete the SYNC_CACHE command on the callback.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2008-03-29 6:46 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-28 15:05 [Qemu-devel] [PATCH] QEMU: fsync AIO writes on flush request Marcelo Tosatti
2008-03-28 15:07 ` Jamie Lokier
2008-03-28 16:31 ` [kvm-devel] " Marcelo Tosatti
2008-03-28 16:40 ` Paul Brook
2008-03-28 16:59 ` Marcelo Tosatti
2008-03-28 17:00 ` Paul Brook
2008-03-28 18:13 ` Marcelo Tosatti
2008-03-29 1:17 ` Jamie Lokier
2008-03-29 2:02 ` Paul Brook
2008-03-29 2:11 ` Jamie Lokier
2008-03-29 2:43 ` Paul Brook
2008-03-28 18:03 ` Jamie Lokier
2008-03-28 18:36 ` Marcelo Tosatti
2008-03-29 1:09 ` Jamie Lokier
2008-03-29 6:49 ` Marcelo Tosatti
2008-03-28 17:25 ` Ian Jackson
2008-03-28 19:11 ` [kvm-devel] " Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).