* [Qemu-devel] [PATCH 0/2] improve Linux AIO support
@ 2011-07-27 18:25 Frediano Ziglio
2011-07-27 18:25 ` [Qemu-devel] [PATCH 1/2] linux aio: support flush operation Frediano Ziglio
2011-07-27 18:25 ` [Qemu-devel] [PATCH 2/2] aio: use Linux AIO even if nocache is not specified Frediano Ziglio
0 siblings, 2 replies; 12+ messages in thread
From: Frediano Ziglio @ 2011-07-27 18:25 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, Frediano Ziglio
These patches avoid many fallbacks to POSIX AIO and enable Linux AIO even
if nocache is not specified.
Also add flush support with Linux AIO.
Frediano Ziglio (2):
linux aio: support flush operation
aio: use Linux AIO even if nocache is not specified
block/raw-posix.c | 30 +++++++++++++++++-------------
linux-aio.c | 3 +++
2 files changed, 20 insertions(+), 13 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-27 18:25 [Qemu-devel] [PATCH 0/2] improve Linux AIO support Frediano Ziglio
@ 2011-07-27 18:25 ` Frediano Ziglio
2011-07-27 18:31 ` Christoph Hellwig
2011-07-27 18:25 ` [Qemu-devel] [PATCH 2/2] aio: use Linux AIO even if nocache is not specified Frediano Ziglio
1 sibling, 1 reply; 12+ messages in thread
From: Frediano Ziglio @ 2011-07-27 18:25 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, Frediano Ziglio
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
---
block/raw-posix.c | 7 +++++++
linux-aio.c | 3 +++
2 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3c6bd4b..27ae81e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -628,6 +628,13 @@ static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
if (fd_open(bs) < 0)
return NULL;
+#ifdef CONFIG_LINUX_AIO
+ if (s->use_aio) {
+ return laio_submit(bs, s->aio_ctx, s->fd, 0, NULL,
+ 0, cb, opaque, QEMU_AIO_FLUSH);
+ }
+#endif
+
return paio_submit(bs, s->fd, 0, NULL, 0, cb, opaque, QEMU_AIO_FLUSH);
}
diff --git a/linux-aio.c b/linux-aio.c
index 68f4b3d..d07c435 100644
--- a/linux-aio.c
+++ b/linux-aio.c
@@ -215,6 +215,9 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
case QEMU_AIO_READ:
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
break;
+ case QEMU_AIO_FLUSH:
+ io_prep_fdsync(iocbs, fd);
+ break;
default:
fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
__func__, type);
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 2/2] aio: use Linux AIO even if nocache is not specified
2011-07-27 18:25 [Qemu-devel] [PATCH 0/2] improve Linux AIO support Frediano Ziglio
2011-07-27 18:25 ` [Qemu-devel] [PATCH 1/2] linux aio: support flush operation Frediano Ziglio
@ 2011-07-27 18:25 ` Frediano Ziglio
2011-07-27 18:32 ` Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Frediano Ziglio @ 2011-07-27 18:25 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, Frediano Ziglio
Currently Linux AIO are used only if nocache is specified.
Linux AIO works in all cases. The only problem is that currently Linux AIO
does not align data so I add a test that use POSIX AIO in this case.
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
---
block/raw-posix.c | 23 ++++++++++-------------
1 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 27ae81e..078a256 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -236,21 +236,16 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
}
#ifdef CONFIG_LINUX_AIO
- if ((bdrv_flags & (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) ==
- (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) {
+ s->use_aio = 0;
+ if ((bdrv_flags & BDRV_O_NATIVE_AIO)) {
s->aio_ctx = laio_init();
if (!s->aio_ctx) {
goto out_free_buf;
}
s->use_aio = 1;
- } else
-#endif
- {
-#ifdef CONFIG_LINUX_AIO
- s->use_aio = 0;
-#endif
}
+#endif
#ifdef CONFIG_XFS
if (platform_test_xfs_fd(s->fd)) {
@@ -592,14 +587,16 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
if (s->aligned_buf) {
if (!qiov_is_aligned(bs, qiov)) {
type |= QEMU_AIO_MISALIGNED;
-#ifdef CONFIG_LINUX_AIO
- } else if (s->use_aio) {
- return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
- nb_sectors, cb, opaque, type);
-#endif
}
}
+#ifdef CONFIG_LINUX_AIO
+ if (s->use_aio && !(type & QEMU_AIO_MISALIGNED)) {
+ return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
+ nb_sectors, cb, opaque, type);
+ }
+#endif
+
return paio_submit(bs, s->fd, sector_num, qiov, nb_sectors,
cb, opaque, type);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-27 18:25 ` [Qemu-devel] [PATCH 1/2] linux aio: support flush operation Frediano Ziglio
@ 2011-07-27 18:31 ` Christoph Hellwig
2011-07-27 19:52 ` Frediano Ziglio
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2011-07-27 18:31 UTC (permalink / raw)
To: Frediano Ziglio; +Cc: Kevin Wolf, qemu-devel
Did you test this at all?
On Wed, Jul 27, 2011 at 08:25:25PM +0200, Frediano Ziglio wrote:
> + case QEMU_AIO_FLUSH:
> + io_prep_fdsync(iocbs, fd);
> + break;
Looks great, but doesn't work as expected.
Hint: grep for aio_fsync in the linux kernel.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] aio: use Linux AIO even if nocache is not specified
2011-07-27 18:25 ` [Qemu-devel] [PATCH 2/2] aio: use Linux AIO even if nocache is not specified Frediano Ziglio
@ 2011-07-27 18:32 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2011-07-27 18:32 UTC (permalink / raw)
To: Frediano Ziglio; +Cc: Kevin Wolf, qemu-devel
On Wed, Jul 27, 2011 at 08:25:26PM +0200, Frediano Ziglio wrote:
> Currently Linux AIO are used only if nocache is specified.
> Linux AIO works in all cases. The only problem is that currently Linux AIO
> does not align data so I add a test that use POSIX AIO in this case.
The kernel will accept buffered I/O requests, and even handle them 100%
correctly. The only thing it won't do is to handle them asynchronously,
so with your patch you're back to executing I/O synchronously in guest
context.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-27 18:31 ` Christoph Hellwig
@ 2011-07-27 19:52 ` Frediano Ziglio
2011-07-27 19:57 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Frediano Ziglio @ 2011-07-27 19:52 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Kevin Wolf, qemu-devel@nongnu.org
Il giorno 27/lug/2011, alle ore 20:31, Christoph Hellwig <hch@lst.de> ha scritto:
> Did you test this at all?
>
Yes! Not at kernel level :-)
Usually I trust documentation and man pages.
> On Wed, Jul 27, 2011 at 08:25:25PM +0200, Frediano Ziglio wrote:
>> + case QEMU_AIO_FLUSH:
>> + io_prep_fdsync(iocbs, fd);
>> + break;
>
> Looks great, but doesn't work as expected.
>
> Hint: grep for aio_fsync in the linux kernel.
>
Thanks. I'll try to port misaligned access to Linux AIO. Also I'll add some comments on code to avoid somebody do the same mistache I did.
Mainly however -k qemu-img and aio=native in blockdev options are silently ignored if nocache is not enabled. Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give impressive performance but currently there is no way to specify all that flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with writethrough.
Frediano
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-27 19:52 ` Frediano Ziglio
@ 2011-07-27 19:57 ` Christoph Hellwig
2011-07-28 7:47 ` Kevin Wolf
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2011-07-27 19:57 UTC (permalink / raw)
To: Frediano Ziglio; +Cc: Kevin Wolf, Christoph Hellwig, qemu-devel@nongnu.org
On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote:
> >
>
> Yes! Not at kernel level :-)
In that case we have a bad error handling problem somewhere in qemu.
the IOCB_CMD_FDSYNC aio opcode will always return EINVAL currently,
and we really should have cought that somewhere in qemu.
> Thanks. I'll try to port misaligned access to Linux AIO. Also I'll add some comments on code to avoid somebody do the same mistache I did.
It's direct I/O code in general that doesn't handle misaligned access.
Given that we should never get misaligned I/O from guests I just didn't
bother duplicating the read-modify-write code for that code path as well.
> Mainly however -k qemu-img and aio=native in blockdev options are silently ignored if nocache is not enabled.
Maybe we should indeed error out instead. Care to prepare a patch for that?
> Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give impressive performance but currently there is no way to specify all that flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with writethrough.
Indeed. This has come up a few times, and actually is a mostly trivial
task. Maybe we should give up waiting for -blockdev and separate cache
mode settings and allow a nocache-writethrough or similar mode now? It's
going to be around 10 lines of code + documentation.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-27 19:57 ` Christoph Hellwig
@ 2011-07-28 7:47 ` Kevin Wolf
2011-07-28 12:15 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2011-07-28 7:47 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Frediano Ziglio, qemu-devel@nongnu.org
Am 27.07.2011 21:57, schrieb Christoph Hellwig:
> On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote:
>> Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give impressive performance but currently there is no way to specify all that flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with writethrough.
>
> Indeed. This has come up a few times, and actually is a mostly trivial
> task. Maybe we should give up waiting for -blockdev and separate cache
> mode settings and allow a nocache-writethrough or similar mode now? It's
> going to be around 10 lines of code + documentation.
I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
what is the explanation for O_DSYNC improving performance?
Christoph, on another note: Can we rely on Linux AIO never returning
short writes except on EOF? Currently we return -EINVAL in this case, so
I hope it's true or we wouldn't return the correct error code.
The reason why I'm asking is because I want to allow reads across EOF
for growable images and pad with zeros (the synchronous code does this
today in order to allow bdrv_pread/pwrite to work, and when we start
using coroutines in the block layer, these cases will hit the AIO paths).
Kevin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-28 7:47 ` Kevin Wolf
@ 2011-07-28 12:15 ` Christoph Hellwig
2011-07-28 12:41 ` Kevin Wolf
2011-07-29 15:33 ` Stefan Hajnoczi
0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2011-07-28 12:15 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel@nongnu.org, Christoph Hellwig, Frediano Ziglio
On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote:
> > Indeed. This has come up a few times, and actually is a mostly trivial
> > task. Maybe we should give up waiting for -blockdev and separate cache
> > mode settings and allow a nocache-writethrough or similar mode now? It's
> > going to be around 10 lines of code + documentation.
>
> I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
> what is the explanation for O_DSYNC improving performance?
There isn't any, at least for modern Linux. O_DSYNC at this point is
equivalent to a range fdatasync for each write call, and given that we're
doing O_DIRECT the ranges flush doesn't matter. If you do have a modern
host and an old guest it might end up beeing faster because the barrier
implementation in Linux used to suck so badly, but that's not inhrent
to the I/O model. If you guest however doesn't support cache flushes
at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems
and block devices.
> Christoph, on another note: Can we rely on Linux AIO never returning
> short writes except on EOF? Currently we return -EINVAL in this case, so
> I hope it's true or we wouldn't return the correct error code.
More or less. There's one corner case for all Linux I/O, and that is
only writes up to INT_MAX are supported, and larger writes (and reads)
get truncated to it. It's pretty nasty, but Linux has been vocally
opposed to fixing this issue.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-28 12:15 ` Christoph Hellwig
@ 2011-07-28 12:41 ` Kevin Wolf
2011-07-29 14:24 ` Christoph Hellwig
2011-07-29 15:33 ` Stefan Hajnoczi
1 sibling, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2011-07-28 12:41 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Frediano Ziglio, qemu-devel@nongnu.org
Am 28.07.2011 14:15, schrieb Christoph Hellwig:
>> Christoph, on another note: Can we rely on Linux AIO never returning
>> short writes except on EOF? Currently we return -EINVAL in this case, so
"short reads" I meant, of course.
>> I hope it's true or we wouldn't return the correct error code.
>
> More or less. There's one corner case for all Linux I/O, and that is
> only writes up to INT_MAX are supported, and larger writes (and reads)
> get truncated to it. It's pretty nasty, but Linux has been vocally
> opposed to fixing this issue.
I think we can safely ignore this. So just replacing the current
ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be
okay, right? (Of course using the qiov versions, but you get the idea)
Kevin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-28 12:41 ` Kevin Wolf
@ 2011-07-29 14:24 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2011-07-29 14:24 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel@nongnu.org, Christoph Hellwig, Frediano Ziglio
On Thu, Jul 28, 2011 at 02:41:02PM +0200, Kevin Wolf wrote:
> > More or less. There's one corner case for all Linux I/O, and that is
> > only writes up to INT_MAX are supported, and larger writes (and reads)
> > get truncated to it. It's pretty nasty, but Linux has been vocally
> > opposed to fixing this issue.
>
> I think we can safely ignore this. So just replacing the current
> ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be
> okay, right? (Of course using the qiov versions, but you get the idea)
This should be safe.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
2011-07-28 12:15 ` Christoph Hellwig
2011-07-28 12:41 ` Kevin Wolf
@ 2011-07-29 15:33 ` Stefan Hajnoczi
1 sibling, 0 replies; 12+ messages in thread
From: Stefan Hajnoczi @ 2011-07-29 15:33 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Kevin Wolf, qemu-devel@nongnu.org, Frediano Ziglio
On Thu, Jul 28, 2011 at 1:15 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote:
>> > Indeed. This has come up a few times, and actually is a mostly trivial
>> > task. Maybe we should give up waiting for -blockdev and separate cache
>> > mode settings and allow a nocache-writethrough or similar mode now? It's
>> > going to be around 10 lines of code + documentation.
>>
>> I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
>> what is the explanation for O_DSYNC improving performance?
>
> There isn't any, at least for modern Linux. O_DSYNC at this point is
> equivalent to a range fdatasync for each write call, and given that we're
> doing O_DIRECT the ranges flush doesn't matter. If you do have a modern
> host and an old guest it might end up beeing faster because the barrier
> implementation in Linux used to suck so badly, but that's not inhrent
> to the I/O model. If you guest however doesn't support cache flushes
> at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems
> and block devices.
I can rebase this cache=directsync patch and send it:
http://repo.or.cz/w/qemu/stefanha.git/commitdiff/6756719a46ac9876ac6d0460a33ad98e96a3a923
The other weird caching-related option I was playing with is -drive
...,readahead=on|off. It lets you disable the host kernel readahead
on buffered modes (cache=writeback|writethrough):
http://repo.or.cz/w/qemu/stefanha.git/commitdiff/f2fc2b297a2b2dd0cccd1dc2f7c519f3b0374e0d
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-07-29 15:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-27 18:25 [Qemu-devel] [PATCH 0/2] improve Linux AIO support Frediano Ziglio
2011-07-27 18:25 ` [Qemu-devel] [PATCH 1/2] linux aio: support flush operation Frediano Ziglio
2011-07-27 18:31 ` Christoph Hellwig
2011-07-27 19:52 ` Frediano Ziglio
2011-07-27 19:57 ` Christoph Hellwig
2011-07-28 7:47 ` Kevin Wolf
2011-07-28 12:15 ` Christoph Hellwig
2011-07-28 12:41 ` Kevin Wolf
2011-07-29 14:24 ` Christoph Hellwig
2011-07-29 15:33 ` Stefan Hajnoczi
2011-07-27 18:25 ` [Qemu-devel] [PATCH 2/2] aio: use Linux AIO even if nocache is not specified Frediano Ziglio
2011-07-27 18:32 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).