From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MeQoS-0003ou-81 for qemu-devel@nongnu.org; Fri, 21 Aug 2009 05:53:36 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MeQoN-0003k8-0M for qemu-devel@nongnu.org; Fri, 21 Aug 2009 05:53:35 -0400 Received: from [199.232.76.173] (port=55402 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MeQoM-0003js-Jj for qemu-devel@nongnu.org; Fri, 21 Aug 2009 05:53:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56844) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MeQoL-0001o6-SJ for qemu-devel@nongnu.org; Fri, 21 Aug 2009 05:53:30 -0400 Message-ID: <4A8E6EAD.3030809@redhat.com> Date: Fri, 21 Aug 2009 12:53:49 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH 2/2] raw-posix: add Linux native AIO support References: <20090820145803.GA23578@lst.de> <20090820145835.GB24183@lst.de> In-Reply-To: <20090820145835.GB24183@lst.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christoph Hellwig Cc: qemu-devel@nongnu.org On 08/20/2009 05:58 PM, Christoph Hellwig wrote: > Now that do have a nicer interface to work against we can add Linux native > AIO support. It's an extremly thing layer just setting up an iocb for > the io_submit system call in the submission path, and registering an > eventfd with the qemu poll handler to do complete the iocbs directly > from there. > > This started out based on Anthony's earlier AIO patch, but after > estimated 42,000 rewrites and just as many build system changes > there's not much left of it. > > To enable native kernel aio use the aio=native sub-command on the > drive command line. I have also added an option to qemu-io to > test the aio support without needing a guest. > > > Signed-off-by: Christoph Hellwig > > Index: qemu/Makefile > =================================================================== > --- qemu.orig/Makefile 2009-08-19 22:49:08.789354196 -0300 > +++ qemu/Makefile 2009-08-19 22:51:25.293352541 -0300 > @@ -56,6 +56,7 @@ recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR > block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o > block-obj-y += nbd.o block.o aio.o aes.o > block-obj-$(CONFIG_POSIX) += posix-aio-compat.o > +block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o > > block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o > block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o > Index: qemu/block/raw-posix.c > =================================================================== > --- qemu.orig/block/raw-posix.c 2009-08-19 22:49:08.793352540 -0300 > +++ qemu/block/raw-posix.c 2009-08-19 23:00:21.157402768 -0300 > @@ -115,6 +115,7 @@ typedef struct BDRVRawState { > int fd_got_error; > int fd_media_changed; > #endif > + int use_aio; > uint8_t* aligned_buf; > } BDRVRawState; > > @@ -159,6 +160,7 @@ static int raw_open_common(BlockDriverSt > } > s->fd = fd; > s->aligned_buf = NULL; > + > if ((bdrv_flags& BDRV_O_NOCACHE)) { > s->aligned_buf = qemu_blockalign(bs, ALIGNED_BUFFER_SIZE); > if (s->aligned_buf == NULL) { > @@ -166,9 +168,22 @@ static int raw_open_common(BlockDriverSt > } > } > > - s->aio_ctx = paio_init(); > - if (!s->aio_ctx) { > - goto out_free_buf; > +#ifdef CONFIG_LINUX_AIO > + if ((bdrv_flags& (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) == > + (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) { > + s->aio_ctx = laio_init(); > + if (!s->aio_ctx) { > + goto out_free_buf; > + } > + s->use_aio = 1; > + } else > +#endif > + { > + s->aio_ctx = paio_init(); > + if (!s->aio_ctx) { > + goto out_free_buf; > + } > + s->use_aio = 0; > } > > return 0; > @@ -524,8 +539,13 @@ static BlockDriverAIOCB *raw_aio_submit( > * boundary. Check if this is the case or telll the low-level > * driver that it needs to copy the buffer. > */ > - if (s->aligned_buf&& !qiov_is_aligned(qiov)) { > - type |= QEMU_AIO_MISALIGNED; > + if (s->aligned_buf) { > + if (!qiov_is_aligned(qiov)) { > + type |= QEMU_AIO_MISALIGNED; > + } else if (s->use_aio) { > + return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, > + nb_sectors, cb, opaque, type); > + } > } > > return paio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, nb_sectors, > Index: qemu/configure > =================================================================== > --- qemu.orig/configure 2009-08-19 22:49:08.801352719 -0300 > +++ qemu/configure 2009-08-19 22:51:25.305393736 -0300 > @@ -197,6 +197,7 @@ build_docs="yes" > uname_release="" > curses="yes" > curl="yes" > +linux_aio="yes" > io_thread="no" > nptl="yes" > mixemu="no" > @@ -499,6 +500,8 @@ for opt do > ;; > --enable-mixemu) mixemu="yes" > ;; > + --disable-linux-aio) linux_aio="no" > + ;; > --enable-io-thread) io_thread="yes" > ;; > --disable-blobs) blobs="no" > @@ -636,6 +639,7 @@ echo " --oss-lib path to > echo " --enable-uname-release=R Return R for uname -r in usermode emulation" > echo " --sparc_cpu=V Build qemu for Sparc architecture v7, v8, v8plus, v8plusa, v9" > echo " --disable-vde disable support for vde network" > +echo " --disable-linux-aio disable Linux AIO support" > echo " --enable-io-thread enable IO thread" > echo " --disable-blobs disable installing provided firmware blobs" > echo " --kerneldir=PATH look for kernel includes in PATH" > @@ -1197,6 +1201,23 @@ if test "$pthread" = no; then > fi > > ########################################## > +# linux-aio probe > +AIOLIBS="" > + > +if test "$linux_aio" = "yes" ; then > + linux_aio=no > + cat> $TMPC< +#include > +#include > +int main(void) { io_setup(0, NULL); io_set_eventfd(NULL, 0); eventfd(0, 0); return 0; } > +EOF > + if compile_prog "" "-laio" ; then > + linux_aio=yes > + LIBS="$LIBS -laio" > + fi > +fi > + > +########################################## > # iovec probe > cat> $TMPC< #include > @@ -1527,6 +1548,7 @@ echo "NPTL support $nptl" > echo "GUEST_BASE $guest_base" > echo "vde support $vde" > echo "IO thread $io_thread" > +echo "Linux AIO support $linux_aio" > echo "Install blobs $blobs" > echo -e "KVM support $kvm" > echo "fdt support $fdt" > @@ -1700,6 +1722,9 @@ fi > if test "$io_thread" = "yes" ; then > echo "CONFIG_IOTHREAD=y">> $config_host_mak > fi > +if test "$linux_aio" = "yes" ; then > + echo "CONFIG_LINUX_AIO=y">> $config_host_mak > +fi > if test "$blobs" = "yes" ; then > echo "INSTALL_BLOBS=yes">> $config_host_mak > fi > Index: qemu/linux-aio.c > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ qemu/linux-aio.c 2009-08-20 10:54:10.924375300 -0300 > @@ -0,0 +1,204 @@ > +/* > + * Linux native AIO support. > + * > + * Copyright (C) 2009 IBM, Corp. > + * Copyright (C) 2009 Red Hat, Inc. > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + */ > +#include "qemu-common.h" > +#include "qemu-aio.h" > +#include "block_int.h" > +#include "block/raw-posix-aio.h" > + > +#include > +#include > + > +/* > + * Queue size (per-device). > + * > + * XXX: eventually we need to communicate this to the guest and/or make it > + * tunable by the guest. If we get more outstanding requests at a time > + * than this we will get EAGAIN from io_submit which is communicated to > + * the guest as an I/O error. > + */ > +#define MAX_EVENTS 128 > Or, we could queue any extra requests. > + > + > +void *laio_init(void) > +{ > + struct qemu_laio_state *s; > + > + s = qemu_mallocz(sizeof(*s)); > + s->efd = eventfd(0, 0); > + if (s->efd == -1) > + goto out_free_state; > + fcntl(s->efd, F_SETFL, O_NONBLOCK); > + > + if (io_setup(MAX_EVENTS,&s->ctx) != 0) > + goto out_close_efd; > + > One day we may want a global io context so we can dequeue many events with one syscall. Or we may not, if we thread these things. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.