From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LAo8E-0006fY-Eo for qemu-devel@nongnu.org; Thu, 11 Dec 2008 11:11:18 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LAo8D-0006eL-Kq for qemu-devel@nongnu.org; Thu, 11 Dec 2008 11:11:18 -0500 Received: from [199.232.76.173] (port=42726 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LAo8D-0006eD-BX for qemu-devel@nongnu.org; Thu, 11 Dec 2008 11:11:17 -0500 Received: from mx2.redhat.com ([66.187.237.31]:44107) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LAo8C-00083q-Qg for qemu-devel@nongnu.org; Thu, 11 Dec 2008 11:11:17 -0500 Message-ID: <49413B9C.3030703@redhat.com> Date: Thu, 11 Dec 2008 17:11:08 +0100 From: Gerd Hoffmann MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool References: <1228512061-25398-1-git-send-email-aliguori@us.ibm.com> <493E941D.4000608@redhat.com> <493E965E.5050701@us.ibm.com> <20081210164401.GF18814@random.random> <493FFAB6.2000106@codemonkey.ws> <493FFC8E.9080802@redhat.com> <49400F69.8080707@codemonkey.ws> <20081210190810.GG18814@random.random> <20081211131222.GA14908@random.random> <494130B5.2080800@redhat.com> <20081211155335.GE14908@random.random> In-Reply-To: <20081211155335.GE14908@random.random> Content-Type: multipart/mixed; boundary="------------030807010108060801070406" Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrea Arcangeli Cc: kvm-devel , qemu-devel@nongnu.org This is a multi-part message in MIME format. --------------030807010108060801070406 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Andrea Arcangeli wrote: >> * It can't handle block allocation. Kernel handles that by doing >> such writes synchronously via VFS layer (instead of the separate >> aio code paths). Leads to horrible performance and bug reports >> such as "installs on sparse files are very slow". > > I think here you mean O_DIRECT regardless of aio/sync API, Yes. But kernel aio requires O_DIRECT, so aio users are affected nevertheless. > So in kernels that don't support IOCB_CMD_READV/WRITEV, we've simply > to an array of iocb through io_submit (i.e. to conver the iov into a > vector of iocb, instead of a single iocb pointing to the > iov). Internally to io_submit a single dma command should be generated > and the same sg list should be built the same as if we used > READV/WRITEV. In theory READV/WRITEV should be just a cpu saving > feature, it shouldn't influence disk bandwidth. If it does, it means > the bio layer is broken and needs fixing. Havn't tested that. Could be it isn't a big problem, extra code size for the two modes aside. > > > ahem: http://www.daemon-systems.org/man/preadv.2.html > > > > Too bad nobody implemented it yet... Kernel side looks easy, attached patch + syscall table windup in all archs ... cheers, Gerd --------------030807010108060801070406 Content-Type: text/plain; name="preadv.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="preadv.diff" diff --git a/fs/read_write.c b/fs/read_write.c index 969a6d9..d1ea2fd 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -701,6 +701,54 @@ sys_writev(unsigned long fd, const struct iovec __user *vec, unsigned long vlen) return ret; } +asmlinkage ssize_t sys_preadv(unsigned int fd, const struct iovec __user *vec, + unsigned long vlen, loff_t pos) +{ + struct file *file; + ssize_t ret = -EBADF; + int fput_needed; + + if (pos < 0) + return -EINVAL; + + file = fget_light(fd, &fput_needed); + if (file) { + ret = -ESPIPE; + if (file->f_mode & FMODE_PREAD) + ret = vfs_readv(file, vec, vlen, &pos); + fput_light(file, fput_needed); + } + + if (ret > 0) + add_rchar(current, ret); + inc_syscr(current); + return ret; +} + +asmlinkage ssize_t sys_pwritev(unsigned int fd, const struct iovec __user *vec, + unsigned long vlen, loff_t pos) +{ + struct file *file; + ssize_t ret = -EBADF; + int fput_needed; + + if (pos < 0) + return -EINVAL; + + file = fget_light(fd, &fput_needed); + if (file) { + ret = -ESPIPE; + if (file->f_mode & FMODE_PWRITE) + ret = vfs_writev(file, vec, vlen, &pos); + fput_light(file, fput_needed); + } + + if (ret > 0) + add_wchar(current, ret); + inc_syscw(current); + return ret; +} + static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, size_t count, loff_t max) { --------------030807010108060801070406--