From: Gerd Hoffmann <kraxel@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: kvm-devel <kvm@vger.kernel.org>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Date: Thu, 11 Dec 2008 17:11:08 +0100 [thread overview]
Message-ID: <49413B9C.3030703@redhat.com> (raw)
In-Reply-To: <20081211155335.GE14908@random.random>
[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]
Andrea Arcangeli wrote:
>> * It can't handle block allocation. Kernel handles that by doing
>> such writes synchronously via VFS layer (instead of the separate
>> aio code paths). Leads to horrible performance and bug reports
>> such as "installs on sparse files are very slow".
>
> I think here you mean O_DIRECT regardless of aio/sync API,
Yes. But kernel aio requires O_DIRECT, so aio users are affected
nevertheless.
> So in kernels that don't support IOCB_CMD_READV/WRITEV, we've simply
> to an array of iocb through io_submit (i.e. to conver the iov into a
> vector of iocb, instead of a single iocb pointing to the
> iov). Internally to io_submit a single dma command should be generated
> and the same sg list should be built the same as if we used
> READV/WRITEV. In theory READV/WRITEV should be just a cpu saving
> feature, it shouldn't influence disk bandwidth. If it does, it means
> the bio layer is broken and needs fixing.
Havn't tested that. Could be it isn't a big problem, extra code size
for the two modes aside.
> > > ahem: http://www.daemon-systems.org/man/preadv.2.html > >
>
> Too bad nobody implemented it yet...
Kernel side looks easy, attached patch + syscall table windup in all
archs ...
cheers,
Gerd
[-- Attachment #2: preadv.diff --]
[-- Type: text/plain, Size: 1390 bytes --]
diff --git a/fs/read_write.c b/fs/read_write.c
index 969a6d9..d1ea2fd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -701,6 +701,54 @@ sys_writev(unsigned long fd, const struct iovec __user *vec, unsigned long vlen)
return ret;
}
+asmlinkage ssize_t sys_preadv(unsigned int fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos)
+{
+ struct file *file;
+ ssize_t ret = -EBADF;
+ int fput_needed;
+
+ if (pos < 0)
+ return -EINVAL;
+
+ file = fget_light(fd, &fput_needed);
+ if (file) {
+ ret = -ESPIPE;
+ if (file->f_mode & FMODE_PREAD)
+ ret = vfs_readv(file, vec, vlen, &pos);
+ fput_light(file, fput_needed);
+ }
+
+ if (ret > 0)
+ add_rchar(current, ret);
+ inc_syscr(current);
+ return ret;
+}
+
+asmlinkage ssize_t sys_pwritev(unsigned int fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos)
+{
+ struct file *file;
+ ssize_t ret = -EBADF;
+ int fput_needed;
+
+ if (pos < 0)
+ return -EINVAL;
+
+ file = fget_light(fd, &fput_needed);
+ if (file) {
+ ret = -ESPIPE;
+ if (file->f_mode & FMODE_PWRITE)
+ ret = vfs_writev(file, vec, vlen, &pos);
+ fput_light(file, fput_needed);
+ }
+
+ if (ret > 0)
+ add_wchar(current, ret);
+ inc_syscw(current);
+ return ret;
+}
+
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
size_t count, loff_t max)
{
next prev parent reply other threads:[~2008-12-11 16:11 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-05 21:21 [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Anthony Liguori
2008-12-06 9:03 ` Blue Swirl
2008-12-06 18:26 ` Jamie Lokier
2008-12-08 18:23 ` Anthony Liguori
2008-12-09 15:51 ` Gerd Hoffmann
2008-12-09 16:01 ` Anthony Liguori
2008-12-10 16:44 ` Andrea Arcangeli
2008-12-10 17:21 ` Anthony Liguori
2008-12-10 17:29 ` Gerd Hoffmann
2008-12-10 18:50 ` Anthony Liguori
2008-12-10 19:08 ` Andrea Arcangeli
2008-12-11 13:12 ` Andrea Arcangeli
2008-12-11 15:24 ` Gerd Hoffmann
2008-12-11 15:53 ` Andrea Arcangeli
2008-12-11 16:11 ` Gerd Hoffmann [this message]
2008-12-11 16:49 ` Andrea Arcangeli
2008-12-11 17:20 ` Gerd Hoffmann
2008-12-11 18:11 ` Andrea Arcangeli
2008-12-11 20:38 ` Gerd Hoffmann
2008-12-11 20:40 ` Anthony Liguori
2008-12-12 8:23 ` Jens Axboe
2008-12-12 11:51 ` Andrea Arcangeli
2008-12-12 11:54 ` Jens Axboe
2008-12-12 14:13 ` Andrea Arcangeli
2008-12-12 14:24 ` Anthony Liguori
2008-12-12 16:33 ` Chris Wright
2008-12-12 16:51 ` Anthony Liguori
2008-12-12 16:52 ` Chris Wright
2008-12-11 21:32 ` Christoph Hellwig
2008-12-12 0:27 ` Andrea Arcangeli
2008-12-11 21:30 ` Christoph Hellwig
2008-12-11 16:41 ` Anthony Liguori
2008-12-12 14:24 ` Andrea Arcangeli
2008-12-12 14:35 ` Anthony Liguori
2008-12-12 15:44 ` Andrea Arcangeli
2008-12-12 16:49 ` Anthony Liguori
2008-12-12 17:09 ` Andrea Arcangeli
2008-12-12 17:25 ` Anthony Liguori
2008-12-12 17:52 ` Andrea Arcangeli
2008-12-12 18:17 ` Anthony Liguori
2008-12-12 18:26 ` Andrea Arcangeli
2008-12-12 20:12 ` Gerd Hoffmann
2008-12-12 20:17 ` Anthony Liguori
2008-12-12 20:35 ` Gerd Hoffmann
2008-12-09 17:16 ` Avi Kivity
2008-12-17 14:44 ` Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49413B9C.3030703@redhat.com \
--to=kraxel@redhat.com \
--cc=aarcange@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).