From: Anthony Liguori <anthony@codemonkey.ws>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>,
kvm-devel <kvm@vger.kernel.org>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Date: Fri, 12 Dec 2008 11:25:55 -0600 [thread overview]
Message-ID: <49429EA3.8070008@codemonkey.ws> (raw)
In-Reply-To: <20081212170916.GO6809@random.random>
Andrea Arcangeli wrote:
> On Fri, Dec 12, 2008 at 10:49:45AM -0600, Anthony Liguori wrote:
>
>> I meant, if you wanted to pass a file descriptor as a raw device. So:
>>
>> qemu -hda raw:fd=4
>>
>> Or something like that. We don't support this today.
>>
>
> ah ok.
>
>
>> I think bouncing the iov and just using pread/pwrite may be our best bet.
>> It means memory allocation but we can cap it. Since we're using threads,
>>
>
> It's already capped. However currently it generates an iovec, but
> we've simply to check the iovcnt to be 1, if it's 1 we pread from
> iov.iov_base, iov.iov_len. The dma api will take care to enforce
> iovcnt to be 1 for the iovec if preadv/pwritev isn't detected at
> compile time.
>
Hrm, that's more complex than I was expecting. I was thinking the bdrv
aio infrastructure would always take an iovec. Any details about the
underlying host's ability to handle the iovec would be insulated.
>> we just can force a thread to sleep until memory becomes available so it's
>> actually pretty straight forward.
>>
>
> There's no way to detect that and wait for memory,
If we artificially cap at say 50MB, then you do something like:
while (buffer == NULL) {
buffer = try_to_bounce(offset, iov, iovcnt, &size);
if (buffer == NULL && errno == ENOMEM) {
pthread_wait_cond(more memory);
}
}
try_to_bounce allocs with malloc() but if you exceed 50MB, then you fail
with an error of ENOMEM. In your bounce_free() function, you do a
pthread_cond_broadcast() to wake up any threads potentially waiting to
allocate memory.
This lets us expose a preadv/pwritev function that actually works. The
expectation is that bouncing will outperform just doing pread/pwrite of
each vector. Of course, you could get smart and if try_to_bounce fail,
fall back to pread/pwrite each vector. Likewise, you can fast-path the
case of a single iovec to avoid bouncing entirely.
Regards,
Anthony Liguori
> it'd sigkill before
> you can check... at least with the default overcommit. The way the dma
> api works, is that it doesn't send a mega large writev, but send it in
> pieces capped by the max buffer size, with many iovecs with iovcnt = 1.
>
>
>> We can use libaio on older Linux's to simulate preadv/pwritev. Use the
>> proper syscalls on newer kernels, on BSDs, and bounce everything else.
>>
>
> Given READV/WRITEV aren't available in not very recent kernels and
> given that without O_DIRECT each iocb will become synchronous, we
> can't use the libaio. Also once they fix linux-aio, if we do that, the
> iocb logic would need to be largely refactored. So I'm not sure if it
> worth it as it can't handle 2.6.16-18 when O_DIRECT is disabled (when
> O_DIRECT is enabled we could just build an array of linear iocb).
>
next prev parent reply other threads:[~2008-12-12 17:26 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-05 21:21 [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Anthony Liguori
2008-12-06 9:03 ` Blue Swirl
2008-12-06 18:26 ` Jamie Lokier
2008-12-08 18:23 ` Anthony Liguori
2008-12-09 15:51 ` Gerd Hoffmann
2008-12-09 16:01 ` Anthony Liguori
2008-12-10 16:44 ` Andrea Arcangeli
2008-12-10 17:21 ` Anthony Liguori
2008-12-10 17:29 ` Gerd Hoffmann
2008-12-10 18:50 ` Anthony Liguori
2008-12-10 19:08 ` Andrea Arcangeli
2008-12-11 13:12 ` Andrea Arcangeli
2008-12-11 15:24 ` Gerd Hoffmann
2008-12-11 15:53 ` Andrea Arcangeli
2008-12-11 16:11 ` Gerd Hoffmann
2008-12-11 16:49 ` Andrea Arcangeli
2008-12-11 17:20 ` Gerd Hoffmann
2008-12-11 18:11 ` Andrea Arcangeli
2008-12-11 20:38 ` Gerd Hoffmann
2008-12-11 20:40 ` Anthony Liguori
2008-12-12 8:23 ` Jens Axboe
2008-12-12 11:51 ` Andrea Arcangeli
2008-12-12 11:54 ` Jens Axboe
2008-12-12 14:13 ` Andrea Arcangeli
2008-12-12 14:24 ` Anthony Liguori
2008-12-12 16:33 ` Chris Wright
2008-12-12 16:51 ` Anthony Liguori
2008-12-12 16:52 ` Chris Wright
2008-12-11 21:32 ` Christoph Hellwig
2008-12-12 0:27 ` Andrea Arcangeli
2008-12-11 21:30 ` Christoph Hellwig
2008-12-11 16:41 ` Anthony Liguori
2008-12-12 14:24 ` Andrea Arcangeli
2008-12-12 14:35 ` Anthony Liguori
2008-12-12 15:44 ` Andrea Arcangeli
2008-12-12 16:49 ` Anthony Liguori
2008-12-12 17:09 ` Andrea Arcangeli
2008-12-12 17:25 ` Anthony Liguori [this message]
2008-12-12 17:52 ` Andrea Arcangeli
2008-12-12 18:17 ` Anthony Liguori
2008-12-12 18:26 ` Andrea Arcangeli
2008-12-12 20:12 ` Gerd Hoffmann
2008-12-12 20:17 ` Anthony Liguori
2008-12-12 20:35 ` Gerd Hoffmann
2008-12-09 17:16 ` Avi Kivity
2008-12-17 14:44 ` Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49429EA3.8070008@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=aarcange@redhat.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).