qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>,
	kvm-devel <kvm@vger.kernel.org>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC] Replace posix-aio with custom thread pool
Date: Fri, 12 Dec 2008 11:25:55 -0600	[thread overview]
Message-ID: <49429EA3.8070008@codemonkey.ws> (raw)
In-Reply-To: <20081212170916.GO6809@random.random>

Andrea Arcangeli wrote:
> On Fri, Dec 12, 2008 at 10:49:45AM -0600, Anthony Liguori wrote:
>   
>> I meant, if you wanted to pass a file descriptor as a raw device.  So:
>>
>> qemu -hda raw:fd=4
>>
>> Or something like that.  We don't support this today.
>>     
>
> ah ok.
>
>   
>> I think bouncing the iov and just using pread/pwrite may be our best bet.  
>> It means memory allocation but we can cap it.  Since we're using threads, 
>>     
>
> It's already capped. However currently it generates an iovec, but
> we've simply to check the iovcnt to be 1, if it's 1 we pread from
> iov.iov_base, iov.iov_len. The dma api will take care to enforce
> iovcnt to be 1 for the iovec if preadv/pwritev isn't detected at
> compile time.
>   

Hrm, that's more complex than I was expecting.  I was thinking the bdrv 
aio infrastructure would always take an iovec.  Any details about the 
underlying host's ability to handle the iovec would be insulated.

>> we just can force a thread to sleep until memory becomes available so it's 
>> actually pretty straight forward.
>>     
>
> There's no way to detect that and wait for memory,

If we artificially cap at say 50MB, then you do something like:

while (buffer == NULL) {
   buffer = try_to_bounce(offset, iov, iovcnt, &size);
   if (buffer == NULL && errno == ENOMEM) {
      pthread_wait_cond(more memory);
   }
}

try_to_bounce allocs with malloc() but if you exceed 50MB, then you fail 
with an error of ENOMEM.  In your bounce_free() function, you do a 
pthread_cond_broadcast() to wake up any threads potentially waiting to 
allocate memory.

This lets us expose a preadv/pwritev function that actually works.  The 
expectation is that bouncing will outperform just doing pread/pwrite of 
each vector.  Of course, you could get smart and if try_to_bounce fail, 
fall back to pread/pwrite each vector.  Likewise, you can fast-path the 
case of a single iovec to avoid bouncing entirely.

Regards,

Anthony Liguori

>  it'd sigkill before
> you can check... at least with the default overcommit. The way the dma
> api works, is that it doesn't send a mega large writev, but send it in
> pieces capped by the max buffer size, with many iovecs with iovcnt = 1.
>
>   
>> We can use libaio on older Linux's to simulate preadv/pwritev.  Use the 
>> proper syscalls on newer kernels, on BSDs, and bounce everything else.
>>     
>
> Given READV/WRITEV aren't available in not very recent kernels and
> given that without O_DIRECT each iocb will become synchronous, we
> can't use the libaio. Also once they fix linux-aio, if we do that, the
> iocb logic would need to be largely refactored. So I'm not sure if it
> worth it as it can't handle 2.6.16-18 when O_DIRECT is disabled (when
> O_DIRECT is enabled we could just build an array of linear iocb).
>   

  reply	other threads:[~2008-12-12 17:26 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-05 21:21 [Qemu-devel] [RFC] Replace posix-aio with custom thread pool Anthony Liguori
2008-12-06  9:03 ` Blue Swirl
2008-12-06 18:26   ` Jamie Lokier
2008-12-08 18:23   ` Anthony Liguori
2008-12-09 15:51 ` Gerd Hoffmann
2008-12-09 16:01   ` Anthony Liguori
2008-12-10 16:44     ` Andrea Arcangeli
2008-12-10 17:21       ` Anthony Liguori
2008-12-10 17:29         ` Gerd Hoffmann
2008-12-10 18:50           ` Anthony Liguori
2008-12-10 19:08             ` Andrea Arcangeli
2008-12-11 13:12               ` Andrea Arcangeli
2008-12-11 15:24                 ` Gerd Hoffmann
2008-12-11 15:53                   ` Andrea Arcangeli
2008-12-11 16:11                     ` Gerd Hoffmann
2008-12-11 16:49                       ` Andrea Arcangeli
2008-12-11 17:20                         ` Gerd Hoffmann
2008-12-11 18:11                           ` Andrea Arcangeli
2008-12-11 20:38                             ` Gerd Hoffmann
2008-12-11 20:40                             ` Anthony Liguori
2008-12-12  8:23                             ` Jens Axboe
2008-12-12 11:51                               ` Andrea Arcangeli
2008-12-12 11:54                                 ` Jens Axboe
2008-12-12 14:13                                   ` Andrea Arcangeli
2008-12-12 14:24                                     ` Anthony Liguori
2008-12-12 16:33                                       ` Chris Wright
2008-12-12 16:51                                         ` Anthony Liguori
2008-12-12 16:52                                           ` Chris Wright
2008-12-11 21:32                         ` Christoph Hellwig
2008-12-12  0:27                           ` Andrea Arcangeli
2008-12-11 21:30                     ` Christoph Hellwig
2008-12-11 16:41                   ` Anthony Liguori
2008-12-12 14:24               ` Andrea Arcangeli
2008-12-12 14:35                 ` Anthony Liguori
2008-12-12 15:44                   ` Andrea Arcangeli
2008-12-12 16:49                     ` Anthony Liguori
2008-12-12 17:09                       ` Andrea Arcangeli
2008-12-12 17:25                         ` Anthony Liguori [this message]
2008-12-12 17:52                           ` Andrea Arcangeli
2008-12-12 18:17                             ` Anthony Liguori
2008-12-12 18:26                               ` Andrea Arcangeli
2008-12-12 20:12                                 ` Gerd Hoffmann
2008-12-12 20:17                                   ` Anthony Liguori
2008-12-12 20:35                                     ` Gerd Hoffmann
2008-12-09 17:16   ` Avi Kivity
2008-12-17 14:44 ` Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49429EA3.8070008@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=aarcange@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).