All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Date: Mon, 23 Mar 2009 13:10:30 -0500	[thread overview]
Message-ID: <49C7D096.3000302@codemonkey.ws> (raw)
In-Reply-To: <20090323172928.GB29449@infradead.org>

Christoph Hellwig wrote:
> On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote:
>   
>> I'd like to see the O_DIRECT bounce buffering removed in favor of the  
>> DMA API bouncing.  Once that happens, raw_read and raw_pread can  
>> disappear.  block-raw-posix becomes much simpler.
>>     
>
> See my vectored I/O patches for doing the bounce buffering at the
> optimal place for the aio path. Note that from my reading of the
> qcow/qcow2 code they might send down unaligned requests, which is
> something the dma api would not help with.
>   

I was going to look today at applying those.

> For the buffered I/O path we will always have to do some sort of buffering
> due to all the partition header reading / etc.  And given how that part
> isn't performance critical my preference would be to keep doing it in
> bdrv_pread/write and guarantee the lowlevel drivers proper alignment.
>   

I really dislike having so many APIs.  I'd rather have an aio API that 
took byte accesses or have pread/pwrite always be emulated with a full 
sector read/write

>> We would drop the signaling stuff and have the thread pool use an fd to  
>> signal.  The big problem with that right now is that it'll cause a  
>> performance regression for certain platforms until we have the IO thread  
>> in place.
>>     
>
> Talking about signaling, does anyone remember why the Linux signalfd/
> eventfd support is only in kvm but not in upstream qemu?
>   

Because upstream QEMU doesn't yet have an IO thread.

TCG chains together TBs and if you have a tight loop in a VCPU, then the 
only way to break out of the loop is to receive a signal.  The signal 
handler will call cpu_interrupt() which will unchain TBs allowing TCG 
execution to break once you return from the signal handler.

An IO thread solves this in a different way by letting select() always 
run in parallel to TCG VCPU execution.  When select() returns you can 
send a signal to the TCG VCPU thread to break it out of chained TBs.

Not all IO in qemu generates a signal so this a potential problem but in 
practice, if we don't generate a signal for disk IO completion, a number 
of real world guests breaks (mostly non-x86 boards).

Regards,

Anthony Liguori

WARNING: multiple messages have this Message-ID (diff)
From: Anthony Liguori <anthony@codemonkey.ws>
To: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Date: Mon, 23 Mar 2009 13:10:30 -0500	[thread overview]
Message-ID: <49C7D096.3000302@codemonkey.ws> (raw)
In-Reply-To: <20090323172928.GB29449@infradead.org>

Christoph Hellwig wrote:
> On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote:
>   
>> I'd like to see the O_DIRECT bounce buffering removed in favor of the  
>> DMA API bouncing.  Once that happens, raw_read and raw_pread can  
>> disappear.  block-raw-posix becomes much simpler.
>>     
>
> See my vectored I/O patches for doing the bounce buffering at the
> optimal place for the aio path. Note that from my reading of the
> qcow/qcow2 code they might send down unaligned requests, which is
> something the dma api would not help with.
>   

I was going to look today at applying those.

> For the buffered I/O path we will always have to do some sort of buffering
> due to all the partition header reading / etc.  And given how that part
> isn't performance critical my preference would be to keep doing it in
> bdrv_pread/write and guarantee the lowlevel drivers proper alignment.
>   

I really dislike having so many APIs.  I'd rather have an aio API that 
took byte accesses or have pread/pwrite always be emulated with a full 
sector read/write

>> We would drop the signaling stuff and have the thread pool use an fd to  
>> signal.  The big problem with that right now is that it'll cause a  
>> performance regression for certain platforms until we have the IO thread  
>> in place.
>>     
>
> Talking about signaling, does anyone remember why the Linux signalfd/
> eventfd support is only in kvm but not in upstream qemu?
>   

Because upstream QEMU doesn't yet have an IO thread.

TCG chains together TBs and if you have a tight loop in a VCPU, then the 
only way to break out of the loop is to receive a signal.  The signal 
handler will call cpu_interrupt() which will unchain TBs allowing TCG 
execution to break once you return from the signal handler.

An IO thread solves this in a different way by letting select() always 
run in parallel to TCG VCPU execution.  When select() returns you can 
send a signal to the TCG VCPU thread to break it out of chained TBs.

Not all IO in qemu generates a signal so this a potential problem but in 
practice, if we don't generate a signal for disk IO completion, a number 
of real world guests breaks (mostly non-x86 boards).

Regards,

Anthony Liguori

  reply	other threads:[~2009-03-23 18:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-23 15:45 [PATCH][RFC] Linux AIO support when using O_DIRECT Anthony Liguori
2009-03-23 15:45 ` [Qemu-devel] " Anthony Liguori
2009-03-23 16:17 ` Avi Kivity
2009-03-23 17:14   ` Anthony Liguori
2009-03-23 17:29     ` Christoph Hellwig
2009-03-23 17:29       ` Christoph Hellwig
2009-03-23 18:10       ` Anthony Liguori [this message]
2009-03-23 18:10         ` Anthony Liguori
2009-03-23 18:48         ` Christoph Hellwig
2009-03-23 19:35           ` Avi Kivity
2009-03-23 19:35             ` Avi Kivity
2009-03-23 17:32     ` Christoph Hellwig
2009-03-23 17:32       ` Christoph Hellwig
2009-03-23 19:58     ` Avi Kivity
2009-03-23 20:32       ` Anthony Liguori
2009-03-23 17:26   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C7D096.3000302@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=hch@infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.