From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT
Date: Mon, 23 Mar 2009 13:10:30 -0500
Message-ID: <49C7D096.3000302@codemonkey.ws>
References: <1237823124-6417-1-git-send-email-aliguori@us.ibm.com> <49C7B620.8030203@redhat.com> <49C7C392.3030001@codemonkey.ws> <20090323172928.GB29449@infradead.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Avi Kivity <avi@redhat.com>, qemu-devel@nongnu.org,
	kvm@vger.kernel.org
To: Christoph Hellwig <hch@infradead.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-qy0-f118.google.com ([209.85.221.118]:45959 "EHLO
	mail-qy0-f118.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1760609AbZCWSKh (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 23 Mar 2009 14:10:37 -0400
Received: by qyk16 with SMTP id 16so2729051qyk.33
        for <kvm@vger.kernel.org>; Mon, 23 Mar 2009 11:10:34 -0700 (PDT)
In-Reply-To: <20090323172928.GB29449@infradead.org>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Christoph Hellwig wrote:
> On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote:
>   
>> I'd like to see the O_DIRECT bounce buffering removed in favor of the  
>> DMA API bouncing.  Once that happens, raw_read and raw_pread can  
>> disappear.  block-raw-posix becomes much simpler.
>>     
>
> See my vectored I/O patches for doing the bounce buffering at the
> optimal place for the aio path. Note that from my reading of the
> qcow/qcow2 code they might send down unaligned requests, which is
> something the dma api would not help with.
>   

I was going to look today at applying those.

> For the buffered I/O path we will always have to do some sort of buffering
> due to all the partition header reading / etc.  And given how that part
> isn't performance critical my preference would be to keep doing it in
> bdrv_pread/write and guarantee the lowlevel drivers proper alignment.
>   

I really dislike having so many APIs.  I'd rather have an aio API that 
took byte accesses or have pread/pwrite always be emulated with a full 
sector read/write

>> We would drop the signaling stuff and have the thread pool use an fd to  
>> signal.  The big problem with that right now is that it'll cause a  
>> performance regression for certain platforms until we have the IO thread  
>> in place.
>>     
>
> Talking about signaling, does anyone remember why the Linux signalfd/
> eventfd support is only in kvm but not in upstream qemu?
>   

Because upstream QEMU doesn't yet have an IO thread.

TCG chains together TBs and if you have a tight loop in a VCPU, then the 
only way to break out of the loop is to receive a signal.  The signal 
handler will call cpu_interrupt() which will unchain TBs allowing TCG 
execution to break once you return from the signal handler.

An IO thread solves this in a different way by letting select() always 
run in parallel to TCG VCPU execution.  When select() returns you can 
send a signal to the TCG VCPU thread to break it out of chained TBs.

Not all IO in qemu generates a signal so this a potential problem but in 
practice, if we don't generate a signal for disk IO completion, a number 
of real world guests breaks (mostly non-x86 boards).

Regards,

Anthony Liguori