From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [Qemu-devel] [PATCH][RFC] Linux AIO support when using O_DIRECT Date: Mon, 23 Mar 2009 13:10:30 -0500 Message-ID: <49C7D096.3000302@codemonkey.ws> References: <1237823124-6417-1-git-send-email-aliguori@us.ibm.com> <49C7B620.8030203@redhat.com> <49C7C392.3030001@codemonkey.ws> <20090323172928.GB29449@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Avi Kivity , qemu-devel@nongnu.org, kvm@vger.kernel.org To: Christoph Hellwig Return-path: Received: from mail-qy0-f118.google.com ([209.85.221.118]:45959 "EHLO mail-qy0-f118.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760609AbZCWSKh (ORCPT ); Mon, 23 Mar 2009 14:10:37 -0400 Received: by qyk16 with SMTP id 16so2729051qyk.33 for ; Mon, 23 Mar 2009 11:10:34 -0700 (PDT) In-Reply-To: <20090323172928.GB29449@infradead.org> Sender: kvm-owner@vger.kernel.org List-ID: Christoph Hellwig wrote: > On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote: > >> I'd like to see the O_DIRECT bounce buffering removed in favor of the >> DMA API bouncing. Once that happens, raw_read and raw_pread can >> disappear. block-raw-posix becomes much simpler. >> > > See my vectored I/O patches for doing the bounce buffering at the > optimal place for the aio path. Note that from my reading of the > qcow/qcow2 code they might send down unaligned requests, which is > something the dma api would not help with. > I was going to look today at applying those. > For the buffered I/O path we will always have to do some sort of buffering > due to all the partition header reading / etc. And given how that part > isn't performance critical my preference would be to keep doing it in > bdrv_pread/write and guarantee the lowlevel drivers proper alignment. > I really dislike having so many APIs. I'd rather have an aio API that took byte accesses or have pread/pwrite always be emulated with a full sector read/write >> We would drop the signaling stuff and have the thread pool use an fd to >> signal. The big problem with that right now is that it'll cause a >> performance regression for certain platforms until we have the IO thread >> in place. >> > > Talking about signaling, does anyone remember why the Linux signalfd/ > eventfd support is only in kvm but not in upstream qemu? > Because upstream QEMU doesn't yet have an IO thread. TCG chains together TBs and if you have a tight loop in a VCPU, then the only way to break out of the loop is to receive a signal. The signal handler will call cpu_interrupt() which will unchain TBs allowing TCG execution to break once you return from the signal handler. An IO thread solves this in a different way by letting select() always run in parallel to TCG VCPU execution. When select() returns you can send a signal to the TCG VCPU thread to break it out of chained TBs. Not all IO in qemu generates a signal so this a potential problem but in practice, if we don't generate a signal for disk IO completion, a number of real world guests breaks (mostly non-x86 boards). Regards, Anthony Liguori