From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:36994) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RJ2O1-0003Cr-Qt for qemu-devel@nongnu.org; Wed, 26 Oct 2011 08:15:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RJ2Nv-0005cC-V6 for qemu-devel@nongnu.org; Wed, 26 Oct 2011 08:15:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58702) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RJ2Nv-0005bo-O7 for qemu-devel@nongnu.org; Wed, 26 Oct 2011 08:15:07 -0400 Message-ID: <4EA7FA83.8070000@redhat.com> Date: Wed, 26 Oct 2011 14:18:11 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <4EA6ACFE.6090109@redhat.com> <4EA6B41B.3000903@codemonkey.ws> <4EA6C00B.3030701@redhat.com> <4EA6C25C.8000502@codemonkey.ws> <4EA7C1B8.9000903@redhat.com> <20111026095709.GF29496@redhat.com> <4EA7ED99.1020700@redhat.com> <20111026113932.GJ29496@redhat.com> In-Reply-To: <20111026113932.GJ29496@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] KVM call agenda for October 25 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" Cc: Paolo Bonzini , Markus Armbruster , kvm@vger.kernel.org, qemu-devel@nongnu.org Am 26.10.2011 13:39, schrieb Daniel P. Berrange: > On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote: >> Am 26.10.2011 11:57, schrieb Daniel P. Berrange: >>> On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote: >>>> Kevin Wolf writes: >>>> >>>>> Am 25.10.2011 16:06, schrieb Anthony Liguori: >>>>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote: >>>>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori: >>>>>>>> I'd be much more open to changing the default mode to cache=none FWIW since the >>>>>>>> risk of data loss there is much, much lower. >>>>>>> >>>>>>> I think people said that they'd rather not have cache=none as default >>>>>>> because O_DIRECT doesn't work everywhere. >>>>>> >>>>>> Where doesn't it work these days? I know it doesn't work on tmpfs. I know it >>>>>> works on ext[234], btrfs, nfs. >>>>> >>>>> Besides file systems (and probably OSes) that don't support O_DIRECT, >>>>> there's another case: Our defaults don't work on 4k sector disks today. >>>>> You need to explicitly specify the logical_block_size qdev property for >>>>> cache=none to work on them. >>>>> >>>>> And changing this default isn't trivial as the right value doesn't only >>>>> depend on the host disk, but it's also guest visible. The only way out >>>>> would be bounce buffers, but I'm not sure that doing that silently is a >>>>> good idea... >>>> >>>> Sector size is a device property. >>>> >>>> If the user asks for a 4K sector disk, and the backend can't support >>>> that, we need to reject the configuration. Just like we reject >>>> read-only backends for read/write disks. >>> >>> I don't see why we need to reject a guest disk with 4k sectors, >>> just because the host disk only has 512 byte sectors. A guest >>> sector size that's a larger multiple of host sector size should >>> work just fine. It just means any guest sector write will update >>> 8 host sectors at a time. We only have problems if guest sector >>> size is not a multiple of host sector size, in which case bounce >>> buffers are the only option (other than rejecting the config >>> which is not too nice). >>> >>> IIUC, current QEMU behaviour is >>> >>> Guest 512 Guest 4k >>> Host 512 * OK OK >>> Host 4k * I/O Err OK >>> >>> '*' marks defaults >>> >>> IMHO, QEMU needs to work withot I/O errors in all of these >>> combinations, even if this means having to use bounce buffers >>> in some of them. That said, IMHO the default should be for >>> QEMU to avoid bounce buffers, which implies it should either >>> chose guest sector size to match host sector size, or it >>> should unconditionally use 4k guest. IMHO we need the former >>> >>> Guest 512 Guest 4k >>> Host 512 *OK OK >>> Host 4k OK *OK >> >> I'm not sure if a 4k host should imply a 4k guest by default. This means >> that some guests wouldn't be able to run on a 4k host. On the other >> hand, for those guests that can do 4k, it would be the much better option. >> >> So I think this decision is the hard thing about it. > > I guess it somewhat depends whether we want to strive for > > 1. Give the user the fastest working config by default > 2. Give the user a working config by default > 3. Give the user the fastest (possibly broken) config by default > > IMHO 3 is not a serious option, but I could see 2 as a reasonable > tradeoff to avoid complexity in chosing QEMU defaults. The user > would have a working config with 512 sectors, but sub-optimal perf > on 4k hosts due to bounce buffering. Ideally libvirt or other > higher app would be setting the best block size that a guest > can support by default, so bounce buffers would rarely be needed. > So only people using QEMU directly without setting a block size > would ordinarily suffer the bounce buffer perf hit on a 4k host > host Yes, I'm currently tending towards this plus a warning on stderr if bounce buffering is used. Or, coming back to the original subject of this discussion, we can default to cache=writeback and forget about alignment. If you specify cache=none, you have to take care to explicitly specify a block size > 512 bytes, too. Maybe the best is actually to do both: Default to cache=writeback, completely avoiding bounce buffers. If the user specifies cache=none, but doesn't change the sector size of the virtual disk, print a warning and enable bounce buffers. Kevin