From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:54122) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RJ1Wd-0000oJ-QK for qemu-devel@nongnu.org; Wed, 26 Oct 2011 07:20:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RJ1Wc-0003Lz-BX for qemu-devel@nongnu.org; Wed, 26 Oct 2011 07:20:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:31760) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RJ1Wc-0003Lf-0b for qemu-devel@nongnu.org; Wed, 26 Oct 2011 07:20:02 -0400 Message-ID: <4EA7ED99.1020700@redhat.com> Date: Wed, 26 Oct 2011 13:23:05 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <4EA6ACFE.6090109@redhat.com> <4EA6B41B.3000903@codemonkey.ws> <4EA6C00B.3030701@redhat.com> <4EA6C25C.8000502@codemonkey.ws> <4EA7C1B8.9000903@redhat.com> <20111026095709.GF29496@redhat.com> In-Reply-To: <20111026095709.GF29496@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] KVM call agenda for October 25 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" Cc: Paolo Bonzini , Markus Armbruster , kvm@vger.kernel.org, qemu-devel@nongnu.org Am 26.10.2011 11:57, schrieb Daniel P. Berrange: > On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote: >> Kevin Wolf writes: >> >>> Am 25.10.2011 16:06, schrieb Anthony Liguori: >>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote: >>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori: >>>>>> I'd be much more open to changing the default mode to cache=none FWIW since the >>>>>> risk of data loss there is much, much lower. >>>>> >>>>> I think people said that they'd rather not have cache=none as default >>>>> because O_DIRECT doesn't work everywhere. >>>> >>>> Where doesn't it work these days? I know it doesn't work on tmpfs. I know it >>>> works on ext[234], btrfs, nfs. >>> >>> Besides file systems (and probably OSes) that don't support O_DIRECT, >>> there's another case: Our defaults don't work on 4k sector disks today. >>> You need to explicitly specify the logical_block_size qdev property for >>> cache=none to work on them. >>> >>> And changing this default isn't trivial as the right value doesn't only >>> depend on the host disk, but it's also guest visible. The only way out >>> would be bounce buffers, but I'm not sure that doing that silently is a >>> good idea... >> >> Sector size is a device property. >> >> If the user asks for a 4K sector disk, and the backend can't support >> that, we need to reject the configuration. Just like we reject >> read-only backends for read/write disks. > > I don't see why we need to reject a guest disk with 4k sectors, > just because the host disk only has 512 byte sectors. A guest > sector size that's a larger multiple of host sector size should > work just fine. It just means any guest sector write will update > 8 host sectors at a time. We only have problems if guest sector > size is not a multiple of host sector size, in which case bounce > buffers are the only option (other than rejecting the config > which is not too nice). > > IIUC, current QEMU behaviour is > > Guest 512 Guest 4k > Host 512 * OK OK > Host 4k * I/O Err OK > > '*' marks defaults > > IMHO, QEMU needs to work withot I/O errors in all of these > combinations, even if this means having to use bounce buffers > in some of them. That said, IMHO the default should be for > QEMU to avoid bounce buffers, which implies it should either > chose guest sector size to match host sector size, or it > should unconditionally use 4k guest. IMHO we need the former > > Guest 512 Guest 4k > Host 512 *OK OK > Host 4k OK *OK I'm not sure if a 4k host should imply a 4k guest by default. This means that some guests wouldn't be able to run on a 4k host. On the other hand, for those guests that can do 4k, it would be the much better option. So I think this decision is the hard thing about it. > Yes, I know there are other wierd sector sizes besides 512 > and 4k, but the same general principals apply of either one > being a multiple of the other, or needing to use bounce > buffers. > >> If the backend can only support it by using bounce buffers, I'd say >> reject it unless the user explicitly permits bounce buffers. But that's >> debatable. > > I don't think it really adds value for QEMU to force the user to specify > some extra magic flag in order to make the user's requested config > actually be honoured. The user's requested config is often enough something like "-hda foo.img". Give me a working disk, I don't care how do you it. (And of course I don't tell you what sector sizes my guest can cope with) > If a config needs bounce buffers, QEMU should just > do it, without needing 'use-bounce-buffers=1'. A higher level mgmt app > is in a better position to inform users about the consequences. A higher level management app doesn't exist in the general case. Kevin