From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:54122)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1RJ1Wd-0000oJ-QK
	for qemu-devel@nongnu.org; Wed, 26 Oct 2011 07:20:04 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1RJ1Wc-0003Lz-BX
	for qemu-devel@nongnu.org; Wed, 26 Oct 2011 07:20:03 -0400
Received: from mx1.redhat.com ([209.132.183.28]:31760)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1RJ1Wc-0003Lf-0b
	for qemu-devel@nongnu.org; Wed, 26 Oct 2011 07:20:02 -0400
Message-ID: <4EA7ED99.1020700@redhat.com>
Date: Wed, 26 Oct 2011 13:23:05 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <m3ty6yslb1.fsf@neno.neno> <j83ihr$ec5$1@dough.gmane.org>
	<4EA6ACFE.6090109@redhat.com> <4EA6B41B.3000903@codemonkey.ws>
	<4EA6C00B.3030701@redhat.com> <4EA6C25C.8000502@codemonkey.ws>
	<4EA7C1B8.9000903@redhat.com>
	<m3bot49m0j.fsf@blackfin.pond.sub.org>
	<20111026095709.GF29496@redhat.com>
In-Reply-To: <20111026095709.GF29496@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] KVM call agenda for October 25
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Markus Armbruster <armbru@redhat.com>, kvm@vger.kernel.org, qemu-devel@nongnu.org

Am 26.10.2011 11:57, schrieb Daniel P. Berrange:
> On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
>> Kevin Wolf <kwolf@redhat.com> writes:
>>
>>> Am 25.10.2011 16:06, schrieb Anthony Liguori:
>>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote:
>>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori:
>>>>>> I'd be much more open to changing the default mode to cache=none FWIW since the
>>>>>> risk of data loss there is much, much lower.
>>>>>
>>>>> I think people said that they'd rather not have cache=none as default
>>>>> because O_DIRECT doesn't work everywhere.
>>>>
>>>> Where doesn't it work these days?  I know it doesn't work on tmpfs.  I know it 
>>>> works on ext[234], btrfs, nfs.
>>>
>>> Besides file systems (and probably OSes) that don't support O_DIRECT,
>>> there's another case: Our defaults don't work on 4k sector disks today.
>>> You need to explicitly specify the logical_block_size qdev property for
>>> cache=none to work on them.
>>>
>>> And changing this default isn't trivial as the right value doesn't only
>>> depend on the host disk, but it's also guest visible. The only way out
>>> would be bounce buffers, but I'm not sure that doing that silently is a
>>> good idea...
>>
>> Sector size is a device property.
>>
>> If the user asks for a 4K sector disk, and the backend can't support
>> that, we need to reject the configuration.  Just like we reject
>> read-only backends for read/write disks.
> 
> I don't see why we need to reject a guest disk with 4k sectors,
> just because the host disk only has 512 byte sectors. A guest
> sector size that's a larger multiple of host sector size should
> work just fine. It just means any guest sector write will update
> 8 host sectors at a time. We only have problems if guest sector
> size is not a multiple of host sector size, in which case bounce
> buffers are the only option (other than rejecting the config
> which is not too nice).
> 
> IIUC, current QEMU behaviour is
> 
>            Guest 512    Guest 4k
>  Host 512   * OK          OK
>  Host 4k    * I/O Err     OK
> 
> '*' marks defaults
> 
> IMHO, QEMU needs to work withot I/O errors in all of these
> combinations, even if this means having to use bounce buffers
> in some of them. That said, IMHO the default should be for
> QEMU to avoid bounce buffers, which implies it should either
> chose guest sector size to match host sector size, or it
> should unconditionally use 4k guest. IMHO we need the former
> 
>            Guest 512  Guest 4k
>  Host 512   *OK         OK
>  Host 4k     OK        *OK

I'm not sure if a 4k host should imply a 4k guest by default. This means
that some guests wouldn't be able to run on a 4k host. On the other
hand, for those guests that can do 4k, it would be the much better option.

So I think this decision is the hard thing about it.

> Yes, I know there are other wierd sector sizes besides 512
> and 4k, but the same general principals apply of either one
> being a multiple of the other, or needing to use bounce
> buffers.
> 
>> If the backend can only support it by using bounce buffers, I'd say
>> reject it unless the user explicitly permits bounce buffers.  But that's
>> debatable.
> 
> I don't think it really adds value for QEMU to force the user to specify
> some extra magic flag in order to make the user's requested config
> actually be honoured. 

The user's requested config is often enough something like "-hda
foo.img". Give me a working disk, I don't care how do you it. (And of
course I don't tell you what sector sizes my guest can cope with)

> If a config needs bounce buffers, QEMU should just
> do it, without needing 'use-bounce-buffers=1'. A higher level mgmt app
> is in a better position to inform users about the consequences.

A higher level management app doesn't exist in the general case.

Kevin