qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] KVM call agenda for October 25
Date: Wed, 26 Oct 2011 14:18:11 +0200	[thread overview]
Message-ID: <4EA7FA83.8070000@redhat.com> (raw)
In-Reply-To: <20111026113932.GJ29496@redhat.com>

Am 26.10.2011 13:39, schrieb Daniel P. Berrange:
> On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote:
>> Am 26.10.2011 11:57, schrieb Daniel P. Berrange:
>>> On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
>>>> Kevin Wolf <kwolf@redhat.com> writes:
>>>>
>>>>> Am 25.10.2011 16:06, schrieb Anthony Liguori:
>>>>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote:
>>>>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori:
>>>>>>>> I'd be much more open to changing the default mode to cache=none FWIW since the
>>>>>>>> risk of data loss there is much, much lower.
>>>>>>>
>>>>>>> I think people said that they'd rather not have cache=none as default
>>>>>>> because O_DIRECT doesn't work everywhere.
>>>>>>
>>>>>> Where doesn't it work these days?  I know it doesn't work on tmpfs.  I know it 
>>>>>> works on ext[234], btrfs, nfs.
>>>>>
>>>>> Besides file systems (and probably OSes) that don't support O_DIRECT,
>>>>> there's another case: Our defaults don't work on 4k sector disks today.
>>>>> You need to explicitly specify the logical_block_size qdev property for
>>>>> cache=none to work on them.
>>>>>
>>>>> And changing this default isn't trivial as the right value doesn't only
>>>>> depend on the host disk, but it's also guest visible. The only way out
>>>>> would be bounce buffers, but I'm not sure that doing that silently is a
>>>>> good idea...
>>>>
>>>> Sector size is a device property.
>>>>
>>>> If the user asks for a 4K sector disk, and the backend can't support
>>>> that, we need to reject the configuration.  Just like we reject
>>>> read-only backends for read/write disks.
>>>
>>> I don't see why we need to reject a guest disk with 4k sectors,
>>> just because the host disk only has 512 byte sectors. A guest
>>> sector size that's a larger multiple of host sector size should
>>> work just fine. It just means any guest sector write will update
>>> 8 host sectors at a time. We only have problems if guest sector
>>> size is not a multiple of host sector size, in which case bounce
>>> buffers are the only option (other than rejecting the config
>>> which is not too nice).
>>>
>>> IIUC, current QEMU behaviour is
>>>
>>>            Guest 512    Guest 4k
>>>  Host 512   * OK          OK
>>>  Host 4k    * I/O Err     OK
>>>
>>> '*' marks defaults
>>>
>>> IMHO, QEMU needs to work withot I/O errors in all of these
>>> combinations, even if this means having to use bounce buffers
>>> in some of them. That said, IMHO the default should be for
>>> QEMU to avoid bounce buffers, which implies it should either
>>> chose guest sector size to match host sector size, or it
>>> should unconditionally use 4k guest. IMHO we need the former
>>>
>>>            Guest 512  Guest 4k
>>>  Host 512   *OK         OK
>>>  Host 4k     OK        *OK
>>
>> I'm not sure if a 4k host should imply a 4k guest by default. This means
>> that some guests wouldn't be able to run on a 4k host. On the other
>> hand, for those guests that can do 4k, it would be the much better option.
>>
>> So I think this decision is the hard thing about it.
> 
> I guess it somewhat depends whether we want to strive for
> 
>  1. Give the user the fastest working config by default
>  2. Give the user a working config by default
>  3. Give the user the fastest (possibly broken) config by default
> 
> IMHO 3 is not a serious option, but I could see 2 as a reasonable
> tradeoff to avoid complexity in chosing QEMU defaults. The user
> would have a working config with 512 sectors, but sub-optimal perf
> on 4k hosts due to bounce buffering. Ideally libvirt or other
> higher app would be setting the best block size that a guest
> can support by default, so bounce buffers would rarely be needed.
> So only people using QEMU directly without setting a block size
> would ordinarily suffer the bounce buffer perf hit on a 4k host
> host

Yes, I'm currently tending towards this plus a warning on stderr if
bounce buffering is used.

Or, coming back to the original subject of this discussion, we can
default to cache=writeback and forget about alignment. If you specify
cache=none, you have to take care to explicitly specify a block size >
512 bytes, too.

Maybe the best is actually to do both: Default to cache=writeback,
completely avoiding bounce buffers. If the user specifies cache=none,
but doesn't change the sector size of the virtual disk, print a warning
and enable bounce buffers.

Kevin

      reply	other threads:[~2011-10-26 12:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-24 11:04 [Qemu-devel] KVM call agenda for October 25 Juan Quintela
2011-10-24 11:35 ` Paolo Bonzini
2011-10-24 12:02   ` Peter Maydell
2011-10-24 13:06     ` Andreas Färber
2011-10-24 15:34     ` Luiz Capitulino
2011-10-25 12:35   ` Kevin Wolf
2011-10-25 13:05     ` Anthony Liguori
2011-10-25 13:18       ` Dor Laor
2011-10-25 13:28         ` Anthony Liguori
2011-10-25 13:40         ` Andreas Färber
2011-10-25 13:56       ` Kevin Wolf
2011-10-25 14:06         ` Anthony Liguori
2011-10-25 15:32           ` Kevin Wolf
2011-10-25 22:19             ` Alexander Graf
2011-10-26 20:41             ` Anthony Liguori
2011-10-26  8:15           ` Kevin Wolf
2011-10-26  8:48             ` Markus Armbruster
2011-10-26  9:41               ` Paolo Bonzini
2011-10-26 11:12                 ` Markus Armbruster
2011-10-26  9:57               ` Daniel P. Berrange
2011-10-26 11:23                 ` Kevin Wolf
2011-10-26 11:39                   ` Daniel P. Berrange
2011-10-26 12:18                     ` Kevin Wolf [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EA7FA83.8070000@redhat.com \
    --to=kwolf@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).