qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
	qemu-devel@nongnu.org,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical block sizes
Date: Wed, 14 Dec 2011 13:40:22 +0100	[thread overview]
Message-ID: <4EE89936.5030007@redhat.com> (raw)
In-Reply-To: <4EE890FF.1050004@redhat.com>

On 12/14/2011 01:05 PM, Kevin Wolf wrote:
> Am 14.12.2011 12:47, schrieb Paolo Bonzini:
>> On 12/14/2011 12:13 PM, Kevin Wolf wrote:
>>> As we discussed before, the really interesting point here is defaults,
>>> and whatever you choose to do is wrong in some respect.
>>>
>>> So it looks like you chose to make the virtual device default to the
>>> host block size.
>>
>> ... wait wait, I default to 512. :)
>>
>> Here is the rationale.  512-over-4k may be slow, but is safe (but it is
>> not slow if you align partitions properly).  4k-over-512 is unsafe.  So,
>> defaulting to 512 seemed the right thing after all.
>
> Which means bounce buffers by default on 4k hosts.

In practice it doesn't (quite surprisingly).  The patches do the following:

- if the initial and ending sector is aligned, submit directly to paio. 
  Otherwise, only bounce the extra host sectors (up to 2) required to 
align the operation: the bulk of the request will reuse the guest's data 
buffer.

- if the buffer is not 4k-aligned, paio will linearize the request with 
a bounce buffer.  This will almost always happen if the initial sector 
is misaligned, but not if only the ending sector is misaligned.

If the partitions are aligned, the OS will always issue aligned 
requests, because file system blocks are already 4k.  And kernel buffers 
will usually be page-aligned rather than block-aligned, so paio will 
also let the request through.  You'll see perhaps half a dozen 
misaligned requests in the whole guest lifetime, for example to read the 
partition table.  So for aligned partitions, the performance difference 
on iozone was well within statistical noise.

> Is this going to
> become our next cache=writethrough? At some point 4k disks will be in
> wide use, but we'll still be stuck with a slow default of 512.

Unless you switch to EFI, the boot disk has to remain anyway on 512-byte 
blocks.

> No matter what we decide here, I think it might really be a good idea to
> save the block size in the image and use that as the default if nothing
> else is specified on the command line.

Yeah, that's sensible to do (though it can be a follow-up).

Paolo

  reply	other threads:[~2011-12-14 12:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-13 12:37 [Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical block sizes Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 01/17] block: do not rely on open_flags for bdrv_is_snapshot Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 02/17] block: store actual flags in bs->open_flags Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 03/17] block: pass protocol flags up to the format Paolo Bonzini
2011-12-15  4:10   ` Zhi Yong Wu
2011-12-13 12:37 ` [Qemu-devel] [PATCH 04/17] block: non-raw protocols never cache Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 05/17] block: remove enable_write_cache Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 06/17] block: move flag bits together Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 07/17] raw: remove the aligned_buf Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 08/17] block: rename buffer_alignment to guest_block_size Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 09/17] block: add host_block_size Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 10/17] raw: probe host_block_size Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 11/17] iscsi: save host block size Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 12/17] block: allow waiting only for overlapping writes Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 13/17] block: allow waiting at arbitrary granularity Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 14/17] block: protect against "torn reads" for guest_block_size > host_block_size Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 15/17] block: align and serialize I/O when guest_block_size < host_block_size Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 16/17] block: default physical block size to host block size Paolo Bonzini
2011-12-13 12:37 ` [Qemu-devel] [PATCH 17/17] qemu-io: add blocksize argument to open Paolo Bonzini
2011-12-14 11:13 ` [Qemu-devel] [PATCH 00/17] Support mismatched host and guest logical block sizes Kevin Wolf
2011-12-14 11:47   ` Paolo Bonzini
2011-12-14 12:05     ` Kevin Wolf
2011-12-14 12:40       ` Paolo Bonzini [this message]
2011-12-21 16:55         ` Christoph Hellwig
2011-12-21 17:00           ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EE89936.5030007@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=hch@lst.de \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).