qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/19] block: Support for 512b-on-4k emulation
@ 2013-12-06 17:22 Kevin Wolf
  2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 01/19] qemu_memalign: Allow small alignments Kevin Wolf
                   ` (20 more replies)
  0 siblings, 21 replies; 32+ messages in thread
From: Kevin Wolf @ 2013-12-06 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, armbru, stefanha

This patch series adds code to the block layer that allows performing
I/O requests in smaller granularities than required by the host backend
(most importantly, O_DIRECT restrictions). It achieves this for reads
by rounding the request to host-side block boundary, and for writes by
performing a read-modify-write cycle (and serialising requests
touching the same block so that the RMW doesn't write back stale data).

Originally I intended to reuse a lot of code from Paolo's previous
patch series, however as I tried to integrate pread/pwrite, which
already do a very similar thing (except for considering concurrency),
and because I wanted to implement zero-copy, most of this series ended
up being new code.

Zero-copy is possible in a common case because while XFS defauls to a
4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E
disks, it still only has a 512 byte memory alignment requirement.
(Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but
we know that the value is wrong and can probe it.)


This series does not cover 4k guests on a 512 byte host, and I'm not
sure yet what to do with this case. Paolos series contained a patch to
protect against "torn reads" (i.e. reads running in parallel with
writes, which return old data for one half of a sector and new data for
the other half) by serialising requests if the guest block size was
greater than the host block size.

One problem with this approach is that it assumes that a single host
block size even exists and can be compared against on the top level.
Different backing files can be stored on different storage, though, with
different block sizes.

Another problem is that block drivers can split requests internally
(imagine a qcow2 image with 512 byte clusters), which would have to be
detected as well.

Finally, it's unclear what to do with cache modes using the kernel page
cache. Technically, these have a required alignment of 1 byte, which is
always smaller than the guest alignment. We always have to expect short
writes, so we can't say "it's always the granularity of the request".
However, serialising _every_ request certainly doesn't seem reasonable;
we've never done it, and we've never got any bug reports.

Other non-file protocols may have the same problem.

(And all of this is ignoring that with multiple users of the block
device - e.g. guest device, NBD server, block jobs - there isn't even a
single guest block size, but it must be passed per request if done
properly.)


Anyway, so I'm hoping for a review of this series in order to get
512b-on-4k merged soon, and some help/discussion for the 4k-on-512
case.

Kevin Wolf (17):
  qemu_memalign: Allow small alignments
  block: Detect unaligned length in bdrv_qiov_is_aligned()
  block: Don't use guest sector size for qemu_blockalign()
  block: Introduce bdrv_aligned_preadv()
  block: Introduce bdrv_co_do_preadv()
  block: Introduce bdrv_aligned_pwritev()
  block: write: Handle COR dependency after I/O throttling
  block: Introduce bdrv_co_do_pwritev()
  block: Switch BdrvTrackedRequest to byte granularity
  block: Allow waiting for overlapping requests between begin/end
  block: Make zero-after-EOF work with larger alignment
  block: Generalise and optimise COR serialisation
  block: Make overlap range for serialisation dynamic
  block: Align requests in bdrv_co_do_pwritev()
  block: Change coroutine wrapper to byte granularity
  block: Make bdrv_pread() a  bdrv_prwv_co() wrapper
  block: Make bdrv_pwrite() a  bdrv_prwv_co() wrapper

Paolo Bonzini (2):
  block: rename buffer_alignment to guest_block_size
  raw: Probe required direct I/O alignment

 block.c                   | 572 ++++++++++++++++++++++++++++++----------------
 block/backup.c            |   7 +-
 block/raw-posix.c         | 102 +++++++--
 block/raw-win32.c         |  41 ++++
 hw/block/virtio-blk.c     |   2 +-
 hw/ide/core.c             |   2 +-
 hw/scsi/scsi-disk.c       |   2 +-
 hw/scsi/scsi-generic.c    |   2 +-
 include/block/block.h     |   3 +-
 include/block/block_int.h |  24 +-
 util/oslib-posix.c        |   5 +
 11 files changed, 539 insertions(+), 223 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-12-11  2:43 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-06 17:22 [Qemu-devel] [RFC PATCH 00/19] block: Support for 512b-on-4k emulation Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 01/19] qemu_memalign: Allow small alignments Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 02/19] block: Detect unaligned length in bdrv_qiov_is_aligned() Kevin Wolf
2013-12-06 19:12   ` Eric Blake
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 03/19] block: Don't use guest sector size for qemu_blockalign() Kevin Wolf
2013-12-10  3:18   ` Wenchao Xia
2013-12-10  9:42     ` Kevin Wolf
2013-12-11  2:43       ` Wenchao Xia
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 04/19] block: rename buffer_alignment to guest_block_size Kevin Wolf
2013-12-10  3:25   ` Wenchao Xia
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 05/19] raw: Probe required direct I/O alignment Kevin Wolf
2013-12-06 17:53   ` Paolo Bonzini
2013-12-09 12:58     ` Kevin Wolf
2013-12-09 13:40       ` Paolo Bonzini
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 06/19] block: Introduce bdrv_aligned_preadv() Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 07/19] block: Introduce bdrv_co_do_preadv() Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 08/19] block: Introduce bdrv_aligned_pwritev() Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 09/19] block: write: Handle COR dependency after I/O throttling Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 10/19] block: Introduce bdrv_co_do_pwritev() Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 11/19] block: Switch BdrvTrackedRequest to byte granularity Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 12/19] block: Allow waiting for overlapping requests between begin/end Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 13/19] block: Make zero-after-EOF work with larger alignment Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 14/19] block: Generalise and optimise COR serialisation Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 15/19] block: Make overlap range for serialisation dynamic Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 16/19] block: Align requests in bdrv_co_do_pwritev() Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 17/19] block: Change coroutine wrapper to byte granularity Kevin Wolf
2013-12-06 17:22 ` [Qemu-devel] [RFC PATCH 18/19] block: Make bdrv_pread() a bdrv_prwv_co() wrapper Kevin Wolf
2013-12-06 17:23 ` [Qemu-devel] [RFC PATCH 19/19] block: Make bdrv_pwrite() " Kevin Wolf
2013-12-06 17:55 ` [Qemu-devel] [RFC PATCH 00/19] block: Support for 512b-on-4k emulation Paolo Bonzini
2013-12-09 11:16   ` Kevin Wolf
2013-12-09 12:51 ` Stefan Hajnoczi
2013-12-09 13:02   ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).