[Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Maxim Levitsky <mlevitsk@redhat.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	qemu-block@nongnu.org, Maxim Levitsky <mlevitsk@redhat.com>,
	John Ferlan <jferlan@redhat.com>, Max Reitz <mreitz@redhat.com>
Subject: [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices
Date: Thu,  4 Jul 2019 15:43:41 +0300	[thread overview]
Message-ID: <20190704124342.7753-1-mlevitsk@redhat.com> (raw)

Linux block devices, even in O_DIRECT mode don't have any user visible
limit on transfer size / number of segments, which underlying kernel block device can have.
The kernel block layer takes care of enforcing these limits by splitting the bios.

By limiting the transfer sizes, we force qemu to do the splitting itself which
introduces various overheads.
It is especially visible in nbd server, where the low max transfer size of the
underlying device forces us to advertise this over NBD, thus increasing the
traffic overhead in case of image conversion which benefits from large blocks.

More information can be found here:
https://bugzilla.redhat.com/show_bug.cgi?id=1647104

Tested this with qemu-img convert over nbd and natively and to my surprise,
even native IO performance improved a bit.

(The device on which it was tested is Intel Optane DC P4800X,
which has 128k max transfer size reported by the kernel)

The benchmark:

Images were created using:

Sparse image:  qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G
Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata  1G / 10G / 100G

The test was:

 echo "convert native:"
 rm -rf /dev/shm/disk.img
 time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero

 echo "convert via nbd:"
 qemu-nbd -k /tmp/nbd.sock -v  -f qcow2 $FILE -x export --cache=none --aio=native --fork
 rm -rf /dev/shm/disk.img
 time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero

The results:

=========================================
1G sparse image:
 native:
	before: 0.027s
	after: 0.027s
 nbd:
	before: 0.287s
	after: 0.035s

=========================================
100G sparse image:
 native:
	before: 0.028s
	after: 0.028s
 nbd:
	before: 23.796s
	after: 0.109s

=========================================
1G preallocated image:
 native:
       before: 0.454s
       after: 0.427s
 nbd:
       before: 0.649s
       after: 0.546s

The block limits of max transfer size/max segment size are retained
for the SCSI passthrough because in this case the kernel passes the userspace request
directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split
such requests.

Fam, since you was the original author of the code that added
these limits, could you share your opinion on that?
What was the reason besides SCSI passthrough?

V2:

*  Manually tested to not break the scsi passthrough with a nested VM
*  As Eric suggested, refactored the area around the fstat.
*  Spelling/grammar fixes

Best regards,
	Maxim Levitsky

Maxim Levitsky (1):
  raw-posix.c - use max transfer length / max segement count only for
    SCSI passthrough

 block/file-posix.c | 54 ++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 26 deletions(-)

-- 
2.17.2

next             reply	other threads:[~2019-07-04 12:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-04 12:43 Maxim Levitsky [this message]
2019-07-04 12:43 ` [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough Maxim Levitsky
2019-07-10 13:43   ` Maxim Levitsky
2019-07-11 10:31 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Stefan Hajnoczi
2019-07-12  8:32 ` [Qemu-devel] " Pankaj Gupta
2019-07-12  9:20 ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190704124342.7753-1-mlevitsk@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=fam@euphon.net \
    --cc=jferlan@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).