qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"Denis V. Lunev" <den@openvz.org>
Subject: [Qemu-devel] [PULL 34/38] block: align bounce buffers to page
Date: Fri, 22 May 2015 10:02:06 +0100	[thread overview]
Message-ID: <1432285330-13994-35-git-send-email-stefanha@redhat.com> (raw)
In-Reply-To: <1432285330-13994-1-git-send-email-stefanha@redhat.com>

From: "Denis V. Lunev" <den@openvz.org>

The following sequence
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    for (i = 0; i < 100000; i++)
            write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes.

The patch changes default bounce buffer optimal alignment to
MAX(page size, 4k). 4k is chosen as maximal known sector size on real
HDD.

The justification of the performance improve is quite interesting.
>From the kernel point of view each request to the disk was split
by two. This could be seen by blktrace like this:
  9,0   11  1     0.000000000 11151  Q  WS 312737792 + 1023 [qemu-img]
  9,0   11  2     0.000007938 11151  Q  WS 312738815 + 8 [qemu-img]
  9,0   11  3     0.000030735 11151  Q  WS 312738823 + 1016 [qemu-img]
  9,0   11  4     0.000032482 11151  Q  WS 312739839 + 8 [qemu-img]
  9,0   11  5     0.000041379 11151  Q  WS 312739847 + 1016 [qemu-img]
  9,0   11  6     0.000042818 11151  Q  WS 312740863 + 8 [qemu-img]
  9,0   11  7     0.000051236 11151  Q  WS 312740871 + 1017 [qemu-img]
  9,0    5  1     0.169071519 11151  Q  WS 312741888 + 1023 [qemu-img]
After the patch the pattern becomes normal:
  9,0    6  1     0.000000000 12422  Q  WS 314834944 + 1024 [qemu-img]
  9,0    6  2     0.000038527 12422  Q  WS 314835968 + 1024 [qemu-img]
  9,0    6  3     0.000072849 12422  Q  WS 314836992 + 1024 [qemu-img]
  9,0    6  4     0.000106276 12422  Q  WS 314838016 + 1024 [qemu-img]
and the amount of requests sent to disk (could be calculated counting
number of lines in the output of blktrace) is reduced about 2 times.

Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
does his job well and real requests comes properly aligned (to page).

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-3-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c           |  8 ++++----
 block/io.c        |  2 +-
 block/raw-posix.c | 13 +++++++------
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index e293907..325f727 100644
--- a/block.c
+++ b/block.c
@@ -106,8 +106,8 @@ int is_windows_drive(const char *filename)
 size_t bdrv_opt_mem_align(BlockDriverState *bs)
 {
     if (!bs || !bs->drv) {
-        /* 4k should be on the safe side */
-        return 4096;
+        /* page size or 4k (hdd sector size) should be on the safe side */
+        return MAX(4096, getpagesize());
     }
 
     return bs->bl.opt_mem_alignment;
@@ -116,8 +116,8 @@ size_t bdrv_opt_mem_align(BlockDriverState *bs)
 size_t bdrv_min_mem_align(BlockDriverState *bs)
 {
     if (!bs || !bs->drv) {
-        /* 4k should be on the safe side */
-        return 4096;
+        /* page size or 4k (hdd sector size) should be on the safe side */
+        return MAX(4096, getpagesize());
     }
 
     return bs->bl.min_mem_alignment;
diff --git a/block/io.c b/block/io.c
index e6c3639..b3ea95c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -205,7 +205,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
     } else {
         bs->bl.min_mem_alignment = 512;
-        bs->bl.opt_mem_alignment = 512;
+        bs->bl.opt_mem_alignment = getpagesize();
     }
 
     if (bs->backing_hd) {
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7083924..2990e95 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -301,6 +301,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
     char *buf;
+    size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
 
     /* For /dev/sg devices the alignment is not really used.
        With buffered I/O, we don't have any restrictions. */
@@ -330,9 +331,9 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     /* If we could not get the sizes so far, we can only guess them */
     if (!s->buf_align) {
         size_t align;
-        buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
-        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
-            if (raw_is_io_aligned(fd, buf + align, MAX_BLOCKSIZE)) {
+        buf = qemu_memalign(max_align, 2 * max_align);
+        for (align = 512; align <= max_align; align <<= 1) {
+            if (raw_is_io_aligned(fd, buf + align, max_align)) {
                 s->buf_align = align;
                 break;
             }
@@ -342,8 +343,8 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
 
     if (!bs->request_alignment) {
         size_t align;
-        buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
-        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+        buf = qemu_memalign(s->buf_align, max_align);
+        for (align = 512; align <= max_align; align <<= 1) {
             if (raw_is_io_aligned(fd, buf, align)) {
                 bs->request_alignment = align;
                 break;
@@ -726,7 +727,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 
     raw_probe_alignment(bs, s->fd, errp);
     bs->bl.min_mem_alignment = s->buf_align;
-    bs->bl.opt_mem_alignment = s->buf_align;
+    bs->bl.opt_mem_alignment = MAX(s->buf_align, getpagesize());
 }
 
 static int check_for_dasd(int fd)
-- 
2.1.0

  parent reply	other threads:[~2015-05-22  9:03 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-22  9:01 [Qemu-devel] [PULL 00/38] Block patches Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 01/38] iotests, parallels: quote TEST_IMG in 076 test to be path-safe Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 02/38] block/parallels: rename parallels_header to ParallelsHeader Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 03/38] block/parallels: switch to bdrv_read Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 04/38] block/parallels: read up to cluster end in one go Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 05/38] block/parallels: add get_block_status Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 06/38] block/parallels: provide _co_readv routine for parallels format driver Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 07/38] block/parallels: replace magic constants 4, 64 with proper sizeofs Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 08/38] block/parallels: mark parallels format driver as zero inited Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 09/38] block/parallels: _co_writev callback for Parallels format Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 10/38] iotests, parallels: test for write into Parallels image Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 11/38] block/parallels: support parallels image creation Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 12/38] iotests, parallels: test for newly created parallels image via qemu-img Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 13/38] parallels: change copyright information in the image header Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 14/38] block/parallels: rename catalog_ names to bat_ Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 15/38] block/parallels: create bat2sect helper Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 16/38] block/parallels: keep BAT bitmap data in little endian in memory Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 17/38] block/parallels: read parallels image header and BAT into single buffer Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 18/38] block/parallels: move parallels_open/probe to the very end of the file Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 19/38] block/parallels: implement parallels_check method of block driver Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 20/38] block/parallels: implement incorrect close detection Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 21/38] iotests, parallels: check for incorrectly closed image in tests Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 22/38] block/parallels: improve image reading performance Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 23/38] block/parallels: create bat_entry_off helper Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 24/38] block/parallels: delay writing to BAT till bdrv_co_flush_to_os Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 25/38] block/parallels: add prealloc-mode and prealloc-size open paramemets Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 26/38] block/parallels: optimize linear image expansion Stefan Hajnoczi
2015-05-22  9:01 ` [Qemu-devel] [PULL 27/38] block/parallels: improve image writing performance further Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 28/38] configure: handle clang -nopie argument warning Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 29/38] configure: factor out supported flag check Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 30/38] configure: silence glib unknown attribute __alloc_size__ Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 31/38] configure: Add workaround for ccache and clang Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 32/38] block: return EPERM on writes or discards to read-only devices Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 33/38] block: minimal bounce buffer alignment Stefan Hajnoczi
2015-05-22  9:02 ` Stefan Hajnoczi [this message]
2015-05-22  9:02 ` [Qemu-devel] [PULL 35/38] Revert "block: Fix unaligned zero write" Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 36/38] block: Fix NULL deference for unaligned write if qiov is NULL Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 37/38] qemu-iotests: Test unaligned sub-block zero write Stefan Hajnoczi
2015-05-22  9:02 ` [Qemu-devel] [PULL 38/38] block: get_block_status: use "else" when testing the opposite condition Stefan Hajnoczi
2015-05-22 14:48 ` [Qemu-devel] [PULL 00/38] Block patches Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1432285330-13994-35-git-send-email-stefanha@redhat.com \
    --to=stefanha@redhat.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).