[Qemu-devel] [PATCH v7 0/2] block: enforce minimal 4096 alignment in qemu_blockalign

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Denis V. Lunev" <den@openvz.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Denis V. Lunev" <den@openvz.org>
Subject: [Qemu-devel] [PATCH v7 0/2] block: enforce minimal 4096 alignment in qemu_blockalign
Date: Tue, 12 May 2015 16:40:58 +0300	[thread overview]
Message-ID: <1431438060-23324-1-git-send-email-den@openvz.org> (raw)

I have used the following program to test
#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <malloc.h>
#include <string.h>

int main(int argc, char *argv[])
{
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    void *buf;
    int i = 0, align = atoi(argv[2]);

    do {
        buf = memalign(align, 4096);
        if (align >= 4096)
            break;
        if ((unsigned long)buf & 4095)
            break;
        i++;
    } while (1);
    printf("%d %p\n", i, buf);

    memset(buf, 0x11, 4096);

    for (i = 0; i < 100000; i++) {
        lseek(fd, SEEK_CUR, 4096);
        write(fd, buf, 4096);
    }

    close(fd);
    return 0;
}
for in in `seq 1 30` ; do a.out aa ; done

The file was placed into 8 GB partition on HDD below to avoid speed
change due to different offset on disk. Results are reliable:
- 189 vs 180 seconds on Linux 3.16

The following setups have been tested:
1) ext4 with block size equals to 1024 over 512/512 physical/logical
   sector size SSD disk
2) ext4 with block size equals to 4096 over 512/512 physical/logical
   sector size SSD disk
3) ext4 with block size equals to 4096 over 512/4096 physical/logical
   sector size rotational disk (WDC WD20EZRX)
4) xfs with block size equals to 4096 over 512/512 physical/logical
   sector size SSD disk

The difference is quite reliable and the same 5%.
  qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
for image in qcow2 format is 1% faster.

qemu-img is also affected. The difference in between
  qemu-img create -f qcow2 1.img 64G
  qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
  time for i in `seq 1 30` ; do qemu-img convert 1.img -t none -O raw 2.img ; rm -rf 2.img ; done
is around 126 vs 119 seconds.

The justification of the performance improve is quite interesting.
>From the kernel point of view each request to the disk was split
by two. This could be seen by blktrace like this:
  9,0   11  1     0.000000000 11151  Q  WS 312737792 + 1023 [qemu-img]
  9,0   11  2     0.000007938 11151  Q  WS 312738815 + 8 [qemu-img]
  9,0   11  3     0.000030735 11151  Q  WS 312738823 + 1016 [qemu-img]
  9,0   11  4     0.000032482 11151  Q  WS 312739839 + 8 [qemu-img]
  9,0   11  5     0.000041379 11151  Q  WS 312739847 + 1016 [qemu-img]
  9,0   11  6     0.000042818 11151  Q  WS 312740863 + 8 [qemu-img]
  9,0   11  7     0.000051236 11151  Q  WS 312740871 + 1017 [qemu-img]
  9,0    5  1     0.169071519 11151  Q  WS 312741888 + 1023 [qemu-img]
After the patch the pattern becomes normal:
  9,0    6  1     0.000000000 12422  Q  WS 314834944 + 1024 [qemu-img]
  9,0    6  2     0.000038527 12422  Q  WS 314835968 + 1024 [qemu-img]
  9,0    6  3     0.000072849 12422  Q  WS 314836992 + 1024 [qemu-img]
  9,0    6  4     0.000106276 12422  Q  WS 314838016 + 1024 [qemu-img]
and the amount of requests sent to disk (could be calculated counting
number of lines in the output of blktrace) is reduced about 2 times.

Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
does his job well and real requests comes properly aligned (to page).

Changes from v6:
- explicitely assign opt_mem_alignemnt in raw-posix.c with
  MAX(s->buf_align, getpagesize()) (Kevin)

Changes from v5:
- found justification from kernel point of view
- fixed checkpatch warnings in the patch 2

Changes from v4:
- patches reordered
- dropped conversion from 512 to BDRV_SECTOR_SIZE
- getpagesize() is replaced with MAX(4096, getpagesize()) as suggested by
  Kevin

Changes from v3:
- portable way to calculate system page size used
- 512/4096 values are replaced with proper macros/values

Changes from v2:
- opt_mem_alignment is split to opt_mem_alignment for bounce buffering
  and min_mem_alignment to check buffers coming from guest.

Changes from v1:
- enforces 4096 alignment in qemu_(try_)blockalign, avoid touching of
  bdrv_qiov_is_aligned path not to enforce additional bounce buffering
  as suggested by Paolo
- reduces 10% to 5% in patch description to better fit 180 vs 189
  difference

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>

next             reply	other threads:[~2015-05-12 13:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-12 13:40 Denis V. Lunev [this message]
2015-05-12 13:40 ` [Qemu-devel] [PATCH 1/2] block: minimal bounce buffer alignment Denis V. Lunev
2015-05-12 13:41 ` [Qemu-devel] [PATCH 2/2] block: align bounce buffers to page Denis V. Lunev
2015-05-12 14:08   ` Kevin Wolf
2015-05-12 14:20     ` Denis V. Lunev
2015-05-12 14:26       ` Kevin Wolf
2015-05-12 14:26         ` Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1431438060-23324-1-git-send-email-den@openvz.org \
    --to=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).