From: "Denis V. Lunev" <den@openvz.org>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] block: enforce minimal 4096 alignment in qemu_blockalign
Date: Thu, 29 Jan 2015 16:49:46 +0300 [thread overview]
Message-ID: <54CA3A7A.8090208@openvz.org> (raw)
In-Reply-To: <20150129131848.GA3950@noname.redhat.com>
On 29/01/15 16:18, Kevin Wolf wrote:
> Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben:
>> The following sequence
>> int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>> for (i = 0; i < 100000; i++)
>> write(fd, buf, 4096);
>> performs 5% better if buf is aligned to 4096 bytes rather then to
>> 512 bytes on HDD with 512/4096 logical/physical sector size.
>>
>> The difference is quite reliable.
>>
>> On the other hand we do not want at the moment to enforce bounce
>> buffering if guest request is aligned to 512 bytes. This patch
>> forces page alignment when we really forced to perform memory
>> allocation.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> block.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/block.c b/block.c
>> index d45e4dd..38cf73f 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -5293,7 +5293,11 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
>>
>> void *qemu_blockalign(BlockDriverState *bs, size_t size)
>> {
>> - return qemu_memalign(bdrv_opt_mem_align(bs), size);
>> + size_t align = bdrv_opt_mem_align(bs);
>> + if (align < 4096) {
>> + align = 4096;
>> + }
>> + return qemu_memalign(align, size);
>> }
>>
>> void *qemu_blockalign0(BlockDriverState *bs, size_t size)
>> @@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState *bs, size_t size)
>>
>> /* Ensure that NULL is never returned on success */
>> assert(align > 0);
>> + if (align < 4096) {
>> + align = 4096;
>> + }
>> if (size == 0) {
>> size = align;
>> }
> This is the wrong place to make this change. First you're duplicating
> logic in the callers of bdrv_opt_mem_align() instead of making it return
> the right thing in the first place.
This has been actually done in the first iteration. bdrv_opt_mem_align
is called actually three times in:
qemu_blockalign
qemu_try_blockalign
bdrv_qiov_is_aligned
Paolo says that he does not want to have bdrv_qiov_is_aligned affected
to avoid extra bounce buffering.
From my point of view this extra bounce buffering is better than unaligned
pointer during write to the disk as 512/4096 logical/physical sectors size
disks are mainstream now. Though I don't want to specially argue here.
Normal guest operations results in page aligned requests and this is not
a problem at all. The amount of 512 aligned requests from guest side is
quite negligible.
> Second, you're arguing with numbers
> from a simple test case for O_DIRECT on Linux, but you're changing the
> alignment for everyone instead of just the raw-posix driver which is
> responsible for accessing Linux files.
This should not be a real problem. We are allocation memory for the
buffer. A little bit stricter alignment is not a big overhead for any libc
implementation thus this kludge will not produce any significant overhead.
> Also, what's the real reason for the performance improvement? Having
> page alignment? If so, actually querying the page size instead of
> assuming 4k might be worth a thought.
>
> Kevin
Most likely the problem comes from the read-modify-write pattern
either in kernel or in disk. Actually my experience says that it is a
bad idea to supply 512 byte aligned buffer for O_DIRECT IO.
ABI technically allows this but in general it is much less tested.
Yes, this synthetic test shows some difference here. In terms of
qemu-io the result is also visible, but less
qemu-img create -f qcow2 ./1.img 64G
qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
performs 1% better.
There is also similar kludge here
size_t bdrv_opt_mem_align(BlockDriverState *bs)
{
if (!bs || !bs->drv) {
/* 4k should be on the safe side */
return 4096;
}
return bs->bl.opt_mem_alignment;
}
which just uses 4096 constant.
Yes, I could agree that queering page size could be a good idea, but
I do not know at the moment how to do that. Can you pls share your
opinion if you have any.
Regards,
Den
next prev parent reply other threads:[~2015-01-29 13:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-29 10:50 [Qemu-devel] [PATCH v2 0/1] block: enforce minimal 4096 alignment in qemu_blockalign Denis V. Lunev
2015-01-29 10:50 ` [Qemu-devel] [PATCH 1/1] " Denis V. Lunev
2015-01-29 10:58 ` Paolo Bonzini
2015-01-29 13:18 ` Kevin Wolf
2015-01-29 13:49 ` Denis V. Lunev [this message]
2015-01-30 18:39 ` Denis V. Lunev
2015-01-30 19:48 ` Kevin Wolf
2015-01-30 20:05 ` Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54CA3A7A.8090208@openvz.org \
--to=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).