From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49536) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YGpTt-0001wz-E6 for qemu-devel@nongnu.org; Thu, 29 Jan 2015 08:50:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YGpTo-0006jq-EE for qemu-devel@nongnu.org; Thu, 29 Jan 2015 08:50:01 -0500 Received: from mx2.parallels.com ([199.115.105.18]:40753) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YGpTo-0006jJ-8f for qemu-devel@nongnu.org; Thu, 29 Jan 2015 08:49:56 -0500 Message-ID: <54CA3A7A.8090208@openvz.org> Date: Thu, 29 Jan 2015 16:49:46 +0300 From: "Denis V. Lunev" MIME-Version: 1.0 References: <1422528659-3121-1-git-send-email-den@openvz.org> <1422528659-3121-2-git-send-email-den@openvz.org> <20150129131848.GA3950@noname.redhat.com> In-Reply-To: <20150129131848.GA3950@noname.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/1] block: enforce minimal 4096 alignment in qemu_blockalign List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Paolo Bonzini , qemu-devel@nongnu.org, Stefan Hajnoczi On 29/01/15 16:18, Kevin Wolf wrote: > Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben: >> The following sequence >> int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); >> for (i = 0; i < 100000; i++) >> write(fd, buf, 4096); >> performs 5% better if buf is aligned to 4096 bytes rather then to >> 512 bytes on HDD with 512/4096 logical/physical sector size. >> >> The difference is quite reliable. >> >> On the other hand we do not want at the moment to enforce bounce >> buffering if guest request is aligned to 512 bytes. This patch >> forces page alignment when we really forced to perform memory >> allocation. >> >> Signed-off-by: Denis V. Lunev >> CC: Paolo Bonzini >> CC: Kevin Wolf >> CC: Stefan Hajnoczi >> --- >> block.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/block.c b/block.c >> index d45e4dd..38cf73f 100644 >> --- a/block.c >> +++ b/block.c >> @@ -5293,7 +5293,11 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int align) >> >> void *qemu_blockalign(BlockDriverState *bs, size_t size) >> { >> - return qemu_memalign(bdrv_opt_mem_align(bs), size); >> + size_t align = bdrv_opt_mem_align(bs); >> + if (align < 4096) { >> + align = 4096; >> + } >> + return qemu_memalign(align, size); >> } >> >> void *qemu_blockalign0(BlockDriverState *bs, size_t size) >> @@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState *bs, size_t size) >> >> /* Ensure that NULL is never returned on success */ >> assert(align > 0); >> + if (align < 4096) { >> + align = 4096; >> + } >> if (size == 0) { >> size = align; >> } > This is the wrong place to make this change. First you're duplicating > logic in the callers of bdrv_opt_mem_align() instead of making it return > the right thing in the first place. This has been actually done in the first iteration. bdrv_opt_mem_align is called actually three times in: qemu_blockalign qemu_try_blockalign bdrv_qiov_is_aligned Paolo says that he does not want to have bdrv_qiov_is_aligned affected to avoid extra bounce buffering. From my point of view this extra bounce buffering is better than unaligned pointer during write to the disk as 512/4096 logical/physical sectors size disks are mainstream now. Though I don't want to specially argue here. Normal guest operations results in page aligned requests and this is not a problem at all. The amount of 512 aligned requests from guest side is quite negligible. > Second, you're arguing with numbers > from a simple test case for O_DIRECT on Linux, but you're changing the > alignment for everyone instead of just the raw-posix driver which is > responsible for accessing Linux files. This should not be a real problem. We are allocation memory for the buffer. A little bit stricter alignment is not a big overhead for any libc implementation thus this kludge will not produce any significant overhead. > Also, what's the real reason for the performance improvement? Having > page alignment? If so, actually querying the page size instead of > assuming 4k might be worth a thought. > > Kevin Most likely the problem comes from the read-modify-write pattern either in kernel or in disk. Actually my experience says that it is a bad idea to supply 512 byte aligned buffer for O_DIRECT IO. ABI technically allows this but in general it is much less tested. Yes, this synthetic test shows some difference here. In terms of qemu-io the result is also visible, but less qemu-img create -f qcow2 ./1.img 64G qemu-io -n -c 'write -P 0xaa 0 1G' 1.img performs 1% better. There is also similar kludge here size_t bdrv_opt_mem_align(BlockDriverState *bs) { if (!bs || !bs->drv) { /* 4k should be on the safe side */ return 4096; } return bs->bl.opt_mem_alignment; } which just uses 4096 constant. Yes, I could agree that queering page size could be a good idea, but I do not know at the moment how to do that. Can you pls share your opinion if you have any. Regards, Den