From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51379) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YsaD2-00018S-1C for qemu-devel@nongnu.org; Wed, 13 May 2015 13:12:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YsaD0-0003Qq-NB for qemu-devel@nongnu.org; Wed, 13 May 2015 13:12:39 -0400 Message-ID: <55538001.90207@odin.com> Date: Wed, 13 May 2015 19:46:57 +0300 From: "Denis V. Lunev" MIME-Version: 1.0 References: <1430746944-27347-1-git-send-email-den@openvz.org> <20150511150817.GK16270@stefanha-thinkpad.redhat.com> <5550D3B5.2050703@openvz.org> <5550DD2D.8000407@odin.com> <20150512100155.GB11497@stefanha-thinkpad.redhat.com> <5551D39E.1020902@odin.com> <5551DA21.7020105@redhat.com> <20150513154342.GB24352@stefanha-thinkpad.redhat.com> In-Reply-To: <20150513154342.GB24352@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-block] [PATCH v5 0/2] block: enforce minimal 4096 alignment in qemu_blockalign List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , Paolo Bonzini Cc: Dmitry Monakhov , qemu-block@nongnu.org, Stefan Hajnoczi , qemu-devel@nongnu.org On 13/05/15 18:43, Stefan Hajnoczi wrote: > On Tue, May 12, 2015 at 12:46:57PM +0200, Paolo Bonzini wrote: >> >> On 12/05/2015 12:19, Denis V. Lunev wrote: >>> >>> hades /vol $ strace -f -e pwrite -e raw=write,pwrite qemu-io -n -c >>> "write -P 0x11 0 64M" ./1.img >>> Process 19326 attached >>> [pid 19326] pwrite(0x6, 0x7fac07fff200, 0x4000000, 0x50000) = 0x4000000 >>> <---- 1 GB Write from userspace >> FWIW this is 64 MB (as expected). >> >>> wrote 67108864/67108864 bytes at offset 0 >>> 64 MiB, 1 ops; 0.2964 sec (215.863 MiB/sec and 3.3729 ops/sec) >>> [pid 19326] +++ exited with 0 +++ >>> +++ exited with 0 +++ >>> hades /vol $ >>> 9,0 1 266 74.030359772 19326 Q WS 473095 + 1016 [(null)] >>> 9,0 1 267 74.030361546 19326 Q WS 474111 + 8 [(null)] >>> 9,0 1 268 74.030395522 19326 Q WS 474119 + 1016 [(null)] >>> 9,0 1 269 74.030397509 19326 Q WS 475135 + 8 [(null)] >>> >>> This means, yes, kernel is INEFFECTIVE performing direct IO with >>> not aligned address. For example, without direct IO the pattern is >>> much better. >> I think this means that the kernel is DMAing at most 128 pages at a >> time. If the buffer is misaligned, you need 129 pages and the kernel >> then splits the request into a 128 page and a 1 page part. >> >> This looks like a hardware limit, and the kernel probably cannot really >> do anything about it because we requested O_DIRECT. So your patch makes >> sense. > A 64 MB buffer was given in the pwrite() call. > > The first and the last 128-page write requests may have partial pages, > but why should the rest not use fully aligned 1024 sector writes? > > Maybe the buffer is split by the max sectors per request before the > alignment requirements are considered. It would be more efficient to > first split off the unaligned parts. > > So I think the kernel is still doing something suboptimal here. > > Stefan I agree with this. Kernel guys are aware and may be we will have the fix after a while... I have heard (not tested) that performance loss over multi-queue SSD is around 30%. Den