From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51379)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <den@odin.com>) id 1YsaD2-00018S-1C
	for qemu-devel@nongnu.org; Wed, 13 May 2015 13:12:45 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <den@odin.com>) id 1YsaD0-0003Qq-NB
	for qemu-devel@nongnu.org; Wed, 13 May 2015 13:12:39 -0400
Message-ID: <55538001.90207@odin.com>
Date: Wed, 13 May 2015 19:46:57 +0300
From: "Denis V. Lunev" <den@odin.com>
MIME-Version: 1.0
References: <1430746944-27347-1-git-send-email-den@openvz.org>
	<20150511150817.GK16270@stefanha-thinkpad.redhat.com>
	<5550D3B5.2050703@openvz.org> <5550DD2D.8000407@odin.com>
	<20150512100155.GB11497@stefanha-thinkpad.redhat.com>
	<5551D39E.1020902@odin.com> <5551DA21.7020105@redhat.com>
	<20150513154342.GB24352@stefanha-thinkpad.redhat.com>
In-Reply-To: <20150513154342.GB24352@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH v5 0/2] block: enforce minimal
 4096 alignment in qemu_blockalign
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>
Cc: Dmitry Monakhov <dmonakhov@odin.com>, qemu-block@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel@nongnu.org

On 13/05/15 18:43, Stefan Hajnoczi wrote:
> On Tue, May 12, 2015 at 12:46:57PM +0200, Paolo Bonzini wrote:
>>
>> On 12/05/2015 12:19, Denis V. Lunev wrote:
>>>
>>> hades /vol $ strace -f -e pwrite -e raw=write,pwrite  qemu-io -n -c
>>> "write -P 0x11 0 64M" ./1.img
>>> Process 19326 attached
>>> [pid 19326] pwrite(0x6, 0x7fac07fff200, 0x4000000, 0x50000) = 0x4000000
>>> <---- 1 GB Write from userspace
>> FWIW this is 64 MB (as expected).
>>
>>> wrote 67108864/67108864 bytes at offset 0
>>> 64 MiB, 1 ops; 0.2964 sec (215.863 MiB/sec and 3.3729 ops/sec)
>>> [pid 19326] +++ exited with 0 +++
>>> +++ exited with 0 +++
>>> hades /vol $
>>>    9,0    1      266    74.030359772 19326  Q  WS 473095 + 1016 [(null)]
>>>    9,0    1      267    74.030361546 19326  Q  WS 474111 + 8 [(null)]
>>>    9,0    1      268    74.030395522 19326  Q  WS 474119 + 1016 [(null)]
>>>    9,0    1      269    74.030397509 19326  Q  WS 475135 + 8 [(null)]
>>>
>>> This means, yes, kernel is INEFFECTIVE performing direct IO with
>>> not aligned address. For example, without direct IO the pattern is
>>> much better.
>> I think this means that the kernel is DMAing at most 128 pages at a
>> time.  If the buffer is misaligned, you need 129 pages and the kernel
>> then splits the request into a 128 page and a 1 page part.
>>
>> This looks like a hardware limit, and the kernel probably cannot really
>> do anything about it because we requested O_DIRECT.  So your patch makes
>> sense.
> A 64 MB buffer was given in the pwrite() call.
>
> The first and the last 128-page write requests may have partial pages,
> but why should the rest not use fully aligned 1024 sector writes?
>
> Maybe the buffer is split by the max sectors per request before the
> alignment requirements are considered.  It would be more efficient to
> first split off the unaligned parts.
>
> So I think the kernel is still doing something suboptimal here.
>
> Stefan
I agree with this. Kernel guys are aware and may be we will have
the fix after a while... I have heard (not tested) that performance
loss over multi-queue SSD is around 30%.

Den