From: "Denis V. Lunev" <den@openvz.org>
To: Peter Lieven <pl@kamp.de>
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs
Date: Fri, 26 Dec 2014 22:15:02 +0300 [thread overview]
Message-ID: <549DB3B6.1070800@openvz.org> (raw)
In-Reply-To: <549D6351.8050904@parallels.com>
On 26/12/14 16:32, Denis V. Lunev wrote:
> On 26/12/14 16:13, Peter Lieven wrote:
>> Am 26.12.2014 um 13:35 schrieb Denis V. Lunev:
>>> The check for maximum length was added by
>>> commit c31cb70728d2c0c8900b35a66784baa446fd5147
>>> Author: Peter Lieven <pl@kamp.de>
>>> Date: Thu Oct 24 12:06:58 2013 +0200
>>> block: honour BlockLimits in bdrv_co_do_write_zeroes
>>>
>>> but actually if driver provides .bdrv_co_write_zeroes callback,
>>> there is
>>> no need to limit the size to 32 MB. Callback should provide effective
>>> implementation which normally should not write any zeroes in comparable
>>> amount.
>>
>> NACK.
>>
>> First there is no guarantee that bdrv_co_do_write_zeroes is a fast
>> operation.
>> This heaviliy depends on several circumstances that the block layer
>> is not aware of.
>> If a specific protocol knows it is very fast in writing zeroes under
>> any circumstance
>> it should provide INT_MAX in bs->bl.max_write_zeroes. It is then
>> still allowed to
>> return -ENOTSUP if the request size or alignment doesn't fit.
>
> the idea is that (from my point of view) if .bdrv_co_do_write_zeroes is
> specified, the cost is almost the same for any amount of zeroes
> written. This is true for fallocate from my point of view. The amount
> of actually written data will be in several orders less than specified
> except slow path, which honors 32 MB limit.
>
> If the operation is complex in realization, then it will be rate-limited
> below, in actual implementation.
>
>> There are known backends e.g. anything that deals with SCSI that have
>> a known
>> limitation of the maximum number of zeroes they can write fast in a
>> single request.
>> This number MUST NOT be exceeded. The below patch would break all
>> those backends.
>
> could you pls point me this backends. Actually, from my point of
> view, they should properly setup max_write_zeroes for themselves.
> This is done at least in block/iscsi.c and it would be consistent
> way of doing so.
>
>>
>> What issue are you trying to fix with this patch? Maybe there is a
>> better way to fix
>> it at another point in the code.
>>
>
> I am trying to minimize amount of metadata updates for a file.
> This provides some speedup even on ext4 and this will provide
> even more speedup with a distributed filesystem like CEPH
> where size updates of the files and block allocation are
> costly.
>
> Regards,
> Den
First of all, the patch is really wrong :) It was written using
wrong assumptions.
OK. I have spent some time reading your original patchset and
and did not found any useful justification for default limit
for both discard and write zero.
Yes, there are drivers which requires block level to call
.bdrv_co_do_write_zeroes with alignment and with upper limit.
But in this case driver setups max_write_zeroes. All buggy
drivers should do that not to affect not buggy ones from
my opinion.
This is the only purpose of the original patches for limits.
I have wrongly interpret BlockLimits as something connected
with time of the operation. Sorry for that.
Therefore there is no good reason for limiting the amount of
data sent to drv->bdrv_co_writev with any data size. The only
thing is that it would be good not to allocate too many memory
at once. We could do something like
base = qemu_try_blockalign(bs, MIN(2048, num) * BDRV_SECTOR_SIZE);
added = 0;
for (added = 0; added < num; add += MIN(2048, num)) {
qemu_iovec_add(qiov, base, MIN(2048, num));
}
to avoid really big allocations here even if .max_write_zeroes is
very high. Do you think that this might be useful?
As for .bdrv_co_do_write_zeroes itself, can we still drop
default 32 Mb limit? If there are some buggy drivers, they
should have .max_write_zeroes specified.
The same applies to .max_discard
Regards,
Den
next prev parent reply other threads:[~2014-12-26 19:15 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-26 12:35 [Qemu-devel] [PATCH v2 0/7] eliminate data write in bdrv_write_zeroes on Linux Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs Denis V. Lunev
2014-12-26 13:13 ` Peter Lieven
2014-12-26 13:32 ` Denis V. Lunev
2014-12-26 19:15 ` Denis V. Lunev [this message]
2014-12-27 14:52 ` Peter Lieven
2014-12-27 17:42 ` Denis V. Lunev
2014-12-27 20:01 ` Peter Lieven
2014-12-27 20:14 ` Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 2/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 3/7] block/raw-posix: create do_fallocate helper Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 4/7] block/raw-posix: create translate_err helper to merge errno values Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 5/7] block/raw-posix: refactor handle_aiocb_write_zeroes a bit Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 6/7] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes Denis V. Lunev
2014-12-26 12:35 ` [Qemu-devel] [PATCH 7/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=549DB3B6.1070800@openvz.org \
--to=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).