From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57787) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xg9Fi-0007pE-20 for qemu-devel@nongnu.org; Mon, 20 Oct 2014 05:27:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xg9Fc-00056j-Qh for qemu-devel@nongnu.org; Mon, 20 Oct 2014 05:27:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38596) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xg9Fc-00056U-Ij for qemu-devel@nongnu.org; Mon, 20 Oct 2014 05:27:40 -0400 Message-ID: <5444D57C.6080409@redhat.com> Date: Mon, 20 Oct 2014 11:27:24 +0200 From: Max Reitz MIME-Version: 1.0 References: <1413785643-20336-1-git-send-email-pl@kamp.de> <5444CED9.8070402@redhat.com> <5444D28F.4010005@kamp.de> In-Reply-To: <5444D28F.4010005@kamp.de> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] block: add a knob to disable multiwrite_merge List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven , qemu-devel@nongnu.org Cc: kwolf@redhat.com, famz@redhat.com, armbru@redhat.com, stefanha@redhat.com On 2014-10-20 at 11:14, Peter Lieven wrote: > On 20.10.2014 10:59, Max Reitz wrote: >> On 2014-10-20 at 08:14, Peter Lieven wrote: >>> the block layer silently merges write requests since >> >> s/^t/T/ >> >>> commit 40b4f539. This patch adds a knob to disable >>> this feature as there has been some discussion lately >>> if multiwrite is a good idea at all and as it falsifies >>> benchmarks. >>> >>> Signed-off-by: Peter Lieven >>> --- >>> block.c | 4 ++++ >>> block/qapi.c | 1 + >>> blockdev.c | 7 +++++++ >>> hmp.c | 4 ++++ >>> include/block/block_int.h | 1 + >>> qapi/block-core.json | 10 +++++++++- >>> qemu-options.hx | 1 + >>> qmp-commands.hx | 2 ++ >>> 8 files changed, 29 insertions(+), 1 deletion(-) >>> >>> diff --git a/block.c b/block.c >>> index 27533f3..1658a72 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -4531,6 +4531,10 @@ static int multiwrite_merge(BlockDriverState >>> *bs, BlockRequest *reqs, >>> { >>> int i, outidx; >>> + if (!bs->write_merging) { >>> + return num_reqs; >>> + } >>> + >>> // Sort requests by start sector >>> qsort(reqs, num_reqs, sizeof(*reqs), &multiwrite_req_compare); >>> diff --git a/block/qapi.c b/block/qapi.c >>> index 9733ebd..02251dd 100644 >>> --- a/block/qapi.c >>> +++ b/block/qapi.c >>> @@ -58,6 +58,7 @@ BlockDeviceInfo >>> *bdrv_block_device_info(BlockDriverState *bs) >>> info->backing_file_depth = bdrv_get_backing_file_depth(bs); >>> info->detect_zeroes = bs->detect_zeroes; >>> + info->write_merging = bs->write_merging; >>> if (bs->io_limits_enabled) { >>> ThrottleConfig cfg; >>> diff --git a/blockdev.c b/blockdev.c >>> index e595910..13e47b8 100644 >>> --- a/blockdev.c >>> +++ b/blockdev.c >>> @@ -378,6 +378,7 @@ static DriveInfo *blockdev_init(const char >>> *file, QDict *bs_opts, >>> const char *id; >>> bool has_driver_specific_opts; >>> BlockdevDetectZeroesOptions detect_zeroes; >>> + bool write_merging; >>> BlockDriver *drv = NULL; >>> /* Check common options by copying from bs_opts to opts, all >>> other options >>> @@ -405,6 +406,7 @@ static DriveInfo *blockdev_init(const char >>> *file, QDict *bs_opts, >>> snapshot = qemu_opt_get_bool(opts, "snapshot", 0); >>> ro = qemu_opt_get_bool(opts, "read-only", 0); >>> copy_on_read = qemu_opt_get_bool(opts, "copy-on-read", false); >>> + write_merging = qemu_opt_get_bool(opts, "write-merging", true); >> >> Using this option in blockdev_init() means that you can only enable >> or disable merging for the top layer (the root BDS). Furthermore, >> since you don't set bs->write_merging in bdrv_new() (or at least >> bdrv_open()), it actually defaults to false and only for the top >> layer it defaults to true. >> >> Therefore, if after this patch a format block driver issues a >> multiwrite to its file, the write will not be merged and the user can >> do nothing about it. I don't suppose this is intentional...? > > I am not sure if a block driver actually can do this at all? The only > way to enter multiwrite is from virtio_blk_handle_request in > virtio-blk.c. Well, there's also qemu-io -c multiwrite (which only accesses the root BDS as well). But other than that, yes, you're right. So, in practice it shouldn't matter. > >> >> I propose evaluating the option in bdrv_open() and setting >> bs->write_merging there. > > I wasn't aware actually. I remember that someone asked me to implement > discard_zeroes in blockdev_init. I think it was something related to > QMP. So we still might > need to check parameters at 2 positions? It is quite confusing which > paramter has to be parsed where. As for me, I don't know why some options are parsed in blockdev_init() at all. I guess all the options currently parsed in blockdev_init() should later be moved to the BlockBackend, at least that would be the idea. In practice, we cannot do that: Things like caching will stay in the BlockDriverState. I think it's just broken. IMHO, everything related to the BB should be in blockdev_init() and everything related to the BDS should be in bdrv_open(). So the question is now whether you want write_merging to be in the BDS or in the BB. Considering BB is in Kevin's block branch as of last Friday, you might actually want to work on that branch and move the field into the BB if you decide that that's the place it should be in. Max > Peter > >> >>> if ((buf = qemu_opt_get(opts, "discard")) != NULL) { >>> if (bdrv_parse_discard_flags(buf, &bdrv_flags) != 0) { >>> @@ -530,6 +532,7 @@ static DriveInfo *blockdev_init(const char >>> *file, QDict *bs_opts, >>> bs->open_flags = snapshot ? BDRV_O_SNAPSHOT : 0; >>> bs->read_only = ro; >>> bs->detect_zeroes = detect_zeroes; >>> + bs->write_merging = write_merging; >>> bdrv_set_on_error(bs, on_read_error, on_write_error); >>> @@ -2746,6 +2749,10 @@ QemuOptsList qemu_common_drive_opts = { >>> .name = "detect-zeroes", >>> .type = QEMU_OPT_STRING, >>> .help = "try to optimize zero writes (off, on, unmap)", >>> + },{ >>> + .name = "write-merging", >>> + .type = QEMU_OPT_BOOL, >>> + .help = "enable write merging (default: true)", >>> }, >>> { /* end of list */ } >>> }, >>> diff --git a/hmp.c b/hmp.c >>> index 63d7686..8d6ad0b 100644 >>> --- a/hmp.c >>> +++ b/hmp.c >>> @@ -348,6 +348,10 @@ void hmp_info_block(Monitor *mon, const QDict >>> *qdict) >>> BlockdevDetectZeroesOptions_lookup[info->value->inserted->detect_zeroes]); >>> >>> } >>> + if (!info->value->inserted->write_merging) { >>> + monitor_printf(mon, " Write Merging: off\n"); >>> + } >>> + >>> if (info->value->inserted->bps >>> || info->value->inserted->bps_rd >>> || info->value->inserted->bps_wr >>> diff --git a/include/block/block_int.h b/include/block/block_int.h >>> index 8d86a6c..39bbde2 100644 >>> --- a/include/block/block_int.h >>> +++ b/include/block/block_int.h >>> @@ -407,6 +407,7 @@ struct BlockDriverState { >>> QDict *options; >>> BlockdevDetectZeroesOptions detect_zeroes; >>> + bool write_merging; >>> /* The error object in use for blocking operations on >>> backing_hd */ >>> Error *backing_blocker; >>> diff --git a/qapi/block-core.json b/qapi/block-core.json >>> index 8f7089e..4931bd9 100644 >>> --- a/qapi/block-core.json >>> +++ b/qapi/block-core.json >>> @@ -214,6 +214,8 @@ >>> # >>> # @detect_zeroes: detect and optimize zero writes (Since 2.1) >>> # >>> +# @write_merging: true if write merging is enabled (Since 2.2) >>> +# >>> # @bps: total throughput limit in bytes per second is specified >>> # >>> # @bps_rd: read throughput limit in bytes per second is specified >>> @@ -250,6 +252,7 @@ >>> '*backing_file': 'str', 'backing_file_depth': 'int', >>> 'encrypted': 'bool', 'encryption_key_missing': 'bool', >>> 'detect_zeroes': 'BlockdevDetectZeroesOptions', >>> + 'write_merging': 'bool', >>> 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int', >>> 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int', >>> 'image': 'ImageInfo', >>> @@ -1180,6 +1183,10 @@ >>> # (default: false) >>> # @detect-zeroes: #optional detect and optimize zero writes (Since >>> 2.1) >>> # (default: off) >>> +# @write-merging: #optional enable the merging of write requests >>> +# also known as multiwrite_merge (Since 2.2) >>> +# (default: true, but this might change in the future >>> +# depending on format/protocol/features used) >>> # >>> # Since: 1.7 >>> ## >>> @@ -1193,7 +1200,8 @@ >>> '*rerror': 'BlockdevOnError', >>> '*werror': 'BlockdevOnError', >>> '*read-only': 'bool', >>> - '*detect-zeroes': 'BlockdevDetectZeroesOptions' } } >>> + '*detect-zeroes': 'BlockdevDetectZeroesOptions', >>> + '*write-merging': 'bool' } } >>> ## >>> # @BlockdevOptionsFile >>> diff --git a/qemu-options.hx b/qemu-options.hx >>> index 22cf3b9..d2f756f 100644 >>> --- a/qemu-options.hx >>> +++ b/qemu-options.hx >>> @@ -432,6 +432,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive, >>> " >>> [,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n" >>> " [,readonly=on|off][,copy-on-read=on|off]\n" >>> " [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n" >>> + " [,write-merging=on|off]\n" >>> " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]]\n" >>> " [[,iops=i]|[[,iops_rd=r][,iops_wr=w]]]\n" >>> " [[,bps_max=bm]|[[,bps_rd_max=rm][,bps_wr_max=wm]]]\n" >>> diff --git a/qmp-commands.hx b/qmp-commands.hx >>> index 1abd619..529a563 100644 >>> --- a/qmp-commands.hx >>> +++ b/qmp-commands.hx >>> @@ -2104,6 +2104,7 @@ Each json-object contain the following: >>> - "iops_size": I/O size when limiting by iops (json-int) >>> - "detect_zeroes": detect and optimize zero writing >>> (json-string) >>> - Possible values: "off", "on", "unmap" >>> + - "write_merging": enable multiwrite_merge feature >>> (json-bool) >>> - "image": the detail of the image, it is a json-object >>> containing >>> the following: >>> - "filename": image file name (json-string) >>> @@ -2181,6 +2182,7 @@ Example: >>> "iops_wr_max": 0, >>> "iops_size": 0, >>> "detect_zeroes": "on", >>> + "write_merging": "true", >>> "image":{ >>> "filename":"disks/test.qcow2", >>> "format":"qcow2", >> >> Hm, thanks for reminding me of that file. There are some things which >> I forgot to update (ImageInfoSpecific at least)... >> >> Max > >