From: Francesco Romani <fromani@redhat.com>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Francesco Romani <fromani@redhat.com>,
mdroth@linux.vnet.ibm.com, stefanha@redhat.com,
lcapitulino@redhat.com
Subject: [Qemu-devel] [PATCH] block: add watermark event
Date: Tue, 8 Jul 2014 16:49:24 +0200 [thread overview]
Message-ID: <1404830964-10733-2-git-send-email-fromani@redhat.com> (raw)
In-Reply-To: <1404830964-10733-1-git-send-email-fromani@redhat.com>
Managing applications, like oVirt (http://www.ovirt.org), make extensive
use of thin-provisioned disk images.
In order to let the guest run flawlessly and be not unnecessarily
paused, oVirt sets a watermark based on the percentage occupation of the
device against the advertised size, and automatically resizes the image
once the watermark is reached or exceeded.
In order to detect the mark crossing, the managing application has no
choice than aggressively polling the QEMU monitor using the
query-blockstats command. This lead to unnecessary system
load, and is made even worse under scale: scenarios
with hundreds of VM are becoming not unusual.
To fix this, this patch adds:
* A new monitor command to set a mark for a given block device.
* A new event to report if a block device usage exceeds the threshold.
This will allow the managing application to drop the polling
alltogether and just wait for a watermark crossing event.
Signed-off-by: Francesco Romani <fromani@redhat.com>
---
block.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++
blockdev.c | 21 ++++++++++++++++++
include/block/block.h | 2 ++
include/block/block_int.h | 3 +++
qapi/block-core.json | 33 ++++++++++++++++++++++++++++
qmp-commands.hx | 24 ++++++++++++++++++++
6 files changed, 139 insertions(+)
diff --git a/block.c b/block.c
index 8800a6b..cf34b7f 100644
--- a/block.c
+++ b/block.c
@@ -1983,6 +1983,8 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
bs_dest->device_list = bs_src->device_list;
memcpy(bs_dest->op_blockers, bs_src->op_blockers,
sizeof(bs_dest->op_blockers));
+
+ bs_dest->wr_watermark_perc = bs_src->wr_watermark_perc;
}
/*
@@ -5813,3 +5815,57 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
bdrv_flush_io_queue(bs->file);
}
}
+
+static bool watermark_exceeded(BlockDriverState *bs,
+ int64_t sector_num,
+ int nb_sectors)
+{
+
+ if (bs->wr_watermark_perc > 0) {
+ int64_t watermark = (bs->total_sectors) / 100 * bs->wr_watermark_perc;
+ if (sector_num >= watermark) {
+ return true;
+ }
+ }
+ return false;
+}
+
+static int coroutine_fn watermark_before_write_notify(NotifierWithReturn *notifier,
+ void *opaque)
+{
+ BdrvTrackedRequest *req = opaque;
+ int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
+ int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
+
+/* FIXME: needed? */
+ assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+ assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+
+ if (watermark_exceeded(req->bs, sector_num, nb_sectors)) {
+ BlockDriverState *bs = req->bs;
+ qapi_event_send_block_watermark(
+ bdrv_get_device_name(bs),
+ sector_num,
+ bs->wr_highest_sector,
+ &error_abort);
+ }
+
+ return 0; /* should always let other notifiers run */
+}
+
+void bdrv_set_watermark_perc(BlockDriverState *bs,
+ int watermark_perc)
+{
+ NotifierWithReturn before_write = {
+ .notify = watermark_before_write_notify,
+ };
+
+ if (watermark_perc <= 0) {
+ return;
+ }
+
+ if (bs->wr_watermark_perc == 0) {
+ bdrv_add_before_write_notifier(bs, &before_write);
+ }
+ bs->wr_watermark_perc = watermark_perc;
+}
diff --git a/blockdev.c b/blockdev.c
index 48bd9a3..ede21d9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2546,6 +2546,27 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
return dummy.next;
}
+void qmp_block_set_watermark(const char *device, int64_t watermark,
+ Error **errp)
+{
+ BlockDriverState *bs;
+ AioContext *aio_context;
+
+ bs = bdrv_find(device);
+ if (!bs) {
+ error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+ return;
+ }
+
+ aio_context = bdrv_get_aio_context(bs);
+ aio_context_acquire(aio_context);
+
+ bdrv_set_watermark_perc(bs, watermark);
+
+ aio_context_release(aio_context);
+}
+
+
QemuOptsList qemu_common_drive_opts = {
.name = "drive",
.head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
diff --git a/include/block/block.h b/include/block/block.h
index 32d3676..ff92ef9 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -588,4 +588,6 @@ void bdrv_io_plug(BlockDriverState *bs);
void bdrv_io_unplug(BlockDriverState *bs);
void bdrv_flush_io_queue(BlockDriverState *bs);
+void bdrv_set_watermark_perc(BlockDriverState *bs, int watermark_perc);
+
#endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f6c3bef..666ea1d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -393,6 +393,9 @@ struct BlockDriverState {
/* The error object in use for blocking operations on backing_hd */
Error *backing_blocker;
+
+ /* watermark limit for writes, percentage of sectors */
+ int wr_watermark_perc;
};
int get_tmp_filename(char *filename, int size);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index e378653..58e3b05 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1643,3 +1643,36 @@
'len' : 'int',
'offset': 'int',
'speed' : 'int' } }
+
+##
+# @BLOCK_WATERMARK
+#
+# Emitted when a block device reaches or exceeds the watermark limit.
+#
+# @device: device name
+#
+# @sector-num: number of the sector exceeding the threshold
+#
+# @sector-highest: number of the last highest written sector
+#
+# Since: 2.1
+##
+{ 'event': 'BLOCK_WATERMARK',
+ 'data': { 'device': 'str', 'sector-num': 'int', 'sector-highest': 'int' } }
+
+##
+# @block_set_watermark
+#
+# Change watermark percentage for a block drive.
+#
+# @device: The name of the device
+#
+# @watermark: high water mark, percentage
+#
+# Returns: Nothing on success
+# If @device is not a valid block device, DeviceNotFound
+#
+# Since: 2.1
+##
+{ 'command': 'block_set_watermark',
+ 'data': { 'device': 'str', 'watermark': 'int' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4be4765..89fee40 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3755,3 +3755,27 @@ Example:
<- { "return": {} }
EQMP
+
+ {
+ .name = "block_set_watermark",
+ .args_type = "device:B,watermark:l",
+ .mhandler.cmd_new = qmp_marshal_input_block_set_watermark,
+ },
+
+SQMP
+block_set_watermark
+------------
+
+Change the high water mark for a block drive.
+
+Arguments:
+
+- "device": device name (json-string)
+- "watermark": the high water mark in percentage (json-int)
+
+Example:
+
+-> { "execute": "block_set_watermark", "arguments": { "device": "virtio0", "watermark": 75 } }
+<- { "return": {} }
+
+EQMP
--
1.9.3
next prev parent reply other threads:[~2014-07-08 14:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-08 14:49 [Qemu-devel] [PATCH] add watermark reporting for block devices Francesco Romani
2014-07-08 14:49 ` Francesco Romani [this message]
2014-07-08 15:10 ` [Qemu-devel] [PATCH] block: add watermark event Eric Blake
2014-08-01 11:39 ` Stefan Hajnoczi
2014-08-05 8:47 ` Kevin Wolf
2014-08-05 13:08 ` Stefan Hajnoczi
2014-08-08 8:01 ` Francesco Romani
2014-08-08 12:51 ` Eric Blake
2014-07-08 14:51 ` [Qemu-devel] [RFC] add watermark reporting for block devices Francesco Romani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1404830964-10733-2-git-send-email-fromani@redhat.com \
--to=fromani@redhat.com \
--cc=kwolf@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).