From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55312) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvYCF-00060w-Dd for qemu-devel@nongnu.org; Mon, 01 Dec 2014 16:07:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XvYC7-0001YM-De for qemu-devel@nongnu.org; Mon, 01 Dec 2014 16:07:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56567) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvYC7-0001XB-7C for qemu-devel@nongnu.org; Mon, 01 Dec 2014 16:07:43 -0500 Message-ID: <547CD89A.2090903@redhat.com> Date: Mon, 01 Dec 2014 14:07:38 -0700 From: Eric Blake MIME-Version: 1.0 References: <1417177867-27918-1-git-send-email-fromani@redhat.com> <1417177867-27918-2-git-send-email-fromani@redhat.com> In-Reply-To: <1417177867-27918-2-git-send-email-fromani@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="095GuJai2LpIWED1dx2sSHGAVj97m12li" Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Francesco Romani , qemu-devel@nongnu.org Cc: kwolf@redhat.com, mdroth@linux.vnet.ibm.com, stefanha@redhat.com, lcapitulino@redhat.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --095GuJai2LpIWED1dx2sSHGAVj97m12li Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/28/2014 05:31 AM, Francesco Romani wrote: > Managing applications, like oVirt (http://www.ovirt.org), make extensiv= e > use of thin-provisioned disk images. > To let the guest run smoothly and be not unnecessarily paused, oVirt se= ts > a disk usage threshold (so called 'high water mark') based on the occup= ation > of the device, and automatically extends the image once the threshold > is reached or exceeded. >=20 > In order to detect the crossing of the threshold, oVirt has no choice b= ut > aggressively polling the QEMU monitor using the query-blockstats comman= d. > This lead to unnecessary system load, and is made even worse under scal= e: > deployments with hundreds of VMs are no longer rare. >=20 > To fix this, this patch adds: > * A new monitor command to set a mark for a given block device. > * A new event to report if a block device usage exceeds the threshold. >=20 > This will allow the managing application to drop the polling > altogether and just wait for a watermark crossing event. I like the idea! Question - what happens if management misses the event (for example, if libvirtd is restarted)? Does the existing 'query-blockstats' and/or 'query-named-block-nodes' still work to query the current threshold and whether it has been exceeded, as a poll-once command executed when reconnecting to the monitor? >=20 > Signed-off-by: Francesco Romani > --- No need for a 0/1 cover letter on a 1-patch series; you have the option of just putting the side-band information here and sending it as a single mail. But the cover letter approach doesn't hurt either, and I can see how it can be easier for some workflows to always send a cover letter than to special-case a 1-patch series. > +static int coroutine_fn before_write_notify(NotifierWithReturn *notifi= er, > + void *opaque) > +{ > + BdrvTrackedRequest *req =3D opaque; > + BlockDriverState *bs =3D req->bs; > + uint64_t amount =3D 0; > + > + amount =3D bdrv_usage_threshold_exceeded(bs, req); > + if (amount > 0) { > + qapi_event_send_block_usage_threshold( > + bs->node_name, > + amount, > + bs->write_threshold_offset, > + &error_abort); > + > + /* autodisable to avoid to flood the monitor */ s/to flood/flooding/ > +/* > + * bdrv_usage_threshold_is_set > + * > + * Tell if an usage threshold is set for a given BDS. s/an usage/a usage/ (in English, the difference between "a" and "an" is whether the leading sound of the next word is pronounced or not; in this case, "usage" is pronounced with a hard "yoo-sage". It may help to remember "an umbrella for a unicorn") > +++ b/qapi/block-core.json > @@ -239,6 +239,9 @@ > # > # @iops_size: #optional an I/O size in bytes (Since 1.7) > # > +# @write-threshold: configured write threshold for the device. > +# 0 if disabled. (Since 2.3) > +# > # Since: 0.14.0 > # > ## > @@ -253,7 +256,7 @@ > '*bps_max': 'int', '*bps_rd_max': 'int', > '*bps_wr_max': 'int', '*iops_max': 'int', > '*iops_rd_max': 'int', '*iops_wr_max': 'int', > - '*iops_size': 'int' } } > + '*iops_size': 'int', 'write-threshold': 'uint64' } } In QMP specs, 'uint64' and 'int' are practically synonyms. I can live with either spelling, although 'int' is more common. Bikeshed on naming: Although we prefer '-' over '_' in new interfaces, we also favor consistency, and BlockDeviceInfo is one of those dinosaur commands that uses _ everywhere until your addition. So naming this field 'write_threshold' would be more consistent. > +## > +# @BLOCK_USAGE_THRESHOLD > +# > +# Emitted when writes on block device reaches or exceeds the > +# configured threshold. For thin-provisioned devices, this > +# means the device should be extended to avoid pausing for > +# disk exaustion. s/exaustion/exhaustion/ > +# > +# @node-name: graph node name on which the threshold was exceeded. > +# > +# @amount-exceeded: amount of data which exceeded the threshold, in by= tes. > +# > +# @offset-threshold: last configured threshold, in bytes. > +# Might want to mention that this event is one-shot; after it triggers, a user must re-register a threshold to get the event again. > +# Since: 2.3 > +## > +{ 'event': 'BLOCK_USAGE_THRESHOLD', > + 'data': { 'node-name': 'str', > + 'amount-exceeded': 'uint64', TAB damage. Please use spaces. ./scripts/checkpatch.pl will catch some offenders (although I didn't test if it will catch this one). However, here you are correct in using '-' for naming :) > + 'threshold': 'uint64' } } > + > +## > +# @block-set-threshold > +# > +# Change usage threshold for a block drive. An event will be delivered= > +# if a write to this block drive crosses the configured threshold. > +# This is useful to transparently resize thin-provisioned drives witho= ut > +# the guest OS noticing. > +# > +# @node-name: graph node name on which the threshold must be set. > +# > +# @write-threshold: configured threshold for the block device, bytes. > +# Use 0 to disable the threshold. > +# > +# Returns: Nothing on success > +# If @node name is not found on the block device graph, > +# DeviceNotFound > +# > +# Since: 2.3 > +## > +{ 'command': 'block-set-threshold', > + 'data': { 'node-name': 'str', 'threshold': 'uint64' } } Again, 'int' instead of 'uint64' is more typical, but shouldn't hurt with either spelling. > +SQMP > +block-set-threshold > +------------ > + > +Change the write threshold for a block drive. The threshold is an offs= et, > +thus must be non-negative. Default is not write threshold. s/not/no/ > +To set the threshold to zero disables it. s/To set/Setting/ Missing the change to the 'query-block' and 'query-named-block-nodes' examples to show the new always-present output field. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --095GuJai2LpIWED1dx2sSHGAVj97m12li Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUfNiaAAoJEKeha0olJ0NqC6AH/2MAcHoJ6aRvck+C99wIcTgs KK0V9TUr+h7hRa3mCp3r1qSSPENMmNGqfu0ZhoktLiqe1F+2QdDJmKhwFA3WOc1S 0Xj2CGaG+VVY/PVTg3l8dZ3DdK90Ble6kIi51z4U7mCHSE9oIEpiUuVN1QZPp73D rm3H3VqZEzaDLuxFJTEuzwcs2/kRDwkhe0flOv93RYa2Cpbw80oXaWi+Kamo6OS7 8ptCXzPFGv6MAUsK2ok9T2LTLH+RabR0k1edxbKNmqSetfg28jHN9qVeiRMEokgE T5hRRdTj8A2fAdja0GThoGq7dt/zWOD/ZEueEfi8X1VKAxZ5O51Mqf9Wla+hKE8= =aVUu -----END PGP SIGNATURE----- --095GuJai2LpIWED1dx2sSHGAVj97m12li--