On 07/08/2014 08:49 AM, Francesco Romani wrote: > Managing applications, like oVirt (http://www.ovirt.org), make extensive > use of thin-provisioned disk images. > In order to let the guest run flawlessly and be not unnecessarily > paused, oVirt sets a watermark based on the percentage occupation of the > device against the advertised size, and automatically resizes the image > once the watermark is reached or exceeded. > > In order to detect the mark crossing, the managing application has no > choice than aggressively polling the QEMU monitor using the > query-blockstats command. This lead to unnecessary system > load, and is made even worse under scale: scenarios > with hundreds of VM are becoming not unusual. > > To fix this, this patch adds: > * A new monitor command to set a mark for a given block device. > * A new event to report if a block device usage exceeds the threshold. > > This will allow the managing application to drop the polling > alltogether and just wait for a watermark crossing event. s/alltogether/altogether/ > > Signed-off-by: Francesco Romani > --- > block.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++ > blockdev.c | 21 ++++++++++++++++++ > include/block/block.h | 2 ++ > include/block/block_int.h | 3 +++ > qapi/block-core.json | 33 ++++++++++++++++++++++++++++ > qmp-commands.hx | 24 ++++++++++++++++++++ > 6 files changed, 139 insertions(+) > > +void bdrv_set_watermark_perc(BlockDriverState *bs, > + int watermark_perc) > +{ > + NotifierWithReturn before_write = { > + .notify = watermark_before_write_notify, > + }; > + > + if (watermark_perc <= 0) { > + return; > + } Shouldn't you uninstall the write_notifier if someone is changing the watermark_perc from non-zero back to zero? > +++ b/include/block/block_int.h > @@ -393,6 +393,9 @@ struct BlockDriverState { > > /* The error object in use for blocking operations on backing_hd */ > Error *backing_blocker; > + > + /* watermark limit for writes, percentage of sectors */ > + int wr_watermark_perc; I didn't see any code that ensures that this limit is between 0 and 100; since it is under user control, your code probably misbehaves for values such as 101. > + > +## > +# @BLOCK_WATERMARK > +# > +# Emitted when a block device reaches or exceeds the watermark limit. > +# > +# @device: device name > +# > +# @sector-num: number of the sector exceeding the threshold > +# > +# @sector-highest: number of the last highest written sector > +# > +# Since: 2.1 2.2. You've missed hard freeze for 2.1. > +## > +{ 'event': 'BLOCK_WATERMARK', > + 'data': { 'device': 'str', 'sector-num': 'int', 'sector-highest': 'int' } } > + > +## > +# @block_set_watermark s/_/-/2 - new commands should use - in their name. > +# > +# Change watermark percentage for a block drive. > +# > +# @device: The name of the device > +# > +# @watermark: high water mark, percentage Is percentage the right unit? On a 10G disk, that means you only have a granularity down to 100M. Should the watermark limit instead be expressed as a byte value instead of a percentage? > +# > +# Returns: Nothing on success > +# If @device is not a valid block device, DeviceNotFound > +# > +# Since: 2.1 2.2. > +## > +{ 'command': 'block_set_watermark', > + 'data': { 'device': 'str', 'watermark': 'int' } } I hate write-only commands. Which query-* command are you modifying to output the current watermark level? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org