Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@gmail.com>
To: Francesco Romani <fromani@redhat.com>
Cc: kwolf@redhat.com, lcapitulino@redhat.com, qemu-devel@nongnu.org,
	stefanha@redhat.com, mdroth@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices
Date: Mon, 17 Nov 2014 16:49:36 +0000	[thread overview]
Message-ID: <20141117164936.GL16192@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <1415365933-3727-2-git-send-email-fromani@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3457 bytes --]

On Fri, Nov 07, 2014 at 02:12:13PM +0100, Francesco Romani wrote:

Sorry for the long review delay.  Looks pretty good, just one real issue
to think about at the bottom.

> +static void usage_threshold_disable(BlockDriverState *bs)
> +{

It would be safest to make this idempotent:

if (!usage_threshold_is_set(bs)) {
    return;
}

That way it can be called multiple times.

> +    notifier_with_return_remove(&bs->wr_usage_threshold_notifier);
> +    bs->wr_offset_threshold = 0;
> +}
> +
> +static int usage_threshold_is_set(const BlockDriverState *bs)
> +{
> +    return !!(bs->wr_offset_threshold > 0);
> +}

Please use the bool type instead of an int return value.

> +
> +static int64_t usage_threshold_exceeded(const BlockDriverState *bs,
> +                                        const BdrvTrackedRequest *req)
> +{
> +    if (usage_threshold_is_set(bs)) {
> +        int64_t amount = req->offset + req->bytes - bs->wr_offset_threshold;
> +        if (amount > 0) {
> +            return amount;
> +        }
> +    }
> +    return 0;
> +}
> +
> +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> +                                            void *opaque)
> +{
> +    BdrvTrackedRequest *req = opaque;
> +    BlockDriverState *bs = req->bs;
> +    int64_t amount = 0;
> +
> +    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> +    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);

Does the code still make these assumptions or are the asserts left over
from previous versions of the patch?

> +
> +    amount = usage_threshold_exceeded(bs, req);
> +    if (amount > 0) {
> +        qapi_event_send_block_usage_threshold(
> +            bdrv_get_device_name(bs), /* FIXME: this does not work */
> +            amount,
> +            bs->wr_offset_threshold,
> +            &error_abort);
> +
> +        /* autodisable to avoid to flood the monitor */
> +        usage_threshold_disable(bs);
> +    }
> +
> +    return 0; /* should always let other notifiers run */
> +}
> +
> +static void usage_threshold_register_notifier(BlockDriverState *bs)
> +{
> +    bs->wr_usage_threshold_notifier.notify = before_write_notify;
> +    notifier_with_return_list_add(&bs->before_write_notifiers,
> +                                  &bs->wr_usage_threshold_notifier);
> +}
> +
> +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t threshold_bytes)
> +{
> +    BlockDriverState *target_bs = bs;
> +    if (bs->file) {
> +        target_bs = bs->file;
> +    }

Hmm...I think now I understand why you are trying to use bs->file.  This
is an attempt to make image formats work with the threshold.

Unfortunately the BlockDriverState topology can be more complicated than
just 1 level.

If we hardcode a strategy to traverse bs->file then it will work in most
cases:

  while (bs->file) {
      bs = bs->file;
  }

But there are cases like VMDK extent files where a BlockDriverState
actually has multiple children.

One way to solve this is to require that the management tool tells QEMU
which exact BlockDriverState node the threshold applies to.  Then QEMU
doesn't need any hardcoded policy.  But I'm not sure how realistic that
it at the moment (whether management tools are uses node names for each
node yet), so it may be best to hardcode the bs->file traversal that
I've suggested.

Kevin: Do you agree?

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

next prev parent reply	other threads:[~2014-11-17 16:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-07 13:12 [Qemu-devel] [RFC][PATCH v2] add write threshold reporting for block devices Francesco Romani
2014-11-07 13:12 ` [Qemu-devel] [RFC][PATCH v2] block: " Francesco Romani
2014-11-17 16:49   ` Stefan Hajnoczi [this message]
2014-11-18  8:12     ` Francesco Romani
2014-11-19 15:52       ` Stefan Hajnoczi
2014-11-20  8:23         ` Francesco Romani
2014-11-20 10:30     ` Kevin Wolf
2014-11-20 11:04       ` Stefan Hajnoczi
2014-11-20 11:34         ` Kevin Wolf
2014-11-21  8:43           ` Francesco Romani
2014-11-21 10:11             ` Kevin Wolf
2014-11-21 15:32               ` Francesco Romani
2014-11-21 16:24             ` Eric Blake
2014-11-21  0:10         ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141117164936.GL16192@stefanha-thinkpad.redhat.com \
    --to=stefanha@gmail.com \
    --cc=fromani@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).