From: Stefan Hajnoczi <stefanha@redhat.com>
To: Francesco Romani <fromani@redhat.com>
Cc: kwolf@redhat.com, Stefan Hajnoczi <stefanha@gmail.com>,
mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,
lcapitulino@redhat.com
Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices
Date: Wed, 19 Nov 2014 15:52:51 +0000 [thread overview]
Message-ID: <20141119155251.GA16593@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <1011194982.313492.1416298332568.JavaMail.zimbra@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4556 bytes --]
On Tue, Nov 18, 2014 at 03:12:12AM -0500, Francesco Romani wrote:
> > > +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> > > + void *opaque)
> > > +{
> > > + BdrvTrackedRequest *req = opaque;
> > > + BlockDriverState *bs = req->bs;
> > > + int64_t amount = 0;
> > > +
> > > + assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> > > + assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> >
> > Does the code still make these assumptions or are the asserts left over
> > from previous versions of the patch?
>
> It's a leftover.
> I understood they don't hurt and add a bit of safety, but if they are confusing
> I'll remove them.
Yes, it made me wonder why. Probably best to remove them.
> > > +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t
> > > threshold_bytes)
> > > +{
> > > + BlockDriverState *target_bs = bs;
> > > + if (bs->file) {
> > > + target_bs = bs->file;
> > > + }
> >
> > Hmm...I think now I understand why you are trying to use bs->file. This
> > is an attempt to make image formats work with the threshold.
> >
> > Unfortunately the BlockDriverState topology can be more complicated than
> > just 1 level.
>
> I thought so but I can't reproduce yet more complex topologies.
> Disclosure: I'm testing against the topology I know to be used on oVirt, lacking
> immediate availability of others: suggestions welcome.
>
> At risk of being redundant, in oVirt the usecase is QCOW2 over lvm block device,
> and we'd like to be notified about the allocation of the lvm block device, which (IIUC)
> is the last bs->file.
>
> This is a simple topology (unless I'm missing something), and that's
> the reason why I go down just one level.
>
> Of course I want a general solution for this change, so...
There is a block driver for error injection called "blkdebug" (see
docs/blkdebug.txt). Here is an example of the following topology:
raw_bsd (drive0) -> blkdebug -> raw-posix (test.img)
qemu-system-x86_64 -drive if=virtio,format=raw,file.driver=blkdebug,file.image.filename=test.img
The blkdebug driver is interposing between the raw_bsd (drive0) root and
the raw-posix leaf node.
> > If we hardcode a strategy to traverse bs->file then it will work in most
> > cases:
> >
> > while (bs->file) {
> > bs = bs->file;
> > }
> >
> > But there are cases like VMDK extent files where a BlockDriverState
> > actually has multiple children.
> >
> > One way to solve this is to require that the management tool tells QEMU
> > which exact BlockDriverState node the threshold applies to. Then QEMU
> > doesn't need any hardcoded policy. But I'm not sure how realistic that
> > it at the moment (whether management tools are uses node names for each
> > node yet), so it may be best to hardcode the bs->file traversal that
> > I've suggested.
>
> oVirt relies on libvirt[1], and uses device node (e.g. 'vda').
>
> BTW, I haven't found a way to inspect from the outside (e.g. monitor command) the BlockDriverState
> topology, there is a way I'm missing?
You can get the BlockDriverState and ->backing_hd chain using the
query-block QMP command. I'm not aware of a command that returns the
full graph of BlockDriverState nodes.
The management tool should not need to inspect the graph because the
graph can change at runtime (e.g. imagine I/O throttling is implemented
as a BlockDriverState node then it could appear/disapper when the
feature is activated/deactivated). Instead the management tool should
name the nodes it knows about and then use those node names.
> Another issue I don't yet have a proper solution is related to this one.
>
> The management app will have deal with VM with more than one block device disk, so we need
> a way to map the notification with the corresponding device.
>
> If we descend the bs->file chain, AFAIU the easiest mapping, being the device name,
> is easily lost because only the outermost BlockDriverState has a device name attached, so when the
> notification trigger
> bdrv_get_device_name() will return NULL
In the worst case a name string can be passed in along with the
threshold values.
> I believe we don't necessarily need a device name in the notification, for example something
> like an opaque cookie set together with the threshold and returned back with the notification
> will suffice. Of course the device name is nicer :)
Agreed.
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
next prev parent reply other threads:[~2014-11-19 15:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-07 13:12 [Qemu-devel] [RFC][PATCH v2] add write threshold reporting for block devices Francesco Romani
2014-11-07 13:12 ` [Qemu-devel] [RFC][PATCH v2] block: " Francesco Romani
2014-11-17 16:49 ` Stefan Hajnoczi
2014-11-18 8:12 ` Francesco Romani
2014-11-19 15:52 ` Stefan Hajnoczi [this message]
2014-11-20 8:23 ` Francesco Romani
2014-11-20 10:30 ` Kevin Wolf
2014-11-20 11:04 ` Stefan Hajnoczi
2014-11-20 11:34 ` Kevin Wolf
2014-11-21 8:43 ` Francesco Romani
2014-11-21 10:11 ` Kevin Wolf
2014-11-21 15:32 ` Francesco Romani
2014-11-21 16:24 ` Eric Blake
2014-11-21 0:10 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141119155251.GA16593@stefanha-thinkpad.redhat.com \
--to=stefanha@redhat.com \
--cc=fromani@redhat.com \
--cc=kwolf@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).