From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56575) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xr7Z2-0005G7-94 for qemu-devel@nongnu.org; Wed, 19 Nov 2014 10:53:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xr7Yw-00050n-0U for qemu-devel@nongnu.org; Wed, 19 Nov 2014 10:53:04 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52150) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xr7Yv-00050e-Ol for qemu-devel@nongnu.org; Wed, 19 Nov 2014 10:52:57 -0500 Date: Wed, 19 Nov 2014 15:52:51 +0000 From: Stefan Hajnoczi Message-ID: <20141119155251.GA16593@stefanha-thinkpad.redhat.com> References: <1415365933-3727-1-git-send-email-fromani@redhat.com> <1415365933-3727-2-git-send-email-fromani@redhat.com> <20141117164936.GL16192@stefanha-thinkpad.redhat.com> <1011194982.313492.1416298332568.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jRHKVT23PllUwdXP" Content-Disposition: inline In-Reply-To: <1011194982.313492.1416298332568.JavaMail.zimbra@redhat.com> Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Francesco Romani Cc: kwolf@redhat.com, Stefan Hajnoczi , mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org, lcapitulino@redhat.com --jRHKVT23PllUwdXP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 18, 2014 at 03:12:12AM -0500, Francesco Romani wrote: > > > +static int coroutine_fn before_write_notify(NotifierWithReturn *noti= fier, > > > + void *opaque) > > > +{ > > > + BdrvTrackedRequest *req =3D opaque; > > > + BlockDriverState *bs =3D req->bs; > > > + int64_t amount =3D 0; > > > + > > > + assert((req->offset & (BDRV_SECTOR_SIZE - 1)) =3D=3D 0); > > > + assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) =3D=3D 0); > >=20 > > Does the code still make these assumptions or are the asserts left over > > from previous versions of the patch? >=20 > It's a leftover. > I understood they don't hurt and add a bit of safety, but if they are con= fusing > I'll remove them. Yes, it made me wonder why. Probably best to remove them. > > > +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t > > > threshold_bytes) > > > +{ > > > + BlockDriverState *target_bs =3D bs; > > > + if (bs->file) { > > > + target_bs =3D bs->file; > > > + } > >=20 > > Hmm...I think now I understand why you are trying to use bs->file. This > > is an attempt to make image formats work with the threshold. > >=20 > > Unfortunately the BlockDriverState topology can be more complicated than > > just 1 level. >=20 > I thought so but I can't reproduce yet more complex topologies. > Disclosure: I'm testing against the topology I know to be used on oVirt, = lacking > immediate availability of others: suggestions welcome. >=20 > At risk of being redundant, in oVirt the usecase is QCOW2 over lvm block = device, > and we'd like to be notified about the allocation of the lvm block device= , which (IIUC) > is the last bs->file. >=20 > This is a simple topology (unless I'm missing something), and that's > the reason why I go down just one level. >=20 > Of course I want a general solution for this change, so... There is a block driver for error injection called "blkdebug" (see docs/blkdebug.txt). Here is an example of the following topology: raw_bsd (drive0) -> blkdebug -> raw-posix (test.img) qemu-system-x86_64 -drive if=3Dvirtio,format=3Draw,file.driver=3Dblkdebug,f= ile.image.filename=3Dtest.img The blkdebug driver is interposing between the raw_bsd (drive0) root and the raw-posix leaf node. > > If we hardcode a strategy to traverse bs->file then it will work in most > > cases: > >=20 > > while (bs->file) { > > bs =3D bs->file; > > } > >=20 > > But there are cases like VMDK extent files where a BlockDriverState > > actually has multiple children. > >=20 > > One way to solve this is to require that the management tool tells QEMU > > which exact BlockDriverState node the threshold applies to. Then QEMU > > doesn't need any hardcoded policy. But I'm not sure how realistic that > > it at the moment (whether management tools are uses node names for each > > node yet), so it may be best to hardcode the bs->file traversal that > > I've suggested. >=20 > oVirt relies on libvirt[1], and uses device node (e.g. 'vda'). >=20 > BTW, I haven't found a way to inspect from the outside (e.g. monitor comm= and) the BlockDriverState > topology, there is a way I'm missing? You can get the BlockDriverState and ->backing_hd chain using the query-block QMP command. I'm not aware of a command that returns the full graph of BlockDriverState nodes. The management tool should not need to inspect the graph because the graph can change at runtime (e.g. imagine I/O throttling is implemented as a BlockDriverState node then it could appear/disapper when the feature is activated/deactivated). Instead the management tool should name the nodes it knows about and then use those node names. > Another issue I don't yet have a proper solution is related to this one. >=20 > The management app will have deal with VM with more than one block device= disk, so we need > a way to map the notification with the corresponding device. >=20 > If we descend the bs->file chain, AFAIU the easiest mapping, being the de= vice name, > is easily lost because only the outermost BlockDriverState has a device n= ame attached, so when the > notification trigger > bdrv_get_device_name() will return NULL In the worst case a name string can be passed in along with the threshold values. > I believe we don't necessarily need a device name in the notification, fo= r example something > like an opaque cookie set together with the threshold and returned back w= ith the notification > will suffice. Of course the device name is nicer :) Agreed. --jRHKVT23PllUwdXP Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJUbLzTAAoJEJykq7OBq3PIciUH/3MwMxWHUXh0yzVW1qFizaYe UyjLclR0rXB/D244StZVmPoS5hD8VmUrA31tzS2XYbG7MR8yFDjI58osXCc0Hv8o rAY1Aznt5Mo8DI3hszRATNQ0Jy4tdVbK4wv+N8SlabDuOYfCAK8TSZchHkew6+C4 rqUDXc8Nzncm/tLD/aqz5MJU49GiDoCSAS+POOw89A4S9nckHOR9xCwTgmaeD9Lf g8gNU6B9QYzcaDQI6+gqXCD2CAwtyFBCjZDbRbjWGpJm4sf7RaMM/6YjP6narRSX llp5oqSRKJ6Y+Boxh7JVFXMI4ZByc6Dn1Dtfvk1vE3yWJdnKRMQXKS9mrT0baIE= =MUbG -----END PGP SIGNATURE----- --jRHKVT23PllUwdXP--