* [Qemu-devel] n ways block filters @ 2014-03-14 15:57 Benoît Canet 2014-03-17 3:12 ` Fam Zheng 0 siblings, 1 reply; 12+ messages in thread From: Benoît Canet @ 2014-03-14 15:57 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, stefanha, mreitz Hello list, I plan to convert throttling to a block filter and write n way throttling support. I discussed a bit with Stefan on the list and we came to the conclusion that the block filter API need group support. filter group: ------------- My current plan to implement this is to add the following fields to the BlockDriver structure. int bdrv_add_filter_group(const char *name, QDict options); int bdrv_reconfigure_filter_group(const char *name, QDict options); int bdrv_destroy_filter_group(const char *name); These three extra method would allow to create, reconfigure or destroy a block filter group. A block filter group contain the shared or non shared state of the blockfilter. For throttling it would contains the ThrottleState structure. Each block filter driver would contains a linked list of linked list where the BDS are registered grouped by filter groups state. The regular bdrv_open callback would be used to instantiate a block filter and add it to a filter group. This method would also take a new-node-name for the new filter. This node-name would become the name of the new filter. bdrv_close would cleanup and deregister from a filter group. An extra filter-group field in the option dict would allow the bdrv_open method to register the newly opened block filter in it's filter group. The BDS structure would have a direct pointer to it's filter group state. Utility methods to do the bdrv_add_filter_group bdrv_open then bdrv_swap to install a new filter can be provided by block.c. The same can be done for filter close and desinstallation. Legacy throttling QMP API ------------------------- The legacy throttling API would create throttling filters groups containing only one BDS. By default for every 1 way block filter block.c would create a filter group using the BDS id or node-name as group name. This allow for easy filer removal with the bds reference. Group throttling API -------------------- Commands would be added to create throttling filter groups reconfigure and remove them. Two additional commands would be added to create and insert a block filter in a given group or close and remove it. Before I start implementing something what are your thougths on this ? Best regards Benoît ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-14 15:57 [Qemu-devel] n ways block filters Benoît Canet @ 2014-03-17 3:12 ` Fam Zheng 2014-03-17 13:52 ` Benoît Canet 2014-03-17 16:02 ` Stefan Hajnoczi 0 siblings, 2 replies; 12+ messages in thread From: Fam Zheng @ 2014-03-17 3:12 UTC (permalink / raw) To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha, mreitz On Fri, 03/14 16:57, Benoît Canet wrote: > > Hello list, > > I plan to convert throttling to a block filter and write n way throttling > support. > > I discussed a bit with Stefan on the list and we came to the conclusion that the > block filter API need group support. > > filter group: > ------------- > > My current plan to implement this is to add the following fields to the BlockDriver > structure. > > int bdrv_add_filter_group(const char *name, QDict options); > int bdrv_reconfigure_filter_group(const char *name, QDict options); > int bdrv_destroy_filter_group(const char *name); > > These three extra method would allow to create, reconfigure or destroy a block > filter group. A block filter group contain the shared or non shared state of the > blockfilter. For throttling it would contains the ThrottleState structure. > > Each block filter driver would contains a linked list of linked list where the > BDS are registered grouped by filter groups state. Sorry I don't fully understand this. Does a filter group contain multiple block filters, and every block filter has effect on multiple BDSes? Could you give an example? > > The regular bdrv_open callback would be used to instantiate a block filter and > add it to a filter group. This method would also take a new-node-name for the new > filter. This node-name would become the name of the new filter. > bdrv_close would cleanup and deregister from a filter group. > > An extra filter-group field in the option dict would allow the bdrv_open method > to register the newly opened block filter in it's filter group. > The BDS structure would have a direct pointer to it's filter group state. > > Utility methods to do the bdrv_add_filter_group bdrv_open then bdrv_swap to > install a new filter can be provided by block.c. The same can be done for filter > close and desinstallation. So you are defining block filter as a new kind of block driver. Is a filter always on top above everything else by definition? But I am afraid BlockDriverState is already taking too many responsibilities here (backend, protocol driver, format driver, filter...). I was wondering if it is clearer to rather introduce bs->filter_list to point to a list of BlockFilter (a new sturcture, tailored for a block filter), and don't bother with bdrv_swap. Thanks, Fam > > Legacy throttling QMP API > ------------------------- > > The legacy throttling API would create throttling filters groups containing only > one BDS. > > By default for every 1 way block filter block.c would create a filter group > using the BDS id or node-name as group name. This allow for easy filer removal > with the bds reference. > > Group throttling API > -------------------- > > Commands would be added to create throttling filter groups reconfigure and remove > them. > > Two additional commands would be added to create and insert a block filter in a > given group or close and remove it. > > Before I start implementing something what are your thougths on this ? > > Best regards > > Benoît > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-17 3:12 ` Fam Zheng @ 2014-03-17 13:52 ` Benoît Canet 2014-03-17 16:02 ` Stefan Hajnoczi 1 sibling, 0 replies; 12+ messages in thread From: Benoît Canet @ 2014-03-17 13:52 UTC (permalink / raw) To: Fam Zheng; +Cc: Benoît Canet, kwolf, qemu-devel, stefanha, mreitz The Monday 17 Mar 2014 à 11:12:31 (+0800), Fam Zheng wrote : > On Fri, 03/14 16:57, Benoît Canet wrote: > > > > Hello list, > > > > I plan to convert throttling to a block filter and write n way throttling > > support. > > > > I discussed a bit with Stefan on the list and we came to the conclusion that the > > block filter API need group support. > > > > filter group: > > ------------- > > > > My current plan to implement this is to add the following fields to the BlockDriver > > structure. > > > > int bdrv_add_filter_group(const char *name, QDict options); > > int bdrv_reconfigure_filter_group(const char *name, QDict options); > > int bdrv_destroy_filter_group(const char *name); > > > > These three extra method would allow to create, reconfigure or destroy a block > > filter group. A block filter group contain the shared or non shared state of the > > blockfilter. For throttling it would contains the ThrottleState structure. > > > > Each block filter driver would contains a linked list of linked list where the > > BDS are registered grouped by filter groups state. > > Sorry I don't fully understand this. Does a filter group contain multiple block > filters, and every block filter has effect on multiple BDSes? Could you give an > example? The driver module says block/throttle.c would contains a linked list of ThrottleGroup in a static variable. Each ThrottleGroup would have a name, a ThrottleState (as in utils/throttle.c) and a linked list of members BDS. The s variable of the bdrv_* method of throttle.c would have a pointer to the ThrottleGroup this BDS is a member of hence the driver can access the ThrottleState So each BDS filter would have only one BDS parent and one BDS ->file child but would also be member of a ThrottleGroup. This would allow to do throttling on a group of devices. Throttling only one device would create a ThrottleGroup of name bds->device_name containing only the throttled BDS. > > > > > The regular bdrv_open callback would be used to instantiate a block filter and > > add it to a filter group. This method would also take a new-node-name for the new > > filter. This node-name would become the name of the new filter. > > bdrv_close would cleanup and deregister from a filter group. > > > > An extra filter-group field in the option dict would allow the bdrv_open method > > to register the newly opened block filter in it's filter group. > > The BDS structure would have a direct pointer to it's filter group state. > > > > Utility methods to do the bdrv_add_filter_group bdrv_open then bdrv_swap to > > install a new filter can be provided by block.c. The same can be done for filter > > close and desinstallation. > > So you are defining block filter as a new kind of block driver. Is a filter > always on top above everything else by definition? Yes, A filter is instanciated on top of another BDS. Filters can be stacked on top of another to form a chain. > But I am afraid BlockDriverState is already taking too many responsibilities > here (backend, protocol driver, format driver, filter...). I was wondering if > it is clearer to rather introduce bs->filter_list to point to a list of > BlockFilter (a new sturcture, tailored for a block filter), and don't bother > with bdrv_swap. Hmm blkverify and quorum are already block filters reusing the BlockDriver interface. The good this with this is that it would allow to move some code from block.c to block/filter_name*.c: block throttling for example. Crypto would be another use case. Best regards Benoît > > Thanks, > Fam > > > > > Legacy throttling QMP API > > ------------------------- > > > > The legacy throttling API would create throttling filters groups containing only > > one BDS. > > > > By default for every 1 way block filter block.c would create a filter group > > using the BDS id or node-name as group name. This allow for easy filer removal > > with the bds reference. > > > > Group throttling API > > -------------------- > > > > Commands would be added to create throttling filter groups reconfigure and remove > > them. > > > > Two additional commands would be added to create and insert a block filter in a > > given group or close and remove it. > > > > Before I start implementing something what are your thougths on this ? > > > > Best regards > > > > Benoît > > > > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-17 3:12 ` Fam Zheng 2014-03-17 13:52 ` Benoît Canet @ 2014-03-17 16:02 ` Stefan Hajnoczi 2014-03-18 13:27 ` Kevin Wolf 1 sibling, 1 reply; 12+ messages in thread From: Stefan Hajnoczi @ 2014-03-17 16:02 UTC (permalink / raw) To: Fam Zheng Cc: Benoît Canet, Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > On Fri, 03/14 16:57, Benoît Canet wrote: >> I discussed a bit with Stefan on the list and we came to the conclusion that the >> block filter API need group support. >> >> filter group: >> ------------- >> >> My current plan to implement this is to add the following fields to the BlockDriver >> structure. >> >> int bdrv_add_filter_group(const char *name, QDict options); >> int bdrv_reconfigure_filter_group(const char *name, QDict options); >> int bdrv_destroy_filter_group(const char *name); >> >> These three extra method would allow to create, reconfigure or destroy a block >> filter group. A block filter group contain the shared or non shared state of the >> blockfilter. For throttling it would contains the ThrottleState structure. >> >> Each block filter driver would contains a linked list of linked list where the >> BDS are registered grouped by filter groups state. > > Sorry I don't fully understand this. Does a filter group contain multiple block > filters, and every block filter has effect on multiple BDSes? Could you give an > example? Just to why a "group" mechanism is useful: You want to impose a 2000 IOPS limit for the entire VM. Currently this is not possible because each drive has its own throttling state. We need a way to say certain drives are part of a group. All drives in a group share the same throttling state and therefore a 2000 IOPS limit is shared amongst them. Whether this concept is common enough to warrant a generic "filter group" API, I'm not sure. But we at least need this for I/O throttling. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-17 16:02 ` Stefan Hajnoczi @ 2014-03-18 13:27 ` Kevin Wolf 2014-03-20 14:05 ` Benoît Canet 0 siblings, 1 reply; 12+ messages in thread From: Kevin Wolf @ 2014-03-18 13:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Benoît Canet, Fam Zheng, qemu-devel, Stefan Hajnoczi, Max Reitz Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > On Fri, 03/14 16:57, Benoît Canet wrote: > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > >> block filter API need group support. > >> > >> filter group: > >> ------------- > >> > >> My current plan to implement this is to add the following fields to the BlockDriver > >> structure. > >> > >> int bdrv_add_filter_group(const char *name, QDict options); > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > >> int bdrv_destroy_filter_group(const char *name); Benoît, your mail left me puzzled. You didn't really describe the problem that you're solving, nor what the QDict options actually contains or what a filter group even is. > >> These three extra method would allow to create, reconfigure or destroy a block > >> filter group. A block filter group contain the shared or non shared state of the > >> blockfilter. For throttling it would contains the ThrottleState structure. > >> > >> Each block filter driver would contains a linked list of linked list where the > >> BDS are registered grouped by filter groups state. > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > filters, and every block filter has effect on multiple BDSes? Could you give an > > example? > > Just to why a "group" mechanism is useful: > > You want to impose a 2000 IOPS limit for the entire VM. Currently > this is not possible because each drive has its own throttling state. > > We need a way to say certain drives are part of a group. All drives > in a group share the same throttling state and therefore a 2000 IOPS > limit is shared amongst them. Now at least I have an idea what you're all talking about, but it's still not obvious to me how the three functions from above solve your problem or how they work in detail. The obvious solution, using often discussed blockdev-add concepts, is: ______________ virtio-blk_A --> | | --> qcow2_A --> raw-posix_A | throttling | virtio_blk_B --> |____________| --> qcow2_B --> nbd_B That is, the I/O throttling BDS is referenced by two devices instead of just one and it associates one 'input' with one 'output'. Once we have BlockBackend, we would have two BBs, but still only one throttling BDS. The new thing that you get there is that the throttling driver has not only multiple parents (that part exists today), but it behaves differently depending on who called it. So we need to provide some way for one BDS to expose multiple slots or whatever you want to call them that users can attach to. This is, by the way, the very same thing as would be required for exposing qcow2 internal snapshots (read-only) while the VM is running. Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-18 13:27 ` Kevin Wolf @ 2014-03-20 14:05 ` Benoît Canet 2014-03-20 15:12 ` Kevin Wolf 0 siblings, 1 reply; 12+ messages in thread From: Benoît Canet @ 2014-03-20 14:05 UTC (permalink / raw) To: Kevin Wolf Cc: Benoît Canet, Fam Zheng, Stefan Hajnoczi, qemu-devel, Max Reitz, Stefan Hajnoczi The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote : > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > > On Fri, 03/14 16:57, Benoît Canet wrote: > > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > > >> block filter API need group support. > > >> > > >> filter group: > > >> ------------- > > >> > > >> My current plan to implement this is to add the following fields to the BlockDriver > > >> structure. > > >> > > >> int bdrv_add_filter_group(const char *name, QDict options); > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > > >> int bdrv_destroy_filter_group(const char *name); > > Benoît, your mail left me puzzled. You didn't really describe the > problem that you're solving, nor what the QDict options actually > contains or what a filter group even is. > > > >> These three extra method would allow to create, reconfigure or destroy a block > > >> filter group. A block filter group contain the shared or non shared state of the > > >> blockfilter. For throttling it would contains the ThrottleState structure. > > >> > > >> Each block filter driver would contains a linked list of linked list where the > > >> BDS are registered grouped by filter groups state. > > > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > > filters, and every block filter has effect on multiple BDSes? Could you give an > > > example? > > > > Just to why a "group" mechanism is useful: > > > > You want to impose a 2000 IOPS limit for the entire VM. Currently > > this is not possible because each drive has its own throttling state. > > > > We need a way to say certain drives are part of a group. All drives > > in a group share the same throttling state and therefore a 2000 IOPS > > limit is shared amongst them. > > Now at least I have an idea what you're all talking about, but it's > still not obvious to me how the three functions from above solve your > problem or how they work in detail. > > The obvious solution, using often discussed blockdev-add concepts, is: > ______________ > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A > | throttling | > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B My proposal would be: ______________ virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A |____________| | _____|________ | | The shared state is the state of a BDS group | Shared | It's stored in a static linked list of the | State | block/throttle.c module. It has a name and contains a |____________| throttle state structure. | _____|________ | BDS 2 | virtio_blk_B --> |____________| --> qcow2_B --> nbd_B The name of the shared state is the throttle group name. The three added methods are used to add, configure and destroy such shared states. The benefit of this aproach is that we don't need to add a special slot mechanism and that removing BDS 2 would be easy. Your approach don't deal with the fact that the throttling group membership can be changed dynamically while the vm is running: for example adding qcow2_C and removing qcow2_B should be made easy. Best regards Benoît > > That is, the I/O throttling BDS is referenced by two devices instead of > just one and it associates one 'input' with one 'output'. Once we have > BlockBackend, we would have two BBs, but still only one throttling > BDS. > > The new thing that you get there is that the throttling driver has > not only multiple parents (that part exists today), but it behaves > differently depending on who called it. So we need to provide some way > for one BDS to expose multiple slots or whatever you want to call them > that users can attach to. > > This is, by the way, the very same thing as would be required for > exposing qcow2 internal snapshots (read-only) while the VM is running. > > Kevin > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-20 14:05 ` Benoît Canet @ 2014-03-20 15:12 ` Kevin Wolf 2014-03-20 15:47 ` Benoît Canet 2014-03-20 16:06 ` Benoît Canet 0 siblings, 2 replies; 12+ messages in thread From: Kevin Wolf @ 2014-03-20 15:12 UTC (permalink / raw) To: Benoît Canet Cc: Fam Zheng, Stefan Hajnoczi, armbru, qemu-devel, Stefan Hajnoczi, Max Reitz Am 20.03.2014 um 15:05 hat Benoît Canet geschrieben: > The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote : > > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > > > On Fri, 03/14 16:57, Benoît Canet wrote: > > > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > > > >> block filter API need group support. > > > >> > > > >> filter group: > > > >> ------------- > > > >> > > > >> My current plan to implement this is to add the following fields to the BlockDriver > > > >> structure. > > > >> > > > >> int bdrv_add_filter_group(const char *name, QDict options); > > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > > > >> int bdrv_destroy_filter_group(const char *name); > > > > Benoît, your mail left me puzzled. You didn't really describe the > > problem that you're solving, nor what the QDict options actually > > contains or what a filter group even is. > > > > > >> These three extra method would allow to create, reconfigure or destroy a block > > > >> filter group. A block filter group contain the shared or non shared state of the > > > >> blockfilter. For throttling it would contains the ThrottleState structure. > > > >> > > > >> Each block filter driver would contains a linked list of linked list where the > > > >> BDS are registered grouped by filter groups state. > > > > > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > > > filters, and every block filter has effect on multiple BDSes? Could you give an > > > > example? > > > > > > Just to why a "group" mechanism is useful: > > > > > > You want to impose a 2000 IOPS limit for the entire VM. Currently > > > this is not possible because each drive has its own throttling state. > > > > > > We need a way to say certain drives are part of a group. All drives > > > in a group share the same throttling state and therefore a 2000 IOPS > > > limit is shared amongst them. > > > > Now at least I have an idea what you're all talking about, but it's > > still not obvious to me how the three functions from above solve your > > problem or how they work in detail. > > > > The obvious solution, using often discussed blockdev-add concepts, is: > > ______________ > > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A > > | throttling | > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > My proposal would be: > ______________ > virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A > |____________| > | > _____|________ > | | The shared state is the state of a BDS group > | Shared | It's stored in a static linked list of the > | State | block/throttle.c module. It has a name and contains a > |____________| throttle state structure. > | > _____|________ > | BDS 2 | > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B Okay. I think your proposal might be easier to implement in the short run, but it introduces an additional type of nodes to the graph (so far we have only one type, BlockDriverStates) with their own set of functions, and I assume monitor commands, for management. This makes the whole graph less uniform and consistent. There may be cases where this is necessary or at least tolerable because the fully generic alternativ isn't doable. I'm not convinced yet that this is the case here. In contrast, my approach would require considerable infrastructure work (you somehow seem to attract that kind of things ;-)), but it's merely a generalisation of what we already have and as such fits nicely in the graph. We already have multiple children of BDS nodes. And we take it for granted that they don't refer to the same data, but that bs->file and bs->backing_hd have actually different semantics. We have recently introduced refcounts for BDSes so that one BDS can now have multiple parents, too, as a first step towards symmetry. The logical extension is that these parent get different semantics, just like the children have different semantics. Doing the abstraction in one model right instead of adding hacks that don't really fit in but are easy to implement has paid off in the past. I'm pretty sure that extending the infrastructure this way will find more users than just I/O throttling, and that having different parents in different roles is universally useful. With qcow2 exposing the snapshots, too, I already named a second potential user of the infrastructure. > The name of the shared state is the throttle group name. > The three added methods are used to add, configure and destroy such shared > states. > > The benefit of this aproach is that we don't need to add a special slot mechanism > and that removing BDS 2 would be easy. > Your approach don't deal with the fact that the throttling group membership can > be changed dynamically while the vm is running: for example adding qcow2_C and > removing qcow2_B should be made easy. Yes, this is right. But then, the nice thing about it is that I stayed fully within the one uniform graph. We just need a way to modify the edges in this graph (and we already need that to insert/delete filters) and you get this special case and many others for free. So, I vote for investing into a uniform infrastructure here instead of adding new one-off node types. Kevin > > That is, the I/O throttling BDS is referenced by two devices instead of > > just one and it associates one 'input' with one 'output'. Once we have > > BlockBackend, we would have two BBs, but still only one throttling > > BDS. > > > > The new thing that you get there is that the throttling driver has > > not only multiple parents (that part exists today), but it behaves > > differently depending on who called it. So we need to provide some way > > for one BDS to expose multiple slots or whatever you want to call them > > that users can attach to. > > > > This is, by the way, the very same thing as would be required for > > exposing qcow2 internal snapshots (read-only) while the VM is running. > > > > Kevin > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-20 15:12 ` Kevin Wolf @ 2014-03-20 15:47 ` Benoît Canet 2014-03-20 16:06 ` Benoît Canet 1 sibling, 0 replies; 12+ messages in thread From: Benoît Canet @ 2014-03-20 15:47 UTC (permalink / raw) To: Kevin Wolf Cc: Benoît Canet, Fam Zheng, Stefan Hajnoczi, armbru, qemu-devel, Stefan Hajnoczi, Max Reitz The Thursday 20 Mar 2014 à 16:12:34 (+0100), Kevin Wolf wrote : > Am 20.03.2014 um 15:05 hat Benoît Canet geschrieben: > > The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote : > > > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > > > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > > > > On Fri, 03/14 16:57, Benoît Canet wrote: > > > > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > > > > >> block filter API need group support. > > > > >> > > > > >> filter group: > > > > >> ------------- > > > > >> > > > > >> My current plan to implement this is to add the following fields to the BlockDriver > > > > >> structure. > > > > >> > > > > >> int bdrv_add_filter_group(const char *name, QDict options); > > > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > > > > >> int bdrv_destroy_filter_group(const char *name); > > > > > > Benoît, your mail left me puzzled. You didn't really describe the > > > problem that you're solving, nor what the QDict options actually > > > contains or what a filter group even is. > > > > > > > >> These three extra method would allow to create, reconfigure or destroy a block > > > > >> filter group. A block filter group contain the shared or non shared state of the > > > > >> blockfilter. For throttling it would contains the ThrottleState structure. > > > > >> > > > > >> Each block filter driver would contains a linked list of linked list where the > > > > >> BDS are registered grouped by filter groups state. > > > > > > > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > > > > filters, and every block filter has effect on multiple BDSes? Could you give an > > > > > example? > > > > > > > > Just to why a "group" mechanism is useful: > > > > > > > > You want to impose a 2000 IOPS limit for the entire VM. Currently > > > > this is not possible because each drive has its own throttling state. > > > > > > > > We need a way to say certain drives are part of a group. All drives > > > > in a group share the same throttling state and therefore a 2000 IOPS > > > > limit is shared amongst them. > > > > > > Now at least I have an idea what you're all talking about, but it's > > > still not obvious to me how the three functions from above solve your > > > problem or how they work in detail. > > > > > > The obvious solution, using often discussed blockdev-add concepts, is: > > > ______________ > > > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A > > > | throttling | > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > My proposal would be: > > ______________ > > virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A > > |____________| > > | > > _____|________ > > | | The shared state is the state of a BDS group > > | Shared | It's stored in a static linked list of the > > | State | block/throttle.c module. It has a name and contains a > > |____________| throttle state structure. > > | > > _____|________ > > | BDS 2 | > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > Okay. I think your proposal might be easier to implement in the short > run, but it introduces an additional type of nodes to the graph (so far > we have only one type, BlockDriverStates) with their own set of > functions, and I assume monitor commands, for management. BDS1 and BDS2 would be regular BDS. Their associated s structure would contain a pointer to the shared state. The shared state would be stored in a static list of the blockfilter. The shared state is not seem from outside the block filter module. The BlockDriver structure would have only the 3 methods to define, configure and destroy groups that the only change required. block.c would manipulated regular BBS no special case involved. Only blockdev.c would fiddle with the 3 extra BlockDriver methods. > > This makes the whole graph less uniform and consistent. There may be > cases where this is necessary or at least tolerable because the fully > generic alternativ isn't doable. I'm not convinced yet that this is the > case here. > > In contrast, my approach would require considerable infrastructure work > (you somehow seem to attract that kind of things ;-)), but it's merely a > generalisation of what we already have and as such fits nicely in the > graph. > > We already have multiple children of BDS nodes. And we take it for > granted that they don't refer to the same data, but that bs->file and > bs->backing_hd have actually different semantics. > > We have recently introduced refcounts for BDSes so that one BDS can now > have multiple parents, too, as a first step towards symmetry. The > logical extension is that these parent get different semantics, just > like the children have different semantics. > > Doing the abstraction in one model right instead of adding hacks that > don't really fit in but are easy to implement has paid off in the past. > I'm pretty sure that extending the infrastructure this way will find > more users than just I/O throttling, and that having different parents > in different roles is universally useful. With qcow2 exposing the > snapshots, too, I already named a second potential user of the > infrastructure. > > > The name of the shared state is the throttle group name. > > The three added methods are used to add, configure and destroy such shared > > states. > > > > The benefit of this aproach is that we don't need to add a special slot mechanism > > and that removing BDS 2 would be easy. > > Your approach don't deal with the fact that the throttling group membership can > > be changed dynamically while the vm is running: for example adding qcow2_C and > > removing qcow2_B should be made easy. > > Yes, this is right. But then, the nice thing about it is that I stayed > fully within the one uniform graph. We just need a way to modify the > edges in this graph (and we already need that to insert/delete filters) > and you get this special case and many others for free. > > So, I vote for investing into a uniform infrastructure here instead of > adding new one-off node types. > > Kevin > > > > That is, the I/O throttling BDS is referenced by two devices instead of > > > just one and it associates one 'input' with one 'output'. Once we have > > > BlockBackend, we would have two BBs, but still only one throttling > > > BDS. > > > > > > The new thing that you get there is that the throttling driver has > > > not only multiple parents (that part exists today), but it behaves > > > differently depending on who called it. So we need to provide some way > > > for one BDS to expose multiple slots or whatever you want to call them > > > that users can attach to. > > > > > > This is, by the way, the very same thing as would be required for > > > exposing qcow2 internal snapshots (read-only) while the VM is running. > > > > > > Kevin > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-20 15:12 ` Kevin Wolf 2014-03-20 15:47 ` Benoît Canet @ 2014-03-20 16:06 ` Benoît Canet 2014-03-24 14:53 ` Benoît Canet 1 sibling, 1 reply; 12+ messages in thread From: Benoît Canet @ 2014-03-20 16:06 UTC (permalink / raw) To: Kevin Wolf Cc: Benoît Canet, Fam Zheng, Stefan Hajnoczi, qemu-devel, Max Reitz, Stefan Hajnoczi, armbru The Thursday 20 Mar 2014 à 16:12:34 (+0100), Kevin Wolf wrote : > Am 20.03.2014 um 15:05 hat Benoît Canet geschrieben: > > The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote : > > > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > > > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > > > > On Fri, 03/14 16:57, Benoît Canet wrote: > > > > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > > > > >> block filter API need group support. > > > > >> > > > > >> filter group: > > > > >> ------------- > > > > >> > > > > >> My current plan to implement this is to add the following fields to the BlockDriver > > > > >> structure. > > > > >> > > > > >> int bdrv_add_filter_group(const char *name, QDict options); > > > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > > > > >> int bdrv_destroy_filter_group(const char *name); > > > > > > Benoît, your mail left me puzzled. You didn't really describe the > > > problem that you're solving, nor what the QDict options actually > > > contains or what a filter group even is. > > > > > > > >> These three extra method would allow to create, reconfigure or destroy a block > > > > >> filter group. A block filter group contain the shared or non shared state of the > > > > >> blockfilter. For throttling it would contains the ThrottleState structure. > > > > >> > > > > >> Each block filter driver would contains a linked list of linked list where the > > > > >> BDS are registered grouped by filter groups state. > > > > > > > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > > > > filters, and every block filter has effect on multiple BDSes? Could you give an > > > > > example? > > > > > > > > Just to why a "group" mechanism is useful: > > > > > > > > You want to impose a 2000 IOPS limit for the entire VM. Currently > > > > this is not possible because each drive has its own throttling state. > > > > > > > > We need a way to say certain drives are part of a group. All drives > > > > in a group share the same throttling state and therefore a 2000 IOPS > > > > limit is shared amongst them. > > > > > > Now at least I have an idea what you're all talking about, but it's > > > still not obvious to me how the three functions from above solve your > > > problem or how they work in detail. > > > > > > The obvious solution, using often discussed blockdev-add concepts, is: > > > ______________ > > > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A > > > | throttling | > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > My proposal would be: > > ______________ > > virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A > > |____________| > > | > > _____|________ > > | | The shared state is the state of a BDS group > > | Shared | It's stored in a static linked list of the > > | State | block/throttle.c module. It has a name and contains a > > |____________| throttle state structure. > > | > > _____|________ > > | BDS 2 | > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > Okay. I think your proposal might be easier to implement in the short > run, but it introduces an additional type of nodes to the graph (so far > we have only one type, BlockDriverStates) with their own set of > functions, and I assume monitor commands, for management. > > This makes the whole graph less uniform and consistent. There may be > cases where this is necessary or at least tolerable because the fully > generic alternativ isn't doable. I'm not convinced yet that this is the > case here. > > In contrast, my approach would require considerable infrastructure work > (you somehow seem to attract that kind of things ;-)), but it's merely a > generalisation of what we already have and as such fits nicely in the > graph. > > We already have multiple children of BDS nodes. And we take it for > granted that they don't refer to the same data, but that bs->file and > bs->backing_hd have actually different semantics. > > We have recently introduced refcounts for BDSes so that one BDS can now > have multiple parents, too, as a first step towards symmetry. The > logical extension is that these parent get different semantics, just > like the children have different semantics. > > Doing the abstraction in one model right instead of adding hacks that > don't really fit in but are easy to implement has paid off in the past. > I'm pretty sure that extending the infrastructure this way will find > more users than just I/O throttling, and that having different parents > in different roles is universally useful. With qcow2 exposing the > snapshots, too, I already named a second potential user of the > infrastructure. > > > The name of the shared state is the throttle group name. > > The three added methods are used to add, configure and destroy such shared > > states. > > > > The benefit of this aproach is that we don't need to add a special slot mechanism > > and that removing BDS 2 would be easy. > > Your approach don't deal with the fact that the throttling group membership can > > be changed dynamically while the vm is running: for example adding qcow2_C and > > removing qcow2_B should be made easy. > > Yes, this is right. But then, the nice thing about it is that I stayed > fully within the one uniform graph. We just need a way to modify the > edges in this graph (and we already need that to insert/delete filters) > and you get this special case and many others for free. > > So, I vote for investing into a uniform infrastructure here instead of > adding new one-off node types. Maybe parents BDS could use a generic block function to get a cookie when they start to use a children BDS. The parent would to bs->file_cookie = bdrv_get_cookie(file); bs->file = file; when choosing to use file as bs file. The get cookie method would be uint64_t bdrv_get_cookie(bs) { bs->cookie = gen_uuid(bs); return bs->cookie; } gen_uuid would combine a random 64 bit number with a registry to prevent identical cookie generation. After this step every BlockDriver method would receive the cookie as second parameter. For example bdrv_read(bs, cookie, ...) So it's easy for a block driver to discriminate based on the cookie and even to look up which of his own child is associated to this cookie. Best regards Benoît > > Kevin > > > > That is, the I/O throttling BDS is referenced by two devices instead of > > > just one and it associates one 'input' with one 'output'. Once we have > > > BlockBackend, we would have two BBs, but still only one throttling > > > BDS. > > > > > > The new thing that you get there is that the throttling driver has > > > not only multiple parents (that part exists today), but it behaves > > > differently depending on who called it. So we need to provide some way > > > for one BDS to expose multiple slots or whatever you want to call them > > > that users can attach to. > > > > > > This is, by the way, the very same thing as would be required for > > > exposing qcow2 internal snapshots (read-only) while the VM is running. > > > > > > Kevin > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-20 16:06 ` Benoît Canet @ 2014-03-24 14:53 ` Benoît Canet 2014-03-24 15:58 ` Kevin Wolf 0 siblings, 1 reply; 12+ messages in thread From: Benoît Canet @ 2014-03-24 14:53 UTC (permalink / raw) To: Benoît Canet Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, qemu-devel, Max Reitz, Stefan Hajnoczi, armbru The Thursday 20 Mar 2014 à 17:06:26 (+0100), Benoît Canet wrote : > The Thursday 20 Mar 2014 à 16:12:34 (+0100), Kevin Wolf wrote : > > Am 20.03.2014 um 15:05 hat Benoît Canet geschrieben: > > > The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote : > > > > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > > > > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > > > > > On Fri, 03/14 16:57, Benoît Canet wrote: > > > > > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > > > > > >> block filter API need group support. > > > > > >> > > > > > >> filter group: > > > > > >> ------------- > > > > > >> > > > > > >> My current plan to implement this is to add the following fields to the BlockDriver > > > > > >> structure. > > > > > >> > > > > > >> int bdrv_add_filter_group(const char *name, QDict options); > > > > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > > > > > >> int bdrv_destroy_filter_group(const char *name); > > > > > > > > Benoît, your mail left me puzzled. You didn't really describe the > > > > problem that you're solving, nor what the QDict options actually > > > > contains or what a filter group even is. > > > > > > > > > >> These three extra method would allow to create, reconfigure or destroy a block > > > > > >> filter group. A block filter group contain the shared or non shared state of the > > > > > >> blockfilter. For throttling it would contains the ThrottleState structure. > > > > > >> > > > > > >> Each block filter driver would contains a linked list of linked list where the > > > > > >> BDS are registered grouped by filter groups state. > > > > > > > > > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > > > > > filters, and every block filter has effect on multiple BDSes? Could you give an > > > > > > example? > > > > > > > > > > Just to why a "group" mechanism is useful: > > > > > > > > > > You want to impose a 2000 IOPS limit for the entire VM. Currently > > > > > this is not possible because each drive has its own throttling state. > > > > > > > > > > We need a way to say certain drives are part of a group. All drives > > > > > in a group share the same throttling state and therefore a 2000 IOPS > > > > > limit is shared amongst them. > > > > > > > > Now at least I have an idea what you're all talking about, but it's > > > > still not obvious to me how the three functions from above solve your > > > > problem or how they work in detail. > > > > > > > > The obvious solution, using often discussed blockdev-add concepts, is: > > > > ______________ > > > > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A > > > > | throttling | > > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > > > My proposal would be: > > > ______________ > > > virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A > > > |____________| > > > | > > > _____|________ > > > | | The shared state is the state of a BDS group > > > | Shared | It's stored in a static linked list of the > > > | State | block/throttle.c module. It has a name and contains a > > > |____________| throttle state structure. > > > | > > > _____|________ > > > | BDS 2 | > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > Okay. I think your proposal might be easier to implement in the short > > run, but it introduces an additional type of nodes to the graph (so far > > we have only one type, BlockDriverStates) with their own set of > > functions, and I assume monitor commands, for management. > > > > This makes the whole graph less uniform and consistent. There may be > > cases where this is necessary or at least tolerable because the fully > > generic alternativ isn't doable. I'm not convinced yet that this is the > > case here. > > > > In contrast, my approach would require considerable infrastructure work > > (you somehow seem to attract that kind of things ;-)), but it's merely a > > generalisation of what we already have and as such fits nicely in the > > graph. > > > > We already have multiple children of BDS nodes. And we take it for > > granted that they don't refer to the same data, but that bs->file and > > bs->backing_hd have actually different semantics. > > > > We have recently introduced refcounts for BDSes so that one BDS can now > > have multiple parents, too, as a first step towards symmetry. The > > logical extension is that these parent get different semantics, just > > like the children have different semantics. > > > > Doing the abstraction in one model right instead of adding hacks that > > don't really fit in but are easy to implement has paid off in the past. > > I'm pretty sure that extending the infrastructure this way will find > > more users than just I/O throttling, and that having different parents > > in different roles is universally useful. With qcow2 exposing the > > snapshots, too, I already named a second potential user of the > > infrastructure. > > > > > The name of the shared state is the throttle group name. > > > The three added methods are used to add, configure and destroy such shared > > > states. > > > > > > The benefit of this aproach is that we don't need to add a special slot mechanism > > > and that removing BDS 2 would be easy. > > > Your approach don't deal with the fact that the throttling group membership can > > > be changed dynamically while the vm is running: for example adding qcow2_C and > > > removing qcow2_B should be made easy. > > > > Yes, this is right. But then, the nice thing about it is that I stayed > > fully within the one uniform graph. We just need a way to modify the > > edges in this graph (and we already need that to insert/delete filters) > > and you get this special case and many others for free. > > > > So, I vote for investing into a uniform infrastructure here instead of > > adding new one-off node types. > > Maybe parents BDS could use a generic block function to get a cookie when they > start to use a children BDS. > > The parent would to > > bs->file_cookie = bdrv_get_cookie(file); > bs->file = file; > > when choosing to use file as bs file. > > The get cookie method would be > > uint64_t bdrv_get_cookie(bs) { > bs->cookie = gen_uuid(bs); > return bs->cookie; > } > > gen_uuid would combine a random 64 bit number with a registry to prevent > identical cookie generation. > > After this step every BlockDriver method would receive the cookie as second > parameter. > > For example bdrv_read(bs, cookie, ...) > > So it's easy for a block driver to discriminate based on the cookie and even to > look up which of his own child is associated to this cookie. > > Best regards > > Benoît > > > > > Kevin Kevin: what do you think of this cookie idea ? It seems something doable with reasonable small steps. Best regards Benoît > > > > > > That is, the I/O throttling BDS is referenced by two devices instead of > > > > just one and it associates one 'input' with one 'output'. Once we have > > > > BlockBackend, we would have two BBs, but still only one throttling > > > > BDS. > > > > > > > > The new thing that you get there is that the throttling driver has > > > > not only multiple parents (that part exists today), but it behaves > > > > differently depending on who called it. So we need to provide some way > > > > for one BDS to expose multiple slots or whatever you want to call them > > > > that users can attach to. > > > > > > > > This is, by the way, the very same thing as would be required for > > > > exposing qcow2 internal snapshots (read-only) while the VM is running. > > > > > > > > Kevin > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-24 14:53 ` Benoît Canet @ 2014-03-24 15:58 ` Kevin Wolf 2014-03-27 15:49 ` Benoît Canet 0 siblings, 1 reply; 12+ messages in thread From: Kevin Wolf @ 2014-03-24 15:58 UTC (permalink / raw) To: Benoît Canet Cc: Fam Zheng, Stefan Hajnoczi, armbru, qemu-devel, Stefan Hajnoczi, Max Reitz Am 24.03.2014 um 15:53 hat Benoît Canet geschrieben: > The Thursday 20 Mar 2014 à 17:06:26 (+0100), Benoît Canet wrote : > > The Thursday 20 Mar 2014 à 16:12:34 (+0100), Kevin Wolf wrote : > > > Am 20.03.2014 um 15:05 hat Benoît Canet geschrieben: > > > > The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote : > > > > > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > > > > > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > > > > > > On Fri, 03/14 16:57, Benoît Canet wrote: > > > > > > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > > > > > > >> block filter API need group support. > > > > > > >> > > > > > > >> filter group: > > > > > > >> ------------- > > > > > > >> > > > > > > >> My current plan to implement this is to add the following fields to the BlockDriver > > > > > > >> structure. > > > > > > >> > > > > > > >> int bdrv_add_filter_group(const char *name, QDict options); > > > > > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > > > > > > >> int bdrv_destroy_filter_group(const char *name); > > > > > > > > > > Benoît, your mail left me puzzled. You didn't really describe the > > > > > problem that you're solving, nor what the QDict options actually > > > > > contains or what a filter group even is. > > > > > > > > > > > >> These three extra method would allow to create, reconfigure or destroy a block > > > > > > >> filter group. A block filter group contain the shared or non shared state of the > > > > > > >> blockfilter. For throttling it would contains the ThrottleState structure. > > > > > > >> > > > > > > >> Each block filter driver would contains a linked list of linked list where the > > > > > > >> BDS are registered grouped by filter groups state. > > > > > > > > > > > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > > > > > > filters, and every block filter has effect on multiple BDSes? Could you give an > > > > > > > example? > > > > > > > > > > > > Just to why a "group" mechanism is useful: > > > > > > > > > > > > You want to impose a 2000 IOPS limit for the entire VM. Currently > > > > > > this is not possible because each drive has its own throttling state. > > > > > > > > > > > > We need a way to say certain drives are part of a group. All drives > > > > > > in a group share the same throttling state and therefore a 2000 IOPS > > > > > > limit is shared amongst them. > > > > > > > > > > Now at least I have an idea what you're all talking about, but it's > > > > > still not obvious to me how the three functions from above solve your > > > > > problem or how they work in detail. > > > > > > > > > > The obvious solution, using often discussed blockdev-add concepts, is: > > > > > ______________ > > > > > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A > > > > > | throttling | > > > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > > > > > My proposal would be: > > > > ______________ > > > > virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A > > > > |____________| > > > > | > > > > _____|________ > > > > | | The shared state is the state of a BDS group > > > > | Shared | It's stored in a static linked list of the > > > > | State | block/throttle.c module. It has a name and contains a > > > > |____________| throttle state structure. > > > > | > > > > _____|________ > > > > | BDS 2 | > > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > > > Okay. I think your proposal might be easier to implement in the short > > > run, but it introduces an additional type of nodes to the graph (so far > > > we have only one type, BlockDriverStates) with their own set of > > > functions, and I assume monitor commands, for management. > > > > > > This makes the whole graph less uniform and consistent. There may be > > > cases where this is necessary or at least tolerable because the fully > > > generic alternativ isn't doable. I'm not convinced yet that this is the > > > case here. > > > > > > In contrast, my approach would require considerable infrastructure work > > > (you somehow seem to attract that kind of things ;-)), but it's merely a > > > generalisation of what we already have and as such fits nicely in the > > > graph. > > > > > > We already have multiple children of BDS nodes. And we take it for > > > granted that they don't refer to the same data, but that bs->file and > > > bs->backing_hd have actually different semantics. > > > > > > We have recently introduced refcounts for BDSes so that one BDS can now > > > have multiple parents, too, as a first step towards symmetry. The > > > logical extension is that these parent get different semantics, just > > > like the children have different semantics. > > > > > > Doing the abstraction in one model right instead of adding hacks that > > > don't really fit in but are easy to implement has paid off in the past. > > > I'm pretty sure that extending the infrastructure this way will find > > > more users than just I/O throttling, and that having different parents > > > in different roles is universally useful. With qcow2 exposing the > > > snapshots, too, I already named a second potential user of the > > > infrastructure. > > > > > > > The name of the shared state is the throttle group name. > > > > The three added methods are used to add, configure and destroy such shared > > > > states. > > > > > > > > The benefit of this aproach is that we don't need to add a special slot mechanism > > > > and that removing BDS 2 would be easy. > > > > Your approach don't deal with the fact that the throttling group membership can > > > > be changed dynamically while the vm is running: for example adding qcow2_C and > > > > removing qcow2_B should be made easy. > > > > > > Yes, this is right. But then, the nice thing about it is that I stayed > > > fully within the one uniform graph. We just need a way to modify the > > > edges in this graph (and we already need that to insert/delete filters) > > > and you get this special case and many others for free. > > > > > > So, I vote for investing into a uniform infrastructure here instead of > > > adding new one-off node types. > > > > Maybe parents BDS could use a generic block function to get a cookie when they > > start to use a children BDS. > > > > The parent would to > > > > bs->file_cookie = bdrv_get_cookie(file); > > bs->file = file; > > > > when choosing to use file as bs file. > > > > The get cookie method would be > > > > uint64_t bdrv_get_cookie(bs) { > > bs->cookie = gen_uuid(bs); > > return bs->cookie; > > } > > > > gen_uuid would combine a random 64 bit number with a registry to prevent > > identical cookie generation. > > > > After this step every BlockDriver method would receive the cookie as second > > parameter. > > > > For example bdrv_read(bs, cookie, ...) > > > > So it's easy for a block driver to discriminate based on the cookie and even to > > look up which of his own child is associated to this cookie. > > > > Best regards > > > > Benoît > > > > > > > > Kevin > > Kevin: what do you think of this cookie idea ? > It seems something doable with reasonable small steps. Sorry, I was going to reply with some more detailed description of what things should look like, but got preempted once again. So, no, this cookies thing is not directly the right thing to do. The idea that the information must be passed is alright, but not as an additional int parameter. First thing is that you can simply use another opaque pointer instead of the integer so that the driver doesn't have to look it up, but can directly use it. The second thing is that there's no need to have two parameters, when one of them implies the other one. So what you end up with is a new type of structure, and you'll split today's BlockDriverStates in three parts: - BlockBackend (the thing that each guest device has) - BlockView (a qcow2 snapshot or a "slot" for I/O throttling filters) - BlockDriverState (deals with the image file and provides views) I'm not completely happy with these names, but I have to use something for this discussion, so I'll just use them until someone comes up with something better. In the end, we should have something like: typedef struct BlockDriverState { /* Like today, except without the fields covered elsewhere */ } BlockDriverState; typedef struct BlockView { BlockDriverState *bs; const char *view_name; uint64_t total_bytes; ... /* more common per-view data */ } BlockView; typedef struct Qcow2View { BlockView common; uint64_t *l1_table; ... /* more per-snapshot data */ } Then you have the different block layer functions, and some of them refer to the whole BlockDriverState (like bdrv_open(), which opens the images and creates all of the views) and others operate on a given BlockView (like bdrv_co_preadv()). Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] n ways block filters 2014-03-24 15:58 ` Kevin Wolf @ 2014-03-27 15:49 ` Benoît Canet 0 siblings, 0 replies; 12+ messages in thread From: Benoît Canet @ 2014-03-27 15:49 UTC (permalink / raw) To: Kevin Wolf Cc: Benoît Canet, Fam Zheng, Stefan Hajnoczi, armbru, qemu-devel, Stefan Hajnoczi, Max Reitz The Monday 24 Mar 2014 à 16:58:55 (+0100), Kevin Wolf wrote : > Am 24.03.2014 um 15:53 hat Benoît Canet geschrieben: > > The Thursday 20 Mar 2014 à 17:06:26 (+0100), Benoît Canet wrote : > > > The Thursday 20 Mar 2014 à 16:12:34 (+0100), Kevin Wolf wrote : > > > > Am 20.03.2014 um 15:05 hat Benoît Canet geschrieben: > > > > > The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote : > > > > > > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben: > > > > > > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <famz@redhat.com> wrote: > > > > > > > > On Fri, 03/14 16:57, Benoît Canet wrote: > > > > > > > >> I discussed a bit with Stefan on the list and we came to the conclusion that the > > > > > > > >> block filter API need group support. > > > > > > > >> > > > > > > > >> filter group: > > > > > > > >> ------------- > > > > > > > >> > > > > > > > >> My current plan to implement this is to add the following fields to the BlockDriver > > > > > > > >> structure. > > > > > > > >> > > > > > > > >> int bdrv_add_filter_group(const char *name, QDict options); > > > > > > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options); > > > > > > > >> int bdrv_destroy_filter_group(const char *name); > > > > > > > > > > > > Benoît, your mail left me puzzled. You didn't really describe the > > > > > > problem that you're solving, nor what the QDict options actually > > > > > > contains or what a filter group even is. > > > > > > > > > > > > > >> These three extra method would allow to create, reconfigure or destroy a block > > > > > > > >> filter group. A block filter group contain the shared or non shared state of the > > > > > > > >> blockfilter. For throttling it would contains the ThrottleState structure. > > > > > > > >> > > > > > > > >> Each block filter driver would contains a linked list of linked list where the > > > > > > > >> BDS are registered grouped by filter groups state. > > > > > > > > > > > > > > > > Sorry I don't fully understand this. Does a filter group contain multiple block > > > > > > > > filters, and every block filter has effect on multiple BDSes? Could you give an > > > > > > > > example? > > > > > > > > > > > > > > Just to why a "group" mechanism is useful: > > > > > > > > > > > > > > You want to impose a 2000 IOPS limit for the entire VM. Currently > > > > > > > this is not possible because each drive has its own throttling state. > > > > > > > > > > > > > > We need a way to say certain drives are part of a group. All drives > > > > > > > in a group share the same throttling state and therefore a 2000 IOPS > > > > > > > limit is shared amongst them. > > > > > > > > > > > > Now at least I have an idea what you're all talking about, but it's > > > > > > still not obvious to me how the three functions from above solve your > > > > > > problem or how they work in detail. > > > > > > > > > > > > The obvious solution, using often discussed blockdev-add concepts, is: > > > > > > ______________ > > > > > > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A > > > > > > | throttling | > > > > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > > > > > > > My proposal would be: > > > > > ______________ > > > > > virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A > > > > > |____________| > > > > > | > > > > > _____|________ > > > > > | | The shared state is the state of a BDS group > > > > > | Shared | It's stored in a static linked list of the > > > > > | State | block/throttle.c module. It has a name and contains a > > > > > |____________| throttle state structure. > > > > > | > > > > > _____|________ > > > > > | BDS 2 | > > > > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B > > > > > > > > Okay. I think your proposal might be easier to implement in the short > > > > run, but it introduces an additional type of nodes to the graph (so far > > > > we have only one type, BlockDriverStates) with their own set of > > > > functions, and I assume monitor commands, for management. > > > > > > > > This makes the whole graph less uniform and consistent. There may be > > > > cases where this is necessary or at least tolerable because the fully > > > > generic alternativ isn't doable. I'm not convinced yet that this is the > > > > case here. > > > > > > > > In contrast, my approach would require considerable infrastructure work > > > > (you somehow seem to attract that kind of things ;-)), but it's merely a > > > > generalisation of what we already have and as such fits nicely in the > > > > graph. > > > > > > > > We already have multiple children of BDS nodes. And we take it for > > > > granted that they don't refer to the same data, but that bs->file and > > > > bs->backing_hd have actually different semantics. > > > > > > > > We have recently introduced refcounts for BDSes so that one BDS can now > > > > have multiple parents, too, as a first step towards symmetry. The > > > > logical extension is that these parent get different semantics, just > > > > like the children have different semantics. > > > > > > > > Doing the abstraction in one model right instead of adding hacks that > > > > don't really fit in but are easy to implement has paid off in the past. > > > > I'm pretty sure that extending the infrastructure this way will find > > > > more users than just I/O throttling, and that having different parents > > > > in different roles is universally useful. With qcow2 exposing the > > > > snapshots, too, I already named a second potential user of the > > > > infrastructure. > > > > > > > > > The name of the shared state is the throttle group name. > > > > > The three added methods are used to add, configure and destroy such shared > > > > > states. > > > > > > > > > > The benefit of this aproach is that we don't need to add a special slot mechanism > > > > > and that removing BDS 2 would be easy. > > > > > Your approach don't deal with the fact that the throttling group membership can > > > > > be changed dynamically while the vm is running: for example adding qcow2_C and > > > > > removing qcow2_B should be made easy. > > > > > > > > Yes, this is right. But then, the nice thing about it is that I stayed > > > > fully within the one uniform graph. We just need a way to modify the > > > > edges in this graph (and we already need that to insert/delete filters) > > > > and you get this special case and many others for free. > > > > > > > > So, I vote for investing into a uniform infrastructure here instead of > > > > adding new one-off node types. > > > > > > Maybe parents BDS could use a generic block function to get a cookie when they > > > start to use a children BDS. > > > > > > The parent would to > > > > > > bs->file_cookie = bdrv_get_cookie(file); > > > bs->file = file; > > > > > > when choosing to use file as bs file. > > > > > > The get cookie method would be > > > > > > uint64_t bdrv_get_cookie(bs) { > > > bs->cookie = gen_uuid(bs); > > > return bs->cookie; > > > } > > > > > > gen_uuid would combine a random 64 bit number with a registry to prevent > > > identical cookie generation. > > > > > > After this step every BlockDriver method would receive the cookie as second > > > parameter. > > > > > > For example bdrv_read(bs, cookie, ...) > > > > > > So it's easy for a block driver to discriminate based on the cookie and even to > > > look up which of his own child is associated to this cookie. > > > > > > Best regards > > > > > > Benoît > > > > > > > > > > > Kevin > > > > Kevin: what do you think of this cookie idea ? > > It seems something doable with reasonable small steps. > > Sorry, I was going to reply with some more detailed description of what > things should look like, but got preempted once again. > > So, no, this cookies thing is not directly the right thing to do. The > idea that the information must be passed is alright, but not as an > additional int parameter. First thing is that you can simply use another > opaque pointer instead of the integer so that the driver doesn't have to > look it up, but can directly use it. The second thing is that there's no > need to have two parameters, when one of them implies the other one. > > So what you end up with is a new type of structure, and you'll split > today's BlockDriverStates in three parts: > > - BlockBackend (the thing that each guest device has) > - BlockView (a qcow2 snapshot or a "slot" for I/O throttling filters) > - BlockDriverState (deals with the image file and provides views) > > I'm not completely happy with these names, but I have to use something > for this discussion, so I'll just use them until someone comes up with > something better. > > In the end, we should have something like: > > typedef struct BlockDriverState { > /* Like today, except without the fields covered elsewhere */ > } BlockDriverState; > > typedef struct BlockView { > BlockDriverState *bs; > const char *view_name; > uint64_t total_bytes; > ... > /* more common per-view data */ > } BlockView; > > typedef struct Qcow2View { > BlockView common; > uint64_t *l1_table; > ... > /* more per-snapshot data */ > } > > Then you have the different block layer functions, and some of them > refer to the whole BlockDriverState (like bdrv_open(), which opens the > images and creates all of the views) and others operate on a given > BlockView (like bdrv_co_preadv()). Does BDS nodes contains pointers to their views ? What is the relation ship between views and the node graph ? Best regards Benoît > > Kevin > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-03-27 15:49 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-14 15:57 [Qemu-devel] n ways block filters Benoît Canet 2014-03-17 3:12 ` Fam Zheng 2014-03-17 13:52 ` Benoît Canet 2014-03-17 16:02 ` Stefan Hajnoczi 2014-03-18 13:27 ` Kevin Wolf 2014-03-20 14:05 ` Benoît Canet 2014-03-20 15:12 ` Kevin Wolf 2014-03-20 15:47 ` Benoît Canet 2014-03-20 16:06 ` Benoît Canet 2014-03-24 14:53 ` Benoît Canet 2014-03-24 15:58 ` Kevin Wolf 2014-03-27 15:49 ` Benoît Canet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).