* [Qemu-devel] Design of the blobstore @ 2011-09-14 17:05 Stefan Berger 2011-09-14 17:40 ` Michael S. Tsirkin ` (3 more replies) 0 siblings, 4 replies; 27+ messages in thread From: Stefan Berger @ 2011-09-14 17:05 UTC (permalink / raw) To: QEMU Developers, Michael S. Tsirkin, Anthony Liguori, Markus Armbruster Hello! Over the last few days primarily Michael Tsirkin and I have discussed the design of the 'blobstore' via IRC (#virtualization). The intention of the blobstore is to provide storage to persist blobs that devices create. Along with these blobs possibly some metadata should be storable in this blobstore. An initial client for the blobstore would be the TPM emulation. The TPM's persistent state needs to be stored once it changes so it can be restored at any point in time later on, i.e., after a cold reboot of the VM. In effect the blobstore simulates the NVRAM of a device where it would typically store such persistent data onto. One design point of the blobstore is that it has to work with QEMU's block layer, i.e., it has to use images for storing the blobs onto and with that use the bdrv_* functions to write its data into these image. The reason for this is primarily QEMU's snapshot feature where snapshots of the VM can be taken assuming QCoW2 image format is being used. If one chooses to provide a QCoW2 image as the storage medium for the blobstore it would enable the snapshotting feature of QEMU automatically. I believe there is no other image format that would work and simply using plain files would in effect destroy the snapshot feature. Using a raw image file for example would prevent snapshotting. One property of the blobstore is that it has a certain required size for accommodating all blobs of device that want to store their blobs onto. The assumption is that the size of these blobs is know a-priori to the writer of the device code and all devices can register their space requirements with the blobstore during device initialization. Then gathering all the registered blobs' sizes plus knowing the overhead of the layout of the data on the disk lets QEMU calculate the total required (minimum) size that the image has to have to accommodate all blobs in a particular blobstore. So what I would like to discuss in this message for now is the design of the command line options for the blobstore in order to determine how to access a blobstore. For experimenting I introduced a 'blobstore' command line option for QEMU with the following possible options: - name=: the name of the blobstore - drive=: the id of the drive used as image file, i.e., -drive id=my-blobs,format=raw,file=/tmp/blobstore.raw,if=none - showsize: Show the size requirement for the image file - create: the image file is created (if found to be of size zero) and then formatted - format: assuming the image file is there, format it before starting the VM; the device will always start with a clean state - formatifbad: format the image file if an attempt to read its content fails upon first read Monitor commands with similar functionality would follow later. The intention behind the parameter 'create' is to make it as easy for the user as possible to start QEMU with a usable image file letting QEMU do the equivalent of 'qemu-img create -f <format> <image file> <size>'. This works fine and lets one start QEMU in one step as long as: - the user passed an empty image file via -drive ...,file=/tmp/blobstore.raw - the format to use is raw For the QCoW2 format, for example, this doesn't works since the QCoW2 file passed via -drive ...,file=/tmp/blobstore.qcow2 cannot be of zero size. In this case the user would have to use the 'showsize' option and learn what size the drive has to be, then invoke 'qemu-img' with the size parameter and then subsequently start QEMU with the image. To find the size the user would have to use a command line like qemu ... \ -blobstore name=my-blobstore,drive=tpm-bs,showsize \ -drive if=none,id=tpm-bs \ -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ -device tpm-tis,tpmdev=tpm0 which would result in QEMU printing to stdout: Blobstore tpm-store on drive with ID tpm-bs requires 83kb. Once a QCoW2 image file has been created using qemu-img create -f qcow2 /tmp/blobstore.qcow2 83k QEMU can then subsequently be used with the following command line options: qemu ... \ -drive if=none,id=tpm-bs,file=/tmp/blobstore.qcow2 \ -blobstore name=my-blobstore,drive=tpm-bs,formatifbad \ -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ -device tpm-tis,tpmdev=tpm0 This would format the blank QCoW2 image only the very first time using the 'formatifbad' parameter. Using a 'raw' image for the blobstore one could do the following to start QEMU in the first step: touch /tmp/blobstore.raw qemu ... \ -blobstore name=my-blobstore,drive=tpm-bs,create \ -drive if=none,id=tpm-bs,format=raw,file=/tmp/blobstore.raw \ -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ -device tpm-tis,tpmdev=tpm0 This would make QEMU create the appropriately sized image and start the VM in one step. Going a layer up into libvirt: To support SELinux labeling (svirt) libvirt could use the above steps as shown for QCoW2 with labeling of the file before starting QEMU. A note at the end: If we were to drop the -drive option and support the file option for the image file in -blobstore, we could have more control over the creation of the image file in any wanted format, but that would mean replicating some of the -drive options in the -blobstore option. QCoW2 files could also be created if the passed file wasn't even existing, yet. Looking forward to your comments. Regards, Stefan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger @ 2011-09-14 17:40 ` Michael S. Tsirkin 2011-09-14 17:49 ` Stefan Berger 2011-09-15 5:47 ` Gleb Natapov ` (2 subsequent siblings) 3 siblings, 1 reply; 27+ messages in thread From: Michael S. Tsirkin @ 2011-09-14 17:40 UTC (permalink / raw) To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > qemu ... \ > -blobstore name=my-blobstore,drive=tpm-bs,showsize \ > -drive if=none,id=tpm-bs \ > -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ > -device tpm-tis,tpmdev=tpm0 > > which would result in QEMU printing to stdout: > > Blobstore tpm-store on drive with ID tpm-bs requires 83kb. So you envision tools parsing this freetext then? Seems like a step back, we are trying to move to QMP ... > Once a QCoW2 image file has been created using > > qemu-img create -f qcow2 /tmp/blobstore.qcow2 83k > > QEMU can then subsequently be used with the following command line options: > > qemu ... \ > -drive if=none,id=tpm-bs,file=/tmp/blobstore.qcow2 \ > -blobstore name=my-blobstore,drive=tpm-bs,formatifbad \ > -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ > -device tpm-tis,tpmdev=tpm0 > > This would format the blank QCoW2 image only the very first time > using the 'formatifbad' parameter. This formatifbad option is a bad mistake (pun intended). It mixes the formatting of image (one time operation) and running of VM (repeated operation). We also saw how this does not play well e.g. with migration. It loses information! Would you like your OS to format hard disk if it can not boot? Right ... Instead, just failing if image is not well formatted will be much easier to debug. > Using a 'raw' image for the blobstore one could do the following to > start QEMU in the first step: > > touch /tmp/blobstore.raw > > qemu ... \ > -blobstore name=my-blobstore,drive=tpm-bs,create \ > -drive if=none,id=tpm-bs,format=raw,file=/tmp/blobstore.raw \ > -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ > -device tpm-tis,tpmdev=tpm0 > > This would make QEMU create the appropriately sized image and start > the VM in one step. > > > Going a layer up into libvirt: To support SELinux labeling (svirt) > libvirt could use the above steps as shown for QCoW2 with labeling > of the file before starting QEMU. > > A note at the end: If we were to drop the -drive option and support > the file option for the image file in -blobstore, we could have more > control over the creation of the image file in any wanted format, > but that would mean replicating some of the -drive options in the > -blobstore option. QCoW2 files could also be created if the passed > file wasn't even existing, yet. > > Looking forward to your comments. > > Regards, > Stefan So with above, the raw case which we don't expect to be used often is easy to use, but qcow which we expect to be the main case is close to imposible, involving manual cut and paste of image size. Formatting images seems a rare enough occasion, that I think only using monitor command for that would be a better idea than a ton of new command line options. On top of that, let's write a script that run qemu, queries image size, creates a qcow2 file, run qemu again to format, all this using QMP. WRT 'format and run in one go' I strongly disagree with it. It's just too easy to shoot oneself in the foot. -- MST ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 17:40 ` Michael S. Tsirkin @ 2011-09-14 17:49 ` Stefan Berger 2011-09-14 17:56 ` Michael S. Tsirkin 0 siblings, 1 reply; 27+ messages in thread From: Stefan Berger @ 2011-09-14 17:49 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote: > On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: >> qemu ... \ >> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ >> -drive if=none,id=tpm-bs \ >> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ >> -device tpm-tis,tpmdev=tpm0 >> >> which would result in QEMU printing to stdout: >> >> Blobstore tpm-store on drive with ID tpm-bs requires 83kb. > So you envision tools parsing this freetext then? > Seems like a step back, we are trying to move to QMP ... I extended it first for the way I typically interact with QEMU. I do not use the monitor much. > > So with above, the raw case which we don't expect to be used often > is easy to use, but qcow which we expect to be the main case > is close to imposible, involving manual cut and paste > of image size. > > Formatting images seems a rare enough occasion, > that I think only using monitor command for that > would be a better idea than a ton of new command > line options. On top of that, let's write a > script that run qemu, queries image size, > creates a qcow2 file, run qemu again to format, > all this using QMP. Creates the qcow2 using 'qemu-img' I suppose. Stefan > WRT 'format and run in one go' I strongly disagree with it. > It's just too easy to shoot oneself in the foot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 17:49 ` Stefan Berger @ 2011-09-14 17:56 ` Michael S. Tsirkin 2011-09-14 21:12 ` Stefan Berger 0 siblings, 1 reply; 27+ messages in thread From: Michael S. Tsirkin @ 2011-09-14 17:56 UTC (permalink / raw) To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote: > On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote: > >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > >>qemu ... \ > >> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ > >> -drive if=none,id=tpm-bs \ > >> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ > >> -device tpm-tis,tpmdev=tpm0 > >> > >>which would result in QEMU printing to stdout: > >> > >>Blobstore tpm-store on drive with ID tpm-bs requires 83kb. > >So you envision tools parsing this freetext then? > >Seems like a step back, we are trying to move to QMP ... > I extended it first for the way I typically interact with QEMU. I do > not use the monitor much. It will work even better if there's a tool to do the job instead of cut and pasting stuff, won't it? And for that, we need monitor commands. > > > >So with above, the raw case which we don't expect to be used often > >is easy to use, but qcow which we expect to be the main case > >is close to imposible, involving manual cut and paste > >of image size. > > > >Formatting images seems a rare enough occasion, > >that I think only using monitor command for that > >would be a better idea than a ton of new command > >line options. On top of that, let's write a > >script that run qemu, queries image size, > >creates a qcow2 file, run qemu again to format, > >all this using QMP. > Creates the qcow2 using 'qemu-img' I suppose. > > Stefan Sure. > >WRT 'format and run in one go' I strongly disagree with it. > >It's just too easy to shoot oneself in the foot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 17:56 ` Michael S. Tsirkin @ 2011-09-14 21:12 ` Stefan Berger 2011-09-15 6:57 ` Michael S. Tsirkin 0 siblings, 1 reply; 27+ messages in thread From: Stefan Berger @ 2011-09-14 21:12 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote: > On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote: >> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote: >>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: >>>> qemu ... \ >>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ >>>> -drive if=none,id=tpm-bs \ >>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ >>>> -device tpm-tis,tpmdev=tpm0 >>>> >>>> which would result in QEMU printing to stdout: >>>> >>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb. >>> So you envision tools parsing this freetext then? >>> Seems like a step back, we are trying to move to QMP ... >> I extended it first for the way I typically interact with QEMU. I do >> not use the monitor much. > It will work even better if there's a tool to do the job instead of cut > and pasting stuff, won't it? And for that, we need monitor commands. > I am not so sure about the design of the QMP commands and how to break things up into individual calls. So does this sequence here and the 'query-blobstore' output look ok? { "execute": "qmp_capabilities" } {"return": {}} { "execute": "query-blobstore" } {"return": [{"size": 84480, "id": "tpm-bs"}]} Corresponding command line parameters are: -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ -drive if=none,id=tpm-bs,file=$TPMSTATE \ Regards, Stefan >>> So with above, the raw case which we don't expect to be used often >>> is easy to use, but qcow which we expect to be the main case >>> is close to imposible, involving manual cut and paste >>> of image size. >>> >>> Formatting images seems a rare enough occasion, >>> that I think only using monitor command for that >>> would be a better idea than a ton of new command >>> line options. On top of that, let's write a >>> script that run qemu, queries image size, >>> creates a qcow2 file, run qemu again to format, >>> all this using QMP. >> Creates the qcow2 using 'qemu-img' I suppose. >> >> Stefan > Sure. > >>> WRT 'format and run in one go' I strongly disagree with it. >>> It's just too easy to shoot oneself in the foot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 21:12 ` Stefan Berger @ 2011-09-15 6:57 ` Michael S. Tsirkin 2011-09-15 10:22 ` Stefan Berger 0 siblings, 1 reply; 27+ messages in thread From: Michael S. Tsirkin @ 2011-09-15 6:57 UTC (permalink / raw) To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote: > On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote: > >On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote: > >>On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote: > >>>On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > >>>>qemu ... \ > >>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ > >>>> -drive if=none,id=tpm-bs \ > >>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ > >>>> -device tpm-tis,tpmdev=tpm0 > >>>> > >>>>which would result in QEMU printing to stdout: > >>>> > >>>>Blobstore tpm-store on drive with ID tpm-bs requires 83kb. > >>>So you envision tools parsing this freetext then? > >>>Seems like a step back, we are trying to move to QMP ... > >>I extended it first for the way I typically interact with QEMU. I do > >>not use the monitor much. > >It will work even better if there's a tool to do the job instead of cut > >and pasting stuff, won't it? And for that, we need monitor commands. > > > I am not so sure about the design of the QMP commands and how to > break things up into individual calls. So does this sequence here > and the 'query-blobstore' output look ok? > > { "execute": "qmp_capabilities" } > {"return": {}} > { "execute": "query-blobstore" } > {"return": [{"size": 84480, "id": "tpm-bs"}]} I'll let some QMP experts to comment. We don't strictly need the id here, right? It is passed to the command. BTW is it [] or {}? It's the total size, right? Should it be {"return": {"size": 84480}} ? > > Corresponding command line parameters are: > > -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ > -drive if=none,id=tpm-bs,file=$TPMSTATE \ > > Regards, > Stefan > > > >>>So with above, the raw case which we don't expect to be used often > >>>is easy to use, but qcow which we expect to be the main case > >>>is close to imposible, involving manual cut and paste > >>>of image size. > >>> > >>>Formatting images seems a rare enough occasion, > >>>that I think only using monitor command for that > >>>would be a better idea than a ton of new command > >>>line options. On top of that, let's write a > >>>script that run qemu, queries image size, > >>>creates a qcow2 file, run qemu again to format, > >>>all this using QMP. > >>Creates the qcow2 using 'qemu-img' I suppose. > >> > >> Stefan > >Sure. > > > >>>WRT 'format and run in one go' I strongly disagree with it. > >>>It's just too easy to shoot oneself in the foot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 6:57 ` Michael S. Tsirkin @ 2011-09-15 10:22 ` Stefan Berger 2011-09-15 10:51 ` Michael S. Tsirkin 0 siblings, 1 reply; 27+ messages in thread From: Stefan Berger @ 2011-09-15 10:22 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote: > On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote: >> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote: >>> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote: >>>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote: >>>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: >>>>>> qemu ... \ >>>>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ >>>>>> -drive if=none,id=tpm-bs \ >>>>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ >>>>>> -device tpm-tis,tpmdev=tpm0 >>>>>> >>>>>> which would result in QEMU printing to stdout: >>>>>> >>>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb. >>>>> So you envision tools parsing this freetext then? >>>>> Seems like a step back, we are trying to move to QMP ... >>>> I extended it first for the way I typically interact with QEMU. I do >>>> not use the monitor much. >>> It will work even better if there's a tool to do the job instead of cut >>> and pasting stuff, won't it? And for that, we need monitor commands. >>> >> I am not so sure about the design of the QMP commands and how to >> break things up into individual calls. So does this sequence here >> and the 'query-blobstore' output look ok? >> >> { "execute": "qmp_capabilities" } >> {"return": {}} >> { "execute": "query-blobstore" } >> {"return": [{"size": 84480, "id": "tpm-bs"}]} > I'll let some QMP experts to comment. > > We don't strictly need the id here, right? > It is passed to the command. > > BTW is it [] or {}? It's the total size, right? Should it be > {"return": {"size": 84480}} > ? The id serves to distinguish one blobstore from the other. We'll have any number of blobstores. Since we get rid of the -blobstore option they will only be identifiable via the ID of the drive they are using. If that's not good, please let me know. The example I had shown yesterday were using the name of the blobstore, rather than the drive ID, to connect the device to the blobstore. before: qemu ... \ -blobstore name=my-blobstore,drive=tpm-bs,showsize \ -drive if=none,id=tpm-bs \ -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ -device tpm-tis,tpmdev=tpm0 now: qemu ...\ -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ -drive if=none,id=tpm-bs,file=$TPMSTATE \ Stefan > >> Corresponding command line parameters are: >> >> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ >> -drive if=none,id=tpm-bs,file=$TPMSTATE \ >> >> Regards, >> Stefan >> >> >>>>> So with above, the raw case which we don't expect to be used often >>>>> is easy to use, but qcow which we expect to be the main case >>>>> is close to imposible, involving manual cut and paste >>>>> of image size. >>>>> >>>>> Formatting images seems a rare enough occasion, >>>>> that I think only using monitor command for that >>>>> would be a better idea than a ton of new command >>>>> line options. On top of that, let's write a >>>>> script that run qemu, queries image size, >>>>> creates a qcow2 file, run qemu again to format, >>>>> all this using QMP. >>>> Creates the qcow2 using 'qemu-img' I suppose. >>>> >>>> Stefan >>> Sure. >>> >>>>> WRT 'format and run in one go' I strongly disagree with it. >>>>> It's just too easy to shoot oneself in the foot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 10:22 ` Stefan Berger @ 2011-09-15 10:51 ` Michael S. Tsirkin 2011-09-15 10:55 ` Stefan Berger 0 siblings, 1 reply; 27+ messages in thread From: Michael S. Tsirkin @ 2011-09-15 10:51 UTC (permalink / raw) To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On Thu, Sep 15, 2011 at 06:22:15AM -0400, Stefan Berger wrote: > On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote: > >On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote: > >>On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote: > >>>On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote: > >>>>On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote: > >>>>>On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > >>>>>>qemu ... \ > >>>>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ > >>>>>> -drive if=none,id=tpm-bs \ > >>>>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ > >>>>>> -device tpm-tis,tpmdev=tpm0 > >>>>>> > >>>>>>which would result in QEMU printing to stdout: > >>>>>> > >>>>>>Blobstore tpm-store on drive with ID tpm-bs requires 83kb. > >>>>>So you envision tools parsing this freetext then? > >>>>>Seems like a step back, we are trying to move to QMP ... > >>>>I extended it first for the way I typically interact with QEMU. I do > >>>>not use the monitor much. > >>>It will work even better if there's a tool to do the job instead of cut > >>>and pasting stuff, won't it? And for that, we need monitor commands. > >>> > >>I am not so sure about the design of the QMP commands and how to > >>break things up into individual calls. So does this sequence here > >>and the 'query-blobstore' output look ok? > >> > >>{ "execute": "qmp_capabilities" } > >>{"return": {}} > >>{ "execute": "query-blobstore" } > >>{"return": [{"size": 84480, "id": "tpm-bs"}]} > >I'll let some QMP experts to comment. > > > >We don't strictly need the id here, right? > >It is passed to the command. > > > >BTW is it [] or {}? It's the total size, right? Should it be > >{"return": {"size": 84480}} > >? > The id serves to distinguish one blobstore from the other. We'll > have any number of blobstores. Since we get rid of the -blobstore > option they will only be identifiable via the ID of the drive they > are using. If that's not good, please let me know. The example I had > shown yesterday were using the name of the blobstore, rather than > the drive ID, to connect the device to the blobstore. > before: > > qemu ... \ > -blobstore name=my-blobstore,drive=tpm-bs,showsize \ > -drive if=none,id=tpm-bs \ > -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ > -device tpm-tis,tpmdev=tpm0 > > now: > > qemu ...\ > -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ > -drive if=none,id=tpm-bs,file=$TPMSTATE \ > > > > Stefan Ah, I get it. I was confused thinking this queries a single store. Instead this returns info about *all* blobstores. query-blobstores would be a better name then. Otherwise I think it's fine. Also, should we rename blobstore to 'nvram' or something else that tells the user what this does? > > > > >>Corresponding command line parameters are: > >> > >> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ > >> -drive if=none,id=tpm-bs,file=$TPMSTATE \ > >> > >>Regards, > >> Stefan > >> > >> > >>>>>So with above, the raw case which we don't expect to be used often > >>>>>is easy to use, but qcow which we expect to be the main case > >>>>>is close to imposible, involving manual cut and paste > >>>>>of image size. > >>>>> > >>>>>Formatting images seems a rare enough occasion, > >>>>>that I think only using monitor command for that > >>>>>would be a better idea than a ton of new command > >>>>>line options. On top of that, let's write a > >>>>>script that run qemu, queries image size, > >>>>>creates a qcow2 file, run qemu again to format, > >>>>>all this using QMP. > >>>>Creates the qcow2 using 'qemu-img' I suppose. > >>>> > >>>> Stefan > >>>Sure. > >>> > >>>>>WRT 'format and run in one go' I strongly disagree with it. > >>>>>It's just too easy to shoot oneself in the foot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 10:51 ` Michael S. Tsirkin @ 2011-09-15 10:55 ` Stefan Berger 0 siblings, 0 replies; 27+ messages in thread From: Stefan Berger @ 2011-09-15 10:55 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster On 09/15/2011 06:51 AM, Michael S. Tsirkin wrote: > On Thu, Sep 15, 2011 at 06:22:15AM -0400, Stefan Berger wrote: >> On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote: >>> On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote: >>>> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote: >>>>> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote: >>>>>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote: >>>>>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: >>>>>>>> qemu ... \ >>>>>>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ >>>>>>>> -drive if=none,id=tpm-bs \ >>>>>>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ >>>>>>>> -device tpm-tis,tpmdev=tpm0 >>>>>>>> >>>>>>>> which would result in QEMU printing to stdout: >>>>>>>> >>>>>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb. >>>>>>> So you envision tools parsing this freetext then? >>>>>>> Seems like a step back, we are trying to move to QMP ... >>>>>> I extended it first for the way I typically interact with QEMU. I do >>>>>> not use the monitor much. >>>>> It will work even better if there's a tool to do the job instead of cut >>>>> and pasting stuff, won't it? And for that, we need monitor commands. >>>>> >>>> I am not so sure about the design of the QMP commands and how to >>>> break things up into individual calls. So does this sequence here >>>> and the 'query-blobstore' output look ok? >>>> >>>> { "execute": "qmp_capabilities" } >>>> {"return": {}} >>>> { "execute": "query-blobstore" } >>>> {"return": [{"size": 84480, "id": "tpm-bs"}]} >>> I'll let some QMP experts to comment. >>> >>> We don't strictly need the id here, right? >>> It is passed to the command. >>> >>> BTW is it [] or {}? It's the total size, right? Should it be >>> {"return": {"size": 84480}} >>> ? >> The id serves to distinguish one blobstore from the other. We'll >> have any number of blobstores. Since we get rid of the -blobstore >> option they will only be identifiable via the ID of the drive they >> are using. If that's not good, please let me know. The example I had >> shown yesterday were using the name of the blobstore, rather than >> the drive ID, to connect the device to the blobstore. >> before: >> >> qemu ... \ >> -blobstore name=my-blobstore,drive=tpm-bs,showsize \ >> -drive if=none,id=tpm-bs \ >> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \ >> -device tpm-tis,tpmdev=tpm0 >> >> now: >> >> qemu ...\ >> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ >> -drive if=none,id=tpm-bs,file=$TPMSTATE \ >> >> >> >> Stefan > Ah, I get it. I was confused thinking this > queries a single store. > Instead this returns info about *all* blobstores. > query-blobstores would be a better name then. > Otherwise I think it's fine. > > > Also, should we rename blobstore to 'nvram' or > something else that tells the user what this does? > > Fine by me. We ought to talk about the on-disk format then ... Stefan >>>> Corresponding command line parameters are: >>>> >>>> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \ >>>> -drive if=none,id=tpm-bs,file=$TPMSTATE \ >>>> >>>> Regards, >>>> Stefan >>>> >>>> >>>>>>> So with above, the raw case which we don't expect to be used often >>>>>>> is easy to use, but qcow which we expect to be the main case >>>>>>> is close to imposible, involving manual cut and paste >>>>>>> of image size. >>>>>>> >>>>>>> Formatting images seems a rare enough occasion, >>>>>>> that I think only using monitor command for that >>>>>>> would be a better idea than a ton of new command >>>>>>> line options. On top of that, let's write a >>>>>>> script that run qemu, queries image size, >>>>>>> creates a qcow2 file, run qemu again to format, >>>>>>> all this using QMP. >>>>>> Creates the qcow2 using 'qemu-img' I suppose. >>>>>> >>>>>> Stefan >>>>> Sure. >>>>> >>>>>>> WRT 'format and run in one go' I strongly disagree with it. >>>>>>> It's just too easy to shoot oneself in the foot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger 2011-09-14 17:40 ` Michael S. Tsirkin @ 2011-09-15 5:47 ` Gleb Natapov 2011-09-15 10:18 ` Stefan Berger 2011-09-15 11:17 ` Stefan Hajnoczi 2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange 3 siblings, 1 reply; 27+ messages in thread From: Gleb Natapov @ 2011-09-15 5:47 UTC (permalink / raw) To: Stefan Berger Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > One property of the blobstore is that it has a certain required > size for accommodating all blobs of device that want to store their > blobs onto. The assumption is that the size of these blobs is know > a-priori to the writer of the device code and all devices can > register their space requirements with the blobstore during device > initialization. Then gathering all the registered blobs' sizes plus > knowing the overhead of the layout of the data on the disk lets QEMU > calculate the total required (minimum) size that the image has to > have to accommodate all blobs in a particular blobstore. > I do not see the point of having one blobstore for all devices. Each should have its own. We will need permanent storage for UEFI firmware too and creating new UEFI config for each machine configuration is not the kind of usability we want to have. -- Gleb. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 5:47 ` Gleb Natapov @ 2011-09-15 10:18 ` Stefan Berger 2011-09-15 10:20 ` Gleb Natapov 0 siblings, 1 reply; 27+ messages in thread From: Stefan Berger @ 2011-09-15 10:18 UTC (permalink / raw) To: Gleb Natapov Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On 09/15/2011 01:47 AM, Gleb Natapov wrote: > On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: >> One property of the blobstore is that it has a certain required >> size for accommodating all blobs of device that want to store their >> blobs onto. The assumption is that the size of these blobs is know >> a-priori to the writer of the device code and all devices can >> register their space requirements with the blobstore during device >> initialization. Then gathering all the registered blobs' sizes plus >> knowing the overhead of the layout of the data on the disk lets QEMU >> calculate the total required (minimum) size that the image has to >> have to accommodate all blobs in a particular blobstore. >> > I do not see the point of having one blobstore for all devices. Each > should have its own. We will need permanent storage for UEFI firmware > too and creating new UEFI config for each machine configuration is not > the kind of usability we want to have. > You will have the possibility of storing all devices' state into one blobstore or each devices' state in its own or any combination in between. Stefan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 10:18 ` Stefan Berger @ 2011-09-15 10:20 ` Gleb Natapov 0 siblings, 0 replies; 27+ messages in thread From: Gleb Natapov @ 2011-09-15 10:20 UTC (permalink / raw) To: Stefan Berger Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On Thu, Sep 15, 2011 at 06:18:35AM -0400, Stefan Berger wrote: > On 09/15/2011 01:47 AM, Gleb Natapov wrote: > >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > >> One property of the blobstore is that it has a certain required > >>size for accommodating all blobs of device that want to store their > >>blobs onto. The assumption is that the size of these blobs is know > >>a-priori to the writer of the device code and all devices can > >>register their space requirements with the blobstore during device > >>initialization. Then gathering all the registered blobs' sizes plus > >>knowing the overhead of the layout of the data on the disk lets QEMU > >>calculate the total required (minimum) size that the image has to > >>have to accommodate all blobs in a particular blobstore. > >> > >I do not see the point of having one blobstore for all devices. Each > >should have its own. We will need permanent storage for UEFI firmware > >too and creating new UEFI config for each machine configuration is not > >the kind of usability we want to have. > > > You will have the possibility of storing all devices' state into one > blobstore or each devices' state in its own or any combination in > between. > Good, thanks. -- Gleb. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger 2011-09-14 17:40 ` Michael S. Tsirkin 2011-09-15 5:47 ` Gleb Natapov @ 2011-09-15 11:17 ` Stefan Hajnoczi 2011-09-15 11:35 ` Daniel P. Berrange ` (2 more replies) 2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange 3 siblings, 3 replies; 27+ messages in thread From: Stefan Hajnoczi @ 2011-09-15 11:17 UTC (permalink / raw) To: Stefan Berger Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger <stefanb@linux.vnet.ibm.com> wrote: > One property of the blobstore is that it has a certain required size for > accommodating all blobs of device that want to store their blobs onto. The > assumption is that the size of these blobs is know a-priori to the writer of > the device code and all devices can register their space requirements with > the blobstore during device initialization. Then gathering all the > registered blobs' sizes plus knowing the overhead of the layout of the data > on the disk lets QEMU calculate the total required (minimum) size that the > image has to have to accommodate all blobs in a particular blobstore. Libraries like tdb or gdbm come to mind. We should be careful not to reinvent cpio/tar or FAT :). What about live migration? If each VM has a LUN assigned on a SAN then these qcow2 files add a new requirement for a shared file system. Perhaps it makes sense to include the blobstore in the VM state data instead? If you take that approach then the blobstore will get snapshotted *into* the existing qcow2 images. Then you don't need a shared file system for migration to work. Can you share your design for the actual QEMU API that the TPM code will use to manipulate the blobstore? Is it designed to work in the event loop while QEMU is running, or is it for rare I/O on startup/shutdown? Stefan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 11:17 ` Stefan Hajnoczi @ 2011-09-15 11:35 ` Daniel P. Berrange 2011-09-15 11:40 ` Kevin Wolf 2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger 2 siblings, 0 replies; 27+ messages in thread From: Daniel P. Berrange @ 2011-09-15 11:35 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Anthony Liguori, Michael S. Tsirkin, Stefan Berger, QEMU Developers, Markus Armbruster On Thu, Sep 15, 2011 at 12:17:54PM +0100, Stefan Hajnoczi wrote: > On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger > <stefanb@linux.vnet.ibm.com> wrote: > > One property of the blobstore is that it has a certain required size for > > accommodating all blobs of device that want to store their blobs onto. The > > assumption is that the size of these blobs is know a-priori to the writer of > > the device code and all devices can register their space requirements with > > the blobstore during device initialization. Then gathering all the > > registered blobs' sizes plus knowing the overhead of the layout of the data > > on the disk lets QEMU calculate the total required (minimum) size that the > > image has to have to accommodate all blobs in a particular blobstore. > > Libraries like tdb or gdbm come to mind. We should be careful not to > reinvent cpio/tar or FAT :). qcow2 is desirable because it lets us provide encryption of the blobstore which is important if you don't trust the admin of the NFS server, or the network between the virt host & NFS server. > What about live migration? If each VM has a LUN assigned on a SAN > then these qcow2 files add a new requirement for a shared file system. NB, I'm not neccessarily recommending this, but it is possible to format a raw block device, to contain a qcow2 image. So it does not actually require a shared filesystem. it would however require an additional LUN, or require that the existing LUN be partitioned into two parts. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 11:17 ` Stefan Hajnoczi 2011-09-15 11:35 ` Daniel P. Berrange @ 2011-09-15 11:40 ` Kevin Wolf 2011-09-15 11:58 ` Stefan Hajnoczi 2011-09-15 14:19 ` Stefan Berger 2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger 2 siblings, 2 replies; 27+ messages in thread From: Kevin Wolf @ 2011-09-15 11:40 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin, QEMU Developers, Stefan Berger Am 15.09.2011 13:17, schrieb Stefan Hajnoczi: > On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger > <stefanb@linux.vnet.ibm.com> wrote: >> One property of the blobstore is that it has a certain required size for >> accommodating all blobs of device that want to store their blobs onto. The >> assumption is that the size of these blobs is know a-priori to the writer of >> the device code and all devices can register their space requirements with >> the blobstore during device initialization. Then gathering all the >> registered blobs' sizes plus knowing the overhead of the layout of the data >> on the disk lets QEMU calculate the total required (minimum) size that the >> image has to have to accommodate all blobs in a particular blobstore. > > Libraries like tdb or gdbm come to mind. We should be careful not to > reinvent cpio/tar or FAT :). We could use vvfat if we need a FAT implementation. *duck* > What about live migration? If each VM has a LUN assigned on a SAN > then these qcow2 files add a new requirement for a shared file system. > > Perhaps it makes sense to include the blobstore in the VM state data > instead? If you take that approach then the blobstore will get > snapshotted *into* the existing qcow2 images. Then you don't need a > shared file system for migration to work. But what happens if you don't do fancy things like snapshots or live migration, but just shut the VM down? Nothing will be saved then, so it must already be on disk. I think using a BlockDriverState for that makes sense, even though it is some additional work for migration. But you already deal with n disks, doing n+1 disks shouldn't be much harder. The one thing that I didn't understand in the original mail is why you think that raw works with your option but qcow2 doesn't. Where's the difference wrt creating an image? Kevin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 11:40 ` Kevin Wolf @ 2011-09-15 11:58 ` Stefan Hajnoczi 2011-09-15 12:31 ` Michael S. Tsirkin 2011-09-16 8:46 ` Kevin Wolf 2011-09-15 14:19 ` Stefan Berger 1 sibling, 2 replies; 27+ messages in thread From: Stefan Hajnoczi @ 2011-09-15 11:58 UTC (permalink / raw) To: Kevin Wolf Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin, QEMU Developers, Stefan Berger On Thu, Sep 15, 2011 at 12:40 PM, Kevin Wolf <kwolf@redhat.com> wrote: > Am 15.09.2011 13:17, schrieb Stefan Hajnoczi: >> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger >> <stefanb@linux.vnet.ibm.com> wrote: >>> One property of the blobstore is that it has a certain required size for >>> accommodating all blobs of device that want to store their blobs onto. The >>> assumption is that the size of these blobs is know a-priori to the writer of >>> the device code and all devices can register their space requirements with >>> the blobstore during device initialization. Then gathering all the >>> registered blobs' sizes plus knowing the overhead of the layout of the data >>> on the disk lets QEMU calculate the total required (minimum) size that the >>> image has to have to accommodate all blobs in a particular blobstore. >> >> Libraries like tdb or gdbm come to mind. We should be careful not to >> reinvent cpio/tar or FAT :). > > We could use vvfat if we need a FAT implementation. *duck* > >> What about live migration? If each VM has a LUN assigned on a SAN >> then these qcow2 files add a new requirement for a shared file system. >> >> Perhaps it makes sense to include the blobstore in the VM state data >> instead? If you take that approach then the blobstore will get >> snapshotted *into* the existing qcow2 images. Then you don't need a >> shared file system for migration to work. > > But what happens if you don't do fancy things like snapshots or live > migration, but just shut the VM down? Nothing will be saved then, so it > must already be on disk. I think using a BlockDriverState for that makes > sense, even though it is some additional work for migration. But you > already deal with n disks, doing n+1 disks shouldn't be much harder. Sure, you need a file because the data needs to be persistent. I'm not saying to keep it in memory only. My concern is that while QEMU block devices provide a convenient wrapper for snapshot and encryption, we need to write the data layout that goes inside that wrapper from scratch. We'll need to invent our own key-value store when there are plenty of existing ones. I explained that the snapshot feature is actually a misfeature, it would be better to integrate with VM state data so that there is no additional migration requirement. As for encryption, just encrypt the values you put into the key-value store. Stefan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 11:58 ` Stefan Hajnoczi @ 2011-09-15 12:31 ` Michael S. Tsirkin 2011-09-16 8:46 ` Kevin Wolf 1 sibling, 0 replies; 27+ messages in thread From: Michael S. Tsirkin @ 2011-09-15 12:31 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers, Stefan Berger > We'll need to invent our > own key-value store when there are plenty of existing ones. Let's not invent our own. So a proposal I sent uses an existing one (BER encoding) for such a store. I actually think we can switch to BER more widely such as for migration format. -- MST ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 11:58 ` Stefan Hajnoczi 2011-09-15 12:31 ` Michael S. Tsirkin @ 2011-09-16 8:46 ` Kevin Wolf 1 sibling, 0 replies; 27+ messages in thread From: Kevin Wolf @ 2011-09-16 8:46 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin, QEMU Developers, Stefan Berger Am 15.09.2011 13:58, schrieb Stefan Hajnoczi: >>> What about live migration? If each VM has a LUN assigned on a SAN >>> then these qcow2 files add a new requirement for a shared file system. >>> >>> Perhaps it makes sense to include the blobstore in the VM state data >>> instead? If you take that approach then the blobstore will get >>> snapshotted *into* the existing qcow2 images. Then you don't need a >>> shared file system for migration to work. >> >> But what happens if you don't do fancy things like snapshots or live >> migration, but just shut the VM down? Nothing will be saved then, so it >> must already be on disk. I think using a BlockDriverState for that makes >> sense, even though it is some additional work for migration. But you >> already deal with n disks, doing n+1 disks shouldn't be much harder. > > Sure, you need a file because the data needs to be persistent. I'm > not saying to keep it in memory only. > > My concern is that while QEMU block devices provide a convenient > wrapper for snapshot and encryption, we need to write the data layout > that goes inside that wrapper from scratch. We'll need to invent our > own key-value store when there are plenty of existing ones. I > explained that the snapshot feature is actually a misfeature, it would > be better to integrate with VM state data so that there is no > additional migration requirement. I'm not so sure if being able to integrate it in the VM state is a feature or a bug. There is no other persistent data that is included in VM state data. Kevin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 11:40 ` Kevin Wolf 2011-09-15 11:58 ` Stefan Hajnoczi @ 2011-09-15 14:19 ` Stefan Berger 2011-09-16 8:12 ` Kevin Wolf 1 sibling, 1 reply; 27+ messages in thread From: Stefan Berger @ 2011-09-15 14:19 UTC (permalink / raw) To: Kevin Wolf Cc: QEMU Developers, Stefan Hajnoczi, Anthony Liguori, Markus Armbruster, Michael S. Tsirkin On 09/15/2011 07:40 AM, Kevin Wolf wrote: > Am 15.09.2011 13:17, schrieb Stefan Hajnoczi: >> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger >> <stefanb@linux.vnet.ibm.com> wrote: >>> One property of the blobstore is that it has a certain required size for >>> accommodating all blobs of device that want to store their blobs onto. The >>> assumption is that the size of these blobs is know a-priori to the writer of >>> the device code and all devices can register their space requirements with >>> the blobstore during device initialization. Then gathering all the >>> registered blobs' sizes plus knowing the overhead of the layout of the data >>> on the disk lets QEMU calculate the total required (minimum) size that the >>> image has to have to accommodate all blobs in a particular blobstore. >> Libraries like tdb or gdbm come to mind. We should be careful not to >> reinvent cpio/tar or FAT :). > We could use vvfat if we need a FAT implementation. *duck* > >> What about live migration? If each VM has a LUN assigned on a SAN >> then these qcow2 files add a new requirement for a shared file system. >> >> Perhaps it makes sense to include the blobstore in the VM state data >> instead? If you take that approach then the blobstore will get >> snapshotted *into* the existing qcow2 images. Then you don't need a >> shared file system for migration to work. > But what happens if you don't do fancy things like snapshots or live > migration, but just shut the VM down? Nothing will be saved then, so it > must already be on disk. I think using a BlockDriverState for that makes > sense, even though it is some additional work for migration. But you > already deal with n disks, doing n+1 disks shouldn't be much harder. > > > The one thing that I didn't understand in the original mail is why you > think that raw works with your option but qcow2 doesn't. Where's the > difference wrt creating an image? I guess you are asking me (also 'Stefan'). When I had QEMU create the disk file I had to pass a file parameter to -drive ...,file=... for it to know which file to create. If the file didn't exist, I got an error. So I create an empty file using 'touch' and could at least start. Though an empty file declared with the format qcow2 in -drive ...,file=...,format=qcow2 throws another error since that's not a valid QCoW2. I wanted to use that parameter 'format' to know what the user wanted to create. So in case of 'raw', I could start out with an empty file, have QEMU calculate the size, call the 'truncate' function on the bdrv it was used with and then had a raw image of the needed size. THe VM could start right away... Stefan > Kevin > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 14:19 ` Stefan Berger @ 2011-09-16 8:12 ` Kevin Wolf 0 siblings, 0 replies; 27+ messages in thread From: Kevin Wolf @ 2011-09-16 8:12 UTC (permalink / raw) To: Stefan Berger Cc: QEMU Developers, Stefan Hajnoczi, Anthony Liguori, Markus Armbruster, Michael S. Tsirkin Am 15.09.2011 16:19, schrieb Stefan Berger: > On 09/15/2011 07:40 AM, Kevin Wolf wrote: >> Am 15.09.2011 13:17, schrieb Stefan Hajnoczi: >>> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger >>> <stefanb@linux.vnet.ibm.com> wrote: >>>> One property of the blobstore is that it has a certain required size for >>>> accommodating all blobs of device that want to store their blobs onto. The >>>> assumption is that the size of these blobs is know a-priori to the writer of >>>> the device code and all devices can register their space requirements with >>>> the blobstore during device initialization. Then gathering all the >>>> registered blobs' sizes plus knowing the overhead of the layout of the data >>>> on the disk lets QEMU calculate the total required (minimum) size that the >>>> image has to have to accommodate all blobs in a particular blobstore. >>> Libraries like tdb or gdbm come to mind. We should be careful not to >>> reinvent cpio/tar or FAT :). >> We could use vvfat if we need a FAT implementation. *duck* >> >>> What about live migration? If each VM has a LUN assigned on a SAN >>> then these qcow2 files add a new requirement for a shared file system. >>> >>> Perhaps it makes sense to include the blobstore in the VM state data >>> instead? If you take that approach then the blobstore will get >>> snapshotted *into* the existing qcow2 images. Then you don't need a >>> shared file system for migration to work. >> But what happens if you don't do fancy things like snapshots or live >> migration, but just shut the VM down? Nothing will be saved then, so it >> must already be on disk. I think using a BlockDriverState for that makes >> sense, even though it is some additional work for migration. But you >> already deal with n disks, doing n+1 disks shouldn't be much harder. >> >> >> The one thing that I didn't understand in the original mail is why you >> think that raw works with your option but qcow2 doesn't. Where's the >> difference wrt creating an image? > I guess you are asking me (also 'Stefan'). > > When I had QEMU create the disk file I had to pass a file parameter to > -drive ...,file=... for it to know which file to create. If the file > didn't exist, I got an error. So I create an empty file using 'touch' > and could at least start. Though an empty file declared with the format > qcow2 in -drive ...,file=...,format=qcow2 throws another error since > that's not a valid QCoW2. I wanted to use that parameter 'format' to > know what the user wanted to create. So in case of 'raw', I could start > out with an empty file, have QEMU calculate the size, call the > 'truncate' function on the bdrv it was used with and then had a raw > image of the needed size. THe VM could start right away... Oh, so you created the image manually instead of using bdrv_img_create?() That explains it... Kevin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore [API of the NVRAM] 2011-09-15 11:17 ` Stefan Hajnoczi 2011-09-15 11:35 ` Daniel P. Berrange 2011-09-15 11:40 ` Kevin Wolf @ 2011-09-15 12:34 ` Stefan Berger 2011-09-16 10:35 ` Stefan Hajnoczi 2 siblings, 1 reply; 27+ messages in thread From: Stefan Berger @ 2011-09-15 12:34 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote: > On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger > <stefanb@linux.vnet.ibm.com> wrote: >> One property of the blobstore is that it has a certain required size for >> accommodating all blobs of device that want to store their blobs onto. The >> assumption is that the size of these blobs is know a-priori to the writer of >> the device code and all devices can register their space requirements with >> the blobstore during device initialization. Then gathering all the >> registered blobs' sizes plus knowing the overhead of the layout of the data >> on the disk lets QEMU calculate the total required (minimum) size that the >> image has to have to accommodate all blobs in a particular blobstore. > Libraries like tdb or gdbm come to mind. We should be careful not to > reinvent cpio/tar or FAT :). Sure. As long as these dbs allow to over-ride open(), close(), read(), write() and seek() with bdrv ops we could recycle any of these. Maybe we can build something smaller than those... > What about live migration? If each VM has a LUN assigned on a SAN > then these qcow2 files add a new requirement for a shared file system. > Well, one can still block-migrate these. The user has to know of course whether shared storage is setup or not and pass the appropriate flags to libvirt for migration. I know it works (modulo some problems when using encrypted QCoW2) since I've been testing with it. > Perhaps it makes sense to include the blobstore in the VM state data > instead? If you take that approach then the blobstore will get > snapshotted *into* the existing qcow2 images. Then you don't need a > shared file system for migration to work. > It could be an option. However, if the user has a raw image for the VM we still need the NVRAM emulation for the TPM for example. So we need to store the persistent data somewhere but raw is not prepared for that. Even if snapshotting doesn't work at all we need to be able to persist devices' data. > Can you share your design for the actual QEMU API that the TPM code > will use to manipulate the blobstore? Is it designed to work in the > event loop while QEMU is running, or is it for rare I/O on > startup/shutdown? > Everything is kind of changing now. But here's what I have right now: tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode); if (!tb->s.tpm_ltpms->nvram) { fprintf(stderr, "Could not find nvram.\n"); return errcode; } nvram_register_blob(tb->s.tpm_ltpms->nvram, NVRAM_ENTRY_PERMSTATE, tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE)); nvram_register_blob(tb->s.tpm_ltpms->nvram, NVRAM_ENTRY_SAVESTATE, tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE)); nvram_register_blob(tb->s.tpm_ltpms->nvram, NVRAM_ENTRY_VOLASTATE, tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE)); rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive); Above first sets up the NVRAM using the drive's id. That is the -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM. Subsequently the blobs to be written into the NVRAM are registered. The nvram_start then reconciles the registered NVRAM blobs with those found on disk and if everything fits together the result is 'rc = 0' and the NVRAM is ready to go. Other devices can than do the same also with the same NVRAM or another NVRAM. (NVRAM now after renaming from blobstore). Reading from NVRAM in case of the TPM is a rare event. It happens in the context of QEMU's main thread: if (nvram_read_data(tpm_ltpms->nvram, NVRAM_ENTRY_PERMSTATE, &tpm_ltpms->permanent_state.buffer, &tpm_ltpms->permanent_state.size, 0, NULL, NULL) || nvram_read_data(tpm_ltpms->nvram, NVRAM_ENTRY_SAVESTATE, &tpm_ltpms->save_state.buffer, &tpm_ltpms->save_state.size, 0, NULL, NULL)) { tpm_ltpms->had_fatal_error = true; return; } Above reads the data of 2 blobs synchronously. This happens during startup. Writes are depending on what the user does with the TPM. He can trigger lots of updates to persistent state if he performs certain operations, i.e., persisting keys inside the TPM. rc = nvram_write_data(tpm_ltpms->nvram, what, tsb->buffer, tsb->size, VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F, NULL, NULL); Above writes a TPM blob into the NVRAM. This is triggered by the TPM thread and notifies the QEMU main thread to write the blob into NVRAM. I do this synchronously at the moment not using the last two parameters for callback after completion but the two flags. The first is to notify the main thread the 2nd flag is to wait for the completion of the request (using a condition internally). Here are the protos: VNVRAM *nvram_setup(const char *drive_id, int *errcode); int nvram_start(VNVRAM *, bool fail_on_encrypted_drive); int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type, unsigned int maxsize); unsigned int nvram_get_totalsize(VNVRAM *bs); unsigned int nvram_get_totalsize_kb(VNVRAM *bs); typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write, unsigned char **data, unsigned int len); int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type, const unsigned char *data, unsigned int len, int flags, NVRAMRWFinishCB cb, void *opaque); As said, things are changing right now, so this is to give an impression... Stefan > Stefan > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore [API of the NVRAM] 2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger @ 2011-09-16 10:35 ` Stefan Hajnoczi 2011-09-16 11:36 ` Stefan Berger 0 siblings, 1 reply; 27+ messages in thread From: Stefan Hajnoczi @ 2011-09-16 10:35 UTC (permalink / raw) To: Stefan Berger Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote: > On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote: > >On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger > ><stefanb@linux.vnet.ibm.com> wrote: > >> One property of the blobstore is that it has a certain required size for > >>accommodating all blobs of device that want to store their blobs onto. The > >>assumption is that the size of these blobs is know a-priori to the writer of > >>the device code and all devices can register their space requirements with > >>the blobstore during device initialization. Then gathering all the > >>registered blobs' sizes plus knowing the overhead of the layout of the data > >>on the disk lets QEMU calculate the total required (minimum) size that the > >>image has to have to accommodate all blobs in a particular blobstore. > >Libraries like tdb or gdbm come to mind. We should be careful not to > >reinvent cpio/tar or FAT :). > Sure. As long as these dbs allow to over-ride open(), close(), > read(), write() and seek() with bdrv ops we could recycle any of > these. Maybe we can build something smaller than those... > >What about live migration? If each VM has a LUN assigned on a SAN > >then these qcow2 files add a new requirement for a shared file system. > > > Well, one can still block-migrate these. The user has to know of > course whether shared storage is setup or not and pass the > appropriate flags to libvirt for migration. I know it works (modulo > some problems when using encrypted QCoW2) since I've been testing > with it. > > >Perhaps it makes sense to include the blobstore in the VM state data > >instead? If you take that approach then the blobstore will get > >snapshotted *into* the existing qcow2 images. Then you don't need a > >shared file system for migration to work. > > > It could be an option. However, if the user has a raw image for the > VM we still need the NVRAM emulation for the TPM for example. So we > need to store the persistent data somewhere but raw is not prepared > for that. Even if snapshotting doesn't work at all we need to be > able to persist devices' data. > > > >Can you share your design for the actual QEMU API that the TPM code > >will use to manipulate the blobstore? Is it designed to work in the > >event loop while QEMU is running, or is it for rare I/O on > >startup/shutdown? > > > Everything is kind of changing now. But here's what I have right now: > > tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode); > if (!tb->s.tpm_ltpms->nvram) { > fprintf(stderr, "Could not find nvram.\n"); > return errcode; > } > > nvram_register_blob(tb->s.tpm_ltpms->nvram, > NVRAM_ENTRY_PERMSTATE, > tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE)); > nvram_register_blob(tb->s.tpm_ltpms->nvram, > NVRAM_ENTRY_SAVESTATE, > tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE)); > nvram_register_blob(tb->s.tpm_ltpms->nvram, > NVRAM_ENTRY_VOLASTATE, > tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE)); > > rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive); > > Above first sets up the NVRAM using the drive's id. That is the > -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM. > Subsequently the blobs to be written into the NVRAM are registered. > The nvram_start then reconciles the registered NVRAM blobs with > those found on disk and if everything fits together the result is > 'rc = 0' and the NVRAM is ready to go. Other devices can than do the > same also with the same NVRAM or another NVRAM. (NVRAM now after > renaming from blobstore). > > Reading from NVRAM in case of the TPM is a rare event. It happens in > the context of QEMU's main thread: > > if (nvram_read_data(tpm_ltpms->nvram, > NVRAM_ENTRY_PERMSTATE, > &tpm_ltpms->permanent_state.buffer, > &tpm_ltpms->permanent_state.size, > 0, NULL, NULL) || > nvram_read_data(tpm_ltpms->nvram, > NVRAM_ENTRY_SAVESTATE, > &tpm_ltpms->save_state.buffer, > &tpm_ltpms->save_state.size, > 0, NULL, NULL)) > { > tpm_ltpms->had_fatal_error = true; > return; > } > > Above reads the data of 2 blobs synchronously. This happens during startup. > > > Writes are depending on what the user does with the TPM. He can > trigger lots of updates to persistent state if he performs certain > operations, i.e., persisting keys inside the TPM. > > rc = nvram_write_data(tpm_ltpms->nvram, > what, tsb->buffer, tsb->size, > VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F, > NULL, NULL); > > Above writes a TPM blob into the NVRAM. This is triggered by the TPM > thread and notifies the QEMU main thread to write the blob into > NVRAM. I do this synchronously at the moment not using the last two > parameters for callback after completion but the two flags. The > first is to notify the main thread the 2nd flag is to wait for the > completion of the request (using a condition internally). > > Here are the protos: > > VNVRAM *nvram_setup(const char *drive_id, int *errcode); > > int nvram_start(VNVRAM *, bool fail_on_encrypted_drive); > > int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type, > unsigned int maxsize); > > unsigned int nvram_get_totalsize(VNVRAM *bs); > unsigned int nvram_get_totalsize_kb(VNVRAM *bs); > > typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write, > unsigned char **data, unsigned int len); > > int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type, > const unsigned char *data, unsigned int len, > int flags, NVRAMRWFinishCB cb, void *opaque); > > > As said, things are changing right now, so this is to give an impression... Thanks, these details are interesting. I interpreted the blobstore as a key-value store but these example show it as a stream. No IDs or offsets are given, the reads are just performed in order and move through the NVRAM. If it stays this simple then bdrv_*() is indeed a natural way to do this - although my migration point remains since this feature adds a new requirement for shared storage when it would be pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM is relatively small?). Stefan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore [API of the NVRAM] 2011-09-16 10:35 ` Stefan Hajnoczi @ 2011-09-16 11:36 ` Stefan Berger 0 siblings, 0 replies; 27+ messages in thread From: Stefan Berger @ 2011-09-16 11:36 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Anthony Liguori, Michael S. Tsirkin, Markus Armbruster, QEMU Developers On 09/16/2011 06:35 AM, Stefan Hajnoczi wrote: > On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote: >> On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote: >> [...] >> Everything is kind of changing now. But here's what I have right now: >> >> tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id,&errcode); >> if (!tb->s.tpm_ltpms->nvram) { >> fprintf(stderr, "Could not find nvram.\n"); >> return errcode; >> } >> >> nvram_register_blob(tb->s.tpm_ltpms->nvram, >> NVRAM_ENTRY_PERMSTATE, >> tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE)); >> nvram_register_blob(tb->s.tpm_ltpms->nvram, >> NVRAM_ENTRY_SAVESTATE, >> tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE)); >> nvram_register_blob(tb->s.tpm_ltpms->nvram, >> NVRAM_ENTRY_VOLASTATE, >> tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE)); >> >> rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive); >> >> Above first sets up the NVRAM using the drive's id. That is the >> -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM. >> Subsequently the blobs to be written into the NVRAM are registered. >> The nvram_start then reconciles the registered NVRAM blobs with >> those found on disk and if everything fits together the result is >> 'rc = 0' and the NVRAM is ready to go. Other devices can than do the >> same also with the same NVRAM or another NVRAM. (NVRAM now after >> renaming from blobstore). >> >> Reading from NVRAM in case of the TPM is a rare event. It happens in >> the context of QEMU's main thread: >> >> if (nvram_read_data(tpm_ltpms->nvram, >> NVRAM_ENTRY_PERMSTATE, >> &tpm_ltpms->permanent_state.buffer, >> &tpm_ltpms->permanent_state.size, >> 0, NULL, NULL) || >> nvram_read_data(tpm_ltpms->nvram, >> NVRAM_ENTRY_SAVESTATE, >> &tpm_ltpms->save_state.buffer, >> &tpm_ltpms->save_state.size, >> 0, NULL, NULL)) >> { >> tpm_ltpms->had_fatal_error = true; >> return; >> } >> >> Above reads the data of 2 blobs synchronously. This happens during startup. >> >> >> Writes are depending on what the user does with the TPM. He can >> trigger lots of updates to persistent state if he performs certain >> operations, i.e., persisting keys inside the TPM. >> >> rc = nvram_write_data(tpm_ltpms->nvram, >> what, tsb->buffer, tsb->size, >> VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F, >> NULL, NULL); >> >> Above writes a TPM blob into the NVRAM. This is triggered by the TPM >> thread and notifies the QEMU main thread to write the blob into >> NVRAM. I do this synchronously at the moment not using the last two >> parameters for callback after completion but the two flags. The >> first is to notify the main thread the 2nd flag is to wait for the >> completion of the request (using a condition internally). >> >> Here are the protos: >> >> VNVRAM *nvram_setup(const char *drive_id, int *errcode); >> >> int nvram_start(VNVRAM *, bool fail_on_encrypted_drive); >> >> int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type, >> unsigned int maxsize); >> >> unsigned int nvram_get_totalsize(VNVRAM *bs); >> unsigned int nvram_get_totalsize_kb(VNVRAM *bs); >> >> typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write, >> unsigned char **data, unsigned int len); >> >> int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type, >> const unsigned char *data, unsigned int len, >> int flags, NVRAMRWFinishCB cb, void *opaque); >> >> >> As said, things are changing right now, so this is to give an impression... > Thanks, these details are interesting. I interpreted the blobstore as a > key-value store but these example show it as a stream. No IDs or IMO the only stuff we should store there are blobs retrievable via keys (names) -- no metadata. > offsets are given, the reads are just performed in order and move > through the NVRAM. If it stays this simple then bdrv_*() is indeed a There are no offsets because there's some intelligence in the blobstore/NVRAM that lays out the data onto the disk. That's why there is a directory. This in turn allows the sharing of the NVRAM by possibly multiple drivers where the driver-writer doesn't need to lay out the blobs him-/herself. > natural way to do this - although my migration point remains since this > feature adds a new requirement for shared storage when it would be > pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM > is relatively small?). It's just another image. You have to treat it like the VM's 'main' image. Block migration works fine on it just that it may be difficult for a user to handle migration flags if one image is on shared storage and the other isn't. Stefan > Stefan > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger ` (2 preceding siblings ...) 2011-09-15 11:17 ` Stefan Hajnoczi @ 2011-09-15 13:05 ` Daniel P. Berrange 2011-09-15 13:13 ` Stefan Berger 3 siblings, 1 reply; 27+ messages in thread From: Daniel P. Berrange @ 2011-09-15 13:05 UTC (permalink / raw) To: Stefan Berger Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > Hello! > > Over the last few days primarily Michael Tsirkin and I have > discussed the design of the 'blobstore' via IRC (#virtualization). > The intention of the blobstore is to provide storage to persist > blobs that devices create. Along with these blobs possibly some > metadata should be storable in this blobstore. > > An initial client for the blobstore would be the TPM emulation. > The TPM's persistent state needs to be stored once it changes so it > can be restored at any point in time later on, i.e., after a cold > reboot of the VM. In effect the blobstore simulates the NVRAM of a > device where it would typically store such persistent data onto. While I can see the appeal of a general 'blobstore' for NVRAM tunables related to device, wrt the TPM emulation, should we be considering use of something like the PKCS#11 standard for storing/retrieving crypto data for the TPM ? https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11 This is a industry standard for interfacing to cryptographic storage mechanisms, widely supported by all SSL libraries & more or less all programming languages. IIUC it lets the application avoid hardcoding a specification storage backend impl, so it can be made to work with anything from local files, to smartcards, to HSMs, to remote network services. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange @ 2011-09-15 13:13 ` Stefan Berger 2011-09-15 13:27 ` Daniel P. Berrange 0 siblings, 1 reply; 27+ messages in thread From: Stefan Berger @ 2011-09-15 13:13 UTC (permalink / raw) To: Daniel P. Berrange Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On 09/15/2011 09:05 AM, Daniel P. Berrange wrote: > On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: >> Hello! >> >> Over the last few days primarily Michael Tsirkin and I have >> discussed the design of the 'blobstore' via IRC (#virtualization). >> The intention of the blobstore is to provide storage to persist >> blobs that devices create. Along with these blobs possibly some >> metadata should be storable in this blobstore. >> >> An initial client for the blobstore would be the TPM emulation. >> The TPM's persistent state needs to be stored once it changes so it >> can be restored at any point in time later on, i.e., after a cold >> reboot of the VM. In effect the blobstore simulates the NVRAM of a >> device where it would typically store such persistent data onto. > While I can see the appeal of a general 'blobstore' for NVRAM > tunables related to device, wrt the TPM emulation, should we > be considering use of something like the PKCS#11 standard for > storing/retrieving crypto data for the TPM ? > > https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11 We should regard the blobs the TPM produces as crypto data as a whole, allowing for encryption of each one. QCoW2 encryption is good for that since it uses per-sector encryption but we loose all that in case of RAW image being use for NVRAM storage. FYI: The TPM writes its data in a custom format and produces a blob that should be stored without knowing the organization of its content. This blob doesn't only contain keys but many other data in the 3 different types of blobs that the TPM can produce under certain cirumstances : values of counters, values of the PCRs (20 byte long registers), keys, owner and SRK (storage root key) password, TPM's NVRAM areas, flags etc. It produces the following blobs: - permanent data blob: Whenever it writes data to peristent storage - save state blob: Upon a S3 Suspend (kicked by the TPM TIS driver sending a command to the TPM) - volatile data: Upon migration / suspend that contains the volatile data that after a reboot of the VM typically are initialized by the TPM but of course need to be restored on the migration target / resume. Stefan > This is a industry standard for interfacing to cryptographic > storage mechanisms, widely supported by all SSL libraries& more > or less all programming languages. IIUC it lets the application > avoid hardcoding a specification storage backend impl, so it can > be made to work with anything from local files, to smartcards, > to HSMs, to remote network services. > > Regards, > Daniel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 13:13 ` Stefan Berger @ 2011-09-15 13:27 ` Daniel P. Berrange 2011-09-15 14:00 ` Stefan Berger 0 siblings, 1 reply; 27+ messages in thread From: Daniel P. Berrange @ 2011-09-15 13:27 UTC (permalink / raw) To: Stefan Berger Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin On Thu, Sep 15, 2011 at 09:13:25AM -0400, Stefan Berger wrote: > On 09/15/2011 09:05 AM, Daniel P. Berrange wrote: > >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: > >>Hello! > >> > >> Over the last few days primarily Michael Tsirkin and I have > >>discussed the design of the 'blobstore' via IRC (#virtualization). > >>The intention of the blobstore is to provide storage to persist > >>blobs that devices create. Along with these blobs possibly some > >>metadata should be storable in this blobstore. > >> > >> An initial client for the blobstore would be the TPM emulation. > >>The TPM's persistent state needs to be stored once it changes so it > >>can be restored at any point in time later on, i.e., after a cold > >>reboot of the VM. In effect the blobstore simulates the NVRAM of a > >>device where it would typically store such persistent data onto. > >While I can see the appeal of a general 'blobstore' for NVRAM > >tunables related to device, wrt the TPM emulation, should we > >be considering use of something like the PKCS#11 standard for > >storing/retrieving crypto data for the TPM ? > > > > https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11 > We should regard the blobs the TPM produces as crypto data as a > whole, allowing for encryption of each one. QCoW2 encryption is good > for that since it uses per-sector encryption but we loose all that > in case of RAW image being use for NVRAM storage. > > FYI: The TPM writes its data in a custom format and produces a blob > that should be stored without knowing the organization of its > content. This blob doesn't only contain keys but many other data in > the 3 different types of blobs that the TPM can produce under > certain cirumstances : values of counters, values of the PCRs (20 > byte long registers), keys, owner and SRK (storage root key) > password, TPM's NVRAM areas, flags etc. Is this description of storage inherant in the impl of TPMs in general, or just the way you've chosen to implement the QEMU vTPM ? IIUC, you are describing a layering like +----------------+ | Guest App | +----------------+ ^ ^ ^ ^ ^ ^ ^ | | | | | | | Data slots V V V V V V V +----------------+ | QEMU vTPM Dev | +----------------+ ^ | Data blob V +----------------+ | Storage device | (File/block dev) +----------------+ I was thinking about whether we could delegate the encoding of data slots -> blobs, to outside the vTPM device emulation by using PKCS ? +----------------+ | Guest App | +----------------+ ^ ^ ^ ^ ^ ^ ^ | | | | | | | Data slots V V V V V V V +----------------+ | QEMU vTPM Dev | +----------------+ ^ ^ ^ ^ ^ ^ ^ | | | | | | | Data slots V V V V V V V +----------------+ | PKCS#11 Driver | +----------------+ ^ | Data blob V +----------------+ | Storage device | (File/blockdev/HSM/Smartcard) +----------------+ Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore 2011-09-15 13:27 ` Daniel P. Berrange @ 2011-09-15 14:00 ` Stefan Berger 0 siblings, 0 replies; 27+ messages in thread From: Stefan Berger @ 2011-09-15 14:00 UTC (permalink / raw) To: Daniel P. Berrange Cc: Anthony Liguori, Michael S. Tsirkin, Markus Armbruster, QEMU Developers On 09/15/2011 09:27 AM, Daniel P. Berrange wrote: > On Thu, Sep 15, 2011 at 09:13:25AM -0400, Stefan Berger wrote: >> On 09/15/2011 09:05 AM, Daniel P. Berrange wrote: >>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote: >>>> Hello! >>>> >>>> Over the last few days primarily Michael Tsirkin and I have >>>> discussed the design of the 'blobstore' via IRC (#virtualization). >>>> The intention of the blobstore is to provide storage to persist >>>> blobs that devices create. Along with these blobs possibly some >>>> metadata should be storable in this blobstore. >>>> >>>> An initial client for the blobstore would be the TPM emulation. >>>> The TPM's persistent state needs to be stored once it changes so it >>>> can be restored at any point in time later on, i.e., after a cold >>>> reboot of the VM. In effect the blobstore simulates the NVRAM of a >>>> device where it would typically store such persistent data onto. >>> While I can see the appeal of a general 'blobstore' for NVRAM >>> tunables related to device, wrt the TPM emulation, should we >>> be considering use of something like the PKCS#11 standard for >>> storing/retrieving crypto data for the TPM ? >>> >>> https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11 >> We should regard the blobs the TPM produces as crypto data as a >> whole, allowing for encryption of each one. QCoW2 encryption is good >> for that since it uses per-sector encryption but we loose all that >> in case of RAW image being use for NVRAM storage. >> >> FYI: The TPM writes its data in a custom format and produces a blob >> that should be stored without knowing the organization of its >> content. This blob doesn't only contain keys but many other data in >> the 3 different types of blobs that the TPM can produce under >> certain cirumstances : values of counters, values of the PCRs (20 >> byte long registers), keys, owner and SRK (storage root key) >> password, TPM's NVRAM areas, flags etc. > Is this description of storage inherant in the impl of TPMs in general, > or just the way you've chosen to implement the QEMU vTPM ? There's no absolute definition of how a TPM writes all its data into NVRAM. Some structures are defined and we used them where we could, others were defined by 'us' -- so they are manufacturer-specific. Suspend operations for example were not envisioned for the hardware TPM but we needed to write more data out than what the standard defines so we could resume properly. What is defined is persistent storage and S3 suspend (save state) as described in the previous mail. > IIUC, you are describing a layering like > > +----------------+ > | Guest App | > +----------------+ > ^ ^ ^ ^ ^ ^ ^ > | | | | | | | Data slots > V V V V V V V > +----------------+ > | QEMU vTPM Dev | > +----------------+ > ^ > | Data blob > V > +----------------+ > | Storage device | (File/block dev) > +----------------+ > > I was thinking about whether we could delegate the encoding > of data slots -> blobs, to outside the vTPM device emulation > by using PKCS ? > > +----------------+ > | Guest App | > +----------------+ > ^ ^ ^ ^ ^ ^ ^ > | | | | | | | Data slots > V V V V V V V > +----------------+ > | QEMU vTPM Dev | > +----------------+ > ^ ^ ^ ^ ^ ^ ^ > | | | | | | | Data slots > V V V V V V V > +----------------+ > | PKCS#11 Driver | > +----------------+ > ^ > | Data blob > V > +----------------+ > | Storage device | (File/blockdev/HSM/Smartcard) > +----------------+ > > v8 (and before) of my TPM patch postings had something like this, but nicely layered though, and I was doing it on a per-blob basis, so no 'slots'. The vTPM dev was passing its raw blobs down to the 'NVRAM' layer and that NVRAM either had a key for encryption or not. In case it didn't have a key it just wrote the data at a certain offset, noting the actual blob size in a directory at in the 1st sector. In case the NVRAM layer had a key it encrypted the blob (which enlarged to the next 16 byte boundary due to AES encryption) and wrote that AES-CBC encrypted blob at a certain offset, noting the actual unencrypted blob size in the directory. The header of the directory contained a flag that all data were encrypted -- so this flag was a property of each blob on the disk. Now with Michael's ASN1 encoding and the additional metadata, I think the encryption should come after encoding the blob and metadata into ASN1 . Again a directory would need a flag for whether the blobs or a single blob is encrypted. I guess this again goes back to command line parameters as well. Where do we pass the key, Is it a per-device property (-tpmdev ...,key=...,) where the device registers a key to use on its blob or a per-blobstore/nvram (-nvram drive=...,key=...) property? Stefan > Regards, > Daniel ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2011-09-16 11:36 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger 2011-09-14 17:40 ` Michael S. Tsirkin 2011-09-14 17:49 ` Stefan Berger 2011-09-14 17:56 ` Michael S. Tsirkin 2011-09-14 21:12 ` Stefan Berger 2011-09-15 6:57 ` Michael S. Tsirkin 2011-09-15 10:22 ` Stefan Berger 2011-09-15 10:51 ` Michael S. Tsirkin 2011-09-15 10:55 ` Stefan Berger 2011-09-15 5:47 ` Gleb Natapov 2011-09-15 10:18 ` Stefan Berger 2011-09-15 10:20 ` Gleb Natapov 2011-09-15 11:17 ` Stefan Hajnoczi 2011-09-15 11:35 ` Daniel P. Berrange 2011-09-15 11:40 ` Kevin Wolf 2011-09-15 11:58 ` Stefan Hajnoczi 2011-09-15 12:31 ` Michael S. Tsirkin 2011-09-16 8:46 ` Kevin Wolf 2011-09-15 14:19 ` Stefan Berger 2011-09-16 8:12 ` Kevin Wolf 2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger 2011-09-16 10:35 ` Stefan Hajnoczi 2011-09-16 11:36 ` Stefan Berger 2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange 2011-09-15 13:13 ` Stefan Berger 2011-09-15 13:27 ` Daniel P. Berrange 2011-09-15 14:00 ` Stefan Berger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).