[Qemu-devel] Design of the blobstore

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Design of the blobstore
@ 2011-09-14 17:05 Stefan Berger
  2011-09-14 17:40 ` Michael S. Tsirkin
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-14 17:05 UTC (permalink / raw)
  To: QEMU Developers, Michael S. Tsirkin, Anthony Liguori,
	Markus Armbruster

Hello!

   Over the last few days primarily Michael Tsirkin and I have discussed 
the design of the 'blobstore' via IRC (#virtualization). The intention 
of the blobstore is to provide storage to persist blobs that devices 
create. Along with these blobs possibly some metadata should be storable 
in this blobstore.

   An initial client for the blobstore would be the TPM emulation. The 
TPM's persistent state needs to be stored once it changes so it can be 
restored at any point in time later on, i.e., after a cold reboot of the 
VM. In effect the blobstore simulates the NVRAM of a device where it 
would typically store such persistent data onto.

   One design point of the blobstore is that it has to work with QEMU's 
block layer, i.e., it has to use images for storing the blobs onto and 
with that use the bdrv_* functions to write its data into these image. 
The reason for this is primarily QEMU's snapshot feature where snapshots 
of the VM can be taken assuming QCoW2 image format is being used. If one 
chooses to provide a QCoW2 image as the storage medium for the blobstore 
it would enable the snapshotting feature of QEMU automatically. I 
believe there is no other image format that would work and simply using 
plain files would in effect destroy the snapshot feature. Using a raw 
image file for example would prevent snapshotting.

   One property of the blobstore is that it has a certain required size 
for accommodating all blobs of device that want to store their blobs 
onto. The assumption is that the size of these blobs is know a-priori to 
the writer of the device code and all devices can register their space 
requirements with the blobstore during device initialization. Then 
gathering all the registered blobs' sizes plus knowing the overhead of 
the layout of the data on the disk lets QEMU calculate the total 
required (minimum) size that the image has to have to accommodate all 
blobs in a particular blobstore.

   So what I would like to discuss in this message for now is the design 
of the command line options for the blobstore in order to determine how 
to access a blobstore.

For experimenting I introduced a 'blobstore' command line option for 
QEMU with the following possible options:
- name=: the name of the blobstore
- drive=: the id of the drive used as image file, i.e., -drive 
id=my-blobs,format=raw,file=/tmp/blobstore.raw,if=none
- showsize: Show the size requirement for the image file
- create: the image file is created  (if found to be of size zero) and 
then formatted
- format: assuming the image file is there, format it before starting 
the VM; the device will always start with a clean state
- formatifbad: format the image file if an attempt to read its content 
fails upon first read

Monitor commands with similar functionality would follow later.

The intention behind the parameter 'create' is to make it as easy for 
the user as possible to start QEMU with a usable image file letting QEMU 
do the equivalent of 'qemu-img create -f <format> <image file> <size>'. 
This works fine and lets one start QEMU in one step as long as:
     - the user passed an empty image file via -drive 
...,file=/tmp/blobstore.raw
     - the format to use is raw

For the QCoW2 format, for example, this doesn't works since the QCoW2 
file passed via -drive ...,file=/tmp/blobstore.qcow2 cannot be of zero 
size. In this case the user would have to use the 'showsize' option and 
learn what size the drive has to be, then invoke 'qemu-img' with the 
size parameter and then subsequently start QEMU with the image. To find 
the size the user would have to use a command line like

qemu ... \
     -blobstore name=my-blobstore,drive=tpm-bs,showsize \
     -drive if=none,id=tpm-bs \
     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
     -device tpm-tis,tpmdev=tpm0

which would result in QEMU printing to stdout:

Blobstore tpm-store on drive with ID tpm-bs requires 83kb.

Once a QCoW2 image file has been created using

qemu-img create -f qcow2 /tmp/blobstore.qcow2 83k

QEMU can then subsequently be used with the following command line options:

qemu ... \
     -drive if=none,id=tpm-bs,file=/tmp/blobstore.qcow2 \
     -blobstore name=my-blobstore,drive=tpm-bs,formatifbad \
     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
     -device tpm-tis,tpmdev=tpm0

This would format the blank QCoW2 image only the very first time using 
the 'formatifbad' parameter.

Using a 'raw' image for the blobstore one could do the following to 
start QEMU in the first step:

touch /tmp/blobstore.raw

qemu ... \
     -blobstore name=my-blobstore,drive=tpm-bs,create \
     -drive if=none,id=tpm-bs,format=raw,file=/tmp/blobstore.raw \
     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
     -device tpm-tis,tpmdev=tpm0

This would make QEMU create the appropriately sized image and start the 
VM in one step.

Going a layer up into libvirt: To support SELinux labeling (svirt) 
libvirt could use the above steps as shown for QCoW2 with labeling of 
the file before starting QEMU.

A note at the end: If we were to drop the -drive option and support the 
file option for the image file in -blobstore, we could have more control 
over the creation of the image file in any wanted format, but that would 
mean replicating some of the -drive options in the -blobstore option. 
QCoW2 files could also be created if the passed file wasn't even 
existing, yet.

  Looking forward to your comments.

Regards,
     Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
@ 2011-09-14 17:40 ` Michael S. Tsirkin
  2011-09-14 17:49   ` Stefan Berger
  2011-09-15  5:47 ` Gleb Natapov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-14 17:40 UTC (permalink / raw)
  To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> qemu ... \
>     -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>     -drive if=none,id=tpm-bs \
>     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>     -device tpm-tis,tpmdev=tpm0
> 
> which would result in QEMU printing to stdout:
> 
> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.

So you envision tools parsing this freetext then?
Seems like a step back, we are trying to move to QMP ...

> Once a QCoW2 image file has been created using
> 
> qemu-img create -f qcow2 /tmp/blobstore.qcow2 83k
> 
> QEMU can then subsequently be used with the following command line options:
> 
> qemu ... \
>     -drive if=none,id=tpm-bs,file=/tmp/blobstore.qcow2 \
>     -blobstore name=my-blobstore,drive=tpm-bs,formatifbad \
>     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>     -device tpm-tis,tpmdev=tpm0
> 
> This would format the blank QCoW2 image only the very first time
> using the 'formatifbad' parameter.

This formatifbad option is a bad mistake (pun intended).
It mixes the formatting of image (one time operation)
and running of VM (repeated operation). We also saw how this does
not play well e.g. with migration.

It loses information! Would you like your OS
to format hard disk if it can not boot? Right ...

Instead, just failing if image is not well formatted
will be much easier to debug.

> Using a 'raw' image for the blobstore one could do the following to
> start QEMU in the first step:
> 
> touch /tmp/blobstore.raw
> 
> qemu ... \
>     -blobstore name=my-blobstore,drive=tpm-bs,create \
>     -drive if=none,id=tpm-bs,format=raw,file=/tmp/blobstore.raw \
>     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>     -device tpm-tis,tpmdev=tpm0
> 
> This would make QEMU create the appropriately sized image and start
> the VM in one step.
> 
> 
> Going a layer up into libvirt: To support SELinux labeling (svirt)
> libvirt could use the above steps as shown for QCoW2 with labeling
> of the file before starting QEMU.
> 
> A note at the end: If we were to drop the -drive option and support
> the file option for the image file in -blobstore, we could have more
> control over the creation of the image file in any wanted format,
> but that would mean replicating some of the -drive options in the
> -blobstore option. QCoW2 files could also be created if the passed
> file wasn't even existing, yet.
> 
>  Looking forward to your comments.
> 
> Regards,
>     Stefan

So with above, the raw case which we don't expect to be used often
is easy to use, but qcow which we expect to be the main case
is close to imposible, involving manual cut and paste
of image size.

Formatting images seems a rare enough occasion,
that I think only using monitor command for that
would be a better idea than a ton of new command
line options. On top of that, let's write a
script that run qemu, queries image size,
creates a qcow2 file, run qemu again to format,
all this using QMP.

WRT 'format and run in one go' I strongly disagree with it.
It's just too easy to shoot oneself in the foot.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 17:40 ` Michael S. Tsirkin
@ 2011-09-14 17:49   ` Stefan Berger
  2011-09-14 17:56     ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-14 17:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>> qemu ... \
>>      -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>      -drive if=none,id=tpm-bs \
>>      -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>      -device tpm-tis,tpmdev=tpm0
>>
>> which would result in QEMU printing to stdout:
>>
>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> So you envision tools parsing this freetext then?
> Seems like a step back, we are trying to move to QMP ...
I extended it first for the way I typically interact with QEMU. I do not 
use the monitor much.
>
> So with above, the raw case which we don't expect to be used often
> is easy to use, but qcow which we expect to be the main case
> is close to imposible, involving manual cut and paste
> of image size.
>
> Formatting images seems a rare enough occasion,
> that I think only using monitor command for that
> would be a better idea than a ton of new command
> line options. On top of that, let's write a
> script that run qemu, queries image size,
> creates a qcow2 file, run qemu again to format,
> all this using QMP.
Creates the qcow2 using 'qemu-img' I suppose.

    Stefan
> WRT 'format and run in one go' I strongly disagree with it.
> It's just too easy to shoot oneself in the foot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 17:49   ` Stefan Berger
@ 2011-09-14 17:56     ` Michael S. Tsirkin
  2011-09-14 21:12       ` Stefan Berger
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-14 17:56 UTC (permalink / raw)
  To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>qemu ... \
> >>     -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> >>     -drive if=none,id=tpm-bs \
> >>     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> >>     -device tpm-tis,tpmdev=tpm0
> >>
> >>which would result in QEMU printing to stdout:
> >>
> >>Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> >So you envision tools parsing this freetext then?
> >Seems like a step back, we are trying to move to QMP ...
> I extended it first for the way I typically interact with QEMU. I do
> not use the monitor much.

It will work even better if there's a tool to do the job instead of cut
and pasting stuff, won't it? And for that, we need monitor commands.

> >
> >So with above, the raw case which we don't expect to be used often
> >is easy to use, but qcow which we expect to be the main case
> >is close to imposible, involving manual cut and paste
> >of image size.
> >
> >Formatting images seems a rare enough occasion,
> >that I think only using monitor command for that
> >would be a better idea than a ton of new command
> >line options. On top of that, let's write a
> >script that run qemu, queries image size,
> >creates a qcow2 file, run qemu again to format,
> >all this using QMP.
> Creates the qcow2 using 'qemu-img' I suppose.
> 
>    Stefan

Sure.

> >WRT 'format and run in one go' I strongly disagree with it.
> >It's just too easy to shoot oneself in the foot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 17:56     ` Michael S. Tsirkin
@ 2011-09-14 21:12       ` Stefan Berger
  2011-09-15  6:57         ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-14 21:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>> qemu ... \
>>>>      -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>>>      -drive if=none,id=tpm-bs \
>>>>      -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>>>      -device tpm-tis,tpmdev=tpm0
>>>>
>>>> which would result in QEMU printing to stdout:
>>>>
>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
>>> So you envision tools parsing this freetext then?
>>> Seems like a step back, we are trying to move to QMP ...
>> I extended it first for the way I typically interact with QEMU. I do
>> not use the monitor much.
> It will work even better if there's a tool to do the job instead of cut
> and pasting stuff, won't it? And for that, we need monitor commands.
>
I am not so sure about the design of the QMP commands and how to break 
things up into individual calls. So does this sequence here and the 
'query-blobstore' output look ok?

{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "query-blobstore" }
{"return": [{"size": 84480, "id": "tpm-bs"}]}


Corresponding command line parameters are:

     -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
     -drive if=none,id=tpm-bs,file=$TPMSTATE \

Regards,
     Stefan


>>> So with above, the raw case which we don't expect to be used often
>>> is easy to use, but qcow which we expect to be the main case
>>> is close to imposible, involving manual cut and paste
>>> of image size.
>>>
>>> Formatting images seems a rare enough occasion,
>>> that I think only using monitor command for that
>>> would be a better idea than a ton of new command
>>> line options. On top of that, let's write a
>>> script that run qemu, queries image size,
>>> creates a qcow2 file, run qemu again to format,
>>> all this using QMP.
>> Creates the qcow2 using 'qemu-img' I suppose.
>>
>>     Stefan
> Sure.
>
>>> WRT 'format and run in one go' I strongly disagree with it.
>>> It's just too easy to shoot oneself in the foot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 21:12       ` Stefan Berger
@ 2011-09-15  6:57         ` Michael S. Tsirkin
  2011-09-15 10:22           ` Stefan Berger
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-15  6:57 UTC (permalink / raw)
  To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
> >On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
> >>On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>>>qemu ... \
> >>>>     -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> >>>>     -drive if=none,id=tpm-bs \
> >>>>     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> >>>>     -device tpm-tis,tpmdev=tpm0
> >>>>
> >>>>which would result in QEMU printing to stdout:
> >>>>
> >>>>Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> >>>So you envision tools parsing this freetext then?
> >>>Seems like a step back, we are trying to move to QMP ...
> >>I extended it first for the way I typically interact with QEMU. I do
> >>not use the monitor much.
> >It will work even better if there's a tool to do the job instead of cut
> >and pasting stuff, won't it? And for that, we need monitor commands.
> >
> I am not so sure about the design of the QMP commands and how to
> break things up into individual calls. So does this sequence here
> and the 'query-blobstore' output look ok?
> 
> { "execute": "qmp_capabilities" }
> {"return": {}}
> { "execute": "query-blobstore" }
> {"return": [{"size": 84480, "id": "tpm-bs"}]}

I'll let some QMP experts to comment.

We don't strictly need the id here, right?
It is passed to the command.

BTW is it [] or {}? It's the total size, right? Should it be
{"return": {"size": 84480}}
?


> 
> Corresponding command line parameters are:
> 
>     -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>     -drive if=none,id=tpm-bs,file=$TPMSTATE \
> 
> Regards,
>     Stefan
> 
> 
> >>>So with above, the raw case which we don't expect to be used often
> >>>is easy to use, but qcow which we expect to be the main case
> >>>is close to imposible, involving manual cut and paste
> >>>of image size.
> >>>
> >>>Formatting images seems a rare enough occasion,
> >>>that I think only using monitor command for that
> >>>would be a better idea than a ton of new command
> >>>line options. On top of that, let's write a
> >>>script that run qemu, queries image size,
> >>>creates a qcow2 file, run qemu again to format,
> >>>all this using QMP.
> >>Creates the qcow2 using 'qemu-img' I suppose.
> >>
> >>    Stefan
> >Sure.
> >
> >>>WRT 'format and run in one go' I strongly disagree with it.
> >>>It's just too easy to shoot oneself in the foot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15  6:57         ` Michael S. Tsirkin
@ 2011-09-15 10:22           ` Stefan Berger
  2011-09-15 10:51             ` Michael S. Tsirkin
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 10:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote:
> On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
>> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
>>>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>>>> qemu ... \
>>>>>>      -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>>>>>      -drive if=none,id=tpm-bs \
>>>>>>      -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>>>>>      -device tpm-tis,tpmdev=tpm0
>>>>>>
>>>>>> which would result in QEMU printing to stdout:
>>>>>>
>>>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
>>>>> So you envision tools parsing this freetext then?
>>>>> Seems like a step back, we are trying to move to QMP ...
>>>> I extended it first for the way I typically interact with QEMU. I do
>>>> not use the monitor much.
>>> It will work even better if there's a tool to do the job instead of cut
>>> and pasting stuff, won't it? And for that, we need monitor commands.
>>>
>> I am not so sure about the design of the QMP commands and how to
>> break things up into individual calls. So does this sequence here
>> and the 'query-blobstore' output look ok?
>>
>> { "execute": "qmp_capabilities" }
>> {"return": {}}
>> { "execute": "query-blobstore" }
>> {"return": [{"size": 84480, "id": "tpm-bs"}]}
> I'll let some QMP experts to comment.
>
> We don't strictly need the id here, right?
> It is passed to the command.
>
> BTW is it [] or {}? It's the total size, right? Should it be
> {"return": {"size": 84480}}
> ?
The id serves to distinguish one blobstore from the other. We'll have 
any number of blobstores. Since we get rid of the -blobstore option they 
will only be identifiable via the ID of the drive they are using. If 
that's not good, please let me know. The example I had shown yesterday 
were using the name of the blobstore,  rather than the drive ID, to 
connect the device to the blobstore.

before:

qemu ... \
     -blobstore name=my-blobstore,drive=tpm-bs,showsize \
     -drive if=none,id=tpm-bs \
     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
     -device tpm-tis,tpmdev=tpm0

now:

qemu ...\
  -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
  -drive if=none,id=tpm-bs,file=$TPMSTATE \



    Stefan

>
>> Corresponding command line parameters are:
>>
>>      -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>>      -drive if=none,id=tpm-bs,file=$TPMSTATE \
>>
>> Regards,
>>      Stefan
>>
>>
>>>>> So with above, the raw case which we don't expect to be used often
>>>>> is easy to use, but qcow which we expect to be the main case
>>>>> is close to imposible, involving manual cut and paste
>>>>> of image size.
>>>>>
>>>>> Formatting images seems a rare enough occasion,
>>>>> that I think only using monitor command for that
>>>>> would be a better idea than a ton of new command
>>>>> line options. On top of that, let's write a
>>>>> script that run qemu, queries image size,
>>>>> creates a qcow2 file, run qemu again to format,
>>>>> all this using QMP.
>>>> Creates the qcow2 using 'qemu-img' I suppose.
>>>>
>>>>     Stefan
>>> Sure.
>>>
>>>>> WRT 'format and run in one go' I strongly disagree with it.
>>>>> It's just too easy to shoot oneself in the foot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 10:22           ` Stefan Berger
@ 2011-09-15 10:51             ` Michael S. Tsirkin
  2011-09-15 10:55               ` Stefan Berger
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-15 10:51 UTC (permalink / raw)
  To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On Thu, Sep 15, 2011 at 06:22:15AM -0400, Stefan Berger wrote:
> On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote:
> >On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
> >>On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
> >>>>On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> >>>>>On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>>>>>qemu ... \
> >>>>>>     -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> >>>>>>     -drive if=none,id=tpm-bs \
> >>>>>>     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> >>>>>>     -device tpm-tis,tpmdev=tpm0
> >>>>>>
> >>>>>>which would result in QEMU printing to stdout:
> >>>>>>
> >>>>>>Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> >>>>>So you envision tools parsing this freetext then?
> >>>>>Seems like a step back, we are trying to move to QMP ...
> >>>>I extended it first for the way I typically interact with QEMU. I do
> >>>>not use the monitor much.
> >>>It will work even better if there's a tool to do the job instead of cut
> >>>and pasting stuff, won't it? And for that, we need monitor commands.
> >>>
> >>I am not so sure about the design of the QMP commands and how to
> >>break things up into individual calls. So does this sequence here
> >>and the 'query-blobstore' output look ok?
> >>
> >>{ "execute": "qmp_capabilities" }
> >>{"return": {}}
> >>{ "execute": "query-blobstore" }
> >>{"return": [{"size": 84480, "id": "tpm-bs"}]}
> >I'll let some QMP experts to comment.
> >
> >We don't strictly need the id here, right?
> >It is passed to the command.
> >
> >BTW is it [] or {}? It's the total size, right? Should it be
> >{"return": {"size": 84480}}
> >?
> The id serves to distinguish one blobstore from the other. We'll
> have any number of blobstores. Since we get rid of the -blobstore
> option they will only be identifiable via the ID of the drive they
> are using. If that's not good, please let me know. The example I had
> shown yesterday were using the name of the blobstore,  rather than
> the drive ID, to connect the device to the blobstore.
> before:
> 
> qemu ... \
>     -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>     -drive if=none,id=tpm-bs \
>     -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>     -device tpm-tis,tpmdev=tpm0
> 
> now:
> 
> qemu ...\
>  -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>  -drive if=none,id=tpm-bs,file=$TPMSTATE \
> 
> 
> 
>    Stefan

Ah, I get it. I was confused thinking this
queries a single store.
Instead this returns info about *all* blobstores.
query-blobstores would be a better name then.
Otherwise I think it's fine.


Also, should we rename blobstore to 'nvram' or
something else that tells the user what this does?


> 
> >
> >>Corresponding command line parameters are:
> >>
> >>     -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
> >>     -drive if=none,id=tpm-bs,file=$TPMSTATE \
> >>
> >>Regards,
> >>     Stefan
> >>
> >>
> >>>>>So with above, the raw case which we don't expect to be used often
> >>>>>is easy to use, but qcow which we expect to be the main case
> >>>>>is close to imposible, involving manual cut and paste
> >>>>>of image size.
> >>>>>
> >>>>>Formatting images seems a rare enough occasion,
> >>>>>that I think only using monitor command for that
> >>>>>would be a better idea than a ton of new command
> >>>>>line options. On top of that, let's write a
> >>>>>script that run qemu, queries image size,
> >>>>>creates a qcow2 file, run qemu again to format,
> >>>>>all this using QMP.
> >>>>Creates the qcow2 using 'qemu-img' I suppose.
> >>>>
> >>>>    Stefan
> >>>Sure.
> >>>
> >>>>>WRT 'format and run in one go' I strongly disagree with it.
> >>>>>It's just too easy to shoot oneself in the foot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 10:51             ` Michael S. Tsirkin
@ 2011-09-15 10:55               ` Stefan Berger
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 10:55 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster

On 09/15/2011 06:51 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 15, 2011 at 06:22:15AM -0400, Stefan Berger wrote:
>> On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
>>>> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
>>>>>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>>>>>> qemu ... \
>>>>>>>>      -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>>>>>>>      -drive if=none,id=tpm-bs \
>>>>>>>>      -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>>>>>>>      -device tpm-tis,tpmdev=tpm0
>>>>>>>>
>>>>>>>> which would result in QEMU printing to stdout:
>>>>>>>>
>>>>>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
>>>>>>> So you envision tools parsing this freetext then?
>>>>>>> Seems like a step back, we are trying to move to QMP ...
>>>>>> I extended it first for the way I typically interact with QEMU. I do
>>>>>> not use the monitor much.
>>>>> It will work even better if there's a tool to do the job instead of cut
>>>>> and pasting stuff, won't it? And for that, we need monitor commands.
>>>>>
>>>> I am not so sure about the design of the QMP commands and how to
>>>> break things up into individual calls. So does this sequence here
>>>> and the 'query-blobstore' output look ok?
>>>>
>>>> { "execute": "qmp_capabilities" }
>>>> {"return": {}}
>>>> { "execute": "query-blobstore" }
>>>> {"return": [{"size": 84480, "id": "tpm-bs"}]}
>>> I'll let some QMP experts to comment.
>>>
>>> We don't strictly need the id here, right?
>>> It is passed to the command.
>>>
>>> BTW is it [] or {}? It's the total size, right? Should it be
>>> {"return": {"size": 84480}}
>>> ?
>> The id serves to distinguish one blobstore from the other. We'll
>> have any number of blobstores. Since we get rid of the -blobstore
>> option they will only be identifiable via the ID of the drive they
>> are using. If that's not good, please let me know. The example I had
>> shown yesterday were using the name of the blobstore,  rather than
>> the drive ID, to connect the device to the blobstore.
>> before:
>>
>> qemu ... \
>>      -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>      -drive if=none,id=tpm-bs \
>>      -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>      -device tpm-tis,tpmdev=tpm0
>>
>> now:
>>
>> qemu ...\
>>   -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>>   -drive if=none,id=tpm-bs,file=$TPMSTATE \
>>
>>
>>
>>     Stefan
> Ah, I get it. I was confused thinking this
> queries a single store.
> Instead this returns info about *all* blobstores.
> query-blobstores would be a better name then.
> Otherwise I think it's fine.
>
>
> Also, should we rename blobstore to 'nvram' or
> something else that tells the user what this does?
>
>
Fine by me.
We ought to talk about the on-disk format then ...

    Stefan

>>>> Corresponding command line parameters are:
>>>>
>>>>      -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>>>>      -drive if=none,id=tpm-bs,file=$TPMSTATE \
>>>>
>>>> Regards,
>>>>      Stefan
>>>>
>>>>
>>>>>>> So with above, the raw case which we don't expect to be used often
>>>>>>> is easy to use, but qcow which we expect to be the main case
>>>>>>> is close to imposible, involving manual cut and paste
>>>>>>> of image size.
>>>>>>>
>>>>>>> Formatting images seems a rare enough occasion,
>>>>>>> that I think only using monitor command for that
>>>>>>> would be a better idea than a ton of new command
>>>>>>> line options. On top of that, let's write a
>>>>>>> script that run qemu, queries image size,
>>>>>>> creates a qcow2 file, run qemu again to format,
>>>>>>> all this using QMP.
>>>>>> Creates the qcow2 using 'qemu-img' I suppose.
>>>>>>
>>>>>>     Stefan
>>>>> Sure.
>>>>>
>>>>>>> WRT 'format and run in one go' I strongly disagree with it.
>>>>>>> It's just too easy to shoot oneself in the foot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
  2011-09-14 17:40 ` Michael S. Tsirkin
@ 2011-09-15  5:47 ` Gleb Natapov
  2011-09-15 10:18   ` Stefan Berger
  2011-09-15 11:17 ` Stefan Hajnoczi
  2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
  3 siblings, 1 reply; 27+ messages in thread
From: Gleb Natapov @ 2011-09-15  5:47 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>   One property of the blobstore is that it has a certain required
> size for accommodating all blobs of device that want to store their
> blobs onto. The assumption is that the size of these blobs is know
> a-priori to the writer of the device code and all devices can
> register their space requirements with the blobstore during device
> initialization. Then gathering all the registered blobs' sizes plus
> knowing the overhead of the layout of the data on the disk lets QEMU
> calculate the total required (minimum) size that the image has to
> have to accommodate all blobs in a particular blobstore.
> 
I do not see the point of having one blobstore for all devices. Each
should have its own. We will need permanent storage for UEFI firmware
too and creating new UEFI config for each machine configuration is not
the kind of usability we want to have.

--
			Gleb.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15  5:47 ` Gleb Natapov
@ 2011-09-15 10:18   ` Stefan Berger
  2011-09-15 10:20     ` Gleb Natapov
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 10:18 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On 09/15/2011 01:47 AM, Gleb Natapov wrote:
> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>    One property of the blobstore is that it has a certain required
>> size for accommodating all blobs of device that want to store their
>> blobs onto. The assumption is that the size of these blobs is know
>> a-priori to the writer of the device code and all devices can
>> register their space requirements with the blobstore during device
>> initialization. Then gathering all the registered blobs' sizes plus
>> knowing the overhead of the layout of the data on the disk lets QEMU
>> calculate the total required (minimum) size that the image has to
>> have to accommodate all blobs in a particular blobstore.
>>
> I do not see the point of having one blobstore for all devices. Each
> should have its own. We will need permanent storage for UEFI firmware
> too and creating new UEFI config for each machine configuration is not
> the kind of usability we want to have.
>
You will have the possibility of storing all devices' state into one 
blobstore or each devices' state in its own or any combination in between.

    Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 10:18   ` Stefan Berger
@ 2011-09-15 10:20     ` Gleb Natapov
  0 siblings, 0 replies; 27+ messages in thread
From: Gleb Natapov @ 2011-09-15 10:20 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On Thu, Sep 15, 2011 at 06:18:35AM -0400, Stefan Berger wrote:
> On 09/15/2011 01:47 AM, Gleb Natapov wrote:
> >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>   One property of the blobstore is that it has a certain required
> >>size for accommodating all blobs of device that want to store their
> >>blobs onto. The assumption is that the size of these blobs is know
> >>a-priori to the writer of the device code and all devices can
> >>register their space requirements with the blobstore during device
> >>initialization. Then gathering all the registered blobs' sizes plus
> >>knowing the overhead of the layout of the data on the disk lets QEMU
> >>calculate the total required (minimum) size that the image has to
> >>have to accommodate all blobs in a particular blobstore.
> >>
> >I do not see the point of having one blobstore for all devices. Each
> >should have its own. We will need permanent storage for UEFI firmware
> >too and creating new UEFI config for each machine configuration is not
> >the kind of usability we want to have.
> >
> You will have the possibility of storing all devices' state into one
> blobstore or each devices' state in its own or any combination in
> between.
> 
Good, thanks.

--
			Gleb.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
  2011-09-14 17:40 ` Michael S. Tsirkin
  2011-09-15  5:47 ` Gleb Natapov
@ 2011-09-15 11:17 ` Stefan Hajnoczi
  2011-09-15 11:35   ` Daniel P. Berrange
                     ` (2 more replies)
  2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
  3 siblings, 3 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2011-09-15 11:17 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
<stefanb@linux.vnet.ibm.com> wrote:
>  One property of the blobstore is that it has a certain required size for
> accommodating all blobs of device that want to store their blobs onto. The
> assumption is that the size of these blobs is know a-priori to the writer of
> the device code and all devices can register their space requirements with
> the blobstore during device initialization. Then gathering all the
> registered blobs' sizes plus knowing the overhead of the layout of the data
> on the disk lets QEMU calculate the total required (minimum) size that the
> image has to have to accommodate all blobs in a particular blobstore.

Libraries like tdb or gdbm come to mind.  We should be careful not to
reinvent cpio/tar or FAT :).

What about live migration?  If each VM has a LUN assigned on a SAN
then these qcow2 files add a new requirement for a shared file system.

Perhaps it makes sense to include the blobstore in the VM state data
instead?  If you take that approach then the blobstore will get
snapshotted *into* the existing qcow2 images.  Then you don't need a
shared file system for migration to work.

Can you share your design for the actual QEMU API that the TPM code
will use to manipulate the blobstore?  Is it designed to work in the
event loop while QEMU is running, or is it for rare I/O on
startup/shutdown?

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 11:17 ` Stefan Hajnoczi
@ 2011-09-15 11:35   ` Daniel P. Berrange
  2011-09-15 11:40   ` Kevin Wolf
  2011-09-15 12:34   ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
  2 siblings, 0 replies; 27+ messages in thread
From: Daniel P. Berrange @ 2011-09-15 11:35 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anthony Liguori, Michael S. Tsirkin, Stefan Berger,
	QEMU Developers, Markus Armbruster

On Thu, Sep 15, 2011 at 12:17:54PM +0100, Stefan Hajnoczi wrote:
> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> <stefanb@linux.vnet.ibm.com> wrote:
> >  One property of the blobstore is that it has a certain required size for
> > accommodating all blobs of device that want to store their blobs onto. The
> > assumption is that the size of these blobs is know a-priori to the writer of
> > the device code and all devices can register their space requirements with
> > the blobstore during device initialization. Then gathering all the
> > registered blobs' sizes plus knowing the overhead of the layout of the data
> > on the disk lets QEMU calculate the total required (minimum) size that the
> > image has to have to accommodate all blobs in a particular blobstore.
> 
> Libraries like tdb or gdbm come to mind.  We should be careful not to
> reinvent cpio/tar or FAT :).

qcow2 is desirable because it lets us provide encryption of the blobstore
which is important if you don't trust the admin of the NFS server, or the
network between the virt host & NFS server.

> What about live migration?  If each VM has a LUN assigned on a SAN
> then these qcow2 files add a new requirement for a shared file system.

NB, I'm not neccessarily recommending this, but it is possible to
format a raw block device, to contain a qcow2 image. So it does not
actually require a shared filesystem. it would however require an
additional LUN, or require that the existing LUN be partitioned into
two parts.


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 11:17 ` Stefan Hajnoczi
  2011-09-15 11:35   ` Daniel P. Berrange
@ 2011-09-15 11:40   ` Kevin Wolf
  2011-09-15 11:58     ` Stefan Hajnoczi
  2011-09-15 14:19     ` Stefan Berger
  2011-09-15 12:34   ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
  2 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2011-09-15 11:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin,
	QEMU Developers, Stefan Berger

Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> <stefanb@linux.vnet.ibm.com> wrote:
>>  One property of the blobstore is that it has a certain required size for
>> accommodating all blobs of device that want to store their blobs onto. The
>> assumption is that the size of these blobs is know a-priori to the writer of
>> the device code and all devices can register their space requirements with
>> the blobstore during device initialization. Then gathering all the
>> registered blobs' sizes plus knowing the overhead of the layout of the data
>> on the disk lets QEMU calculate the total required (minimum) size that the
>> image has to have to accommodate all blobs in a particular blobstore.
> 
> Libraries like tdb or gdbm come to mind.  We should be careful not to
> reinvent cpio/tar or FAT :).

We could use vvfat if we need a FAT implementation. *duck*

> What about live migration?  If each VM has a LUN assigned on a SAN
> then these qcow2 files add a new requirement for a shared file system.
> 
> Perhaps it makes sense to include the blobstore in the VM state data
> instead?  If you take that approach then the blobstore will get
> snapshotted *into* the existing qcow2 images.  Then you don't need a
> shared file system for migration to work.

But what happens if you don't do fancy things like snapshots or live
migration, but just shut the VM down? Nothing will be saved then, so it
must already be on disk. I think using a BlockDriverState for that makes
sense, even though it is some additional work for migration. But you
already deal with n disks, doing n+1 disks shouldn't be much harder.


The one thing that I didn't understand in the original mail is why you
think that raw works with your option but qcow2 doesn't. Where's the
difference wrt creating an image?

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 11:40   ` Kevin Wolf
@ 2011-09-15 11:58     ` Stefan Hajnoczi
  2011-09-15 12:31       ` Michael S. Tsirkin
  2011-09-16  8:46       ` Kevin Wolf
  2011-09-15 14:19     ` Stefan Berger
  1 sibling, 2 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2011-09-15 11:58 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin,
	QEMU Developers, Stefan Berger

On Thu, Sep 15, 2011 at 12:40 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
>> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
>> <stefanb@linux.vnet.ibm.com> wrote:
>>>  One property of the blobstore is that it has a certain required size for
>>> accommodating all blobs of device that want to store their blobs onto. The
>>> assumption is that the size of these blobs is know a-priori to the writer of
>>> the device code and all devices can register their space requirements with
>>> the blobstore during device initialization. Then gathering all the
>>> registered blobs' sizes plus knowing the overhead of the layout of the data
>>> on the disk lets QEMU calculate the total required (minimum) size that the
>>> image has to have to accommodate all blobs in a particular blobstore.
>>
>> Libraries like tdb or gdbm come to mind.  We should be careful not to
>> reinvent cpio/tar or FAT :).
>
> We could use vvfat if we need a FAT implementation. *duck*
>
>> What about live migration?  If each VM has a LUN assigned on a SAN
>> then these qcow2 files add a new requirement for a shared file system.
>>
>> Perhaps it makes sense to include the blobstore in the VM state data
>> instead?  If you take that approach then the blobstore will get
>> snapshotted *into* the existing qcow2 images.  Then you don't need a
>> shared file system for migration to work.
>
> But what happens if you don't do fancy things like snapshots or live
> migration, but just shut the VM down? Nothing will be saved then, so it
> must already be on disk. I think using a BlockDriverState for that makes
> sense, even though it is some additional work for migration. But you
> already deal with n disks, doing n+1 disks shouldn't be much harder.

Sure, you need a file because the data needs to be persistent.  I'm
not saying to keep it in memory only.

My concern is that while QEMU block devices provide a convenient
wrapper for snapshot and encryption, we need to write the data layout
that goes inside that wrapper from scratch.  We'll need to invent our
own key-value store when there are plenty of existing ones.  I
explained that the snapshot feature is actually a misfeature, it would
be better to integrate with VM state data so that there is no
additional migration requirement.

As for encryption, just encrypt the values you put into the key-value store.

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 11:58     ` Stefan Hajnoczi
@ 2011-09-15 12:31       ` Michael S. Tsirkin
  2011-09-16  8:46       ` Kevin Wolf
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-15 12:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
	Stefan Berger

> We'll need to invent our
> own key-value store when there are plenty of existing ones.

Let's not invent our own.
So a proposal I sent uses an existing one (BER encoding) for such a
store.  I actually think we can switch to BER more widely such as for
migration format.

-- 
MST

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 11:58     ` Stefan Hajnoczi
  2011-09-15 12:31       ` Michael S. Tsirkin
@ 2011-09-16  8:46       ` Kevin Wolf
  1 sibling, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2011-09-16  8:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin,
	QEMU Developers, Stefan Berger

Am 15.09.2011 13:58, schrieb Stefan Hajnoczi:
>>> What about live migration?  If each VM has a LUN assigned on a SAN
>>> then these qcow2 files add a new requirement for a shared file system.
>>>
>>> Perhaps it makes sense to include the blobstore in the VM state data
>>> instead?  If you take that approach then the blobstore will get
>>> snapshotted *into* the existing qcow2 images.  Then you don't need a
>>> shared file system for migration to work.
>>
>> But what happens if you don't do fancy things like snapshots or live
>> migration, but just shut the VM down? Nothing will be saved then, so it
>> must already be on disk. I think using a BlockDriverState for that makes
>> sense, even though it is some additional work for migration. But you
>> already deal with n disks, doing n+1 disks shouldn't be much harder.
> 
> Sure, you need a file because the data needs to be persistent.  I'm
> not saying to keep it in memory only.
> 
> My concern is that while QEMU block devices provide a convenient
> wrapper for snapshot and encryption, we need to write the data layout
> that goes inside that wrapper from scratch.  We'll need to invent our
> own key-value store when there are plenty of existing ones.  I
> explained that the snapshot feature is actually a misfeature, it would
> be better to integrate with VM state data so that there is no
> additional migration requirement.

I'm not so sure if being able to integrate it in the VM state is a
feature or a bug. There is no other persistent data that is included in
VM state data.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 11:40   ` Kevin Wolf
  2011-09-15 11:58     ` Stefan Hajnoczi
@ 2011-09-15 14:19     ` Stefan Berger
  2011-09-16  8:12       ` Kevin Wolf
  1 sibling, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 14:19 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: QEMU Developers, Stefan Hajnoczi, Anthony Liguori,
	Markus Armbruster, Michael S. Tsirkin

On 09/15/2011 07:40 AM, Kevin Wolf wrote:
> Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
>> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
>> <stefanb@linux.vnet.ibm.com>  wrote:
>>>   One property of the blobstore is that it has a certain required size for
>>> accommodating all blobs of device that want to store their blobs onto. The
>>> assumption is that the size of these blobs is know a-priori to the writer of
>>> the device code and all devices can register their space requirements with
>>> the blobstore during device initialization. Then gathering all the
>>> registered blobs' sizes plus knowing the overhead of the layout of the data
>>> on the disk lets QEMU calculate the total required (minimum) size that the
>>> image has to have to accommodate all blobs in a particular blobstore.
>> Libraries like tdb or gdbm come to mind.  We should be careful not to
>> reinvent cpio/tar or FAT :).
> We could use vvfat if we need a FAT implementation. *duck*
>
>> What about live migration?  If each VM has a LUN assigned on a SAN
>> then these qcow2 files add a new requirement for a shared file system.
>>
>> Perhaps it makes sense to include the blobstore in the VM state data
>> instead?  If you take that approach then the blobstore will get
>> snapshotted *into* the existing qcow2 images.  Then you don't need a
>> shared file system for migration to work.
> But what happens if you don't do fancy things like snapshots or live
> migration, but just shut the VM down? Nothing will be saved then, so it
> must already be on disk. I think using a BlockDriverState for that makes
> sense, even though it is some additional work for migration. But you
> already deal with n disks, doing n+1 disks shouldn't be much harder.
>
>
> The one thing that I didn't understand in the original mail is why you
> think that raw works with your option but qcow2 doesn't. Where's the
> difference wrt creating an image?
I guess you are asking me (also 'Stefan').

When I had QEMU create the disk file I had to pass a file parameter to 
-drive ...,file=... for it to know which file to create. If the file 
didn't exist, I got an error. So I create an empty file using 'touch' 
and could at least start. Though an empty file declared with the format 
qcow2 in -drive ...,file=...,format=qcow2 throws another error since 
that's not a valid QCoW2. I wanted to use that parameter 'format' to 
know what the user wanted to create. So in case of 'raw', I could start 
out with an empty file, have QEMU calculate the size, call the 
'truncate' function on the bdrv it was used with and then had a raw 
image of the needed size. THe VM could start right away...

    Stefan

> Kevin
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 14:19     ` Stefan Berger
@ 2011-09-16  8:12       ` Kevin Wolf
  0 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2011-09-16  8:12 UTC (permalink / raw)
  To: Stefan Berger
  Cc: QEMU Developers, Stefan Hajnoczi, Anthony Liguori,
	Markus Armbruster, Michael S. Tsirkin

Am 15.09.2011 16:19, schrieb Stefan Berger:
> On 09/15/2011 07:40 AM, Kevin Wolf wrote:
>> Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
>>> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
>>> <stefanb@linux.vnet.ibm.com>  wrote:
>>>>   One property of the blobstore is that it has a certain required size for
>>>> accommodating all blobs of device that want to store their blobs onto. The
>>>> assumption is that the size of these blobs is know a-priori to the writer of
>>>> the device code and all devices can register their space requirements with
>>>> the blobstore during device initialization. Then gathering all the
>>>> registered blobs' sizes plus knowing the overhead of the layout of the data
>>>> on the disk lets QEMU calculate the total required (minimum) size that the
>>>> image has to have to accommodate all blobs in a particular blobstore.
>>> Libraries like tdb or gdbm come to mind.  We should be careful not to
>>> reinvent cpio/tar or FAT :).
>> We could use vvfat if we need a FAT implementation. *duck*
>>
>>> What about live migration?  If each VM has a LUN assigned on a SAN
>>> then these qcow2 files add a new requirement for a shared file system.
>>>
>>> Perhaps it makes sense to include the blobstore in the VM state data
>>> instead?  If you take that approach then the blobstore will get
>>> snapshotted *into* the existing qcow2 images.  Then you don't need a
>>> shared file system for migration to work.
>> But what happens if you don't do fancy things like snapshots or live
>> migration, but just shut the VM down? Nothing will be saved then, so it
>> must already be on disk. I think using a BlockDriverState for that makes
>> sense, even though it is some additional work for migration. But you
>> already deal with n disks, doing n+1 disks shouldn't be much harder.
>>
>>
>> The one thing that I didn't understand in the original mail is why you
>> think that raw works with your option but qcow2 doesn't. Where's the
>> difference wrt creating an image?
> I guess you are asking me (also 'Stefan').
> 
> When I had QEMU create the disk file I had to pass a file parameter to 
> -drive ...,file=... for it to know which file to create. If the file 
> didn't exist, I got an error. So I create an empty file using 'touch' 
> and could at least start. Though an empty file declared with the format 
> qcow2 in -drive ...,file=...,format=qcow2 throws another error since 
> that's not a valid QCoW2. I wanted to use that parameter 'format' to 
> know what the user wanted to create. So in case of 'raw', I could start 
> out with an empty file, have QEMU calculate the size, call the 
> 'truncate' function on the bdrv it was used with and then had a raw 
> image of the needed size. THe VM could start right away...

Oh, so you created the image manually instead of using
bdrv_img_create?() That explains it...

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore  [API of the NVRAM]
  2011-09-15 11:17 ` Stefan Hajnoczi
  2011-09-15 11:35   ` Daniel P. Berrange
  2011-09-15 11:40   ` Kevin Wolf
@ 2011-09-15 12:34   ` Stefan Berger
  2011-09-16 10:35     ` Stefan Hajnoczi
  2 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 12:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> <stefanb@linux.vnet.ibm.com>  wrote:
>>   One property of the blobstore is that it has a certain required size for
>> accommodating all blobs of device that want to store their blobs onto. The
>> assumption is that the size of these blobs is know a-priori to the writer of
>> the device code and all devices can register their space requirements with
>> the blobstore during device initialization. Then gathering all the
>> registered blobs' sizes plus knowing the overhead of the layout of the data
>> on the disk lets QEMU calculate the total required (minimum) size that the
>> image has to have to accommodate all blobs in a particular blobstore.
> Libraries like tdb or gdbm come to mind.  We should be careful not to
> reinvent cpio/tar or FAT :).
Sure. As long as these dbs allow to over-ride open(), close(), read(), 
write() and seek() with bdrv ops we could recycle any of these. Maybe we 
can build something smaller than those...
> What about live migration?  If each VM has a LUN assigned on a SAN
> then these qcow2 files add a new requirement for a shared file system.
>
Well, one can still block-migrate these. The user has to know of course 
whether shared storage is setup or not and pass the appropriate flags to 
libvirt for migration. I know it works (modulo some problems when using 
encrypted QCoW2) since I've been testing with it.

> Perhaps it makes sense to include the blobstore in the VM state data
> instead?  If you take that approach then the blobstore will get
> snapshotted *into* the existing qcow2 images.  Then you don't need a
> shared file system for migration to work.
>
It could be an option. However, if the user has a raw image for the VM 
we still need the NVRAM emulation for the TPM for example. So we need to 
store the persistent data somewhere but raw is not prepared for that. 
Even if snapshotting doesn't work at all we need to be able to persist 
devices' data.


> Can you share your design for the actual QEMU API that the TPM code
> will use to manipulate the blobstore?  Is it designed to work in the
> event loop while QEMU is running, or is it for rare I/O on
> startup/shutdown?
>
Everything is kind of changing now. But here's what I have right now:

     tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode);
     if (!tb->s.tpm_ltpms->nvram) {
         fprintf(stderr, "Could not find nvram.\n");
         return errcode;
     }

     nvram_register_blob(tb->s.tpm_ltpms->nvram,
                         NVRAM_ENTRY_PERMSTATE,
                         tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
     nvram_register_blob(tb->s.tpm_ltpms->nvram,
                         NVRAM_ENTRY_SAVESTATE,
                         tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
     nvram_register_blob(tb->s.tpm_ltpms->nvram,
                         NVRAM_ENTRY_VOLASTATE,
                         
tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));

     rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);

Above first sets up the NVRAM using the drive's id. That is the -tpmdev 
...,nvram=my-bs, parameter. This establishes the NVRAM. Subsequently the 
blobs to be written into the NVRAM are registered. The nvram_start then 
reconciles the registered NVRAM blobs with those found on disk and if 
everything fits together the result is 'rc = 0' and the NVRAM is ready 
to go. Other devices can than do the same also with the same NVRAM or 
another NVRAM. (NVRAM now after renaming from blobstore).

Reading from NVRAM in case of the TPM is a rare event. It happens in the 
context of QEMU's main thread:

     if (nvram_read_data(tpm_ltpms->nvram,
                         NVRAM_ENTRY_PERMSTATE,
&tpm_ltpms->permanent_state.buffer,
&tpm_ltpms->permanent_state.size,
                         0, NULL, NULL) ||
         nvram_read_data(tpm_ltpms->nvram,
                         NVRAM_ENTRY_SAVESTATE,
&tpm_ltpms->save_state.buffer,
&tpm_ltpms->save_state.size,
                         0, NULL, NULL))
     {
         tpm_ltpms->had_fatal_error = true;
         return;
     }

Above reads the data of 2 blobs synchronously. This happens during startup.


Writes are depending on what the user does with the TPM. He can trigger 
lots of updates to persistent state if he performs certain operations, 
i.e., persisting keys inside the TPM.

     rc = nvram_write_data(tpm_ltpms->nvram,
                           what, tsb->buffer, tsb->size,
                           VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
                           NULL, NULL);

Above writes a TPM blob into the NVRAM. This is triggered by the TPM 
thread and notifies the QEMU main thread to write the blob into NVRAM. I 
do this synchronously at the moment not using the last two parameters 
for callback after completion but the two flags. The first is to notify 
the main thread the 2nd flag is to wait for the completion of the 
request (using a condition internally).

Here are the protos:

VNVRAM *nvram_setup(const char *drive_id, int *errcode);

int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);

int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
                         unsigned int maxsize);

unsigned int nvram_get_totalsize(VNVRAM *bs);
unsigned int nvram_get_totalsize_kb(VNVRAM *bs);

typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
                              unsigned char **data, unsigned int len);

int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
                      const unsigned char *data, unsigned int len,
                      int flags, NVRAMRWFinishCB cb, void *opaque);


As said, things are changing right now, so this is to give an impression...

   Stefan

> Stefan
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore  [API of the NVRAM]
  2011-09-15 12:34   ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
@ 2011-09-16 10:35     ` Stefan Hajnoczi
  2011-09-16 11:36       ` Stefan Berger
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2011-09-16 10:35 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote:
> On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
> >On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> ><stefanb@linux.vnet.ibm.com>  wrote:
> >>  One property of the blobstore is that it has a certain required size for
> >>accommodating all blobs of device that want to store their blobs onto. The
> >>assumption is that the size of these blobs is know a-priori to the writer of
> >>the device code and all devices can register their space requirements with
> >>the blobstore during device initialization. Then gathering all the
> >>registered blobs' sizes plus knowing the overhead of the layout of the data
> >>on the disk lets QEMU calculate the total required (minimum) size that the
> >>image has to have to accommodate all blobs in a particular blobstore.
> >Libraries like tdb or gdbm come to mind.  We should be careful not to
> >reinvent cpio/tar or FAT :).
> Sure. As long as these dbs allow to over-ride open(), close(),
> read(), write() and seek() with bdrv ops we could recycle any of
> these. Maybe we can build something smaller than those...
> >What about live migration?  If each VM has a LUN assigned on a SAN
> >then these qcow2 files add a new requirement for a shared file system.
> >
> Well, one can still block-migrate these. The user has to know of
> course whether shared storage is setup or not and pass the
> appropriate flags to libvirt for migration. I know it works (modulo
> some problems when using encrypted QCoW2) since I've been testing
> with it.
> 
> >Perhaps it makes sense to include the blobstore in the VM state data
> >instead?  If you take that approach then the blobstore will get
> >snapshotted *into* the existing qcow2 images.  Then you don't need a
> >shared file system for migration to work.
> >
> It could be an option. However, if the user has a raw image for the
> VM we still need the NVRAM emulation for the TPM for example. So we
> need to store the persistent data somewhere but raw is not prepared
> for that. Even if snapshotting doesn't work at all we need to be
> able to persist devices' data.
> 
> 
> >Can you share your design for the actual QEMU API that the TPM code
> >will use to manipulate the blobstore?  Is it designed to work in the
> >event loop while QEMU is running, or is it for rare I/O on
> >startup/shutdown?
> >
> Everything is kind of changing now. But here's what I have right now:
> 
>     tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode);
>     if (!tb->s.tpm_ltpms->nvram) {
>         fprintf(stderr, "Could not find nvram.\n");
>         return errcode;
>     }
> 
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_PERMSTATE,
>                         tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_SAVESTATE,
>                         tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_VOLASTATE,
> tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));
> 
>     rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);
> 
> Above first sets up the NVRAM using the drive's id. That is the
> -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM.
> Subsequently the blobs to be written into the NVRAM are registered.
> The nvram_start then reconciles the registered NVRAM blobs with
> those found on disk and if everything fits together the result is
> 'rc = 0' and the NVRAM is ready to go. Other devices can than do the
> same also with the same NVRAM or another NVRAM. (NVRAM now after
> renaming from blobstore).
> 
> Reading from NVRAM in case of the TPM is a rare event. It happens in
> the context of QEMU's main thread:
> 
>     if (nvram_read_data(tpm_ltpms->nvram,
>                         NVRAM_ENTRY_PERMSTATE,
> &tpm_ltpms->permanent_state.buffer,
> &tpm_ltpms->permanent_state.size,
>                         0, NULL, NULL) ||
>         nvram_read_data(tpm_ltpms->nvram,
>                         NVRAM_ENTRY_SAVESTATE,
> &tpm_ltpms->save_state.buffer,
> &tpm_ltpms->save_state.size,
>                         0, NULL, NULL))
>     {
>         tpm_ltpms->had_fatal_error = true;
>         return;
>     }
> 
> Above reads the data of 2 blobs synchronously. This happens during startup.
> 
> 
> Writes are depending on what the user does with the TPM. He can
> trigger lots of updates to persistent state if he performs certain
> operations, i.e., persisting keys inside the TPM.
> 
>     rc = nvram_write_data(tpm_ltpms->nvram,
>                           what, tsb->buffer, tsb->size,
>                           VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
>                           NULL, NULL);
> 
> Above writes a TPM blob into the NVRAM. This is triggered by the TPM
> thread and notifies the QEMU main thread to write the blob into
> NVRAM. I do this synchronously at the moment not using the last two
> parameters for callback after completion but the two flags. The
> first is to notify the main thread the 2nd flag is to wait for the
> completion of the request (using a condition internally).
> 
> Here are the protos:
> 
> VNVRAM *nvram_setup(const char *drive_id, int *errcode);
> 
> int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);
> 
> int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
>                         unsigned int maxsize);
> 
> unsigned int nvram_get_totalsize(VNVRAM *bs);
> unsigned int nvram_get_totalsize_kb(VNVRAM *bs);
> 
> typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
>                              unsigned char **data, unsigned int len);
> 
> int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
>                      const unsigned char *data, unsigned int len,
>                      int flags, NVRAMRWFinishCB cb, void *opaque);
> 
> 
> As said, things are changing right now, so this is to give an impression...

Thanks, these details are interesting.  I interpreted the blobstore as a
key-value store but these example show it as a stream.  No IDs or
offsets are given, the reads are just performed in order and move
through the NVRAM.  If it stays this simple then bdrv_*() is indeed a
natural way to do this - although my migration point remains since this
feature adds a new requirement for shared storage when it would be
pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM
is relatively small?).

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore  [API of the NVRAM]
  2011-09-16 10:35     ` Stefan Hajnoczi
@ 2011-09-16 11:36       ` Stefan Berger
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-16 11:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anthony Liguori, Michael S. Tsirkin,
	Markus Armbruster, QEMU Developers

On 09/16/2011 06:35 AM, Stefan Hajnoczi wrote:
> On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote:
>> On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
>>
[...]
>> Everything is kind of changing now. But here's what I have right now:
>>
>>      tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id,&errcode);
>>      if (!tb->s.tpm_ltpms->nvram) {
>>          fprintf(stderr, "Could not find nvram.\n");
>>          return errcode;
>>      }
>>
>>      nvram_register_blob(tb->s.tpm_ltpms->nvram,
>>                          NVRAM_ENTRY_PERMSTATE,
>>                          tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
>>      nvram_register_blob(tb->s.tpm_ltpms->nvram,
>>                          NVRAM_ENTRY_SAVESTATE,
>>                          tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
>>      nvram_register_blob(tb->s.tpm_ltpms->nvram,
>>                          NVRAM_ENTRY_VOLASTATE,
>> tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));
>>
>>      rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);
>>
>> Above first sets up the NVRAM using the drive's id. That is the
>> -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM.
>> Subsequently the blobs to be written into the NVRAM are registered.
>> The nvram_start then reconciles the registered NVRAM blobs with
>> those found on disk and if everything fits together the result is
>> 'rc = 0' and the NVRAM is ready to go. Other devices can than do the
>> same also with the same NVRAM or another NVRAM. (NVRAM now after
>> renaming from blobstore).
>>
>> Reading from NVRAM in case of the TPM is a rare event. It happens in
>> the context of QEMU's main thread:
>>
>>      if (nvram_read_data(tpm_ltpms->nvram,
>>                          NVRAM_ENTRY_PERMSTATE,
>> &tpm_ltpms->permanent_state.buffer,
>> &tpm_ltpms->permanent_state.size,
>>                          0, NULL, NULL) ||
>>          nvram_read_data(tpm_ltpms->nvram,
>>                          NVRAM_ENTRY_SAVESTATE,
>> &tpm_ltpms->save_state.buffer,
>> &tpm_ltpms->save_state.size,
>>                          0, NULL, NULL))
>>      {
>>          tpm_ltpms->had_fatal_error = true;
>>          return;
>>      }
>>
>> Above reads the data of 2 blobs synchronously. This happens during startup.
>>
>>
>> Writes are depending on what the user does with the TPM. He can
>> trigger lots of updates to persistent state if he performs certain
>> operations, i.e., persisting keys inside the TPM.
>>
>>      rc = nvram_write_data(tpm_ltpms->nvram,
>>                            what, tsb->buffer, tsb->size,
>>                            VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
>>                            NULL, NULL);
>>
>> Above writes a TPM blob into the NVRAM. This is triggered by the TPM
>> thread and notifies the QEMU main thread to write the blob into
>> NVRAM. I do this synchronously at the moment not using the last two
>> parameters for callback after completion but the two flags. The
>> first is to notify the main thread the 2nd flag is to wait for the
>> completion of the request (using a condition internally).
>>
>> Here are the protos:
>>
>> VNVRAM *nvram_setup(const char *drive_id, int *errcode);
>>
>> int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);
>>
>> int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
>>                          unsigned int maxsize);
>>
>> unsigned int nvram_get_totalsize(VNVRAM *bs);
>> unsigned int nvram_get_totalsize_kb(VNVRAM *bs);
>>
>> typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
>>                               unsigned char **data, unsigned int len);
>>
>> int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
>>                       const unsigned char *data, unsigned int len,
>>                       int flags, NVRAMRWFinishCB cb, void *opaque);
>>
>>
>> As said, things are changing right now, so this is to give an impression...
> Thanks, these details are interesting.  I interpreted the blobstore as a
> key-value store but these example show it as a stream.  No IDs or
IMO the only stuff we should store there are blobs retrievable via keys 
(names) -- no metadata.

> offsets are given, the reads are just performed in order and move
> through the NVRAM.  If it stays this simple then bdrv_*() is indeed a
There are no offsets because there's some intelligence in the 
blobstore/NVRAM that lays out the data onto the disk. That's why there 
is a directory. This in turn allows the sharing of the NVRAM by possibly 
multiple drivers where the driver-writer doesn't need to lay out the 
blobs him-/herself.
> natural way to do this - although my migration point remains since this
> feature adds a new requirement for shared storage when it would be
> pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM
> is relatively small?).
It's just another image. You have to treat it like the VM's 'main' 
image. Block migration works fine on it just that it may be difficult 
for a user to handle migration flags if one image is on shared storage 
and the other isn't.

    Stefan
> Stefan
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
                   ` (2 preceding siblings ...)
  2011-09-15 11:17 ` Stefan Hajnoczi
@ 2011-09-15 13:05 ` Daniel P. Berrange
  2011-09-15 13:13   ` Stefan Berger
  3 siblings, 1 reply; 27+ messages in thread
From: Daniel P. Berrange @ 2011-09-15 13:05 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> Hello!
> 
>   Over the last few days primarily Michael Tsirkin and I have
> discussed the design of the 'blobstore' via IRC (#virtualization).
> The intention of the blobstore is to provide storage to persist
> blobs that devices create. Along with these blobs possibly some
> metadata should be storable in this blobstore.
> 
>   An initial client for the blobstore would be the TPM emulation.
> The TPM's persistent state needs to be stored once it changes so it
> can be restored at any point in time later on, i.e., after a cold
> reboot of the VM. In effect the blobstore simulates the NVRAM of a
> device where it would typically store such persistent data onto.

While I can see the appeal of a general 'blobstore' for NVRAM
tunables related to device, wrt the TPM emulation, should we
be considering use of something like the PKCS#11 standard for
storing/retrieving crypto data for the TPM ?

  https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11

This is a industry standard for interfacing to cryptographic
storage mechanisms, widely supported by all SSL libraries & more
or less all programming languages. IIUC it lets the application
avoid hardcoding a specification storage backend impl, so it can
be made to work with anything from local files, to smartcards,
to HSMs, to remote network services.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
@ 2011-09-15 13:13   ` Stefan Berger
  2011-09-15 13:27     ` Daniel P. Berrange
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 13:13 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On 09/15/2011 09:05 AM, Daniel P. Berrange wrote:
> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>> Hello!
>>
>>    Over the last few days primarily Michael Tsirkin and I have
>> discussed the design of the 'blobstore' via IRC (#virtualization).
>> The intention of the blobstore is to provide storage to persist
>> blobs that devices create. Along with these blobs possibly some
>> metadata should be storable in this blobstore.
>>
>>    An initial client for the blobstore would be the TPM emulation.
>> The TPM's persistent state needs to be stored once it changes so it
>> can be restored at any point in time later on, i.e., after a cold
>> reboot of the VM. In effect the blobstore simulates the NVRAM of a
>> device where it would typically store such persistent data onto.
> While I can see the appeal of a general 'blobstore' for NVRAM
> tunables related to device, wrt the TPM emulation, should we
> be considering use of something like the PKCS#11 standard for
> storing/retrieving crypto data for the TPM ?
>
>    https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11
We should regard the blobs the TPM produces as crypto data as a whole, 
allowing for encryption of each one. QCoW2 encryption is good for that 
since it uses per-sector encryption but we loose all that in case of RAW 
image being use for NVRAM storage.

FYI: The TPM writes its data in a custom format and produces a blob that 
should be stored without knowing the organization of its content. This 
blob doesn't only contain keys but many other data in the 3 different 
types of blobs that the TPM can produce under certain cirumstances : 
values of counters, values of the PCRs (20 byte long registers), keys, 
owner and SRK (storage root key) password, TPM's NVRAM areas, flags etc.

It produces the following blobs:
- permanent data blob: Whenever it writes data to peristent storage
- save state blob: Upon a S3 Suspend (kicked by the TPM TIS driver 
sending a command to the TPM)
- volatile data: Upon migration / suspend that contains the volatile 
data that after a reboot of the VM typically are initialized by the TPM 
but of course need to be restored on the migration target / resume.

    Stefan

> This is a industry standard for interfacing to cryptographic
> storage mechanisms, widely supported by all SSL libraries&  more
> or less all programming languages. IIUC it lets the application
> avoid hardcoding a specification storage backend impl, so it can
> be made to work with anything from local files, to smartcards,
> to HSMs, to remote network services.
>
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 13:13   ` Stefan Berger
@ 2011-09-15 13:27     ` Daniel P. Berrange
  2011-09-15 14:00       ` Stefan Berger
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel P. Berrange @ 2011-09-15 13:27 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Markus Armbruster, Anthony Liguori, QEMU Developers,
	Michael S. Tsirkin

On Thu, Sep 15, 2011 at 09:13:25AM -0400, Stefan Berger wrote:
> On 09/15/2011 09:05 AM, Daniel P. Berrange wrote:
> >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>Hello!
> >>
> >>   Over the last few days primarily Michael Tsirkin and I have
> >>discussed the design of the 'blobstore' via IRC (#virtualization).
> >>The intention of the blobstore is to provide storage to persist
> >>blobs that devices create. Along with these blobs possibly some
> >>metadata should be storable in this blobstore.
> >>
> >>   An initial client for the blobstore would be the TPM emulation.
> >>The TPM's persistent state needs to be stored once it changes so it
> >>can be restored at any point in time later on, i.e., after a cold
> >>reboot of the VM. In effect the blobstore simulates the NVRAM of a
> >>device where it would typically store such persistent data onto.
> >While I can see the appeal of a general 'blobstore' for NVRAM
> >tunables related to device, wrt the TPM emulation, should we
> >be considering use of something like the PKCS#11 standard for
> >storing/retrieving crypto data for the TPM ?
> >
> >   https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11
> We should regard the blobs the TPM produces as crypto data as a
> whole, allowing for encryption of each one. QCoW2 encryption is good
> for that since it uses per-sector encryption but we loose all that
> in case of RAW image being use for NVRAM storage.
> 
> FYI: The TPM writes its data in a custom format and produces a blob
> that should be stored without knowing the organization of its
> content. This blob doesn't only contain keys but many other data in
> the 3 different types of blobs that the TPM can produce under
> certain cirumstances : values of counters, values of the PCRs (20
> byte long registers), keys, owner and SRK (storage root key)
> password, TPM's NVRAM areas, flags etc.

Is this description of storage inherant in the impl of TPMs in general,
or just the way you've chosen to implement the QEMU vTPM ?

IIUC, you are describing a layering like

   +----------------+
   | Guest App      |
   +----------------+
     ^ ^ ^ ^ ^ ^ ^
     | | | | | | |    Data slots
     V V V V V V V
   +----------------+
   | QEMU vTPM Dev  |
   +----------------+
      ^
      |               Data blob
      V
   +----------------+
   | Storage device |  (File/block dev)
   +----------------+

I was thinking about whether we could delegate the encoding
of data slots -> blobs, to outside the vTPM device emulation
by using PKCS ?

   +----------------+
   | Guest App      |
   +----------------+
     ^ ^ ^ ^ ^ ^ ^
     | | | | | | |    Data slots
     V V V V V V V
   +----------------+
   | QEMU vTPM Dev  |
   +----------------+
     ^ ^ ^ ^ ^ ^ ^
     | | | | | | |    Data slots
     V V V V V V V
   +----------------+
   | PKCS#11 Driver |
   +----------------+
      ^
      |               Data blob
      V
   +----------------+
   | Storage device | (File/blockdev/HSM/Smartcard)
   +----------------+


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] Design of the blobstore
  2011-09-15 13:27     ` Daniel P. Berrange
@ 2011-09-15 14:00       ` Stefan Berger
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 14:00 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Anthony Liguori, Michael S. Tsirkin, Markus Armbruster,
	QEMU Developers

On 09/15/2011 09:27 AM, Daniel P. Berrange wrote:
> On Thu, Sep 15, 2011 at 09:13:25AM -0400, Stefan Berger wrote:
>> On 09/15/2011 09:05 AM, Daniel P. Berrange wrote:
>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>> Hello!
>>>>
>>>>    Over the last few days primarily Michael Tsirkin and I have
>>>> discussed the design of the 'blobstore' via IRC (#virtualization).
>>>> The intention of the blobstore is to provide storage to persist
>>>> blobs that devices create. Along with these blobs possibly some
>>>> metadata should be storable in this blobstore.
>>>>
>>>>    An initial client for the blobstore would be the TPM emulation.
>>>> The TPM's persistent state needs to be stored once it changes so it
>>>> can be restored at any point in time later on, i.e., after a cold
>>>> reboot of the VM. In effect the blobstore simulates the NVRAM of a
>>>> device where it would typically store such persistent data onto.
>>> While I can see the appeal of a general 'blobstore' for NVRAM
>>> tunables related to device, wrt the TPM emulation, should we
>>> be considering use of something like the PKCS#11 standard for
>>> storing/retrieving crypto data for the TPM ?
>>>
>>>    https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11
>> We should regard the blobs the TPM produces as crypto data as a
>> whole, allowing for encryption of each one. QCoW2 encryption is good
>> for that since it uses per-sector encryption but we loose all that
>> in case of RAW image being use for NVRAM storage.
>>
>> FYI: The TPM writes its data in a custom format and produces a blob
>> that should be stored without knowing the organization of its
>> content. This blob doesn't only contain keys but many other data in
>> the 3 different types of blobs that the TPM can produce under
>> certain cirumstances : values of counters, values of the PCRs (20
>> byte long registers), keys, owner and SRK (storage root key)
>> password, TPM's NVRAM areas, flags etc.
> Is this description of storage inherant in the impl of TPMs in general,
> or just the way you've chosen to implement the QEMU vTPM ?
There's no absolute definition of how a TPM writes all its data into 
NVRAM. Some structures are defined and we used them where we could, 
others were defined by 'us' -- so they are manufacturer-specific. 
Suspend operations for example were not envisioned for the hardware TPM 
but we needed to write more data out than what the standard defines so 
we could resume properly. What is defined is persistent storage and S3 
suspend (save state) as described in the previous mail.

> IIUC, you are describing a layering like
>
>     +----------------+
>     | Guest App      |
>     +----------------+
>       ^ ^ ^ ^ ^ ^ ^
>       | | | | | | |    Data slots
>       V V V V V V V
>     +----------------+
>     | QEMU vTPM Dev  |
>     +----------------+
>        ^
>        |               Data blob
>        V
>     +----------------+
>     | Storage device |  (File/block dev)
>     +----------------+
>
> I was thinking about whether we could delegate the encoding
> of data slots ->  blobs, to outside the vTPM device emulation
> by using PKCS ?
>
>     +----------------+
>     | Guest App      |
>     +----------------+
>       ^ ^ ^ ^ ^ ^ ^
>       | | | | | | |    Data slots
>       V V V V V V V
>     +----------------+
>     | QEMU vTPM Dev  |
>     +----------------+
>       ^ ^ ^ ^ ^ ^ ^
>       | | | | | | |    Data slots
>       V V V V V V V
>     +----------------+
>     | PKCS#11 Driver |
>     +----------------+
>        ^
>        |               Data blob
>        V
>     +----------------+
>     | Storage device | (File/blockdev/HSM/Smartcard)
>     +----------------+
>
>
v8 (and before) of my TPM patch postings had something like this, but 
nicely layered though, and I was doing it on a per-blob basis, so no 
'slots'. The vTPM dev was passing its raw blobs down to the 'NVRAM' 
layer and that NVRAM either had a key for encryption or not.
In case it didn't have a key it just wrote the data at a certain offset, 
noting the actual blob size in a directory at in the 1st sector.
In case the NVRAM layer had a key it encrypted the blob (which enlarged 
to the next 16 byte boundary due to AES encryption) and wrote that 
AES-CBC encrypted blob at a certain offset, noting the actual 
unencrypted blob size in the directory. The header of the directory 
contained a flag that all data were encrypted -- so this flag was a 
property of each blob on the disk.

Now with Michael's ASN1 encoding and the additional metadata, I think 
the encryption should come after encoding the blob and metadata into 
ASN1 . Again a directory would need a flag for whether the blobs or a 
single blob is encrypted. I guess this again goes back to command line 
parameters as well. Where do we pass the key, Is it a per-device 
property (-tpmdev ...,key=...,) where the device registers a key to use 
on its blob or a per-blobstore/nvram (-nvram drive=...,key=...) property?

    Stefan
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-09-16 11:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
2011-09-14 17:40 ` Michael S. Tsirkin
2011-09-14 17:49   ` Stefan Berger
2011-09-14 17:56     ` Michael S. Tsirkin
2011-09-14 21:12       ` Stefan Berger
2011-09-15  6:57         ` Michael S. Tsirkin
2011-09-15 10:22           ` Stefan Berger
2011-09-15 10:51             ` Michael S. Tsirkin
2011-09-15 10:55               ` Stefan Berger
2011-09-15  5:47 ` Gleb Natapov
2011-09-15 10:18   ` Stefan Berger
2011-09-15 10:20     ` Gleb Natapov
2011-09-15 11:17 ` Stefan Hajnoczi
2011-09-15 11:35   ` Daniel P. Berrange
2011-09-15 11:40   ` Kevin Wolf
2011-09-15 11:58     ` Stefan Hajnoczi
2011-09-15 12:31       ` Michael S. Tsirkin
2011-09-16  8:46       ` Kevin Wolf
2011-09-15 14:19     ` Stefan Berger
2011-09-16  8:12       ` Kevin Wolf
2011-09-15 12:34   ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
2011-09-16 10:35     ` Stefan Hajnoczi
2011-09-16 11:36       ` Stefan Berger
2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
2011-09-15 13:13   ` Stefan Berger
2011-09-15 13:27     ` Daniel P. Berrange
2011-09-15 14:00       ` Stefan Berger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).