From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:42015)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanb@linux.vnet.ibm.com>) id 1R1dRy-0005vJ-L9
	for qemu-devel@nongnu.org; Thu, 08 Sep 2011 08:11:26 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanb@linux.vnet.ibm.com>) id 1R1dRu-0000WG-K0
	for qemu-devel@nongnu.org; Thu, 08 Sep 2011 08:11:22 -0400
Received: from e39.co.us.ibm.com ([32.97.110.160]:57352)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanb@linux.vnet.ibm.com>) id 1R1dRu-0000W6-Cj
	for qemu-devel@nongnu.org; Thu, 08 Sep 2011 08:11:18 -0400
Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com
	[9.17.195.107])
	by e39.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p88Btf1O004426
	for <qemu-devel@nongnu.org>; Thu, 8 Sep 2011 05:55:41 -0600
Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85])
	by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP
	id p88CB2Jh146872
	for <qemu-devel@nongnu.org>; Thu, 8 Sep 2011 06:11:04 -0600
Received: from d03av05.boulder.ibm.com (loopback [127.0.0.1])
	by d03av05.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id p88CB1oC000514
	for <qemu-devel@nongnu.org>; Thu, 8 Sep 2011 06:11:02 -0600
Message-ID: <4E68B0D4.9080807@linux.vnet.ibm.com>
Date: Thu, 08 Sep 2011 08:11:00 -0400
From: Stefan Berger <stefanb@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <20110831143551.127339744@linux.vnet.ibm.com>
	<20110831143623.043146580@linux.vnet.ibm.com>
	<20110901192659.GM10989@redhat.com>
	<4E603E37.40901@linux.vnet.ibm.com>
	<20110907185536.GA15276@redhat.com>
	<4E68095B.60802@linux.vnet.ibm.com>
	<20110908103232.GA25263@redhat.com>
In-Reply-To: <20110908103232.GA25263@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC
	encryption
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: chrisw@redhat.com, anbang.ruan@cs.ox.ac.uk, qemu-devel@nongnu.org, rrelyea@redhat.com, alevy@redhat.com, andreas.niederl@iaik.tugraz.at, serge@hallyn.com

On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote:
> On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote:
>> On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote:
>>> On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote:
>>>>>> An additional 'layer' for reading and writing the blobs to the underlying
>>>>>> block storage is added. This layer encrypts the blobs for writing if a key is
>>>>>> available. Similarly it decrypts the blobs after reading.
>>> So a couple of further thoughts:
>>> 1. Raw storage should work too, and with e.g. NFS migration will be fine, right?
>>>     So I'd say it's worth supporting.
>> NFS via shared storage, yes, but not migration via Qemu's block
>> migration mechanism. If snapshotting was supposed to be a feature to
>> support then that's only possible via block storage (QCoW2 in
>> particular).
> As disk has the same limitation, that sounds fine.
> Let the user decide whether snapshoting is needed,
> same as disk.
>
>> Adding plain file support to the TPM code so it can store its 3
>> blobs into adds quite a bit of complexity to the code. The command
>> line parameter that previously pointed to QCoW2 image file would
>> probably have to point to a directory where files for the 3 blobs
>> can be written into. Besides that, snapshotting would actually have
>> to be prevented maybe through registering a (fake) file of other
>> than QCoW2 type since the plain TPM files won't handle snapshotting
>> correctly, either, and QEMU pretty much would have to be prevented
>> from doing snapshotting at all. Maybe there's an API for this, but I
>> don't know. Though why create this additional complexity? I don't
>> mind relaxing the requirement of using a QCoW2 image and allowing
>> for example RAW images (that then automatically prevent the
>> snapshotting from happening) but the same code I now have would work
>> for writing the blobs into it the single file.
> Right. Write all blobs into a single files at different
> offsets, or something.

That's exactly what I am doing already. Just that I am doing this with 
Qemu's BlockStorage (bdrv)  writing to sectors rather than seek()ing in 
files. To avoid more complexity I'd rather not introduce more code 
handling plain files but rely on all the image formats that qemu already 
supports and that give features like encryption (QCoW2 only), 
snapshotting (QCoW2 only) and block migration (presumably all of them). 
Plain files offer none of that. Devices that need to write their state 
to persistent storage really have to aim for doing this through Qemu's 
bdrv since they will otherwise be the ones killing the snapshot feature. 
TPM certainly doesn't want to be one of them. If the user doesn't want 
snapshotting to be supported since his VM image files are not QCoW2 type 
of files, just create a raw image file for the TPM's persistent state 
and bdrv will automatically prevent snapshotting. The point is that the 
TPM code now using the bdrv layer works with any image format already.

>>> 2. File backed nvram is interesting outside tpm.
>>>     For example,vpd and chassis number for pci, eeprom emulation for network cards.
>>>     Using a file per device might be inconvenient though.
>>>     So please think of a format and API that will allow sections
>>>     for use by different devices.
>> Also here 'snapshotting' is the most 'demanding' feature of QEMU I
>> would say. Snapshotting isn't easily supported outside of the block
>> layer from what I understand. Once you are tied to the block layer
>> you end up having to use images and those don't grow quite well. So
>> other devices wanting to use those type of devices would need to
>> know what the worst case sizes are for writing their state into --
>> unless an image format is created that can grow.
>>
>> As for the format: Ideally all devices could write into one file,
>> right? That would at least prevent too many files besides the VM's
>> image file from floating around which presumably makes image
>> management easier. Following the above, you add up all the worst
>> case sizes the individual devices may need for their blobs and
>> create an image with that capacity. Then you need some form of a
>> (primitive?) directory that lets you write blobs into that storage.
>> Assuming there were well defined names for those devices one could
>> say for example store this blobs under the name
>> 'tpm-permanent-state' and later on load it under that name. The
>> possible size of the directory would have to be considered as
>> well... I do something like that for the TPM where I have up to 3
>> such blobs that I store.
>>
>> The bad thing about the above is of course the need to know what the
>> sum of all the worst case sizes is.
> A typical usecase I know about has prepared vpd/eeprom content.
> We'll typically need a tool to get binary blobs and put that into the
> file image.  That tool can do the necessary math.
> We could also integrate this into qemu-img if we like.
>
>> So a growable image format would
>> be quite good to have. I haven't followed the conversations much,
>> but is that something QCoW3 would support?
> I don't follow - does TPM need a growable image format? Why?
> Hardware typically has fixed amount of memory :)
Ideally the user wouldn't have to worry about creating the single file 
for persistent storage for all the devices at all but Qemu could 
'somehow' do this.
Assume the user starts the VM with a device having an EEPROM. Now that 
device has the need for 10k of persistent storage. So somehow with the 
limitations of images that don't grow you have to have created an image 
of at least 10k a priori. Later the user adds another device to the same 
VM that needs 40k of persistent storage. What now? Dispose the old image 
with the EPPROM data and create a new image with at least 50k to hold 
both their data? Or add another image with just 40k to hold that 
device's persistent state? I'd rather have the 10k image grow to 50k and 
accommodate both state blobs...

>>> 3. Home-grown file formats give us enough trouble in migration.
>>>     Could this use one of the variants of ASN.1?
>>>     There are portable libraries to read/write that, even.
>>>
>> I am not sure what 'this' refers to. What I am doing with the TPM is
>> writing 3 independent blobs at certain offset into the QCoW2 block
>> file. A directory in the first sector holds the offsets, sizes and
>> crc32's of these (unencrypted) blobs.
> Right. It's the encoding of the directory that is custom,
> and that bothers me. I'd prefer a format that is self-describing and
> self-delimiting, give a way to inspect the data using external tools.
Nothing would prevent us from defining a data structure for that 
directory as long as that data structure accommodates all use cases of 
today and especially tomorrow :-).

>> I am not that familiar with ASN.1 except that from what I have seen
>> it looks like a fairly terrible format needing an object language to
>> create a parser from etc. not to mention the problems I had with
>> snacc trying to compile the ASN.1 object language of an RFC...
>>
>>     Stefan
> Sorry about the confusion, we don't need the notation, I don't mean that.
> I mean use a subset of the ASN.1 basic encoding
> http://homepages.dcc.ufmg.br/~coelho/nm/asn.1.intro.pdf
>
> So we could have a set of sequences, with an ascii string (a tag)
> followed by an octet string (content).
>
>
I think the data layout in the image should be in such format that you 
don't have to re-write the whole content of the image if a blob is 
stored. I think a directory at the beginning could solve this. To make 
it simple one probably would need to know how big the 'directory' could 
be otherwise one has to get into allocation of sectors so that once the 
directory was to grow beyond 512 bytes that one would know where its 
next data are written into. The same is true for the devices' data 
blobs. If one knows the sizes of all the blobs one can lay them out to 
start and end at specific offsets in the image. And knowing the size of 
all the blobs helps in creating the image of correct size.
Well, all this is a work-around for not having a 'filesystem'.

     Stefan