From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:58794) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R4B9y-0001AY-R4 for qemu-devel@nongnu.org; Thu, 15 Sep 2011 08:35:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R4B9t-0001gq-SV for qemu-devel@nongnu.org; Thu, 15 Sep 2011 08:35:18 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:55925) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R4B9t-0001fw-Lq for qemu-devel@nongnu.org; Thu, 15 Sep 2011 08:35:13 -0400 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by e39.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p8FCJVc0025750 for ; Thu, 15 Sep 2011 06:19:31 -0600 Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p8FCYvVm131996 for ; Thu, 15 Sep 2011 06:35:01 -0600 Received: from d03av05.boulder.ibm.com (loopback [127.0.0.1]) by d03av05.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p8FCYust007783 for ; Thu, 15 Sep 2011 06:34:56 -0600 Message-ID: <4E71F0EF.6070803@linux.vnet.ibm.com> Date: Thu, 15 Sep 2011 08:34:55 -0400 From: Stefan Berger MIME-Version: 1.0 References: <4E70DEE8.8090908@linux.vnet.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Design of the blobstore [API of the NVRAM] List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Markus Armbruster , Anthony Liguori , QEMU Developers , "Michael S. Tsirkin" On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote: > On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger > wrote: >> One property of the blobstore is that it has a certain required size for >> accommodating all blobs of device that want to store their blobs onto. The >> assumption is that the size of these blobs is know a-priori to the writer of >> the device code and all devices can register their space requirements with >> the blobstore during device initialization. Then gathering all the >> registered blobs' sizes plus knowing the overhead of the layout of the data >> on the disk lets QEMU calculate the total required (minimum) size that the >> image has to have to accommodate all blobs in a particular blobstore. > Libraries like tdb or gdbm come to mind. We should be careful not to > reinvent cpio/tar or FAT :). Sure. As long as these dbs allow to over-ride open(), close(), read(), write() and seek() with bdrv ops we could recycle any of these. Maybe we can build something smaller than those... > What about live migration? If each VM has a LUN assigned on a SAN > then these qcow2 files add a new requirement for a shared file system. > Well, one can still block-migrate these. The user has to know of course whether shared storage is setup or not and pass the appropriate flags to libvirt for migration. I know it works (modulo some problems when using encrypted QCoW2) since I've been testing with it. > Perhaps it makes sense to include the blobstore in the VM state data > instead? If you take that approach then the blobstore will get > snapshotted *into* the existing qcow2 images. Then you don't need a > shared file system for migration to work. > It could be an option. However, if the user has a raw image for the VM we still need the NVRAM emulation for the TPM for example. So we need to store the persistent data somewhere but raw is not prepared for that. Even if snapshotting doesn't work at all we need to be able to persist devices' data. > Can you share your design for the actual QEMU API that the TPM code > will use to manipulate the blobstore? Is it designed to work in the > event loop while QEMU is running, or is it for rare I/O on > startup/shutdown? > Everything is kind of changing now. But here's what I have right now: tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode); if (!tb->s.tpm_ltpms->nvram) { fprintf(stderr, "Could not find nvram.\n"); return errcode; } nvram_register_blob(tb->s.tpm_ltpms->nvram, NVRAM_ENTRY_PERMSTATE, tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE)); nvram_register_blob(tb->s.tpm_ltpms->nvram, NVRAM_ENTRY_SAVESTATE, tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE)); nvram_register_blob(tb->s.tpm_ltpms->nvram, NVRAM_ENTRY_VOLASTATE, tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE)); rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive); Above first sets up the NVRAM using the drive's id. That is the -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM. Subsequently the blobs to be written into the NVRAM are registered. The nvram_start then reconciles the registered NVRAM blobs with those found on disk and if everything fits together the result is 'rc = 0' and the NVRAM is ready to go. Other devices can than do the same also with the same NVRAM or another NVRAM. (NVRAM now after renaming from blobstore). Reading from NVRAM in case of the TPM is a rare event. It happens in the context of QEMU's main thread: if (nvram_read_data(tpm_ltpms->nvram, NVRAM_ENTRY_PERMSTATE, &tpm_ltpms->permanent_state.buffer, &tpm_ltpms->permanent_state.size, 0, NULL, NULL) || nvram_read_data(tpm_ltpms->nvram, NVRAM_ENTRY_SAVESTATE, &tpm_ltpms->save_state.buffer, &tpm_ltpms->save_state.size, 0, NULL, NULL)) { tpm_ltpms->had_fatal_error = true; return; } Above reads the data of 2 blobs synchronously. This happens during startup. Writes are depending on what the user does with the TPM. He can trigger lots of updates to persistent state if he performs certain operations, i.e., persisting keys inside the TPM. rc = nvram_write_data(tpm_ltpms->nvram, what, tsb->buffer, tsb->size, VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F, NULL, NULL); Above writes a TPM blob into the NVRAM. This is triggered by the TPM thread and notifies the QEMU main thread to write the blob into NVRAM. I do this synchronously at the moment not using the last two parameters for callback after completion but the two flags. The first is to notify the main thread the 2nd flag is to wait for the completion of the request (using a condition internally). Here are the protos: VNVRAM *nvram_setup(const char *drive_id, int *errcode); int nvram_start(VNVRAM *, bool fail_on_encrypted_drive); int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type, unsigned int maxsize); unsigned int nvram_get_totalsize(VNVRAM *bs); unsigned int nvram_get_totalsize_kb(VNVRAM *bs); typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write, unsigned char **data, unsigned int len); int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type, const unsigned char *data, unsigned int len, int flags, NVRAMRWFinishCB cb, void *opaque); As said, things are changing right now, so this is to give an impression... Stefan > Stefan >