[argh, just posted this to qemu-trivial -- it's not trivial]

Hello!

I am posting this message to revive the previous discussions about the design of vNVRAM / blobstore cc'ing (at least) those that participated in this discussion 'back then'.

The first goal of the implementation is to provide an vNVRAM storage for a software implementation of a TPM to store its different blobs into. Some of the data that the TPM writes into persistent memory needs to survive a power down / power up cycle of a virtual machine, therefore this type of persistent storage is needed. For the vNVRAM not to become a road-block for VM migration, we would make use of block device migration and layer the vNVRAM on top of the block device, therefore using virtual machine images for storing the vNVRAM data.

Besides the TPM blobs the vNVRAM should of course also be able able to accommodate other use cases where persistent data is stored into NVRAM, BBRAM (battery backed-up RAM) or EEPROM. As far as I know more recent machines with UEFI also have such types of persistent memory. I believe the current design of the vNVRAM layer accommodates other use cases as well, though additional 'glue devices' would need to be implemented to interact with this vNVRAM layer.

Here is a reference to the previous discussion:

http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg01791.html
http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg01967.html

Two aspects of the vNVRAM seem of primary interest: its API and how the data is organized in the virtual machine image leaving its inner workings to the side for now.

API of the vNVRAM:
------------------

The following functions and data structures are important for devices:

enum NVRAMRWOp {
    NVRAM_OP_READ,
    NVRAM_OP_WRITE,
    NVRAM_OP_FORMAT
};

/**
* Callback function a device must provide for reading and writing
* of blobs as well as for indicating to the NVRAM layer the maximum
* blob size of the given entry. Due to the layout of the data in the
* NVRAM, the device must always write a blob with the size indicated
* during formatting.
* @op: Indication of the NVRAM operation
* @v: Input visitor in case of a read operation, output visitor in
*     case of a write or format operation.
* @entry_name: Unique name of the NVRAM entry
* @opaque: opaque data previously provided when registering the NVRAM
*          entry
* @errp: Pointer to an Error pointer for the visitor to indicate error
*/
typedef int (*NVRAMRWData)(enum NVRAMRWOp op, Visitor *v,
                           const NVRAMEntryName *entry_name, void *opaque,
                           Error **errp);

/**
* nvram_setup:
* @drive_id: The ID of the drive to be used as NVRAM. Following the command
*            line '-drive if=none,id=tpm-bs,file=<file>' 'tpm-bs' would
*            have to be passed.
* @errcode: Pointer to an integer for an error code
* @resetfn : Device reset function
* @dev: The DeviceState to be passed to the device reset function @resetfn
*
* This function returns a pointer to VNVRAM or NULL in case an error occurred
*/
VNVRAM *nvram_setup(const char *drive_id, int *errcode,
                    qdev_resetfn resetfn, DeviceState *dev);

/**
* nvram_delete:
* @nvram: The NVRAM to destroy
*
* Destroy the NVRAM previously allocated using nvram_setup.
*/
int nvram_delete(VNVRAM *nvram);

/**
* nvram_start:
* @nvram: The NVRAM to start
* @fail_on_encrypted_drive: Fail if the drive is encrypted but no
*                           key was provided so far to lower layers.
*
* After all blobs that the device intends to write have been registered
* with the NVRAM, this function is used to start up the NVRAM. In case
* no error occurred, 0 is returned, an error code otherwise.
*/
int nvram_start(VNVRAM *nvram, bool fail_on_encrypted_drive);

/**
* nvram_process_requests:
*
* Have the NVRAM layer process all outstanding requests and wait
* for their completion.
*/
void nvram_process_requests(void);

/**
* nvram_register_entry:
*
* @nvram: The NVRAM to register an entry with
* @entry_name: The unique name of the blob to register
* @rwdata_callback: Callback function for the NVRAM layer to
*                   invoke for asynchronous requests such as
*                   delivering the results of a read operation
*                   or requesting the maximum size of the blob
*                   when formatting.
* @opaque: Data to pass to the callback function
*
* Register an entry for the NVRAM layer to write. In case of success
* this function returns 0, an error code otherwise.
*/
int nvram_register_entry(VNVRAM *nvram, const NVRAMEntryName *entry_name,
                         NVRAMRWData rwdata_callback, void *opaque);

/**
* nvram_deregister_entry:
* @nvram: The NVRAM to deregister an entry from
* @entry_name: the unique name of the entry
*
* Deregister an NVRAM entry previously registered with the NVRAM layer.
* The function returns 0 on success, an error code on failure.
*/
int nvram_deregister_entry(VNVRAM *nvram, const NVRAMEntryName *entry_name);

/**
* nvram_had_fatal_error:
* @nvram: The NVRAM to check
*
* Returns true in case the NVRAM had a fatal error and
* is unusable, false if the device can be used.
*/
bool nvram_had_fatal_error(VNVRAM *nvram);

/**
* nvram_write_data:
* @nvram: The NVRAM to write the data to
* @entry_name: The name of the blob to write
* @flags: Flags indicating sychronouse or asynchronous
*         operation and whether to wait for completion
*         of the operation.
* @cb: callback to invoke for an async write
* @opaque: data to pass to the callback
*
* Write data into NVRAM. This function will invoke the callback
* provided in nvram_setup where an output visitor will be
* provided for writing the blob. This function returns 0 in case
* of success, an error code otherwise.
*/
int nvram_write_data(VNVRAM *nvram, const NVRAMEntryName *entry_name,
                     int flags, NVRAMRWFinishCB cb, void *opaque);

/**
* nvram_write_data:
* @nvram: The NVRAM to read the data from
* @entry_name: The name of the blob tow rite
* @flags: Flags indicating sychronouse or asynchronous
*         operation and whether to wait for completion
*         of the operation.
* @cb: callback to invoke for an async read
* @opaque: data to pass to the callback
*
* Read data from NVRAM. This function will invoke the callback
* provided in nvram_setup where an input visitor will be
* provided for reading the data. This function return 0 in
* case of success, an error code otherwise.
*/
int nvram_read_data(VNVRAM *nvram, const NVRAMEntryName *entry_name,
                    int flags, NVRAMRWFinishCB cb, void *opaque);

/* flags used above */
#define VNVRAM_ASYNC_F              (1 << 0)
#define VNVRAM_WAIT_COMPLETION_F    (1 << 1)

Organization of the data in the virtual machine image:
------------------------------------------------------

All data on the VM image are written as a single ASN.1 stream with a header followed by the individual fixed-sized NVRAM entries. The NVRAM layer creates the header during an NVRAM formatting step that must be initiated by the user (or libvirt) through an HMP or QMP command.

The fact that we are writing ASN.1 formatted data into the virtual machine image is also the reason for the recent posts of the ASN.1 visitor patch series.

/*
* The NVRAM on-disk format is as follows:
* Let '{' and '}' denote an ASN.1 sequence start and end.
*
* {
*   NVRAM-header: "qemu-nvram"
*   # a sequence of blobs:
*   {
*     1st NVRAM entry's name
*     1st NVRAM entry's ASN.1 blob (fixed size)
*   }
*   {
*     2nd NVRAM entry's name
*     2nd NVRAM entry's ASN.1 blob (fixed size)
*   }
*   ...
* }
*/

NVRAM entries are read by searching for the entry identified by its unique name. If it is found, the device's callback function is invoked with an input visitor for the device to read the data.

NVRAM entries are written by searching for the entry identified by its unique name. If it is found, the device's callback function is invoked with an output visitor positioned to where the data need to be written to in the VM image. The device then uses the visitor directly to write the blob.

The ASN.1 blobs have to be of fixed size since an inflating or deflating 1st blob would require that all subsequent blobs be moved or destroy the integrity of the ASN.1 stream.

One complication is the requirements on size of the NVRAM and the fact the virtual machine images typically don't grow. Here users may need a-priori knowledge as to what the size of the NVRAM has to be for the device to properly work. In case of the the TPM for example, the TPM requires a virtual machine image of a certain size for it to be able to write all its blobs into. It may be necessary for human users to start QEMU once to find out the required size of the NVRAM image (using an HMP command) or get it through documentation. In the case of libvirt the required image size could be hard coded into libvirt since it will not change anymore and is a property of the device. Another possibility would be to use QEMU APIs to re-size the image before formatting (this at least did not work a few months ago if I recall correctly, or did not work with all VM image formats; details here faded from memory...)

I think this is enough detail for now. Please let me know of any comments you may have. My primary concern for now is to get clarity on the layout of the data inside the virtual machine image. The ASN.1 visitors were written for this purpose.

Thanks and regards,
    Stefan