[argh, just posted this
to qemu-trivial -- it's not trivial]
Hello!
I am posting this message to revive the previous discussions about
the design of vNVRAM / blobstore cc'ing (at least) those that
participated in this discussion 'back then'.
The first goal of the implementation is to provide an vNVRAM
storage for a software implementation of a TPM to store its
different blobs into. Some of the data that the TPM writes into
persistent memory needs to survive a power down / power up cycle
of a virtual machine, therefore this type of persistent storage is
needed. For the vNVRAM not to become a road-block for VM
migration, we would make use of block device migration and layer
the vNVRAM on top of the block device, therefore using virtual
machine images for storing the vNVRAM data.
Besides the TPM blobs the vNVRAM should of course also be able
able to accommodate other use cases where persistent data is
stored into NVRAM, BBRAM (battery backed-up RAM) or EEPROM. As far
as I know more recent machines with UEFI also have such types of
persistent memory. I believe the current design of the vNVRAM
layer accommodates other use cases as well, though additional
'glue devices' would need to be implemented to interact with this
vNVRAM layer.
Here is a reference to the previous discussion:
http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg01791.html
http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg01967.html
Two aspects of the vNVRAM seem of primary interest: its API and
how the data is organized in the virtual machine image leaving its
inner workings to the side for now.
API of the vNVRAM:
------------------
The following functions and data structures are important for
devices:
enum NVRAMRWOp {
NVRAM_OP_READ,
NVRAM_OP_WRITE,
NVRAM_OP_FORMAT
};
/**
* Callback function a device must provide for reading and writing
* of blobs as well as for indicating to the NVRAM layer the
maximum
* blob size of the given entry. Due to the layout of the data in
the
* NVRAM, the device must always write a blob with the size
indicated
* during formatting.
* @op: Indication of the NVRAM operation
* @v: Input visitor in case of a read operation, output visitor
in
* case of a write or format operation.
* @entry_name: Unique name of the NVRAM entry
* @opaque: opaque data previously provided when registering the
NVRAM
* entry
* @errp: Pointer to an Error pointer for the visitor to indicate
error
*/
typedef int (*NVRAMRWData)(enum NVRAMRWOp op, Visitor *v,
const NVRAMEntryName *entry_name, void
*opaque,
Error **errp);
/**
* nvram_setup:
* @drive_id: The ID of the drive to be used as NVRAM. Following
the command
* line '-drive if=none,id=tpm-bs,file=<file>'
'tpm-bs' would
* have to be passed.
* @errcode: Pointer to an integer for an error code
* @resetfn : Device reset function
* @dev: The DeviceState to be passed to the device reset function
@resetfn
*
* This function returns a pointer to VNVRAM or NULL in case an
error occurred
*/
VNVRAM *nvram_setup(const char *drive_id, int *errcode,
qdev_resetfn resetfn, DeviceState *dev);
/**
* nvram_delete:
* @nvram: The NVRAM to destroy
*
* Destroy the NVRAM previously allocated using nvram_setup.
*/
int nvram_delete(VNVRAM *nvram);
/**
* nvram_start:
* @nvram: The NVRAM to start
* @fail_on_encrypted_drive: Fail if the drive is encrypted but no
* key was provided so far to lower
layers.
*
* After all blobs that the device intends to write have been
registered
* with the NVRAM, this function is used to start up the NVRAM. In
case
* no error occurred, 0 is returned, an error code otherwise.
*/
int nvram_start(VNVRAM *nvram, bool fail_on_encrypted_drive);
/**
* nvram_process_requests:
*
* Have the NVRAM layer process all outstanding requests and wait
* for their completion.
*/
void nvram_process_requests(void);
/**
* nvram_register_entry:
*
* @nvram: The NVRAM to register an entry with
* @entry_name: The unique name of the blob to register
* @rwdata_callback: Callback function for the NVRAM layer to
* invoke for asynchronous requests such as
* delivering the results of a read operation
* or requesting the maximum size of the blob
* when formatting.
* @opaque: Data to pass to the callback function
*
* Register an entry for the NVRAM layer to write. In case of
success
* this function returns 0, an error code otherwise.
*/
int nvram_register_entry(VNVRAM *nvram, const NVRAMEntryName
*entry_name,
NVRAMRWData rwdata_callback, void
*opaque);
/**
* nvram_deregister_entry:
* @nvram: The NVRAM to deregister an entry from
* @entry_name: the unique name of the entry
*
* Deregister an NVRAM entry previously registered with the NVRAM
layer.
* The function returns 0 on success, an error code on failure.
*/
int nvram_deregister_entry(VNVRAM *nvram, const NVRAMEntryName
*entry_name);
/**
* nvram_had_fatal_error:
* @nvram: The NVRAM to check
*
* Returns true in case the NVRAM had a fatal error and
* is unusable, false if the device can be used.
*/
bool nvram_had_fatal_error(VNVRAM *nvram);
/**
* nvram_write_data:
* @nvram: The NVRAM to write the data to
* @entry_name: The name of the blob to write
* @flags: Flags indicating sychronouse or asynchronous
* operation and whether to wait for completion
* of the operation.
* @cb: callback to invoke for an async write
* @opaque: data to pass to the callback
*
* Write data into NVRAM. This function will invoke the callback
* provided in nvram_setup where an output visitor will be
* provided for writing the blob. This function returns 0 in case
* of success, an error code otherwise.
*/
int nvram_write_data(VNVRAM *nvram, const NVRAMEntryName
*entry_name,
int flags, NVRAMRWFinishCB cb, void *opaque);
/**
* nvram_write_data:
* @nvram: The NVRAM to read the data from
* @entry_name: The name of the blob tow rite
* @flags: Flags indicating sychronouse or asynchronous
* operation and whether to wait for completion
* of the operation.
* @cb: callback to invoke for an async read
* @opaque: data to pass to the callback
*
* Read data from NVRAM. This function will invoke the callback
* provided in nvram_setup where an input visitor will be
* provided for reading the data. This function return 0 in
* case of success, an error code otherwise.
*/
int nvram_read_data(VNVRAM *nvram, const NVRAMEntryName
*entry_name,
int flags, NVRAMRWFinishCB cb, void *opaque);
/* flags used above */
#define VNVRAM_ASYNC_F (1 << 0)
#define VNVRAM_WAIT_COMPLETION_F (1 << 1)
Organization of the data
in the virtual machine image:
------------------------------------------------------
All data on the VM image are written as a single ASN.1 stream
with a header followed by the individual fixed-sized NVRAM
entries. The NVRAM layer creates the header during an NVRAM
formatting step that must be initiated by the user (or libvirt)
through an HMP or QMP command.
The fact that we are writing ASN.1 formatted data into the
virtual machine image is also the reason for the recent posts of
the ASN.1 visitor patch series.
/*
* The NVRAM on-disk format is as follows:
* Let '{' and '}' denote an ASN.1 sequence start and end.
*
* {
* NVRAM-header: "qemu-nvram"
* # a sequence of blobs:
* {
* 1st NVRAM entry's name
* 1st NVRAM entry's ASN.1 blob (fixed size)
* }
* {
* 2nd NVRAM entry's name
* 2nd NVRAM entry's ASN.1 blob (fixed size)
* }
* ...
* }
*/
NVRAM entries are read by searching for the entry identified by
its unique name. If it is found, the device's callback function
is invoked with an input visitor for the device to read the
data.
NVRAM entries are written by searching for the entry identified
by its unique name. If it is found, the device's callback
function is invoked with an output visitor positioned to where
the data need to be written to in the VM image. The device then
uses the visitor directly to write the blob.
The ASN.1 blobs have to be of fixed size since an inflating or
deflating 1st blob would require that all subsequent blobs be
moved or destroy the integrity of the ASN.1 stream.
One complication is the requirements on size of
the NVRAM and the fact the virtual machine images typically
don't grow. Here users may need a-priori knowledge as to
what the size of the NVRAM has to be for the device to
properly work. In case of the the TPM for example, the TPM
requires a virtual machine image of a certain size for it to
be able to write all its blobs into. It may be necessary for
human users to start QEMU once to find out the required size
of the NVRAM image (using an HMP
command) or get it
through documentation. In the case of libvirt the required
image size could be hard coded into
libvirt since it will not change anymore and is a property of the
device. Another possibility would be to use QEMU APIs to re-size
the image before formatting (this at
least did not work a few months ago if I recall correctly, or
did not work with all VM image formats; details here faded
from memory...)
I think this is enough detail for now. Please let me know of any
comments you may have. My primary concern for now is to get
clarity on the layout of the data inside the virtual machine
image. The ASN.1 visitors were written for this purpose.
Thanks and regards,
Stefan