* [virtio-comment] [PATCH v2 1/8] admin: Add theory of operation for device migration
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 2/8] admin: Redefine reserved2 as command specific output Parav Pandit
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
One or more passthrough PCI VF devices are ubiquitous for virtual
machines usage using generic kernel framework.
A passthrough PCI VF device is fully owned by the virtual machine
device driver. This passthrough device controls its own device
reset flow, basic functionality as PCI VF function level reset
and rest of the virtio device functionality such as control vq,
config space access, data path descriptors handling.
Additionally, VM live migration using a precopy method is also widely used.
To support a VM live migration for such passthrough virtio member devices,
the owner PCI PF device administers the device migration flow.
This patch introduces the basic theory of operation which describes the flow
and supporting administration commands.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
v0->v1:
- addressed comments from Jason
- simplified commit log to remove wording of flow
- added link to the device reset section
- addressed comments from Michael
---
admin-cmds-device-migration.tex | 95 +++++++++++++++++++++++++++++++++
admin.tex | 1 +
2 files changed, 96 insertions(+)
create mode 100644 admin-cmds-device-migration.tex
diff --git a/admin-cmds-device-migration.tex b/admin-cmds-device-migration.tex
new file mode 100644
index 0000000..d172130
--- /dev/null
+++ b/admin-cmds-device-migration.tex
@@ -0,0 +1,95 @@
+\subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device / Device groups / Group
+administration commands / Device Migration}
+
+In some systems, there is a need to migrate a running virtual machine
+from one to another system. A running virtual machine has one or more
+passthrough virtio member devices attached to it. A passthrough device
+is entirely operated by the guest virtual machine. For example, with
+the SR-IOV group type, group member device VF undergos device reset
+\ref{sec:Basic Facilities of a Virtio Device / Device Reset}
+and may also undergo PCI function level reset(FLR). Such operations
+are in control of the guest virtual machine which must comply to the
+device reset requirements and the PCI standard; at the same time those
+operations must not obstruct the device migration. In such a scenario,
+a group owner device can provide the administration command interface
+to facilitate the device migration related operations.
+
+When a virtual machine migrates from one hypervisor to another hypervisor,
+these hypervisors are named as source and destination hypervisor respectively.
+In such a scenario, a source hypervisor administers the
+member device to suspend the device and preserves the device context.
+Subsequently, a destination hypervisor administers the member device to
+setup a device context and resumes the member device. The source hypervisor
+reads the member device context and the destination hypervisor writes the member
+device context. The method to transfer the member device context from the source
+to the destination hypervisor is outside the scope of this specification.
+
+The member device can be in any of the three migration modes. The owner driver
+sets the member device in one of the following modes during device migration flow.
+
+\begin{tabularx}{\textwidth}{ |l||l|X| }
+\hline
+Value & Name & Description \\
+\hline \hline
+0x0 & Active &
+ It is the default mode after instantiation of the member device. \\
+\hline
+0x1 & Stop &
+ In this mode, the member device does not send any notifications,
+ and it does not access any driver memory.
+ The member device may receive driver notifications in this mode,
+ the member device context and device configuration space may change. \\
+\hline
+0x2 & Freeze &
+ In this mode, the member device does not accept any driver notifications,
+ it ignores any device configuration space writes,
+ the device do not have any changes in the device context. The
+ member device is not accessed in the system through the virtio interface. \\
+\hline
+\hline
+0x03-0xFF & - & reserved for future use \\
+\hline
+\end{tabularx}
+
+When the owner driver wants to stop the operation of the
+device, the owner driver sets the device mode to \field{Stop}. Once the
+device is in the \field{Stop} mode, the device does not initiate any notifications
+or does not access any driver memory. Since the member driver may be still
+active which may send further driver notifications to the device, the device
+context may be updated. When the member driver has stopped accessing the
+device, the owner driver sets the device to \field{Freeze} mode indicating
+to the device that no more driver access occurs. In the \field{Freeze} mode,
+no more changes occur in the device context. At this point, the device ensures
+that there will not be any update to the device context.
+
+The member device has a device context which the owner driver can either
+read or write. The member device context consist of any device specific
+data which is needed by the device to resume its operation when the device mode
+is changed from \field{Stop} to \field{Active} or from \field{Freeze}
+to \field{Active}.
+
+Once the device context is read, it is cleared from the device. Typically, on
+the source hypervisor, the owner driver reads the device context once when
+the device is in \field{Active} or \field{Stop} mode and later once the member
+device is in \field{Freeze} mode.
+
+Typically, the device context is read and written one time on the source and
+the destination hypervisor respectively once the device is in \field{Freeze}
+mode. On the destination hypervisor, after writing the device context,
+when the device mode set to \field{Active}, the device uses the most recently
+set device context and resumes the device operation.
+
+In an alternative flow, on the source hypervisor the owner driver may choose
+to read the device context first time while the device is in \field{Active} mode
+and second time once the device is in \field{Freeze} mode. Similarly, on the
+destination hypervisor writes the device context first time while the device
+is still running in \field{Active} mode on the source hypervisor and writes
+the device context second time while the device is in \field{Freeze} mode.
+This flow may result in very short setup time as the device context likely
+have minimal changes from the previously written device context. This flow may
+reduce the device migration time significantly and may have near constant
+device activation time regardless of number of virtqueues, resources and
+passthough devices in use by the migrating virtual machine.
+
+The owner driver can discard any partially read or written device context when
+any of the device migration flow should be aborted.
diff --git a/admin.tex b/admin.tex
index 0803c26..6eeef58 100644
--- a/admin.tex
+++ b/admin.tex
@@ -297,6 +297,7 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
might differ between different group types.
\input{admin-cmds-legacy-interface.tex}
+\input{admin-cmds-device-migration.tex}
\devicenormative{\subsubsection}{Group administration commands}{Basic Facilities of a Virtio Device / Device groups / Group administration commands}
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* [virtio-comment] [PATCH v2 2/8] admin: Redefine reserved2 as command specific output
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 1/8] admin: Add theory of operation for device migration Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 3/8] device-context: Define the device context fields for device migration Parav Pandit
` (7 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
Currently when a command wants to get two distinct types of data in
the result, such as one consumed by the driver, other to be zero
copied to some user buffers, the driver needs to prepare an
extra descriptor for driver consumed field. When such a field is
<= 4 bytes, extra descriptor is an overhead.
virtio_admin_command already has 4B of reserved for the device
writable area. Utilize it to define as device writable output.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
admin.tex | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/admin.tex b/admin.tex
index 6eeef58..c86813d 100644
--- a/admin.tex
+++ b/admin.tex
@@ -90,8 +90,7 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
/* Device-writable part */
le16 status;
le16 status_qualifier;
- /* unused, reserved for future extensions */
- u8 reserved2[4];
+ u8 command_specific_output[4];
u8 command_specific_result[];
};
\end{lstlisting}
@@ -192,11 +191,15 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
\hline
\end{tabularx}
-Each command uses a different \field{command_specific_data} and
-\field{command_specific_result} structures and the length of
+Each command uses a different \field{command_specific_data},
+\field{command_specific_output} and
+\field{command_specific_result} fields. The length of
\field{command_specific_data} and \field{command_specific_result}
-depends on these structures and is described separately or is
-implicit in the structure description.
+depends on the command and is described separately or is
+implicit in the structure description. The \field{command_specific_output}
+describes any command specific output which is up to 4 bytes size. The
+\field{command_specific_output} contain one or more command specific
+fields.
Before sending any group administration commands to the device, the driver
needs to communicate to the device which commands it is going to
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* [virtio-comment] [PATCH v2 3/8] device-context: Define the device context fields for device migration
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 1/8] admin: Add theory of operation for device migration Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 2/8] admin: Redefine reserved2 as command specific output Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 4/8] admin: Add device migration admin commands Parav Pandit
` (6 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
Define the device context and its fields for purpose of device
migration. The device context is read and written by the owner driver
on source and destination hypervisor respectively.
Device context fields will experience a rapid growth post this initial
version to cover many details of the device.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
---
changelog:
v1->v2:
- addressed comments from Michael
- dropped layout from the enums and definition
- defined more practical fields type range of 16-bit
- split the range to generic and device type range
- added assumptions and device context extension sections for future
proofing
v0->v1:
- enrich device context to cover feature bits, device configuration
fields
- corrected alignment of device context fields
---
content.tex | 1 +
device-context.tex | 189 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 190 insertions(+)
create mode 100644 device-context.tex
diff --git a/content.tex b/content.tex
index 0a62dce..2698931 100644
--- a/content.tex
+++ b/content.tex
@@ -503,6 +503,7 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
\input{admin.tex}
+\input{device-context.tex}
\chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
diff --git a/device-context.tex b/device-context.tex
new file mode 100644
index 0000000..0921f1c
--- /dev/null
+++ b/device-context.tex
@@ -0,0 +1,189 @@
+\section{Device Context}\label{sec:Basic Facilities of a Virtio Device / Device Context}
+
+The device context holds the information that a owner driver can use
+to setup a member device and resume its operation. The device context
+of a member device is read or written by the owner driver using
+administration commands.
+
+The device context mainly consists of two types of fields. One type is generic
+device agnostic fields, this is identified by \field{type} range 0 to 0xFFF. The
+second type is a device type specific fields; the device type specific range
+is reserved from 0x1000 to 0x1FFFF.
+
+\begin{lstlisting}
+struct virtio_dev_ctx_field_tlv {
+ le16 type;
+ u8 reserved[6];
+ le64 length;
+ u8 value[];
+};
+
+struct virtio_dev_ctx {
+ le32 field_count;
+ struct virtio_dev_ctx_field_tlv fields[];
+};
+
+\end{lstlisting}
+
+The \field{struct virtio_dev_ctx} is the device context of a member device.
+The \field{field_count} indicates how many instances of
+\field{struct virtio_dev_ctx_field_tlv} are present.
+
+The \field{struct virtio_dev_ctx_field_tlv} consist of \field{type} indicating
+what data is contained in the \field{value} of length \field{length}.
+The valid values for \field{type} can be found in the following table:
+
+\begin{table}
+\caption{\label{tab:Device Context Fields} Device Context Fields}
+\begin{tabularx}{\textwidth}{ |l||l|X| }
+\hline
+type & Name & Description \\
+\hline \hline
+0x0 & VIRTIO_DEV_CTX_PCI_COMMON_CFG & Provides common configuration space of device for PCI transport \\
+\hline
+0x1 & VIRTIO_DEV_CTX_DEV_FEATURES & Provides device features \\
+\hline
+0x2 & VIRTIO_DEV_CTX_PCI_VQ_CFG & Provides Virtqueue configuration for PCI transport \\
+\hline
+0x3 & VIRTIO_DEV_CTX_VQ_SPLIT_RUNTIME_CFG & Provides Queue run time state \\
+\hline
+0x4 & VIRTIO_DEV_CTX_VQ_SPLIT_DEV_OWN_DESC & Provides list of virtqueue descriptors owned by device \\
+\hline
+0x5 - 0xFFF & - & Generic device agnostic range reserved for future \\
+\hline
+0x1000 & VIRTIO_DEV_CTX_DEV_CFG & Provides device specific configuration \\
+\hline
+0x1001 - 0x1FFF & - & Device type specific range reserved for future \\
+\hline
+0x3000 - 0xFFFF & - & Reserved for future \\
+\hline
+\end{tabularx}
+\end{table}
+
+\subsection{Device Context Fields}\label{sec:Basic Facilities of a Virtio Device / Device Context / Device Context Fields}
+
+\subsubsection{PCI Common Configuration Context}
+\label{par:Basic Facilities of a Virtio Device / Device Context / Device Context Fields/ PCI Common Configuration Context}
+
+For the field VIRTIO_DEV_CTX_PCI_COMMON_CFG, \field{type} is set to 0x0.
+The \field{value} is in format of \field{struct virtio_pci_common_cfg}.
+The \field{length} is the length of \field{struct virtio_pci_common_cfg}.
+
+\subsubsection{Device Features Context}
+\label{par:Basic Facilities of a Virtio Device / Device Context / Device Context Fields/ Device Features Context}
+
+For the field VIRTIO_DEV_CTX_DEV_FEATURES, \field{type} is set to 0x1.
+The \field{value} is in format of device feature bits listed in
+\ref{sec:Basic Facilities of a Virtio Device / Feature Bits} in the format of \field{struct virtio_dev_ctx_features}.
+The \field{length} is the length of the \field{value}.
+
+\begin{lstlisting}
+struct virtio_dev_ctx_pci_vq_cfg {
+ le64 feature_bits[];
+};
+\end{lstlisting}
+
+\subsubsection{PCI Virtqueue Configuration Context}
+\label{par:Basic Facilities of a Virtio Device / Device Context / Device Context Fields/ PCI Virtqueue Configuration Context}
+
+For the field VIRTIO_DEV_CTX_PCI_VQ_CFG, \field{type} is set to 0x2.
+The \field{value} is in format of \field{struct virtio_dev_ctx_pci_vq_cfg}.
+The \field{length} is the length of \field{struct virtio_dev_ctx_pci_vq_cfg}.
+
+\begin{lstlisting}
+struct virtio_dev_ctx_pci_vq_cfg {
+ le16 vq_index;
+ le16 queue_size;
+ le16 queue_msix_vector;
+ le64 queue_desc;
+ le64 queue_driver;
+ le64 queue_device;
+};
+\end{lstlisting}
+
+One or multiple entries of PCI Virtqueue Configuration Context may exist, each such
+entry corresponds to a unique virtqueue identified by the \field{vq_index}.
+
+\subsubsection{Virtqueue Split Mode Runtime Context}
+\label{par:Basic Facilities of a Virtio Device / Device Context / Device Context Fields/ Virtqueue Split Mode Runtime Context}
+
+For the field VIRTIO_DEV_CTX_VQ_SPLIT_RUNTIME_CFG, \field{type} is set to 0x3.
+The \field{value} is in format of \field{struct virtio_dev_ctx_vq_split_runtime}.
+The \field{length} is the length of \field{struct virtio_dev_ctx_vq_split_runtime}.
+
+\begin{lstlisting}
+struct virtio_dev_ctx_vq_split_runtime {
+ le16 vq_index;
+ le16 dev_avail_idx;
+ u8 enabled;
+};
+\end{lstlisting}
+
+The \field{dev_avail_idx} indicates the next available index of the virtqueue from which
+the device must start processing the available ring.
+
+One or multiple entries of Virtqueue Split Mode Runtime Context may exist, each such
+entry corresponds to a unique virtqueue identified by the \field{vq_index}.
+
+\subsubsection{Virtqueue Split Mode Device owned Descriptors Context}
+
+For the field VIRTIO_DEV_CTX_VQ_SPLIT_DEV_OWN_DESC, \field{type} is set to 0x4.
+The \field{value} is in format of \field{struct virtio_dev_ctx_vq_split_runtime}.
+The \field{length} is the length of \field{struct virtio_dev_ctx_vq_split_dev_descs}.
+
+\begin{lstlisting}
+struct virtio_dev_ctx_vq_split_dev_descs {
+ le16 vq_index;
+ le16 desc_count;
+ le16 desc_idx[];
+};
+\end{lstlisting}
+
+The \field{desc_idx} contains indices of the descriptors in \field{desc_count} of a
+virtqueue identified by \field{vq_index} which is owned by the device.
+
+One or multiple entries of Virtqueue Split Mode Device owned Descriptors Context may exist, each such
+entry corresponds to a unique virtqueue identified by the \field{vq_index}.
+
+\subsubsection{Device Configuration Context}
+\label{par:Basic Facilities of a Virtio Device / Device Context / Device Context Fields/ Device Configuration Context}
+
+For the field VIRTIO_DEV_CTX_DEV_CFG, \field{type} is set to 0x1000.
+The \field{value} is in format of device specific configuration listed
+in each of the device's device configuration layout section.
+For example, for File System Device, \field{value} is in format of
+\field{struct virtio_fs_config}.
+The \field{length} is the length of the device configuration data of
+\field {value}.
+
+\subsubsection{Device Context Extensions}
+Various considerations are necessary when creating new device context field or
+when extending the device context field structure.
+
+1. How to define a new device context field? \\
+If the new field is generic for all the device types or most of the device types,
+it should be added under the generic field range. If the new field is unique to
+a device type, it should be added under the device range type. \\
+
+2. When to define a new device context field? \\
+When the device context field for a specific field does not exists, one should
+define a new device context field. \\
+
+3. How to avoid duplication of device context field definition with device
+ specific structures which may be present as control vq data structures? \\
+Each device should reuse any existing field definition that may exists as part
+of device control virtqueue or any other request structure. \\
+
+4. How to extend the existing device context field definition? \\
+When a element is missing in already defined field, a new field must be added at
+the end of the device context field. New field MUST not be added at beginning or in
+the middle of the field structure. Any field which is already present MUST NOT
+be removed. \\
+
+\subsubsection{Assumptions}
+For the SR-IOV group type, some hypervisor do not permit the driver to access
+PCI configuration space and MSI-X Table space directly. Such hypervisor handles the
+query and saving of these fields without the need of its existence in device context.
+Hence, this version of the specification do not have it in the device context. A future
+extension of the device context may further include them with new field type for
+each of the field.
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* [virtio-comment] [PATCH v2 4/8] admin: Add device migration admin commands
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
` (2 preceding siblings ...)
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 3/8] device-context: Define the device context fields for device migration Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 5/8] admin: Add requirements of device migration commands Parav Pandit
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
A passthrough device is mapped to the guest VM. A passthrough device
accessed by the driver can undergo its own device reset and for PCI
transport it can undergo its PCI FLR while the guest VM migration is
ongoing.
The passhtrough device may not have any direct channel through which
device migration related administrative tasks can be done, and even if
it may have such adminstative task must not be interrupted by the
device reset or VF FLR flow initiated by the passthrough device.
Hence, the owner driver which administers the member devices,
facilitate the device migration flow.
Add device migration administration commands that owner driver can use
for the passthrough device.
Subsequent patch defines the device context in detail.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
---
changelog:
v1->v2:
- addressed comments from Michael
- updated commit log to refer to device context in later patch
- moved admin command table opcode to this (right) patch
- added command to query supported fields of the device context
---
admin-cmds-device-migration.tex | 233 +++++++++++++++++++++++++++++++-
admin.tex | 16 ++-
2 files changed, 247 insertions(+), 2 deletions(-)
diff --git a/admin-cmds-device-migration.tex b/admin-cmds-device-migration.tex
index d172130..bbe5902 100644
--- a/admin-cmds-device-migration.tex
+++ b/admin-cmds-device-migration.tex
@@ -66,7 +66,8 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
read or write. The member device context consist of any device specific
data which is needed by the device to resume its operation when the device mode
is changed from \field{Stop} to \field{Active} or from \field{Freeze}
-to \field{Active}.
+to \field{Active}. The device context is described in section
+\ref{sec:Basic Facilities of a Virtio Device / Device Context}.
Once the device context is read, it is cleared from the device. Typically, on
the source hypervisor, the owner driver reads the device context once when
@@ -93,3 +94,233 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
The owner driver can discard any partially read or written device context when
any of the device migration flow should be aborted.
+
+The owner driver uses following device migration group administration commands.
+
+\begin{enumerate}
+\item Device Mode Get Command
+\item Device Mode Set Command
+\item Device Context Size Get Command
+\item Device Context Read Command
+\item Device Context Write Command
+\item Device Context Supported Fields Query Command
+\item Device Context Discard Command
+\end{enumerate}
+
+These commands are currently only defined for the SR-IOV group type.
+
+\paragraph{Device Mode Get Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Mode Get Command}
+
+This command reads the mode of the device.
+For the command VIRTIO_ADMIN_CMD_DEV_MODE_GET, \field{opcode}
+is set to 0x7.
+The \field{group_member_id} refers to the member device to be accessed.
+This command has no command specific data.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_mode_get_result {
+ u8 mode;
+};
+\end{lstlisting}
+
+When the command completes successfully, \field{command_specific_result}
+is in the format \field{struct virtio_admin_cmd_dev_mode_get_result}
+returned by the device where the device returns the \field{mode} value to
+either \field{Active} or \field{Stop} or \field{Freeze}.
+
+\paragraph{Device Mode Set Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Mode Set Command}
+
+This command sets the mode of the device.
+The \field{command_specific_data} is in the format
+\field{struct virtio_admin_cmd_dev_mode_set_data} describing the new device mode.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_mode_set_data {
+ u8 mode;
+};
+\end{lstlisting}
+
+For the command VIRTIO_ADMIN_CMD_DEV_MODE_SET, \field{opcode} is set to 0x8.
+The \field{group_member_id} refers to the member device to be accessed.
+The \field{mode} is set to either \field{Active} or \field{Stop} or
+\field{Freeze}.
+
+This command has no command specific result. When the command completes
+successfully, device is set in the new \field{mode}. When the command fails
+the device stays in the previous mode.
+
+\paragraph{Device Context Size Get Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Context Size Get Command}
+
+This command returns the remaining estimated device context size. The
+driver can query the remaining estimated device context size
+for the current mode or for the \field{Freeze} mode. While
+reading the device context using VIRTIO_ADMIN_CMD_DEV_CTX_READ command, the
+actual device context size may differ than what is being returned by
+this command. After reading the device context using command
+VIRTIO_ADMIN_CMD_DEV_CTX_READ, the remaining estimated context size
+usually reduces by amount of device context read by the driver using
+VIRTIO_ADMIN_CMD_DEV_CTX_READ command. If the device context is updated
+rapidly the remaining estimated context size may also increase even after
+reading the device context using VIRTIO_ADMIN_CMD_DEV_CTX_READ command.
+
+For the command VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET, \field{opcode} is set to 0x9.
+The \field{group_member_id} refers to the member device to be accessed.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_ctx_size_get_data {
+ u8 freeze_mode;
+};
+\end{lstlisting}
+
+The \field{command_specific_data} is in the format
+\field{struct virtio_admin_cmd_dev_ctx_size_get_data}.
+When \field{freeze_mode} is set to 1, the device returns the estimated
+device context size when the device will be in \field{Freeze} mode.
+As the device context is read from the device, the remaining estimated
+context size may decrease. For example, member device mode is
+\field{Stop}, the device has estimated total device context size
+as 12KB; the device would return 12KB for the first
+VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET command, once the driver has
+already read 8KB of device context data using
+VIRTIO_ADMIN_CMD_DEV_CTX_READ command, and the remaining data is
+4KB, hence the device returns 4KB in the subsequent
+VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET command.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_ctx_size_get_result {
+ le64 size;
+};
+\end{lstlisting}
+
+When the command completes successfully, \field{command_specific_result} is in
+the format \field{struct virtio_admin_cmd_dev_ctx_size_get_result}.
+
+Once the device context is fully read, this command returns zero for
+\field{size} until the new device context is generated.
+
+\paragraph{Device Context Read Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Context Read Command}
+
+This command reads the current device context.
+For the command VIRTIO_ADMIN_CMD_DEV_CTX_READ, \field{opcode} is set to 0xa.
+The \field{group_member_id} refers to the member device to be accessed.
+
+This command has no command specific data.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_ctx_rd_len {
+ le32 context_len;
+};
+
+struct virtio_admin_cmd_dev_ctx_rd_result {
+ u8 data[];
+};
+\end{lstlisting}
+
+When the command completes successfully, \field{command_specific_result}
+is in the format \field{struct virtio_admin_cmd_dev_ctx_rd_result}
+returned by the device containing the device context data and
+\field{command_specific_output} is in format of
+\field{struct virtio_admin_cmd_dev_ctx_rd_len} containing length of
+context data returned by the device in the command response. When the length
+returned is zero or when the returned context data is less the data requested by
+the driver, the device do not have any device context data left that the device
+can report, at this point the device context stream ends.
+
+The driver can read the whole device context data using one or multiple
+commands. When the device context does not fit in the
+\field{command_specific_result}, driver reads the subsequent remaining
+bytes using one or more subsequent commands.
+
+\paragraph{Device Context Write Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Context Write Command}
+
+This command writes the device context data. The device context can be written
+only when the device mode is \field{Freeze}.
+
+For the command VIRTIO_ADMIN_CMD_DEV_CTX_WRITE, \field{opcode}
+is set to 0xb.
+The \field{group_member_id} refers to the member device to be accessed.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_ctx_wr_data {
+ u8 data[];
+};
+\end{lstlisting}
+
+The \field{command_specific_data} is in the format
+\field{struct virtio_admin_cmd_legacy_common_cfg_wr_data} describing
+the access to be performed.
+
+This command has no command specific result.
+The device fails the command when command is executed when the device mode
+is other than \field{Freeze}.
+
+The written device context is effective when the device mode is changed
+from \field{Freeze} to \field{Stop} or from \field{Freeze} to \field{Active}.
+
+The driver can write the whole device context using one or multiple
+commands. When the device context does not fit in one command result the
+driver writes the subsequent remaining bytes using one or more subsequent
+commands.
+
+\paragraph{Device Context Supported Fields Query Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Context Supported Fields Query Command}
+
+This command reads supported fields in the device context.
+Each listed \field{type} of the device context of
+\ref{sec:Basic Facilities of a Virtio Device / Device Context} is represented
+as one entry the command response. When the device support a given \field{type} for the member
+device, corresponding entry is set in the command response.
+
+For the command VIRTIO_ADMIN_CMD_DEV_CTX_FIELDS_QUERY, \field{opcode} is set to 0xc.
+The \field{group_member_id} refers to the member device to be accessed.
+
+This command has no command specific data.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_ctx_supported_field {
+ le16 type;
+ u8 reserved[6];
+ le64 length;
+};
+
+struct virtio_admin_cmd_dev_ctx_supported_fields_result {
+ struct virtio_admin_cmd_dev_ctx_supported_field fields[];
+};
+\end{lstlisting}
+
+When the command completes successfully, \field{command_specific_result}
+is in the format \field{struct virtio_admin_cmd_dev_ctx_supported_fields_result}.
+Each entry in array \field{fields} represents the supported \field{type}
+and its length as described in \ref{tab:Device Context Fields}.
+
+\paragraph{Device Context Discard Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Context Discard Command}
+
+This command discards any partial device context that is yet to be read
+by the driver and it also discards any device context that is partially written.
+This command can be used by the driver to abort any device context migration
+flow when there may have been any partial context read or write operations
+have occurred.
+
+For the command VIRTIO_ADMIN_CMD_DEV_CTX_DISCARD, \field{opcode}
+is set to 0xd.
+The \field{group_member_id} refers to the member device to be accessed.
+
+This command has no command specific data.
+This command has no command specific result.
+
+Once this command completes successfully, the device context is
+discarded. If the device context that is discarded was part of the write
+operation, once this command completes, the device functions as if the device
+context was never written. If the device context that is discarded was part
+of the read operation, once this command completes, the device functions as if
+the device context was never read in the given device mode. Once the device
+context is discarded, in subsequent VIRTIO_ADMIN_CMD_DEV_CTX_READ command,
+the device returns new device context entry. Once the device context is
+discarded, subsequent VIRTIO_ADMIN_CMD_DEV_CTX_WRITE command writes a new device
+context.
diff --git a/admin.tex b/admin.tex
index c86813d..142692c 100644
--- a/admin.tex
+++ b/admin.tex
@@ -126,7 +126,21 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
\hline
0x0006 & VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO & Query the notification region information \\
\hline
-0x0007 - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\
+0x0007 & VIRTIO_ADMIN_CMD_DEV_MODE_GET & Query the device mode \\
+\hline
+0x0008 & VIRTIO_ADMIN_CMD_DEV_MODE_SET & Set the device mode \\
+\hline
+0x0009 & VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET & Query the device context size \\
+\hline
+0x000a & VIRTIO_ADMIN_CMD_DEV_CTX_READ & Read the device context data \\
+\hline
+0x000b & VIRTIO_ADMIN_CMD_DEV_CTX_WRITE & Write the device context data \\
+\hline
+0x000c & VIRTIO_ADMIN_CMD_DEV_CTX_FIELDS_QUERY & Query Supported fields of device context \\
+\hline
+0x000d & VIRTIO_ADMIN_CMD_DEV_CTX_DISCARD & Clear the device context data \\
+\hline
+0x000e - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\
\hline
0x8000 - 0xFFFF & - & Reserved for future commands (possibly using a different structure) \\
\hline
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* [virtio-comment] [PATCH v2 5/8] admin: Add requirements of device migration commands
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
` (3 preceding siblings ...)
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 4/8] admin: Add device migration admin commands Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 6/8] admin: Add theory of operation for write recording commands Parav Pandit
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
Add device and driver side requirements for the device migration
commands.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
v1->v2:
- fixed spelling from membe to member
- removed device requirement line of FLR making the device active
as it was incorrectly written to mix operational and admin state
- added requirements to clarify flr, device reset, pm and admin commands
- group sriov requirements
- added description for device config space access in stop mode
- removed stale requirement around pci ids
- made device context write command requirements more robust
for future and backward compatibility
---
admin-cmds-device-migration.tex | 141 ++++++++++++++++++++++++++++++++
1 file changed, 141 insertions(+)
diff --git a/admin-cmds-device-migration.tex b/admin-cmds-device-migration.tex
index bbe5902..5cd9ec7 100644
--- a/admin-cmds-device-migration.tex
+++ b/admin-cmds-device-migration.tex
@@ -324,3 +324,144 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
the device returns new device context entry. Once the device context is
discarded, subsequent VIRTIO_ADMIN_CMD_DEV_CTX_WRITE command writes a new device
context.
+
+\devicenormative{\paragraph}{Device Migration}{Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration}
+
+A device MUST either support all of, or none of
+VIRTIO_ADMIN_CMD_DEV_MODE_GET,
+VIRTIO_ADMIN_CMD_DEV_MODE_SET,
+VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET,
+VIRTIO_ADMIN_CMD_DEV_READ,
+VIRTIO_ADMIN_CMD_DEV_WRITE and
+VIRTIO_ADMIN_CMD_DEV_CTX_DISCARD commands.
+
+When the device \field{mode} supplied in the command
+VIRTIO_ADMIN_CMD_DEV_MODE_SET is same as what the mode in the device, the device
+MUST complete the command successfully.
+
+The device MUST fail the command VIRTIO_ADMIN_CMD_DEV_MODE_SET when the \field{mode}
+is other than \field{Active} or \field{Stop} or \field{Freeze}.
+
+When changing the device mode using the command VIRTIO_ADMIN_CMD_DEV_MODE_SET,
+if the command fails, the device MUST retain the current device mode.
+
+The device MUST fail VIRTIO_ADMIN_CMD_DEV_MODE_SET command when \field{mode}
+is set to \field{Active} or \field{Stop} and if the device context is
+partially read or written using VIRTIO_ADMIN_CMD_DEV_CTX_READ and
+VIRTIO_ADMIN_CMD_DEV_CTX_WRITE commands respectively.
+
+When VIRTIO_ADMIN_CMD_DEV_CTX_READ command is received multiple times
+in a given mode, and when the complete device context is already read by the
+driver, on subsequent reception of command VIRTIO_ADMIN_CMD_DEV_CTX_READ,
+the device MUST complete the command successfully with
+\field{context_len} set to zero.
+
+The device MUST support reading the device context when the device is
+in any mode \field{Active} or \field{Stop} or \field{Freeze} using command
+VIRTIO_ADMIN_CMD_DEV_CTX_READ.
+
+When the device is in any of the mode, and if the device context is read
+partially using VIRTIO_ADMIN_CMD_DEV_CTX_READ command, the device MUST discard
+the device context when VIRTIO_ADMIN_CMD_DEV_CTX_DISCARD command is executed;
+In subsequent execution of VIRTIO_ADMIN_CMD_DEV_CTX_READ and
+VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET, the device MUST return the remaining
+estimated device context size and the device context respectively for the
+current mode as if VIRTIO_ADMIN_CMD_DEV_CTX_READ was never received by the
+device for the current device mode.
+
+The device MUST support writing the complete device context multiple times
+by the command VIRTIO_ADMIN_CMD_DEV_CTX_WRITE.
+
+The device MUST fail VIRTIO_ADMIN_CMD_DEV_CTX_WRITE command when the device
+mode is not \field{Freeze}.
+
+For the SR-IOV group type,
+\begin{itemize}
+\item the device MUST not initiate any PCI transaction
+ when the device mode is not \field{Active}.
+\item the device MUST finish all the outstanding PCI transactions before completing
+ the command VIRTIO_ADMIN_CMD_DEV_MODE_SET.
+\item when the device mode is \field{Stop}, the device MUST accept driver
+ notifications and the device MAY update any fields of the device context.
+\item the device MUST respond with valid values for PCI read requests when
+ the device mode is \field{Stop}.
+\item the device MUST function same for the PCI architected interfaces
+ regardless of the device mode.
+\item the device MUST not generate any PCI PME when the device is
+ not in \field{Active} state.
+\item the device MUST NOT update any fields of the device context when the
+ device is in \field{Freeze} mode, the device MAY update fields of the
+ device context when the device transitions from \field{Stop} to
+ \field{Freeze} mode.
+\end{itemize}
+
+When the device mode is not \field{Active},
+\begin{itemize}
+\item the device MUST not access any virtqueue memory or any memory referred
+ by the virtqueue when the device mode is not \field{Active}.
+
+\item the device MUST not generate any configuration change notification.
+\end{itemize}
+
+When the device is in \field{Freeze} mode, and if any device context is
+written partially by VIRTIO_ADMIN_CMD_DEV_CTX_WRITE, the device MUST discard
+the device context when VIRTIO_ADMIN_CMD_DEV_CTX_DISCARD
+command is executed, i.e. the device functions as if the command
+VIRTIO_ADMIN_CMD_DEV_CTX_WRITE was never received.
+
+For the SR-IOV group type,
+\begin{itemize}
+\item when the device is in \field{Freeze} mode, any
+write access to virtio configuration space MUST not update any fields and any
+configuration space read MAY return any value.
+
+\item for the VIRTIO_PCI_CAP_PCI_CFG capability area,
+the device MUST ignore writes when the device mode is set to \field{Freeze}
+and on receiving the reads, the device MUST function same regardless of the
+device mode is \field{Active} or \field{Stop} or \field{Freeze}.
+
+\item the VF device MUST respond to commands
+VIRTIO_ADMIN_CMD_DEV_MODE_SET, VIRTIO_ADMIN_CMD_DEV_CTX_WRITE and
+VIRTIO_ADMIN_CMD_DEV_CTX_READ after the VF FLR completes in the device, if the VF FLR
+is in progress when the device receives any of these commands.
+
+\item the VF device MUST respond to commands
+VIRTIO_ADMIN_CMD_DEV_MODE_SET, VIRTIO_ADMIN_CMD_DEV_CTX_WRITE and
+VIRTIO_ADMIN_CMD_DEV_CTX_READ after the device reset completes in the device, if the
+device reset is in progress when the device receives any of these commands.
+
+\item the VF device MUST respond to commands
+VIRTIO_ADMIN_CMD_DEV_MODE_SET, VIRTIO_ADMIN_CMD_DEV_CTX_WRITE and
+VIRTIO_ADMIN_CMD_DEV_CTX_READ after the device power management state transition completes
+in the device, if the power management state transition is in progress
+when the device receives any of these commands.
+\end{itemize}
+
+The device MUST respond with an error for the command
+VIRTIO_ADMIN_CMD_DEV_CTX_WRITE, if there is a mismatch in the
+device context field length between the field provided in the
+VIRTIO_ADMIN_CMD_DEV_CTX_WRITE data and the length of the field
+in the device.
+
+\drivernormative{\paragraph}{Device Migration}{Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration}
+
+The driver SHOULD read the complete device context using one or multiple
+VIRTIO_ADMIN_CMD_DEV_CTX_READ commands.
+
+The driver MAY write the device context before changing the device mode from
+\field{Freeze} to \field{Stop} or from \field{Freeze} to \field{Active};
+the driver MUST write a complete device context using one or multiple
+VIRTIO_ADMIN_CMD_DEV_CTX_WRITE commands.
+
+The driver MUST NOT change the device mode to \field{Stop} or \field{Active}
+in the command VIRTIO_ADMIN_CMD_DEV_MODE_SET when device context is
+partially written.
+
+For the SR-IOV group type, the driver SHOULD NOT access device configuration
+space described in section
+\ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
+when the device mode is set to \field{Freeze} or \field{Stop}.
+
+For the SR-IOV group type, the driver MUST NOT write into the
+VIRTIO_PCI_CAP_PCI_CFG capability area when the device mode is set to
+\field{Freeze}.
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* [virtio-comment] [PATCH v2 6/8] admin: Add theory of operation for write recording commands
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
` (4 preceding siblings ...)
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 5/8] admin: Add requirements of device migration commands Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 7/8] admin: Add " Parav Pandit
` (3 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
During a device migration flow (typically in a precopy phase of the
live migration), a device may write to the guest memory. Some
iommu/hypervisor may not be able to track these written pages.
These pages to be migrated from source to destination hypervisor.
A device which writes to these pages, provides the page address record
of the to the owner device. The owner device starts write
recording for the device and queries all the page addresses written by
the device.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
---
changelog:
v1->v2:
- addressed comments from Michael
- replaced iova with physical address
---
admin-cmds-device-migration.tex | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/admin-cmds-device-migration.tex b/admin-cmds-device-migration.tex
index 5cd9ec7..fba3a6b 100644
--- a/admin-cmds-device-migration.tex
+++ b/admin-cmds-device-migration.tex
@@ -95,6 +95,21 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
The owner driver can discard any partially read or written device context when
any of the device migration flow should be aborted.
+During the device migration flow, a passthrough device may write data to the
+guest virtual machine's memory, a source hypervisor needs to keep track of these
+written memory to migrate such memory to destination hypervisor.
+Some systems may not be able to keep track of such memory write addresses at
+hypervisor level. In such a scenario, a device records and reports these
+written memory addresses to the owner device. The owner driver enables write
+recording for one or more physical address ranges per device during device
+migration flow. The owner driver periodically queries these written physical
+address records from the device. As the driver reads the written address records,
+the device clears those records from the device.
+Once the device reports zero or small number of written address records, the device
+mode is set to \field{Stop} or \field{Freeze}. Once the device is set to \field{Stop}
+or \field{Freeze} mode, and once all the IOVA records are read, the driver stops
+the write recording in the device.
+
The owner driver uses following device migration group administration commands.
\begin{enumerate}
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* [virtio-comment] [PATCH v2 7/8] admin: Add write recording commands
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
` (5 preceding siblings ...)
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 6/8] admin: Add theory of operation for write recording commands Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 8/8] admin: Add requirements of write reporting commands Parav Pandit
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
When migrating a virtual machine with passthrough
virtio devices, the virtio device may write into the guest
memory. Some systems may not be able to keep track of these
pages efficiently.
To facilitate such a system, a device provides the record
of pages which are written by the device.
The owner driver configures the member device for list of address
ranges for which it expects write recording and reporting by the device.
The owner driver periodically queries the written pages address record
which gets cleared from the device upon reading it.
When the write records reduces over the time, at one point write recording
is stopped after the device mode is set to FREEZE.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
---
changelog:
v1->v2:
- addressed comments from Michael
- merged theory of operation changes to previous patch
- replaced iova with physical address
- renamed iova range with a page
- reworded and simplified wording using page
---
admin-cmds-device-migration.tex | 129 +++++++++++++++++++++++++++++++-
admin.tex | 10 ++-
2 files changed, 135 insertions(+), 4 deletions(-)
diff --git a/admin-cmds-device-migration.tex b/admin-cmds-device-migration.tex
index fba3a6b..992d6ec 100644
--- a/admin-cmds-device-migration.tex
+++ b/admin-cmds-device-migration.tex
@@ -106,9 +106,8 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
address records from the device. As the driver reads the written address records,
the device clears those records from the device.
Once the device reports zero or small number of written address records, the device
-mode is set to \field{Stop} or \field{Freeze}. Once the device is set to \field{Stop}
-or \field{Freeze} mode, and once all the IOVA records are read, the driver stops
-the write recording in the device.
+mode is set to \field{Stop} or \field{Freeze}. Once all the physical address records
+are read, the driver stops the write recording in the device.
The owner driver uses following device migration group administration commands.
@@ -120,6 +119,9 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
\item Device Context Write Command
\item Device Context Supported Fields Query Command
\item Device Context Discard Command
+\item Device Write Records Start Command
+\item Device Write Records Stop Command
+\item Device Write Records Read Command
\end{enumerate}
These commands are currently only defined for the SR-IOV group type.
@@ -340,6 +342,127 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
discarded, subsequent VIRTIO_ADMIN_CMD_DEV_CTX_WRITE command writes a new device
context.
+\paragraph{Device Write Record Capabilities Query Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Record Capabilities Query Command}
+
+This command reads the device write record capabilities.
+For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY, \field{opcode}
+is set to 0xd.
+The \field{group_member_id} refers to the member device to be accessed.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_dev_write_record_cap_result {
+ le32 supported_page_size_bitmap;
+ le32 supported_ranges;
+};
+\end{lstlisting}
+
+When the command completes successfully, \field{command_specific_result}
+is in the format \field{struct virtio_admin_cmd_dev_write_record_cap_result}
+returned by the device. The \field{supported_page_size_bitmap} indicates
+the physical address range named as page size granularity at which the device can record.
+The minimum page size granularity is of 4KB. Each bit represents a
+supported page size. Bit 0 corresponds to 4KB, bit 1 corresponds to 8KB,
+bit 31 corresponds to 4TB. The device support one or more page sizes.
+For page size, the device sets corresponding bit in the
+\field{supported_page_size_bitmap}. The \field{supported_ranges}
+indicates unique (non overlapping) physical address ranges in page granularity
+can be recorded by the device.
+
+\paragraph{Device Write Records Start Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Records Start Command}
+
+This command starts the write recording in the device for the specified
+physical address ranges.
+
+For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START, \field{opcode}
+is set to 0xe.
+The \field{group_member_id} refers to the member device to be accessed.
+
+The \field{command_specific_data} is in the format
+\field{struct virtio_admin_cmd_write_record_start_data}.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_write_record_start_entry {
+ le64 page_address;
+ le64 page_count;
+};
+
+struct virtio_admin_cmd_write_record_start_data {
+ le64 page_size;
+ le32 count;
+ u8 reserved[4];
+ struct virtio_admin_cmd_write_record_start_entry entries[];
+};
+
+\end{lstlisting}
+
+The \field{count} is set to indicate number of valid \field{entries}.
+The \field{page_address} indicates the start physical address.
+The \field{page_count} indicates number of pages of size \field{page_size}
+starting from \field{page_address} to record. All the \field{entries}
+are unique non overlapping page entries.
+Whenever a memory write occurs by the device in the supplied address range, the
+device records the physical address of the page in which the write occurred
+by the device. These write records can be read by the driver using
+VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ command.
+
+This command has no command specific result.
+
+\paragraph{Device Write Record Stop Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Record Stop Command}
+
+This command stops the write recording in the device for which was
+previously started using VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command.
+
+For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP, \field{opcode}
+is set to 0xf.
+The \field{group_member_id} refers to the member device to be accessed.
+
+This command does not have any command specific data.
+This command has no command specific result.
+
+\paragraph{Device Write Records Read Command}
+\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Records Read Command}
+
+This command reads the device write records for which the write recording is
+previously started using VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command.
+
+For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ, \field{opcode}
+is set to 0x10.
+The \field{group_member_id} refers to the member device to be accessed.
+
+\begin{lstlisting}
+struct virtio_admin_cmd_write_records_read_data {
+ le64 page_address;
+ le64 length;
+};
+
+struct virtio_admin_cmd_dev_write_records_cnt {
+ le32 count;
+};
+
+struct virtio_admin_cmd_dev_write_records_result {
+ le64 address_entries[];
+};
+\end{lstlisting}
+
+The \field{command_specific_data} is in the format
+\field{struct virtio_admin_cmd_write_records_read_data}. The driver
+sets the \field {page_address} indicating the start page address for up to the
+\field{length} number of bytes. The supplied physical address range can be
+same or smaller than the range supplied when write recording is started by
+the driver in VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command. The \field{length}
+must be same or multiple of any of the page size reported by the device in the
+\field{supported_page_size_bitmap}.
+
+When the command completes successfully, \field{command_specific_result} is in
+format of \field{struct virtio_admin_cmd_dev_write_records_cnt} containing number
+of write records returned by the device and \field{command_specific_result} is
+in the format of \field{struct virtio_admin_cmd_dev_write_records_result}
+When the command completes successfully, the write records which are returned
+in the result are cleared from the device.
+
\devicenormative{\paragraph}{Device Migration}{Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration}
A device MUST either support all of, or none of
diff --git a/admin.tex b/admin.tex
index 142692c..41cabfe 100644
--- a/admin.tex
+++ b/admin.tex
@@ -140,7 +140,15 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti
\hline
0x000d & VIRTIO_ADMIN_CMD_DEV_CTX_DISCARD & Clear the device context data \\
\hline
-0x000e - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\
+0x000f & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY & Query Write recording capabilities \\
+\hline
+0x0010 & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START & Start Write recording in the device \\
+\hline
+0x0011 & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP & Stop write recording in the device \\
+\hline
+0x0012 & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ & Read and clear write records from the device \\
+\hline
+0x0013 - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\
\hline
0x8000 - 0xFFFF & - & Reserved for future commands (possibly using a different structure) \\
\hline
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* [virtio-comment] [PATCH v2 8/8] admin: Add requirements of write reporting commands
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
` (6 preceding siblings ...)
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 7/8] admin: Add " Parav Pandit
@ 2023-10-17 20:06 ` Parav Pandit
2023-10-18 0:53 ` [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Jason Wang
2023-10-18 1:56 ` Zhu, Lingshan
9 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-17 20:06 UTC (permalink / raw)
To: virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, lingshan.zhu, jasowang,
Parav Pandit
Add device and driver requirements for the write reporting commands.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
- addressed comments from Michael
- renamed iova range to a page
- removed duplicate device requirement
- allow stopping write recording multiple times even if it is stopped
so migration driver can start cleanly at beginning
---
admin-cmds-device-migration.tex | 36 +++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/admin-cmds-device-migration.tex b/admin-cmds-device-migration.tex
index 992d6ec..01826c7 100644
--- a/admin-cmds-device-migration.tex
+++ b/admin-cmds-device-migration.tex
@@ -581,6 +581,34 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
VIRTIO_ADMIN_CMD_DEV_CTX_WRITE data and the length of the field
in the device.
+A device MUST either support all of, or none of
+VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY,
+VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START,
+VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP and
+VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ commands.
+
+If the device supports VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY
+command, the device MUST set minimum one bit in the
+\field{supported_page_size_bitmap} and set non zero value in the
+\field{supported_ranges}.
+
+The device MUST fail VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ command
+if the write recording is not started by the driver.
+
+The device MUST complete VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP command
+successfully, even if the write recording is not started by the driver
+or write recording is already stopped previously.
+
+For the SR-IOV group type, for the VF member device, VF function level
+reset (FLR) MUST NOT stop write recording on the VF device and it MUST NOT
+clear any write records already gathered by the owner device.
+
+The device MUST clear the write records which are returned in the
+VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ result. After command completion
+of VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ if new write record is created
+for the same page, the device MUST report such a write record as
+new entry.
+
\drivernormative{\paragraph}{Device Migration}{Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration}
The driver SHOULD read the complete device context using one or multiple
@@ -603,3 +631,11 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device /
For the SR-IOV group type, the driver MUST NOT write into the
VIRTIO_PCI_CAP_PCI_CFG capability area when the device mode is set to
\field{Freeze}.
+
+The driver MUST NOT invoke VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START
+for overlapping page ranges, each page range supplied in the command
+MUST be supply unique ranges.
+
+If the write recording is started by the driver using
+VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START commands, the driver MUST explicitly
+stop the wrie recording using VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP command.
--
2.34.1
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
` (7 preceding siblings ...)
2023-10-17 20:06 ` [virtio-comment] [PATCH v2 8/8] admin: Add requirements of write reporting commands Parav Pandit
@ 2023-10-18 0:53 ` Jason Wang
2023-10-18 4:02 ` Parav Pandit
2023-10-18 1:56 ` Zhu, Lingshan
9 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2023-10-18 0:53 UTC (permalink / raw)
To: Parav Pandit
Cc: virtio-comment, mst, cohuck, sburla, shahafs, maorg, yishaih,
lingshan.zhu
On Wed, Oct 18, 2023 at 4:07 AM Parav Pandit <parav@nvidia.com> wrote:
>
> This series introduces administration commands for member device migration
> for PCI transport; when needed it can be extended for other transports
> too.
>
> Use case requirements:
> ======================
> 1. A hypervisor system needs to provide a PCI VF as passthrough
> device to the guest virtual machine and also support live
> migration of this virtual machine.
> A passthrough device has typically only PCI configuration space
> and MSI-X table emulated. No virtio native interface offered
> by the virtio member device is trapped and/or emulated.
> 2. A virtual machine may have one or more such passthrough
> virtio devices.
> 3. A virtual machine may have other PCI passthrough device
> which may also interact with virtio device.
> 4. A hypervisor runs a generic device type agnostic driver with
> extension to support device migration.
> 5. A PCI VF passthrough device needs to support transparent
> device reset and PCI FLR while the device migration is
> ongoing.
> 6. A owner driver do not involve in device operations mediation
> for the passthrough device at virtio interface level.
> 7. Mechanism is generic enough that applies to large family of
> virtio devices and it does not involve trapping any virtio
> device interfaces for the passthrough devices.
>
> Overview:
> =========
> Above usecase requirements is solved by PCI PF group owner driver
> facilitating the member device migration functionality using
> administration commands.
>
> There are three major functionalities.
>
> 1. Suspend and resume the device operation
> 2. Read and Write the device context containing all the information
> that can be transferred from source to destination to migrate to
> a member device
> 3. Track pages written by the device during device migration is
> ongoing
>
> This comprehensive series introduces 4 infrastructure pieces
> covering PCI transport, peer to peer PCI devices, page write tracking
> (aka dirty page tracking) and generic virtio device context.
>
> 1. Device mode get,set (active, stop, freeze)
> 2. Device context read and write
> 3. Defines device context and compatibility command
> 4. Write reporting to track page addresses
>
> This series enables virtio PCI SR-IOV member device to member device
> migration. It can also be used to/from migrate from PCI SR-IOV member
> device to software composed PCI device if/when needed which can
> parse and compose software based PCI virtio device.
>
> Example flow:
> =============
> Source hypervisor:
> 1. Instructs device to start tracking pages it is writing
> 2. Periodically query the addresses of the written pages
> 3. Suspend the device operation
> 4. Read the device context and transfer to destination hypervisor
>
> Destination hypervisor:
> 5. Write the device context received from source
> 6. Resume the device that has newly written device context
>
> Patch summary:
> ==============
> patch-1: Adds theory of operation for device migration commands
> patch-2: Redefine reserved2 to command output field
> patch-3: Defines short device context for split virtqueues
> patch-4: Adds device migration commands
> patch-5: Adds requirements for device migration commands
> patch-6: Adds theory of operation for write reporting commands
> patch-7: Adds write reporting commands
> patch-8: Adds requirements for write reporting commands
>
> It also takes inspiration from the similar idea presented at KVM Forum
> at [1].
>
> Please review.
>
> Changelog:
> ==========
> v1->v2:
> - Addressed comments from Michael and Jason
> - replaced iova to page/physical address range in write recording commands
> - several device specific requirements added to clarify, interaction of
> device reset, FLR, PCI PM and admin commands
> - added device context fields query command to learn compatibility
> - split device context field type range into generic and device specific
> - added device context extension section to maintain backward and future
> compatibility
> - several rewording in theory of operation
> - added requirements to cover config space read/write interaction with
> device context commands
> - added assumption about pci config space and msix table not present in
> device context, which can be added when hypervisor need them
> v0->v1:
> - enrich device context to cover device configuration layout, feature bits
> - fixed alignment of device context fields
> - added missing Sign-off for the joint work done with Satananda
> - added link to the github issue
A lot of questions were not answered in V1, do you expect me to repeat
it again here?
Thanks
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 14+ messages in thread* RE: [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands
2023-10-18 0:53 ` [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Jason Wang
@ 2023-10-18 4:02 ` Parav Pandit
0 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2023-10-18 4:02 UTC (permalink / raw)
To: Jason Wang
Cc: virtio-comment@lists.oasis-open.org, mst@redhat.com,
cohuck@redhat.com, sburla@marvell.com, Shahaf Shuler,
Maor Gottlieb, Yishai Hadas, lingshan.zhu@intel.com
> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Jason Wang
> Sent: Wednesday, October 18, 2023 6:24 AM
>
> A lot of questions were not answered in V1, do you expect me to repeat it again
> here?
>
No, questions posted on 10/17 and 10/16 will be answered.
There were already many suggestions from you and Michael to take care of before losing them in 100 long emails, so patches must progress.
Hence, v2 covered most comments and suggestions upto 10/15.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands
2023-10-17 20:06 [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Parav Pandit
` (8 preceding siblings ...)
2023-10-18 0:53 ` [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands Jason Wang
@ 2023-10-18 1:56 ` Zhu, Lingshan
2023-10-18 4:04 ` Parav Pandit
9 siblings, 1 reply; 14+ messages in thread
From: Zhu, Lingshan @ 2023-10-18 1:56 UTC (permalink / raw)
To: Parav Pandit, virtio-comment, mst, cohuck
Cc: sburla, shahafs, maorg, yishaih, jasowang
On 10/18/2023 4:06 AM, Parav Pandit wrote:
> This series introduces administration commands for member device migration
> for PCI transport; when needed it can be extended for other transports
> too.
>
> Use case requirements:
> ======================
> 1. A hypervisor system needs to provide a PCI VF as passthrough
> device to the guest virtual machine and also support live
> migration of this virtual machine.
> A passthrough device has typically only PCI configuration space
> and MSI-X table emulated. No virtio native interface offered
> by the virtio member device is trapped and/or emulated.
> 2. A virtual machine may have one or more such passthrough
> virtio devices.
> 3. A virtual machine may have other PCI passthrough device
> which may also interact with virtio device.
> 4. A hypervisor runs a generic device type agnostic driver with
> extension to support device migration.
> 5. A PCI VF passthrough device needs to support transparent
> device reset and PCI FLR while the device migration is
> ongoing.
> 6. A owner driver do not involve in device operations mediation
> for the passthrough device at virtio interface level.
> 7. Mechanism is generic enough that applies to large family of
> virtio devices and it does not involve trapping any virtio
> device interfaces for the passthrough devices.
>
> Overview:
> =========
> Above usecase requirements is solved by PCI PF group owner driver
> facilitating the member device migration functionality using
> administration commands.
>
> There are three major functionalities.
>
> 1. Suspend and resume the device operation
> 2. Read and Write the device context containing all the information
> that can be transferred from source to destination to migrate to
> a member device
> 3. Track pages written by the device during device migration is
> ongoing
>
> This comprehensive series introduces 4 infrastructure pieces
> covering PCI transport, peer to peer PCI devices, page write tracking
> (aka dirty page tracking) and generic virtio device context.
>
> 1. Device mode get,set (active, stop, freeze)
> 2. Device context read and write
> 3. Defines device context and compatibility command
> 4. Write reporting to track page addresses
>
> This series enables virtio PCI SR-IOV member device to member device
> migration. It can also be used to/from migrate from PCI SR-IOV member
> device to software composed PCI device if/when needed which can
> parse and compose software based PCI virtio device.
>
> Example flow:
> =============
> Source hypervisor:
> 1. Instructs device to start tracking pages it is writing
> 2. Periodically query the addresses of the written pages
> 3. Suspend the device operation
> 4. Read the device context and transfer to destination hypervisor
>
> Destination hypervisor:
> 5. Write the device context received from source
> 6. Resume the device that has newly written device context
>
> Patch summary:
> ==============
> patch-1: Adds theory of operation for device migration commands
> patch-2: Redefine reserved2 to command output field
> patch-3: Defines short device context for split virtqueues
> patch-4: Adds device migration commands
> patch-5: Adds requirements for device migration commands
> patch-6: Adds theory of operation for write reporting commands
> patch-7: Adds write reporting commands
> patch-8: Adds requirements for write reporting commands
>
> It also takes inspiration from the similar idea presented at KVM Forum
> at [1].
>
> Please review.
>
> Changelog:
> ==========
> v1->v2:
> - Addressed comments from Michael and Jason
> - replaced iova to page/physical address range in write recording commands
> - several device specific requirements added to clarify, interaction of
> device reset, FLR, PCI PM and admin commands
> - added device context fields query command to learn compatibility
> - split device context field type range into generic and device specific
> - added device context extension section to maintain backward and future
> compatibility
> - several rewording in theory of operation
> - added requirements to cover config space read/write interaction with
> device context commands
> - added assumption about pci config space and msix table not present in
> device context, which can be added when hypervisor need them
> v0->v1:
> - enrich device context to cover device configuration layout, feature bits
> - fixed alignment of device context fields
> - added missing Sign-off for the joint work done with Satananda
> - added link to the github issue
>
> [1] https://static.sched.com/hosted_files/kvmforum2022/3a/KVM22-Migratable-Vhost-vDPA.pdf
>
> Parav Pandit (8):
> admin: Add theory of operation for device migration
> admin: Redefine reserved2 as command specific output
> device-context: Define the device context fields for device migration
> admin: Add device migration admin commands
> admin: Add requirements of device migration commands
> admin: Add theory of operation for write recording commands
> admin: Add write recording commands
> admin: Add requirements of write reporting commands
>
> admin-cmds-device-migration.tex | 641 ++++++++++++++++++++++++++++++++
> admin.tex | 40 +-
> content.tex | 1 +
> device-context.tex | 189 ++++++++++
> 4 files changed, 864 insertions(+), 7 deletions(-)
> create mode 100644 admin-cmds-device-migration.tex
> create mode 100644 device-context.tex
There were many discussions and questions on your V1 that did not
receive effective answers from you.
We also didn't see these issues addressed in your V2. So, what's the
purpose of releasing this V2?
Thanks
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 14+ messages in thread* RE: [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands
2023-10-18 1:56 ` Zhu, Lingshan
@ 2023-10-18 4:04 ` Parav Pandit
2023-10-18 6:04 ` Michael S. Tsirkin
0 siblings, 1 reply; 14+ messages in thread
From: Parav Pandit @ 2023-10-18 4:04 UTC (permalink / raw)
To: Zhu, Lingshan, virtio-comment@lists.oasis-open.org,
mst@redhat.com, cohuck@redhat.com
Cc: sburla@marvell.com, Shahaf Shuler, Maor Gottlieb, Yishai Hadas,
jasowang@redhat.com
> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, October 18, 2023 7:27 AM
>
> > Changelog:
> > ==========
> > v1->v2:
> > - Addressed comments from Michael and Jason
> > - replaced iova to page/physical address range in write recording
> > commands
> > - several device specific requirements added to clarify, interaction of
> > device reset, FLR, PCI PM and admin commands
> > - added device context fields query command to learn compatibility
> > - split device context field type range into generic and device
> > specific
> > - added device context extension section to maintain backward and future
> > compatibility
> > - several rewording in theory of operation
> > - added requirements to cover config space read/write interaction with
> > device context commands
> > - added assumption about pci config space and msix table not present in
> > device context, which can be added when hypervisor need them
> We also didn't see these issues addressed in your V2. So, what's the purpose of
> releasing this V2?
The purpose to cover comments and improvements from Michael and Jason as listed above.
Only questions from 10/16 and 10/17 are yet to cover.
I will answer them this week.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-comment] [PATCH v2 0/8] Introduce device migration support commands
2023-10-18 4:04 ` Parav Pandit
@ 2023-10-18 6:04 ` Michael S. Tsirkin
0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2023-10-18 6:04 UTC (permalink / raw)
To: Parav Pandit
Cc: Zhu, Lingshan, virtio-comment@lists.oasis-open.org,
cohuck@redhat.com, sburla@marvell.com, Shahaf Shuler,
Maor Gottlieb, Yishai Hadas, jasowang@redhat.com
On Wed, Oct 18, 2023 at 04:04:15AM +0000, Parav Pandit wrote:
>
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Wednesday, October 18, 2023 7:27 AM
> >
> > > Changelog:
> > > ==========
> > > v1->v2:
> > > - Addressed comments from Michael and Jason
> > > - replaced iova to page/physical address range in write recording
> > > commands
> > > - several device specific requirements added to clarify, interaction of
> > > device reset, FLR, PCI PM and admin commands
> > > - added device context fields query command to learn compatibility
> > > - split device context field type range into generic and device
> > > specific
> > > - added device context extension section to maintain backward and future
> > > compatibility
> > > - several rewording in theory of operation
> > > - added requirements to cover config space read/write interaction with
> > > device context commands
> > > - added assumption about pci config space and msix table not present in
> > > device context, which can be added when hypervisor need them
>
> > We also didn't see these issues addressed in your V2. So, what's the purpose of
> > releasing this V2?
> The purpose to cover comments and improvements from Michael and Jason as listed above.
> Only questions from 10/16 and 10/17 are yet to cover.
> I will answer them this week.
Maybe mention this in the cover letter next time. Thanks!
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 14+ messages in thread