* Re: [PATCH v3] virtio-iommu: Rework the bypass feature
2021-10-22 12:12 [PATCH v3] virtio-iommu: Rework the bypass feature Jean-Philippe Brucker
@ 2021-10-31 17:50 ` Eric Auger
2021-11-03 1:14 ` Tian, Kevin
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Eric Auger @ 2021-10-31 17:50 UTC (permalink / raw)
To: Jean-Philippe Brucker, virtio-dev, kevin.tian
Cc: sebastien.boeuf, cohuck, mst
Hi Jean,
On 10/22/21 2:12 PM, Jean-Philippe Brucker wrote:
> The VIRTIO_IOMMU_F_BYPASS feature is awkward to use and incomplete.
> Although it is implemented by QEMU, it is not supported by any driver as
> far as I know. Replace it with a new VIRTIO_IOMMU_F_BYPASS_CONFIG
> feature.
>
> Two features are missing from virtio-iommu:
>
> * The ability for an hypervisor to start the device in bypass mode. The
> wording for VIRTIO_IOMMU_F_BYPASS is not clear enough to allow it at
> the moment, because it only specifies the behavior after feature
> negotiation.
>
> * The ability for a guest to set individual endpoints in bypass mode
> when bypass is globally disabled. An OS should have the ability to
> allow only endpoints it trusts to bypass the IOMMU, while keeping DMA
> disabled for endpoints it isn't even aware of. At the moment this can
> only be emulated by creating identity mappings.
>
> The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a 'bypass' config field
> that allows to enable and disable bypass globally. It also adds a new
> flag for the ATTACH request.
>
> * The hypervisor can start the VM with bypass enabled or, if it knows
> that the software stack supports it, disabled. The 'bypass' config
> fields is initialized to 0 or 1. It is sticky and isn't affected by
> device reset.
>
> * Generally the firmware won't have an IOMMU driver and will need to be
> started in bypass mode, so the bootloader and kernel can be loaded
> from storage endpoint.
>
> For more security, the firmware could implement a minimal virtio-iommu
> driver that reuses existing virtio support and only touches the config
> space. It could enable PCI bus mastering in bridges only for the
> endpoints that need it, enable global IOMMU bypass by flipping a bit,
> then tear everything down before handing control over to the OS. This
> prevents vulnerability windows where a malicious endpoint reprograms
> the IOMMU while the OS is configuring it [1].
>
> The isolation provided by vIOMMUs has mainly been used for securely
> assigning endpoints to untrusted applications so far, while kernel DMA
> bypasses the IOMMU. But we can expect boot security to become as
> important in virtualization as it presently is on bare-metal systems,
> where some devices are untrusted and must never be able to access
> memory that wasn't assigned to them.
>
> * The OS can enable and disable bypass globally. It can then enable
> bypass for individual endpoints by attaching them to bypass domains,
> using the new VIRTIO_IOMMU_ATTACH_F_BYPASS flag. It can disable bypass
> by attaching them to normal domains.
>
> [1] IOMMU protection against I/O attacks: a vulnerability and a proof of concept
> Morgan, B., Alata, É., Nicomette, V. et al.
> https://link.springer.com/article/10.1186/s13173-017-0066-7
>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/119
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>
> The virtio-iommu spec with colored diff is available at
> https://jpbrucker.net/virtio-iommu/spec-bypass/virtio-iommu-f-bypass-config-v3-diff.pdf
>
> v1: https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07817.html
> v2: https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07898.html
> v3:
> * Normative statement about device reset vs. system reset - the bypass
> bit is sticky across device reset to avoid the vulnerability described
> above, but restored on system reset (a term also used by the virtio
> memory device).
> * Explain that the field bypass is in effect as long as the new feature
> is offered, even when not accepted by the driver
> * Another clarification about the state of an endpoint after detach.
> ---
> conformance.tex | 1 -
> virtio-iommu.tex | 111 ++++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 90 insertions(+), 22 deletions(-)
>
> diff --git a/conformance.tex b/conformance.tex
> index c2d7959..5c10ab9 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -504,7 +504,6 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> \begin{itemize}
> \item \ref{devicenormative:Device Types / IOMMU Device / Feature bits}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device configuration layout}
> -\item \ref{devicenormative:Device Types / IOMMU Device / Device Initialization}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device operations}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device operations / ATTACH request}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device operations / DETACH request}
> diff --git a/virtio-iommu.tex b/virtio-iommu.tex
> index 08b358a..f8cbe88 100644
> --- a/virtio-iommu.tex
> +++ b/virtio-iommu.tex
> @@ -59,14 +59,21 @@ \subsection{Feature bits}\label{sec:Device Types / IOMMU Device / Feature bits}
> VIRTIO_IOMMU_F_MAP_UNMAP is supported.}
>
> \item[VIRTIO_IOMMU_F_BYPASS (3)]
> - When not attached to a domain, endpoints downstream of the IOMMU
> - can access the guest-physical address space.
> + Endpoints that are not attached to a domain are in bypass mode.
>
> \item[VIRTIO_IOMMU_F_PROBE (4)]
> The PROBE request is available.
>
> \item[VIRTIO_IOMMU_F_MMIO (5)]
> The VIRTIO_IOMMU_MAP_F_MMIO flag is available.
> +
> +\item[VIRTIO_IOMMU_F_BYPASS_CONFIG (6)]
> + Field \field{bypass} of struct virtio_iommu_config determines
> + whether endpoints that are not attached to a domain are in
> + bypass mode. Flag VIRTIO_IOMMU_ATTACH_F_BYPASS determines
> + whether endpoints that are attached to a domain are in bypass
> + mode.
> +
> \end{description}
>
> \drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits}
> @@ -79,6 +86,11 @@ \subsection{Feature bits}\label{sec:Device Types / IOMMU Device / Feature bits}
>
> The device SHOULD offer feature bit VIRTIO_IOMMU_F_MAP_UNMAP.
>
> +The VIRTIO_IOMMU_F_BYPASS_CONFIG feature supersedes
> +VIRTIO_IOMMU_F_BYPASS. New devices SHOULD NOT offer
> +VIRTIO_IOMMU_F_BYPASS. Devices SHOULD NOT offer both
> +VIRTIO_IOMMU_F_BYPASS and VIRTIO_IOMMU_F_BYPASS_CONFIG.
> +
> \subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device / Device configuration layout}
>
> The \field{page_size_mask} field is always present. Availability of the
> @@ -97,12 +109,20 @@ \subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device /
> le32 end;
> } domain_range;
> le32 probe_size;
> + u8 bypass;
> + u8 reserved[3];
> };
> \end{lstlisting}
>
> \drivernormative{\subsubsection}{Device configuration layout}{Device Types / IOMMU Device / Device configuration layout}
>
> -The driver MUST NOT write to device configuration fields.
> +When the VIRTIO_IOMMU_F_BYPASS_CONFIG feature is negotiated, the
> +driver MAY write to \field{bypass}. The driver MUST NOT write to
> +any other device configuration field.
> +
> +The driver MUST NOT write a value different than 0 or 1 to
> +\field{bypass}. The driver SHOULD ignore bits 1-7 of
> +\field{bypass}.
>
> \devicenormative{\subsubsection}{Device configuration layout}{Device Types / IOMMU Device / Device configuration layout}
>
> @@ -110,17 +130,18 @@ \subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device /
> the page granularity. The device MAY set more than one bit in
> \field{page_size_mask}.
>
> +If the device offers the VIRTIO_IOMMU_F_BYPASS_CONFIG feature, it
> +MAY initialize the \field{bypass} field to 1. Field
> +\field{bypass} SHOULD NOT change on device reset, but SHOULD be
> +restored to its initial value on system reset.
> +
> +The device MUST NOT present a value different than 0 or 1 in
> +\field{bypass}.
> +
> \subsection{Device initialization}\label{sec:Device Types / IOMMU Device / Device initialization}
>
> When the device is reset, endpoints are not attached to any domain.
>
> -If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all accesses from
> -unattached endpoints are allowed and translated by the IOMMU using the
> -identity function. If the feature is not negotiated, any memory access
> -from an unattached endpoint fails. Upon attaching an endpoint in
> -bypass mode to a new domain, any memory access from the endpoint fails,
> -since the domain does not contain any mapping.
> -
> Future devices might support more modes of operation besides MAP/UNMAP.
> Drivers verify that devices set VIRTIO_IOMMU_F_MAP_UNMAP and fail
> gracefully if they don't.
> @@ -134,11 +155,6 @@ \subsection{Device initialization}\label{sec:Device Types / IOMMU Device / Devic
> VIRTIO_IOMMU_T_PROBE request for each endpoint before attaching the
> endpoint to a domain.
>
> -\devicenormative{\subsubsection}{Device Initialization}{Device Types / IOMMU Device / Device Initialization}
> -
> -If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
> -device SHOULD NOT let endpoints access the guest-physical address space.
> -
> \subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device operations}
>
> Driver send requests on the request virtqueue, notifies the device and
> @@ -229,6 +245,27 @@ \subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device op
> 64-bit address space (\texttt{start = 0, end = 0xffffffff ffffffff})
> \end{itemize}
>
> +An endpoint is in bypass mode if:
> +\begin{itemize}
> + \item the VIRTIO_IOMMU_F_BYPASS_CONFIG feature is offered and:
> + \begin{itemize}
> + \item config field \field{bypass} is 1 and the endpoint is
> + not attached to a domain. This applies even if the driver
> + does not accept the VIRTIO_IOMMU_F_BYPASS_CONFIG feature
> + and the device initializes field \field{bypass} to 1.
> +
> + or
> + \item the endpoint is attached to a domain with
> + VIRTIO_IOMMU_ATTACH_F_BYPASS.
> + \end{itemize}
> + or
> + \item the VIRTIO_IOMMU_F_BYPASS feature is negotiated and the
> + endpoint is not attached to a domain.
> +\end{itemize}
> +
> +All accesses from an endpoint in bypass mode are allowed and
> +translated by the IOMMU using the identity function.
> +
> \drivernormative{\subsubsection}{Device operations}{Device Types / IOMMU Device / Device operations}
>
> The driver SHOULD set field \field{reserved} of struct
> @@ -259,6 +296,9 @@ \subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device op
> virtio_iommu_req_head and SHOULD set field \field{reserved} of struct
> virtio_iommu_req_tail to zero.
>
> +The device SHOULD NOT let unattached endpoints that are not in
> +bypass mode access the guest-physical address space.
> +
> \subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device / Device operations / ATTACH request}
>
> \begin{lstlisting}
> @@ -266,9 +306,12 @@ \subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device / Device op
> struct virtio_iommu_req_head head;
> le32 domain;
> le32 endpoint;
> - u8 reserved[8];
> + le32 flags;
> + u8 reserved[4];
> struct virtio_iommu_req_tail tail;
> };
> +
> +#define VIRTIO_IOMMU_ATTACH_F_BYPASS (1 << 0)
> \end{lstlisting}
>
> Attach an endpoint to a domain. \field{domain} uniquely identifies a
> @@ -294,6 +337,10 @@ \subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device / Device op
> attached to a single domain at a time. Endpoints attached to different
> domains are isolated from each other.
>
> +When the VIRTIO_IOMMU_F_BYPASS_CONFIG is negotiated, the driver
> +can set the VIRTIO_IOMMU_ATTACH_F_BYPASS flag to create a bypass
> +domain. Endpoints attached to this domain are in bypass mode.
> +
> \drivernormative{\paragraph}{ATTACH request}{Device Types / IOMMU Device / Device operations / ATTACH request}
>
> The driver SHOULD set \field{reserved} to zero.
> @@ -301,11 +348,20 @@ \subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device / Device op
> The driver SHOULD ensure that endpoints that cannot be isolated from each
> other are attached to the same domain.
>
> +If the domain already exists and is a bypass domain, the driver
> +SHOULD set the VIRTIO_IOMMU_ATTACH_F_BYPASS flag. If the domain
> +exists and is not a bypass domain, the driver SHOULD NOT set the
> +VIRTIO_IOMMU_ATTACH_F_BYPASS flag.
> +
> \devicenormative{\paragraph}{ATTACH request}{Device Types / IOMMU Device / Device operations / ATTACH request}
>
> If the \field{reserved} field of an ATTACH request is not zero, the device
> MUST reject the request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
>
> +If the device does not recognize a \field{flags} bit, it MUST
> +reject the request and set \field{status} to
> +VIRTIO_IOMMU_S_INVAL.
> +
> If the endpoint identified by \field{endpoint} doesn't exist, the device
> MUST reject the request and set \field{status} to VIRTIO_IOMMU_S_NOENT.
>
> @@ -314,6 +370,10 @@ \subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device / Device op
> \field{endpoint} to the domain. If it cannot do so, the device MUST reject
> the request and set \field{status} to VIRTIO_IOMMU_S_UNSUPP.
>
> +If the domain already exists and the VIRTIO_IOMMU_ATTACH_F_BYPASS
> +flag is not consistent with that domain, the device SHOULD reject
> +the request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
> +
> If the endpoint identified by \field{endpoint} is already attached to
> another domain, then the device SHOULD first detach it from that domain
> and attach it to the one identified by \field{domain}. In that case the
> @@ -342,11 +402,10 @@ \subsubsection{DETACH request}
> };
> \end{lstlisting}
>
> -Detach an endpoint from a domain. When this request completes, the
> -endpoint cannot access any mapping from that domain anymore. If feature
> -VIRTIO_IOMMU_F_BYPASS has been negotiated, then once this request
> -completes all accesses from the endpoint are allowed and translated by the
> -IOMMU using the identity function.
> +Detach an endpoint from a domain. When this request completes,
> +the endpoint cannot access any mapping from that domain anymore.
> +However the endpoint may then be in bypass mode and access the
> +guest-physical address space.
>
> After all endpoints have been successfully detached from a domain, it
> ceases to exist and its ID can be reused by the driver for another domain.
> @@ -433,6 +492,8 @@ \subsubsection{MAP request}\label{sec:Device Types / IOMMU Device / Device opera
>
> The driver SHOULD set undefined \field{flags} bits to zero.
>
> +The driver SHOULD NOT send MAP requests on a bypass domain.
> +
> \field{virt_end} MUST be strictly greater than \field{virt_start}.
>
> The driver SHOULD set the VIRTIO_IOMMU_MAP_F_MMIO flag when the physical
> @@ -461,6 +522,9 @@ \subsubsection{MAP request}\label{sec:Device Types / IOMMU Device / Device opera
> If \field{domain} does not exist, the device SHOULD reject the request and
> set \field{status} to VIRTIO_IOMMU_S_NOENT.
>
> +If the domain is a bypass domain, the device SHOULD reject the
> +request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
> +
> The device MUST NOT allow writes to a range mapped without the
> VIRTIO_IOMMU_MAP_F_WRITE flag. However, if the underlying architecture
> does not support write-only mappings, the device MAY allow reads to a
> @@ -538,6 +602,8 @@ \subsubsection{UNMAP request}\label{sec:Device Types / IOMMU Device / Device ope
> or be outside any mapping. The last address of a range MUST either be the
> last address of a mapping or be outside any mapping.
>
> +The driver SHOULD NOT send UNMAP requests on a bypass domain.
> +
> \devicenormative{\paragraph}{UNMAP request}{Device Types / IOMMU Device / Device operations / UNMAP request}
>
> If the \field{reserved} field of an UNMAP request is not zero, the device
> @@ -547,6 +613,9 @@ \subsubsection{UNMAP request}\label{sec:Device Types / IOMMU Device / Device ope
> If \field{domain} does not exist, the device SHOULD set the request
> \field{status} to VIRTIO_IOMMU_S_NOENT.
>
> +If the domain is a bypass domain, the device SHOULD reject the
> +request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
> +
> If a mapping affected by the range is not covered in its entirety by the
> range (the UNMAP request would split the mapping), then the device SHOULD
> set the request \field{status} to VIRTIO_IOMMU_S_RANGE, and SHOULD NOT
Looks good to me
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: [PATCH v3] virtio-iommu: Rework the bypass feature
2021-10-22 12:12 [PATCH v3] virtio-iommu: Rework the bypass feature Jean-Philippe Brucker
2021-10-31 17:50 ` Eric Auger
@ 2021-11-03 1:14 ` Tian, Kevin
2021-11-03 13:01 ` [virtio-dev] " Cornelia Huck
2021-11-11 10:26 ` [virtio-dev] " Halil Pasic
3 siblings, 0 replies; 6+ messages in thread
From: Tian, Kevin @ 2021-11-03 1:14 UTC (permalink / raw)
To: Jean-Philippe Brucker, virtio-dev@lists.oasis-open.org,
eric.auger@redhat.com
Cc: Boeuf, Sebastien, cohuck@redhat.com, mst@redhat.com
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, October 22, 2021 8:12 PM
>
> The VIRTIO_IOMMU_F_BYPASS feature is awkward to use and incomplete.
> Although it is implemented by QEMU, it is not supported by any driver as
> far as I know. Replace it with a new VIRTIO_IOMMU_F_BYPASS_CONFIG
> feature.
>
> Two features are missing from virtio-iommu:
>
> * The ability for an hypervisor to start the device in bypass mode. The
> wording for VIRTIO_IOMMU_F_BYPASS is not clear enough to allow it at
> the moment, because it only specifies the behavior after feature
> negotiation.
>
> * The ability for a guest to set individual endpoints in bypass mode
> when bypass is globally disabled. An OS should have the ability to
> allow only endpoints it trusts to bypass the IOMMU, while keeping DMA
> disabled for endpoints it isn't even aware of. At the moment this can
> only be emulated by creating identity mappings.
>
> The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a 'bypass' config field
> that allows to enable and disable bypass globally. It also adds a new
> flag for the ATTACH request.
>
> * The hypervisor can start the VM with bypass enabled or, if it knows
> that the software stack supports it, disabled. The 'bypass' config
> fields is initialized to 0 or 1. It is sticky and isn't affected by
> device reset.
>
> * Generally the firmware won't have an IOMMU driver and will need to be
> started in bypass mode, so the bootloader and kernel can be loaded
> from storage endpoint.
>
> For more security, the firmware could implement a minimal virtio-iommu
> driver that reuses existing virtio support and only touches the config
> space. It could enable PCI bus mastering in bridges only for the
> endpoints that need it, enable global IOMMU bypass by flipping a bit,
> then tear everything down before handing control over to the OS. This
> prevents vulnerability windows where a malicious endpoint reprograms
> the IOMMU while the OS is configuring it [1].
>
> The isolation provided by vIOMMUs has mainly been used for securely
> assigning endpoints to untrusted applications so far, while kernel DMA
> bypasses the IOMMU. But we can expect boot security to become as
> important in virtualization as it presently is on bare-metal systems,
> where some devices are untrusted and must never be able to access
> memory that wasn't assigned to them.
>
> * The OS can enable and disable bypass globally. It can then enable
> bypass for individual endpoints by attaching them to bypass domains,
> using the new VIRTIO_IOMMU_ATTACH_F_BYPASS flag. It can disable
> bypass
> by attaching them to normal domains.
>
> [1] IOMMU protection against I/O attacks: a vulnerability and a proof of
> concept
> Morgan, B., Alata, É., Nicomette, V. et al.
> https://link.springer.com/article/10.1186/s13173-017-0066-7
>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/119
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>
> The virtio-iommu spec with colored diff is available at
> https://jpbrucker.net/virtio-iommu/spec-bypass/virtio-iommu-f-bypass-
> config-v3-diff.pdf
>
> v1: https://www.mail-archive.com/virtio-dev@lists.oasis-
> open.org/msg07817.html
> v2: https://www.mail-archive.com/virtio-dev@lists.oasis-
> open.org/msg07898.html
> v3:
> * Normative statement about device reset vs. system reset - the bypass
> bit is sticky across device reset to avoid the vulnerability described
> above, but restored on system reset (a term also used by the virtio
> memory device).
> * Explain that the field bypass is in effect as long as the new feature
> is offered, even when not accepted by the driver
> * Another clarification about the state of an endpoint after detach.
> ---
> conformance.tex | 1 -
> virtio-iommu.tex | 111 ++++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 90 insertions(+), 22 deletions(-)
>
> diff --git a/conformance.tex b/conformance.tex
> index c2d7959..5c10ab9 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -504,7 +504,6 @@ \section{Conformance
> Targets}\label{sec:Conformance / Conformance Targets}
> \begin{itemize}
> \item \ref{devicenormative:Device Types / IOMMU Device / Feature bits}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device
> configuration layout}
> -\item \ref{devicenormative:Device Types / IOMMU Device / Device
> Initialization}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device
> operations}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device
> operations / ATTACH request}
> \item \ref{devicenormative:Device Types / IOMMU Device / Device
> operations / DETACH request}
> diff --git a/virtio-iommu.tex b/virtio-iommu.tex
> index 08b358a..f8cbe88 100644
> --- a/virtio-iommu.tex
> +++ b/virtio-iommu.tex
> @@ -59,14 +59,21 @@ \subsection{Feature bits}\label{sec:Device Types /
> IOMMU Device / Feature bits}
> VIRTIO_IOMMU_F_MAP_UNMAP is supported.}
>
> \item[VIRTIO_IOMMU_F_BYPASS (3)]
> - When not attached to a domain, endpoints downstream of the IOMMU
> - can access the guest-physical address space.
> + Endpoints that are not attached to a domain are in bypass mode.
>
> \item[VIRTIO_IOMMU_F_PROBE (4)]
> The PROBE request is available.
>
> \item[VIRTIO_IOMMU_F_MMIO (5)]
> The VIRTIO_IOMMU_MAP_F_MMIO flag is available.
> +
> +\item[VIRTIO_IOMMU_F_BYPASS_CONFIG (6)]
> + Field \field{bypass} of struct virtio_iommu_config determines
> + whether endpoints that are not attached to a domain are in
> + bypass mode. Flag VIRTIO_IOMMU_ATTACH_F_BYPASS determines
> + whether endpoints that are attached to a domain are in bypass
> + mode.
> +
> \end{description}
>
> \drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU
> Device / Feature bits}
> @@ -79,6 +86,11 @@ \subsection{Feature bits}\label{sec:Device Types /
> IOMMU Device / Feature bits}
>
> The device SHOULD offer feature bit VIRTIO_IOMMU_F_MAP_UNMAP.
>
> +The VIRTIO_IOMMU_F_BYPASS_CONFIG feature supersedes
> +VIRTIO_IOMMU_F_BYPASS. New devices SHOULD NOT offer
> +VIRTIO_IOMMU_F_BYPASS. Devices SHOULD NOT offer both
> +VIRTIO_IOMMU_F_BYPASS and VIRTIO_IOMMU_F_BYPASS_CONFIG.
> +
> \subsection{Device configuration layout}\label{sec:Device Types / IOMMU
> Device / Device configuration layout}
>
> The \field{page_size_mask} field is always present. Availability of the
> @@ -97,12 +109,20 @@ \subsection{Device configuration
> layout}\label{sec:Device Types / IOMMU Device /
> le32 end;
> } domain_range;
> le32 probe_size;
> + u8 bypass;
> + u8 reserved[3];
> };
> \end{lstlisting}
>
> \drivernormative{\subsubsection}{Device configuration layout}{Device Types
> / IOMMU Device / Device configuration layout}
>
> -The driver MUST NOT write to device configuration fields.
> +When the VIRTIO_IOMMU_F_BYPASS_CONFIG feature is negotiated, the
> +driver MAY write to \field{bypass}. The driver MUST NOT write to
> +any other device configuration field.
> +
> +The driver MUST NOT write a value different than 0 or 1 to
> +\field{bypass}. The driver SHOULD ignore bits 1-7 of
> +\field{bypass}.
>
> \devicenormative{\subsubsection}{Device configuration layout}{Device
> Types / IOMMU Device / Device configuration layout}
>
> @@ -110,17 +130,18 @@ \subsection{Device configuration
> layout}\label{sec:Device Types / IOMMU Device /
> the page granularity. The device MAY set more than one bit in
> \field{page_size_mask}.
>
> +If the device offers the VIRTIO_IOMMU_F_BYPASS_CONFIG feature, it
> +MAY initialize the \field{bypass} field to 1. Field
> +\field{bypass} SHOULD NOT change on device reset, but SHOULD be
> +restored to its initial value on system reset.
> +
> +The device MUST NOT present a value different than 0 or 1 in
> +\field{bypass}.
> +
> \subsection{Device initialization}\label{sec:Device Types / IOMMU Device /
> Device initialization}
>
> When the device is reset, endpoints are not attached to any domain.
>
> -If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all accesses from
> -unattached endpoints are allowed and translated by the IOMMU using the
> -identity function. If the feature is not negotiated, any memory access
> -from an unattached endpoint fails. Upon attaching an endpoint in
> -bypass mode to a new domain, any memory access from the endpoint fails,
> -since the domain does not contain any mapping.
> -
> Future devices might support more modes of operation besides
> MAP/UNMAP.
> Drivers verify that devices set VIRTIO_IOMMU_F_MAP_UNMAP and fail
> gracefully if they don't.
> @@ -134,11 +155,6 @@ \subsection{Device initialization}\label{sec:Device
> Types / IOMMU Device / Devic
> VIRTIO_IOMMU_T_PROBE request for each endpoint before attaching the
> endpoint to a domain.
>
> -\devicenormative{\subsubsection}{Device Initialization}{Device Types /
> IOMMU Device / Device Initialization}
> -
> -If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
> -device SHOULD NOT let endpoints access the guest-physical address space.
> -
> \subsection{Device operations}\label{sec:Device Types / IOMMU Device /
> Device operations}
>
> Driver send requests on the request virtqueue, notifies the device and
> @@ -229,6 +245,27 @@ \subsection{Device operations}\label{sec:Device
> Types / IOMMU Device / Device op
> 64-bit address space (\texttt{start = 0, end = 0xffffffff ffffffff})
> \end{itemize}
>
> +An endpoint is in bypass mode if:
> +\begin{itemize}
> + \item the VIRTIO_IOMMU_F_BYPASS_CONFIG feature is offered and:
> + \begin{itemize}
> + \item config field \field{bypass} is 1 and the endpoint is
> + not attached to a domain. This applies even if the driver
> + does not accept the VIRTIO_IOMMU_F_BYPASS_CONFIG feature
> + and the device initializes field \field{bypass} to 1.
> +
> + or
> + \item the endpoint is attached to a domain with
> + VIRTIO_IOMMU_ATTACH_F_BYPASS.
> + \end{itemize}
> + or
> + \item the VIRTIO_IOMMU_F_BYPASS feature is negotiated and the
> + endpoint is not attached to a domain.
> +\end{itemize}
> +
> +All accesses from an endpoint in bypass mode are allowed and
> +translated by the IOMMU using the identity function.
> +
> \drivernormative{\subsubsection}{Device operations}{Device Types /
> IOMMU Device / Device operations}
>
> The driver SHOULD set field \field{reserved} of struct
> @@ -259,6 +296,9 @@ \subsection{Device operations}\label{sec:Device
> Types / IOMMU Device / Device op
> virtio_iommu_req_head and SHOULD set field \field{reserved} of struct
> virtio_iommu_req_tail to zero.
>
> +The device SHOULD NOT let unattached endpoints that are not in
> +bypass mode access the guest-physical address space.
> +
> \subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device /
> Device operations / ATTACH request}
>
> \begin{lstlisting}
> @@ -266,9 +306,12 @@ \subsubsection{ATTACH request}\label{sec:Device
> Types / IOMMU Device / Device op
> struct virtio_iommu_req_head head;
> le32 domain;
> le32 endpoint;
> - u8 reserved[8];
> + le32 flags;
> + u8 reserved[4];
> struct virtio_iommu_req_tail tail;
> };
> +
> +#define VIRTIO_IOMMU_ATTACH_F_BYPASS (1 << 0)
> \end{lstlisting}
>
> Attach an endpoint to a domain. \field{domain} uniquely identifies a
> @@ -294,6 +337,10 @@ \subsubsection{ATTACH request}\label{sec:Device
> Types / IOMMU Device / Device op
> attached to a single domain at a time. Endpoints attached to different
> domains are isolated from each other.
>
> +When the VIRTIO_IOMMU_F_BYPASS_CONFIG is negotiated, the driver
> +can set the VIRTIO_IOMMU_ATTACH_F_BYPASS flag to create a bypass
> +domain. Endpoints attached to this domain are in bypass mode.
> +
> \drivernormative{\paragraph}{ATTACH request}{Device Types / IOMMU
> Device / Device operations / ATTACH request}
>
> The driver SHOULD set \field{reserved} to zero.
> @@ -301,11 +348,20 @@ \subsubsection{ATTACH request}\label{sec:Device
> Types / IOMMU Device / Device op
> The driver SHOULD ensure that endpoints that cannot be isolated from each
> other are attached to the same domain.
>
> +If the domain already exists and is a bypass domain, the driver
> +SHOULD set the VIRTIO_IOMMU_ATTACH_F_BYPASS flag. If the domain
> +exists and is not a bypass domain, the driver SHOULD NOT set the
> +VIRTIO_IOMMU_ATTACH_F_BYPASS flag.
> +
> \devicenormative{\paragraph}{ATTACH request}{Device Types / IOMMU
> Device / Device operations / ATTACH request}
>
> If the \field{reserved} field of an ATTACH request is not zero, the device
> MUST reject the request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
>
> +If the device does not recognize a \field{flags} bit, it MUST
> +reject the request and set \field{status} to
> +VIRTIO_IOMMU_S_INVAL.
> +
> If the endpoint identified by \field{endpoint} doesn't exist, the device
> MUST reject the request and set \field{status} to VIRTIO_IOMMU_S_NOENT.
>
> @@ -314,6 +370,10 @@ \subsubsection{ATTACH request}\label{sec:Device
> Types / IOMMU Device / Device op
> \field{endpoint} to the domain. If it cannot do so, the device MUST reject
> the request and set \field{status} to VIRTIO_IOMMU_S_UNSUPP.
>
> +If the domain already exists and the VIRTIO_IOMMU_ATTACH_F_BYPASS
> +flag is not consistent with that domain, the device SHOULD reject
> +the request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
> +
> If the endpoint identified by \field{endpoint} is already attached to
> another domain, then the device SHOULD first detach it from that domain
> and attach it to the one identified by \field{domain}. In that case the
> @@ -342,11 +402,10 @@ \subsubsection{DETACH request}
> };
> \end{lstlisting}
>
> -Detach an endpoint from a domain. When this request completes, the
> -endpoint cannot access any mapping from that domain anymore. If feature
> -VIRTIO_IOMMU_F_BYPASS has been negotiated, then once this request
> -completes all accesses from the endpoint are allowed and translated by the
> -IOMMU using the identity function.
> +Detach an endpoint from a domain. When this request completes,
> +the endpoint cannot access any mapping from that domain anymore.
> +However the endpoint may then be in bypass mode and access the
> +guest-physical address space.
>
> After all endpoints have been successfully detached from a domain, it
> ceases to exist and its ID can be reused by the driver for another domain.
> @@ -433,6 +492,8 @@ \subsubsection{MAP request}\label{sec:Device Types
> / IOMMU Device / Device opera
>
> The driver SHOULD set undefined \field{flags} bits to zero.
>
> +The driver SHOULD NOT send MAP requests on a bypass domain.
> +
> \field{virt_end} MUST be strictly greater than \field{virt_start}.
>
> The driver SHOULD set the VIRTIO_IOMMU_MAP_F_MMIO flag when the
> physical
> @@ -461,6 +522,9 @@ \subsubsection{MAP request}\label{sec:Device Types
> / IOMMU Device / Device opera
> If \field{domain} does not exist, the device SHOULD reject the request and
> set \field{status} to VIRTIO_IOMMU_S_NOENT.
>
> +If the domain is a bypass domain, the device SHOULD reject the
> +request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
> +
> The device MUST NOT allow writes to a range mapped without the
> VIRTIO_IOMMU_MAP_F_WRITE flag. However, if the underlying architecture
> does not support write-only mappings, the device MAY allow reads to a
> @@ -538,6 +602,8 @@ \subsubsection{UNMAP request}\label{sec:Device
> Types / IOMMU Device / Device ope
> or be outside any mapping. The last address of a range MUST either be the
> last address of a mapping or be outside any mapping.
>
> +The driver SHOULD NOT send UNMAP requests on a bypass domain.
> +
> \devicenormative{\paragraph}{UNMAP request}{Device Types / IOMMU
> Device / Device operations / UNMAP request}
>
> If the \field{reserved} field of an UNMAP request is not zero, the device
> @@ -547,6 +613,9 @@ \subsubsection{UNMAP request}\label{sec:Device
> Types / IOMMU Device / Device ope
> If \field{domain} does not exist, the device SHOULD set the request
> \field{status} to VIRTIO_IOMMU_S_NOENT.
>
> +If the domain is a bypass domain, the device SHOULD reject the
> +request and set \field{status} to VIRTIO_IOMMU_S_INVAL.
> +
> If a mapping affected by the range is not covered in its entirety by the
> range (the UNMAP request would split the mapping), then the device
> SHOULD
> set the request \field{status} to VIRTIO_IOMMU_S_RANGE, and SHOULD
> NOT
> --
> 2.33.0
^ permalink raw reply [flat|nested] 6+ messages in thread* [virtio-dev] Re: [PATCH v3] virtio-iommu: Rework the bypass feature
2021-10-22 12:12 [PATCH v3] virtio-iommu: Rework the bypass feature Jean-Philippe Brucker
2021-10-31 17:50 ` Eric Auger
2021-11-03 1:14 ` Tian, Kevin
@ 2021-11-03 13:01 ` Cornelia Huck
2021-11-11 10:26 ` [virtio-dev] " Halil Pasic
3 siblings, 0 replies; 6+ messages in thread
From: Cornelia Huck @ 2021-11-03 13:01 UTC (permalink / raw)
To: Jean-Philippe Brucker, virtio-dev, eric.auger, kevin.tian
Cc: sebastien.boeuf, mst, Jean-Philippe Brucker
On Fri, Oct 22 2021, Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> The VIRTIO_IOMMU_F_BYPASS feature is awkward to use and incomplete.
> Although it is implemented by QEMU, it is not supported by any driver as
> far as I know. Replace it with a new VIRTIO_IOMMU_F_BYPASS_CONFIG
> feature.
>
> Two features are missing from virtio-iommu:
>
> * The ability for an hypervisor to start the device in bypass mode. The
> wording for VIRTIO_IOMMU_F_BYPASS is not clear enough to allow it at
> the moment, because it only specifies the behavior after feature
> negotiation.
>
> * The ability for a guest to set individual endpoints in bypass mode
> when bypass is globally disabled. An OS should have the ability to
> allow only endpoints it trusts to bypass the IOMMU, while keeping DMA
> disabled for endpoints it isn't even aware of. At the moment this can
> only be emulated by creating identity mappings.
>
> The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a 'bypass' config field
> that allows to enable and disable bypass globally. It also adds a new
> flag for the ATTACH request.
>
> * The hypervisor can start the VM with bypass enabled or, if it knows
> that the software stack supports it, disabled. The 'bypass' config
> fields is initialized to 0 or 1. It is sticky and isn't affected by
> device reset.
>
> * Generally the firmware won't have an IOMMU driver and will need to be
> started in bypass mode, so the bootloader and kernel can be loaded
> from storage endpoint.
>
> For more security, the firmware could implement a minimal virtio-iommu
> driver that reuses existing virtio support and only touches the config
> space. It could enable PCI bus mastering in bridges only for the
> endpoints that need it, enable global IOMMU bypass by flipping a bit,
> then tear everything down before handing control over to the OS. This
> prevents vulnerability windows where a malicious endpoint reprograms
> the IOMMU while the OS is configuring it [1].
>
> The isolation provided by vIOMMUs has mainly been used for securely
> assigning endpoints to untrusted applications so far, while kernel DMA
> bypasses the IOMMU. But we can expect boot security to become as
> important in virtualization as it presently is on bare-metal systems,
> where some devices are untrusted and must never be able to access
> memory that wasn't assigned to them.
>
> * The OS can enable and disable bypass globally. It can then enable
> bypass for individual endpoints by attaching them to bypass domains,
> using the new VIRTIO_IOMMU_ATTACH_F_BYPASS flag. It can disable bypass
> by attaching them to normal domains.
>
> [1] IOMMU protection against I/O attacks: a vulnerability and a proof of concept
> Morgan, B., Alata, É., Nicomette, V. et al.
> https://link.springer.com/article/10.1186/s13173-017-0066-7
>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/119
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>
> The virtio-iommu spec with colored diff is available at
> https://jpbrucker.net/virtio-iommu/spec-bypass/virtio-iommu-f-bypass-config-v3-diff.pdf
>
> v1: https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07817.html
> v2: https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg07898.html
> v3:
> * Normative statement about device reset vs. system reset - the bypass
> bit is sticky across device reset to avoid the vulnerability described
> above, but restored on system reset (a term also used by the virtio
> memory device).
> * Explain that the field bypass is in effect as long as the new feature
> is offered, even when not accepted by the driver
> * Another clarification about the state of an endpoint after detach.
> ---
> conformance.tex | 1 -
> virtio-iommu.tex | 111 ++++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 90 insertions(+), 22 deletions(-)
Now looks good to me.
Michael, do you think we should start voting?
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [virtio-dev] [PATCH v3] virtio-iommu: Rework the bypass feature
2021-10-22 12:12 [PATCH v3] virtio-iommu: Rework the bypass feature Jean-Philippe Brucker
` (2 preceding siblings ...)
2021-11-03 13:01 ` [virtio-dev] " Cornelia Huck
@ 2021-11-11 10:26 ` Halil Pasic
2021-11-11 11:09 ` Jean-Philippe Brucker
3 siblings, 1 reply; 6+ messages in thread
From: Halil Pasic @ 2021-11-11 10:26 UTC (permalink / raw)
To: Jean-Philippe Brucker
Cc: virtio-dev, eric.auger, kevin.tian, sebastien.boeuf, cohuck, mst,
Halil Pasic
On Fri, 22 Oct 2021 13:12:20 +0100
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> The VIRTIO_IOMMU_F_BYPASS feature is awkward to use and incomplete.
> Although it is implemented by QEMU, it is not supported by any driver as
> far as I know. Replace it with a new VIRTIO_IOMMU_F_BYPASS_CONFIG
> feature.
Hi Jean-Philippe!
Is somebody working on the implementation of this? I would like to be
Cc-ed with the implementation patches if possible. The interaction of
s390x and IOMMU is sometimes a little tricky. I would like to make sure
nothing breaks for us. :)
Regards,
Halil
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [virtio-dev] [PATCH v3] virtio-iommu: Rework the bypass feature
2021-11-11 10:26 ` [virtio-dev] " Halil Pasic
@ 2021-11-11 11:09 ` Jean-Philippe Brucker
0 siblings, 0 replies; 6+ messages in thread
From: Jean-Philippe Brucker @ 2021-11-11 11:09 UTC (permalink / raw)
To: Halil Pasic
Cc: virtio-dev, eric.auger, kevin.tian, sebastien.boeuf, cohuck, mst
Hi Halil,
On Thu, Nov 11, 2021 at 11:26:36AM +0100, Halil Pasic wrote:
> On Fri, 22 Oct 2021 13:12:20 +0100
> Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
>
> > The VIRTIO_IOMMU_F_BYPASS feature is awkward to use and incomplete.
> > Although it is implemented by QEMU, it is not supported by any driver as
> > far as I know. Replace it with a new VIRTIO_IOMMU_F_BYPASS_CONFIG
> > feature.
>
> Hi Jean-Philippe!
>
> Is somebody working on the implementation of this? I would like to be
> Cc-ed with the implementation patches if possible.
Yes I posted the Linux support:
https://lore.kernel.org/linux-iommu/20211013121052.518113-1-jean-philippe@linaro.org/
And QEMU, which depends on the Linux header:
https://lore.kernel.org/qemu-devel/20210930185050.262759-1-jean-philippe@linaro.org/
I plan to resend after the merge window and will Cc you.
> The interaction of s390x and IOMMU is sometimes a little tricky. I would
> like to make sure nothing breaks for us. :)
Bypass should be constrained within the virtio-iommu device and driver,
and so shouldn't break s390 as far as I know. I haven't thought much about
supporting virtio-iommu for archs other than arm and x86 for the moment
(in general it should be as simple as adding it to ACPI tables or DT).
I believe power already has a PV IOMMU interface and doesn't need a
dedicated vIOMMU device, but I'm not sure where s390 stands
Thanks,
Jean
^ permalink raw reply [flat|nested] 6+ messages in thread