All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-dev] [RFC] virtio-pmem: PMEM device spec
@ 2019-03-26 14:35 Pankaj Gupta
  2019-04-10 10:02 ` [virtio-dev] " Cornelia Huck
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Pankaj Gupta @ 2019-03-26 14:35 UTC (permalink / raw)
  To: virtio-dev
  Cc: mst, david, riel, stefanha, dan.j.williams, aarcange, cohuck,
	lcapitulino, nilal

This patch proposes a virtio specification for new
virtio pmem device. Virtio pmem is a paravirtualized 
device which allows the guest to bypass page cache.
Previous posting of kernel driver is [1] and Qemu 
device is [2]. Specification has introduction of 
virtio pmem device with other implementation details. 

I have also listed concerns with page cache side channel
attacks in previous kernel driver posting [1] and 
possible countermeasures based on discussion [4].
I have also created an virtio-spec issue [5] for this.

Request to provide feedback on device specification.

[1] https://lkml.org/lkml/2019/1/9/471 
[2] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
[3] https://lkml.org/lkml/2019/1/9/541
[4] https://lkml.org/lkml/2019/2/4/1151
[5] https://github.com/oasis-tcs/virtio-spec/issues/38

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 conformance.tex |  14 ++++++
 content.tex     |   3 ++
 virtio-pmem.tex | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)
 create mode 100644 virtio-pmem.tex

diff --git a/conformance.tex b/conformance.tex
index ad7e82e..2133a7c 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -181,6 +181,20 @@ A socket driver MUST conform to the following normative statements:
 \item \ref{drivernormative:Device Types / Socket Device / Device Operation / Device Events}
 \end{itemize}
 
+\subsection{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance}
+
+A PMEM driver MUST conform to the following normative statements:
+
+\begin{itemize}
+\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization}
+\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access}
+\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush}
+\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command}
+\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush}
+\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return}
+\end{itemize}
+
+
 \section{Device Conformance}\label{sec:Conformance / Device Conformance}
 
 A device MUST conform to the following normative statements:
diff --git a/content.tex b/content.tex
index ede0ef6..6077d32 100644
--- a/content.tex
+++ b/content.tex
@@ -2634,6 +2634,8 @@ Device ID  &  Virtio Device    \\
 \hline
 24         &   Memory device \\
 \hline
+25         &   PMEM device \\
+\hline
 \end{tabular}
 
 Some of the devices above are unspecified by this document,
@@ -5595,6 +5597,7 @@ descriptor for the \field{sense_len}, \field{residual},
 \input{virtio-input.tex}
 \input{virtio-crypto.tex}
 \input{virtio-vsock.tex}
+\input{virtio-pmem.tex}
 
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
diff --git a/virtio-pmem.tex b/virtio-pmem.tex
new file mode 100644
index 0000000..04e07bb
--- /dev/null
+++ b/virtio-pmem.tex
@@ -0,0 +1,134 @@
+\section{PMEM Device}\label{sec:Device Types / PMEM Device}
+
+The virtio pmem is a fake persistent memory (NVDIMM) device
+used to bypass the guest page cache and provide a virtio
+based asynchronous flush mechanism. This avoids the need
+of a separate page cache in guest and keeps page cache only
+in the host. Under memory pressure, the host makes use of
+effecient memory reclaim decisions for page cache pages
+of all the guests. This helps to reduce the memory footprint
+and fit more guests in the host system.
+
+\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
+  25
+
+\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
+\begin{description}
+\item[0] req_vq
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
+
+There are currently no feature bits defined for this device.
+
+\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
+
+\begin{lstlisting}
+struct virtio_pmem_config {
+	le64 start;
+	le64 size;
+};
+\end{lstlisting}
+
+\field{start} contains the guest physical address of the start of the
+physical address range to be hotplugged into the guest address space
+using the pmem API.
+\field{size} contains the length of this address range.
+
+\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
+
+Device hotplugs physical memory to guest address space. Persistent memory device
+is emulated with file backed memory at host side.
+
+\begin{enumerate}
+\item Guest vpmem start is read from \field{start}.
+\item Guest vpmem end is read from \field{size}.
+\end{enumerate}
+
+\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
+
+File backed memory SHOULD be memory mapped to guest address space with SHARED
+memory mapping.
+
+\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
+
+Driver hotplugs the physical memory and registers associated
+region with the pmem API. Also, configures a flush callback
+function with the corresponding region.
+
+\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
+
+Driver SHOULD enable filesystem direct access operations for
+read/write on the device.
+
+\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
+
+Driver SHOULD add implement a virtio based flush callback.
+Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
+when virtio flush is configured.
+
+\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
+\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
+
+Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
+blocks guest userspace process utill host completes fsync/flush.
+Driver SHOULD handle multiple fsync requests on files present
+on the device.
+
+\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
+
+\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
+
+Device SHOULD handle multiple flush requests simultaneously using
+host filesystem fsync/flush call.
+
+\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
+
+Device SHOULD return integer '0' for success and '-1' for failure.
+These errors are converted to corresponding error codes by guest as
+per architecture.
+
+\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
+
+There could be potential security implications depending on how
+memory mapped host backing file is used. By default device emulation
+is done with SHARED mapping. There is a contract between guest and host
+process to access same backing file for read/write operations.
+
+If a malicious guest or host userspace map the same backing file,
+attacking process can make use of known cache side channel attacks
+to predict the current state of shared page cache page. If both
+attacker and victim somehow execute same shared code after a
+flush/evict call, with difference in execution timing attacker
+could infer another guest local data or host data. Though this is
+not easy and same challenges exist as with bare metal host system
+when userspace share same backing file.
+
+\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
+
+\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
+
+If device backing backing file is shared with multiple guests or host
+processes, this may act as a metric for page cache side channel attack.
+As a counter measure every guest should have its own(not shared with
+another guest) SHARED backing file and gets populated a per host process
+page cache pages.
+
+\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
+There maybe be chances of side channels attack with PRIVATE
+memory mapping similar to SHARED with read-only shared mappings.
+PRIVATE is not used for virtio pmem making this usecase
+irrelevant.
+
+\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
+For SHARED mapping, if workload is single application inside
+guest and there is no risk with sharing of data between guests.
+Guest sharing same backing file with SHARED mapping can be
+used as a valid configuration.
+
+\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
+Don't allow cache evict from guest filesystem trim/discard command
+with virtio pmem. This rules out any possibility of evict-reload
+page cache side channel attacks if backing disk is shared(SHARED)
+with mutliple guests. Though if we use per device backing file with
+shared mapping this countermeasure is not required.
-- 
2.14.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-03-26 14:35 [virtio-dev] [RFC] virtio-pmem: PMEM device spec Pankaj Gupta
@ 2019-04-10 10:02 ` Cornelia Huck
  2019-04-10 10:33   ` Pankaj Gupta
  2019-04-10 14:42 ` Stefan Hajnoczi
  2019-06-21 20:33 ` [virtio-dev] " Michael S. Tsirkin
  2 siblings, 1 reply; 11+ messages in thread
From: Cornelia Huck @ 2019-04-10 10:02 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: virtio-dev, mst, david, riel, stefanha, dan.j.williams, aarcange,
	lcapitulino, nilal

On Tue, 26 Mar 2019 20:05:07 +0530
Pankaj Gupta <pagupta@redhat.com> wrote:

> This patch proposes a virtio specification for new
> virtio pmem device. Virtio pmem is a paravirtualized 
> device which allows the guest to bypass page cache.
> Previous posting of kernel driver is [1] and Qemu 
> device is [2]. Specification has introduction of 
> virtio pmem device with other implementation details. 
> 
> I have also listed concerns with page cache side channel
> attacks in previous kernel driver posting [1] and 
> possible countermeasures based on discussion [4].
> I have also created an virtio-spec issue [5] for this.
> 
> Request to provide feedback on device specification.
> 
> [1] https://lkml.org/lkml/2019/1/9/471 
> [2] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
> [3] https://lkml.org/lkml/2019/1/9/541
> [4] https://lkml.org/lkml/2019/2/4/1151
> [5] https://github.com/oasis-tcs/virtio-spec/issues/38
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  conformance.tex |  14 ++++++
>  content.tex     |   3 ++
>  virtio-pmem.tex | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 151 insertions(+)
>  create mode 100644 virtio-pmem.tex
> 
> diff --git a/conformance.tex b/conformance.tex
> index ad7e82e..2133a7c 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -181,6 +181,20 @@ A socket driver MUST conform to the following normative statements:
>  \item \ref{drivernormative:Device Types / Socket Device / Device Operation / Device Events}
>  \end{itemize}
>  
> +\subsection{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance}
> +
> +A PMEM driver MUST conform to the following normative statements:
> +
> +\begin{itemize}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return}

This section must only list the _driver_normative statements. The
devicenormative statements must be moved to a PMEM device conformance
section below.

> +\end{itemize}
> +
> +
>  \section{Device Conformance}\label{sec:Conformance / Device Conformance}
>  
>  A device MUST conform to the following normative statements:
> diff --git a/content.tex b/content.tex
> index ede0ef6..6077d32 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2634,6 +2634,8 @@ Device ID  &  Virtio Device    \\
>  \hline
>  24         &   Memory device \\
>  \hline
> +25         &   PMEM device \\
> +\hline

Probably better to split this out, as you already sent a out a patch
reserving the id.

I sure hope we'll be able to process the outstanding device id
reservations quickly once 1.1 is out of the door :/ 

>  \end{tabular}
>  
>  Some of the devices above are unspecified by this document,
> @@ -5595,6 +5597,7 @@ descriptor for the \field{sense_len}, \field{residual},
>  \input{virtio-input.tex}
>  \input{virtio-crypto.tex}
>  \input{virtio-vsock.tex}
> +\input{virtio-pmem.tex}
>  
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> new file mode 100644
> index 0000000..04e07bb
> --- /dev/null
> +++ b/virtio-pmem.tex
> @@ -0,0 +1,134 @@
> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> +
> +The virtio pmem is a fake persistent memory (NVDIMM) device
> +used to bypass the guest page cache and provide a virtio
> +based asynchronous flush mechanism. This avoids the need
> +of a separate page cache in guest and keeps page cache only

s/of/for/
s/guest/the guest/
s/page cache/the page cache/

> +in the host. Under memory pressure, the host makes use of

s/makes use/is expected to make use/
?

> +effecient memory reclaim decisions for page cache pages
> +of all the guests. This helps to reduce the memory footprint
> +and fit more guests in the host system.
> +
> +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> +  25
> +
> +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> +\begin{description}
> +\item[0] req_vq
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> +
> +\begin{lstlisting}
> +struct virtio_pmem_config {
> +	le64 start;
> +	le64 size;
> +};
> +\end{lstlisting}
> +
> +\field{start} contains the guest physical address of the start of the
> +physical address range to be hotplugged into the guest address space
> +using the pmem API.

I think we need a reference to what the pmem API is somewhere.

> +\field{size} contains the length of this address range.
> +
> +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> +
> +Device hotplugs physical memory to guest address space. Persistent memory device

s/Device/The device/

> +is emulated with file backed memory at host side.

Is "file backed memory" generic enough, or is the concept too
unix-specific?

(I won't comment further on the memory-specific wording in here, will
leave that to folks more familiar with that area.)

> +
> +\begin{enumerate}
> +\item Guest vpmem start is read from \field{start}.
> +\item Guest vpmem end is read from \field{size}.
> +\end{enumerate}
> +
> +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> +
> +File backed memory SHOULD be memory mapped to guest address space with SHARED
> +memory mapping.
> +
> +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> +
> +Driver hotplugs the physical memory and registers associated

s/Driver/The driver/

(more instances of that below)

> +region with the pmem API. Also, configures a flush callback
> +function with the corresponding region.
> +
> +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
> +
> +Driver SHOULD enable filesystem direct access operations for
> +read/write on the device.
> +
> +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +
> +Driver SHOULD add implement a virtio based flush callback.
> +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
> +when virtio flush is configured.
> +
> +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +
> +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
> +blocks guest userspace process utill host completes fsync/flush.

s/utill/until/

> +Driver SHOULD handle multiple fsync requests on files present
> +on the device.
> +
> +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> +
> +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +
> +Device SHOULD handle multiple flush requests simultaneously using

s/Device/The device/

(more below)

> +host filesystem fsync/flush call.
> +
> +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
> +
> +Device SHOULD return integer '0' for success and '-1' for failure.
> +These errors are converted to corresponding error codes by guest as
> +per architecture.

I think that is an implementation-specific detail that does not touch
the device/driver interface?

> +
> +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
> +
> +There could be potential security implications depending on how
> +memory mapped host backing file is used. By default device emulation
> +is done with SHARED mapping. There is a contract between guest and host
> +process to access same backing file for read/write operations.
> +
> +If a malicious guest or host userspace map the same backing file,
> +attacking process can make use of known cache side channel attacks
> +to predict the current state of shared page cache page. If both
> +attacker and victim somehow execute same shared code after a
> +flush/evict call, with difference in execution timing attacker
> +could infer another guest local data or host data. Though this is
> +not easy and same challenges exist as with bare metal host system
> +when userspace share same backing file.
> +
> +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
> +
> +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
> +
> +If device backing backing file is shared with multiple guests or host
> +processes, this may act as a metric for page cache side channel attack.
> +As a counter measure every guest should have its own(not shared with
> +another guest) SHARED backing file and gets populated a per host process
> +page cache pages.
> +
> +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
> +There maybe be chances of side channels attack with PRIVATE
> +memory mapping similar to SHARED with read-only shared mappings.
> +PRIVATE is not used for virtio pmem making this usecase
> +irrelevant.
> +
> +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
> +For SHARED mapping, if workload is single application inside
> +guest and there is no risk with sharing of data between guests.
> +Guest sharing same backing file with SHARED mapping can be
> +used as a valid configuration.
> +
> +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
> +Don't allow cache evict from guest filesystem trim/discard command
> +with virtio pmem. This rules out any possibility of evict-reload
> +page cache side channel attacks if backing disk is shared(SHARED)
> +with mutliple guests. Though if we use per device backing file with
> +shared mapping this countermeasure is not required.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-04-10 10:02 ` [virtio-dev] " Cornelia Huck
@ 2019-04-10 10:33   ` Pankaj Gupta
  0 siblings, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2019-04-10 10:33 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, mst, david, riel, stefanha, dan j williams, aarcange,
	lcapitulino, nilal


Hi Cornelia,

Thank you for the review. Please find my reply inline.

> 
> > This patch proposes a virtio specification for new
> > virtio pmem device. Virtio pmem is a paravirtualized
> > device which allows the guest to bypass page cache.
> > Previous posting of kernel driver is [1] and Qemu
> > device is [2]. Specification has introduction of
> > virtio pmem device with other implementation details.
> > 
> > I have also listed concerns with page cache side channel
> > attacks in previous kernel driver posting [1] and
> > possible countermeasures based on discussion [4].
> > I have also created an virtio-spec issue [5] for this.
> > 
> > Request to provide feedback on device specification.
> > 
> > [1] https://lkml.org/lkml/2019/1/9/471
> > [2] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
> > [3] https://lkml.org/lkml/2019/1/9/541
> > [4] https://lkml.org/lkml/2019/2/4/1151
> > [5] https://github.com/oasis-tcs/virtio-spec/issues/38
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  conformance.tex |  14 ++++++
> >  content.tex     |   3 ++
> >  virtio-pmem.tex | 134
> >  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 151 insertions(+)
> >  create mode 100644 virtio-pmem.tex
> > 
> > diff --git a/conformance.tex b/conformance.tex
> > index ad7e82e..2133a7c 100644
> > --- a/conformance.tex
> > +++ b/conformance.tex
> > @@ -181,6 +181,20 @@ A socket driver MUST conform to the following
> > normative statements:
> >  \item \ref{drivernormative:Device Types / Socket Device / Device Operation
> >  / Device Events}
> >  \end{itemize}
> >  
> > +\subsection{PMEM Driver Conformance}\label{sec:Conformance / Driver
> > Conformance / PMEM Driver Conformance}
> > +
> > +A PMEM driver MUST conform to the following normative statements:
> > +
> > +\begin{itemize}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device
> > Initialization}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver
> > Initialization / Direct access}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver
> > Initialization / Virtio flush}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation /
> > Virtqueue command}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation /
> > Virtqueue flush}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation /
> > Virtqueue return}
> 
> This section must only list the _driver_normative statements. The
> devicenormative statements must be moved to a PMEM device conformance
> section below.

Sure, will change.

> 
> > +\end{itemize}
> > +
> > +
> >  \section{Device Conformance}\label{sec:Conformance / Device Conformance}
> >  
> >  A device MUST conform to the following normative statements:
> > diff --git a/content.tex b/content.tex
> > index ede0ef6..6077d32 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -2634,6 +2634,8 @@ Device ID  &  Virtio Device    \\
> >  \hline
> >  24         &   Memory device \\
> >  \hline
> > +25         &   PMEM device \\
> > +\hline
> 
> Probably better to split this out, as you already sent a out a patch
> reserving the id.

Yes. o.k.

> 
> I sure hope we'll be able to process the outstanding device id
> reservations quickly once 1.1 is out of the door :/

o.k. Thank you!

> 
> >  \end{tabular}
> >  
> >  Some of the devices above are unspecified by this document,
> > @@ -5595,6 +5597,7 @@ descriptor for the \field{sense_len},
> > \field{residual},
> >  \input{virtio-input.tex}
> >  \input{virtio-crypto.tex}
> >  \input{virtio-vsock.tex}
> > +\input{virtio-pmem.tex}
> >  
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  
> > diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> > new file mode 100644
> > index 0000000..04e07bb
> > --- /dev/null
> > +++ b/virtio-pmem.tex
> > @@ -0,0 +1,134 @@
> > +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> > +
> > +The virtio pmem is a fake persistent memory (NVDIMM) device
> > +used to bypass the guest page cache and provide a virtio
> > +based asynchronous flush mechanism. This avoids the need
> > +of a separate page cache in guest and keeps page cache only
> 
> s/of/for/
> s/guest/the guest/
> s/page cache/the page cache/

will change this as suggested.

> 
> > +in the host. Under memory pressure, the host makes use of
> 
> s/makes use/is expected to make use/
> ?

o.k

> 
> > +effecient memory reclaim decisions for page cache pages
> > +of all the guests. This helps to reduce the memory footprint
> > +and fit more guests in the host system.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> > +  25
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> > +\begin{description}
> > +\item[0] req_vq
> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature
> > bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM
> > Device / Device configuration layout}
> > +
> > +\begin{lstlisting}
> > +struct virtio_pmem_config {
> > +	le64 start;
> > +	le64 size;
> > +};
> > +\end{lstlisting}
> > +
> > +\field{start} contains the guest physical address of the start of the
> > +physical address range to be hotplugged into the guest address space
> > +using the pmem API.
> 
> I think we need a reference to what the pmem API is somewhere.

Sure.

> 
> > +\field{size} contains the length of this address range.
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device /
> > Device Initialization}
> > +
> > +Device hotplugs physical memory to guest address space. Persistent memory
> > device
> 
> s/Device/The device/

o.k

> 
> > +is emulated with file backed memory at host side.
> 
> Is "file backed memory" generic enough, or is the concept too
> unix-specific?

I will change it to "file backed mapping"?

> 
> (I won't comment further on the memory-specific wording in here, will
> leave that to folks more familiar with that area.)
> 
> > +
> > +\begin{enumerate}
> > +\item Guest vpmem start is read from \field{start}.
> > +\item Guest vpmem end is read from \field{size}.
> > +\end{enumerate}
> > +
> > +\devicenormative{\subsubsection}{Device Initialization}{Device Types /
> > PMEM Device / Device Initialization}
> > +
> > +File backed memory SHOULD be memory mapped to guest address space with
> > SHARED
> > +memory mapping.
> > +
> > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver /
> > Driver Initialization}
> > +
> > +Driver hotplugs the physical memory and registers associated
> 
> s/Driver/The driver/
> 
> (more instances of that below)

o.k

> 
> > +region with the pmem API. Also, configures a flush callback
> > +function with the corresponding region.
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct
> > access}{Device Types / PMEM Driver / Driver Initialization / Direct
> > access}
> > +
> > +Driver SHOULD enable filesystem direct access operations for
> > +read/write on the device.
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Virtio
> > flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> > +
> > +Driver SHOULD add implement a virtio based flush callback.
> > +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
> > +when virtio flush is configured.
> > +
> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver /
> > Driver Operation}
> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue
> > command}{Device Types / PMEM Driver / Driver Operation / Virtqueue
> > command}
> > +
> > +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
> > +blocks guest userspace process utill host completes fsync/flush.
> 
> s/utill/until/

o.k

> 
> > +Driver SHOULD handle multiple fsync requests on files present
> > +on the device.
> > +
> > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver /
> > Device Operation}
> > +
> > +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM
> > Device / Device Operation / Virtqueue flush}
> > +
> > +Device SHOULD handle multiple flush requests simultaneously using
> 
> s/Device/The device/
> 
> (more below)

o.k

> 
> > +host filesystem fsync/flush call.
> > +
> > +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM
> > Device / Device Operation / Virtqueue return}
> > +
> > +Device SHOULD return integer '0' for success and '-1' for failure.
> > +These errors are converted to corresponding error codes by guest as
> > +per architecture.
> 
> I think that is an implementation-specific detail that does not touch
> the device/driver interface?

yes. I thought we should explain about device return values and accordingly
driver handles them.

Best regards,
Pankaj

> 
> > +
> > +\subsection{Possible security implications}\label{sec:Device Types / PMEM
> > Device / Possible Security Implications}
> > +
> > +There could be potential security implications depending on how
> > +memory mapped host backing file is used. By default device emulation
> > +is done with SHARED mapping. There is a contract between guest and host
> > +process to access same backing file for read/write operations.
> > +
> > +If a malicious guest or host userspace map the same backing file,
> > +attacking process can make use of known cache side channel attacks
> > +to predict the current state of shared page cache page. If both
> > +attacker and victim somehow execute same shared code after a
> > +flush/evict call, with difference in execution timing attacker
> > +could infer another guest local data or host data. Though this is
> > +not easy and same challenges exist as with bare metal host system
> > +when userspace share same backing file.
> > +
> > +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device /
> > Possible Security Implications / Countermeasures}
> > +
> > +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device
> > / Possible Security Implications / Countermeasures / SHARED}
> > +
> > +If device backing backing file is shared with multiple guests or host
> > +processes, this may act as a metric for page cache side channel attack.
> > +As a counter measure every guest should have its own(not shared with
> > +another guest) SHARED backing file and gets populated a per host process
> > +page cache pages.
> > +
> > +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device
> > / Possible Security Implications / Countermeasures / PRIVATE}
> > +There maybe be chances of side channels attack with PRIVATE
> > +memory mapping similar to SHARED with read-only shared mappings.
> > +PRIVATE is not used for virtio pmem making this usecase
> > +irrelevant.
> > +
> > +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM
> > Device / Possible Security Implications / Countermeasures / Workload}
> > +For SHARED mapping, if workload is single application inside
> > +guest and there is no risk with sharing of data between guests.
> > +Guest sharing same backing file with SHARED mapping can be
> > +used as a valid configuration.
> > +
> > +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM
> > Device / Possible Security Implications / Countermeasures / Cache
> > eviction}
> > +Don't allow cache evict from guest filesystem trim/discard command
> > +with virtio pmem. This rules out any possibility of evict-reload
> > +page cache side channel attacks if backing disk is shared(SHARED)
> > +with mutliple guests. Though if we use per device backing file with
> > +shared mapping this countermeasure is not required.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-03-26 14:35 [virtio-dev] [RFC] virtio-pmem: PMEM device spec Pankaj Gupta
  2019-04-10 10:02 ` [virtio-dev] " Cornelia Huck
@ 2019-04-10 14:42 ` Stefan Hajnoczi
  2019-04-10 14:56   ` David Hildenbrand
  2019-04-11  9:02   ` Pankaj Gupta
  2019-06-21 20:33 ` [virtio-dev] " Michael S. Tsirkin
  2 siblings, 2 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2019-04-10 14:42 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: virtio-dev, mst, david, riel, dan.j.williams, aarcange, cohuck,
	lcapitulino, nilal

[-- Attachment #1: Type: text/plain, Size: 7834 bytes --]

On Tue, Mar 26, 2019 at 08:05:07PM +0530, Pankaj Gupta wrote:
> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> new file mode 100644
> index 0000000..04e07bb
> --- /dev/null
> +++ b/virtio-pmem.tex
> @@ -0,0 +1,134 @@
> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> +
> +The virtio pmem is a fake persistent memory (NVDIMM) device
> +used to bypass the guest page cache and provide a virtio
> +based asynchronous flush mechanism. This avoids the need

Is there anything "fake" about virtio-pmem from the perspective of the
device interface?

What you are describing is one use case.  But on a platform that doesn't
have existing physical NVDIMM interfaces, ACPI, etc maybe virtio-pmem
would be used to pass through physical NVDIMMs from the host?  If you
agree, then it might make sense to describe the device simply as a
persistent memory device and give "fake NVDIMM" as an example use case
in a separate paragraph.  This would make the text more future-proof.

> +of a separate page cache in guest and keeps page cache only
> +in the host. Under memory pressure, the host makes use of
> +effecient memory reclaim decisions for page cache pages

efficient

Please use a spell-checker like aspell or ispell.

> +of all the guests. This helps to reduce the memory footprint
> +and fit more guests in the host system.
> +
> +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> +  25
> +
> +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> +\begin{description}
> +\item[0] req_vq
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> +
> +\begin{lstlisting}
> +struct virtio_pmem_config {
> +	le64 start;
> +	le64 size;
> +};
> +\end{lstlisting}
> +
> +\field{start} contains the guest physical address of the start of the
> +physical address range to be hotplugged into the guest address space
> +using the pmem API.

What pmem API?  This specification shouldn't be written from a Linux
kernel internals point of view.  It should be possible to implement BSD,
Windows, etc drivers too.

Just saying that this is the physical address of the start of the
persistent memory range is enough.

> +\field{size} contains the length of this address range.

"in bytes" would be nice because it makes the units explicit.

> +
> +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> +
> +Device hotplugs physical memory to guest address space. Persistent memory device
> +is emulated with file backed memory at host side.

File-backed memory is a detail of one possible use case.  The spec
shouldn't be tied to this single use case.

> +
> +\begin{enumerate}
> +\item Guest vpmem start is read from \field{start}.
> +\item Guest vpmem end is read from \field{size}.
> +\end{enumerate}

"vpmem" is not defined.  Please use consistent terminology.

I suggest structuring this subsection as follows:

  Initialization proceeds as follows:

  \begin{enumerate}
  \item The driver reads the physical start address from \field{start}.
  \item The driver reads the length of the persistent memory range from \field{size}.
  \end{enumerate}

> +
> +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> +
> +File backed memory SHOULD be memory mapped to guest address space with SHARED
> +memory mapping.

I'm not sure this point is appropriate for the virtio spec since it's a
device implementation detail.  This is outside the scope of the device
interface and should be dropped.

If you'd like to describe the file-backed memory use case, please do it
in a non-normative section and use the appropriate terminology
(MAP_SHARED) and explanations (shared mapping so that changes will be
written back to the file).

> +
> +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> +
> +Driver hotplugs the physical memory and registers associated
> +region with the pmem API. Also, configures a flush callback
> +function with the corresponding region.

There are Linux internals that don't need to be in the spec.  Some
kernels may not have a similar pmem API with a flush callback,
registration, etc.

Please describe the functionality instead:

  Memory stores to the persistent memory range are not guaranteed to be
  persistent without further action.  An explicit flush command is
  required to ensure persistence.  The req_vq is used to perform flush
  commands.

> +
> +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
> +
> +Driver SHOULD enable filesystem direct access operations for
> +read/write on the device.

This is specific to one possible use case for virtio-pmem and it's a
driver implementation detail, not a requirement of the device interface.

Imagine a unikernel application that doesn't have a file system.  They
just want to access the persistent memory range, they don't care about
filesystem direct access.

Please focus on the requirements/constraints/assumptions of the device
interface, not the implementation details of the device emulation or the
Linux guest driver.  That way the specification will be generally useful
and will age better as the use cases change.

> +
> +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +
> +Driver SHOULD add implement a virtio based flush callback.

If flush is optional, how should a device that does not implement it
behave:
1. Don't implement req_vq
Or
2. Fail flush commands with a certain error code
?

> +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
> +when virtio flush is configured.

What does this mean?

> +
> +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +
> +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
> +blocks guest userspace process utill host completes fsync/flush.

userspace processes are outside the scope of the virtio specification.
There may be no "userspace" inside the guest.  Please stick to
describing the device interface.

I think the intention here is:

  The VIRTIO_FLUSH command persists memory writes that were performed
  before the command was submitted.  Once the command completes the
  writes are guaranteed to be persistent.

And the normative part is:

  The driver MUST submit a VIRTIO_FLUSH command after performing memory
  writes that require persistence.

  The driver MUST wait for the VIRTIO_FLUSH command to complete before
  assuming previous writes are persistent.

> +Driver SHOULD handle multiple fsync requests on files present
> +on the device.

fsync is outside the scope of the device interface and, hence, this
specification.  Perhaps:

  The driver MAY have multiple VIRTIO_FLUSH commands pending, though
  each command completion only guarantees persistence of memory writes
  that were performed before the command was submitted.

> +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> +
> +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +
> +Device SHOULD handle multiple flush requests simultaneously using
> +host filesystem fsync/flush call.

fsync/flush is an implementation detail.  I'll stop reviewing at this
point to avoid repetition :).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-04-10 14:42 ` Stefan Hajnoczi
@ 2019-04-10 14:56   ` David Hildenbrand
  2019-04-11  9:12     ` Pankaj Gupta
  2019-04-11  9:02   ` Pankaj Gupta
  1 sibling, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2019-04-10 14:56 UTC (permalink / raw)
  To: Stefan Hajnoczi, Pankaj Gupta
  Cc: virtio-dev, mst, riel, dan.j.williams, aarcange, cohuck,
	lcapitulino, nilal

On 10.04.19 16:42, Stefan Hajnoczi wrote:
> On Tue, Mar 26, 2019 at 08:05:07PM +0530, Pankaj Gupta wrote:
>> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
>> new file mode 100644
>> index 0000000..04e07bb
>> --- /dev/null
>> +++ b/virtio-pmem.tex
>> @@ -0,0 +1,134 @@
>> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
>> +
>> +The virtio pmem is a fake persistent memory (NVDIMM) device
>> +used to bypass the guest page cache and provide a virtio
>> +based asynchronous flush mechanism. This avoids the need
> 
> Is there anything "fake" about virtio-pmem from the perspective of the
> device interface?
> 
> What you are describing is one use case.  But on a platform that doesn't
> have existing physical NVDIMM interfaces, ACPI, etc maybe virtio-pmem
> would be used to pass through physical NVDIMMs from the host?  If you
> agree, then it might make sense to describe the device simply as a
> persistent memory device and give "fake NVDIMM" as an example use case
> in a separate paragraph.  This would make the text more future-proof.

Very good point. E.g. on x86-64, we might want to pass a real NVDIMM to
the guest using virtio-pmem, if our guest VM e.g. has no ACPI configured.

In this case, it would be helpful to indicate if the flushing interface
is to be used, or if other flushing (e.g. like NVDIMMs) is to be used.


-- 

Thanks,

David / dhildenb

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-04-10 14:42 ` Stefan Hajnoczi
  2019-04-10 14:56   ` David Hildenbrand
@ 2019-04-11  9:02   ` Pankaj Gupta
  1 sibling, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2019-04-11  9:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, mst, david, riel, dan j williams, aarcange, cohuck,
	lcapitulino, nilal


Hi Stefan,

Thank you for the review. Please find my reply inline.

> > diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> > new file mode 100644
> > index 0000000..04e07bb
> > --- /dev/null
> > +++ b/virtio-pmem.tex
> > @@ -0,0 +1,134 @@
> > +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> > +
> > +The virtio pmem is a fake persistent memory (NVDIMM) device
> > +used to bypass the guest page cache and provide a virtio
> > +based asynchronous flush mechanism. This avoids the need
> 
> Is there anything "fake" about virtio-pmem from the perspective of the
> device interface?

I think I should remove fake Keyword here.

> 
> What you are describing is one use case.  But on a platform that doesn't
> have existing physical NVDIMM interfaces, ACPI, etc maybe virtio-pmem
> would be used to pass through physical NVDIMMs from the host?  If you
> agree, then it might make sense to describe the device simply as a
> persistent memory device and give "fake NVDIMM" as an example use case
> in a separate paragraph.  This would make the text more future-proof.

Yes, good point point. I will add a separate paragraph for "fake NVDIMM"
as an example usecase. 

> 
> > +of a separate page cache in guest and keeps page cache only
> > +in the host. Under memory pressure, the host makes use of
> > +effecient memory reclaim decisions for page cache pages
> 
> efficient

o.k

> 
> Please use a spell-checker like aspell or ispell.

Sure.

> 
> > +of all the guests. This helps to reduce the memory footprint
> > +and fit more guests in the host system.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> > +  25
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> > +\begin{description}
> > +\item[0] req_vq
> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature
> > bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM
> > Device / Device configuration layout}
> > +
> > +\begin{lstlisting}
> > +struct virtio_pmem_config {
> > +	le64 start;
> > +	le64 size;
> > +};
> > +\end{lstlisting}
> > +
> > +\field{start} contains the guest physical address of the start of the
> > +physical address range to be hotplugged into the guest address space
> > +using the pmem API.
> 
> What pmem API?  This specification shouldn't be written from a Linux
> kernel internals point of view.  It should be possible to implement BSD,
> Windows, etc drivers too.

o.k. will change this to generic naming common to other operating systems.

> 
> Just saying that this is the physical address of the start of the
> persistent memory range is enough.
> 
> > +\field{size} contains the length of this address range.
> 
> "in bytes" would be nice because it makes the units explicit.

yes.

> 
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device /
> > Device Initialization}
> > +
> > +Device hotplugs physical memory to guest address space. Persistent memory
> > device
> > +is emulated with file backed memory at host side.
> 
> File-backed memory is a detail of one possible use case.  The spec
> shouldn't be tied to this single use case.

o.k. Will try to make more generic text here as well. I think I need to consider
all the use-cases with real NVDIMM and file backed memory on regular disk?  

> 
> > +
> > +\begin{enumerate}
> > +\item Guest vpmem start is read from \field{start}.
> > +\item Guest vpmem end is read from \field{size}.
> > +\end{enumerate}
> 
> "vpmem" is not defined.  Please use consistent terminology.

o.k

> 
> I suggest structuring this subsection as follows:
> 
>   Initialization proceeds as follows:
> 
>   \begin{enumerate}
>   \item The driver reads the physical start address from \field{start}.
>   \item The driver reads the length of the persistent memory range from
>   \field{size}.
>   \end{enumerate}
> 
> > +
> > +\devicenormative{\subsubsection}{Device Initialization}{Device Types /
> > PMEM Device / Device Initialization}
> > +
> > +File backed memory SHOULD be memory mapped to guest address space with
> > SHARED
> > +memory mapping.
> 
> I'm not sure this point is appropriate for the virtio spec since it's a
> device implementation detail.  This is outside the scope of the device
> interface and should be dropped.
> 
> If you'd like to describe the file-backed memory use case, please do it
> in a non-normative section and use the appropriate terminology
> (MAP_SHARED) and explanations (shared mapping so that changes will be
> written back to the file).

I omitted MAP_SHARED because I thought its not generic to other operating system 
(e.g Windows).

> 
> > +
> > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver /
> > Driver Initialization}
> > +
> > +Driver hotplugs the physical memory and registers associated
> > +region with the pmem API. Also, configures a flush callback
> > +function with the corresponding region.
> 
> There are Linux internals that don't need to be in the spec.  Some
> kernels may not have a similar pmem API with a flush callback,
> registration, etc.

Agree.

> 
> Please describe the functionality instead:
> 
>   Memory stores to the persistent memory range are not guaranteed to be
>   persistent without further action.  An explicit flush command is
>   required to ensure persistence.  The req_vq is used to perform flush
>   commands.

This is better.

> 
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct
> > access}{Device Types / PMEM Driver / Driver Initialization / Direct
> > access}
> > +
> > +Driver SHOULD enable filesystem direct access operations for
> > +read/write on the device.
> 
> This is specific to one possible use case for virtio-pmem and it's a
> driver implementation detail, not a requirement of the device interface.
> 
> Imagine a unikernel application that doesn't have a file system.  They
> just want to access the persistent memory range, they don't care about
> filesystem direct access.

Agree.

> 
> Please focus on the requirements/constraints/assumptions of the device
> interface, not the implementation details of the device emulation or the
> Linux guest driver.  That way the specification will be generally useful
> and will age better as the use cases change.

o.k

> 
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Virtio
> > flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> > +
> > +Driver SHOULD add implement a virtio based flush callback.
> 
> If flush is optional, how should a device that does not implement it
> behave:
> 1. Don't implement req_vq
> Or
> 2. Fail flush commands with a certain error code
> ?

I think this needs to be structured again after non ACPI real NVDIMM passthrough
use-case which you pointed above.

> 
> > +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
> > +when virtio flush is configured.
> 
> What does this mean?

Either Virtio based flush is enabled or alternative flush (e.g MAP_SYNC is enabled).

> 
> > +
> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver /
> > Driver Operation}
> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue
> > command}{Device Types / PMEM Driver / Driver Operation / Virtqueue
> > command}
> > +
> > +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
> > +blocks guest userspace process utill host completes fsync/flush.
> 
> userspace processes are outside the scope of the virtio specification.
> There may be no "userspace" inside the guest.  Please stick to
> describing the device interface.
> 
> I think the intention here is:
> 
>   The VIRTIO_FLUSH command persists memory writes that were performed
>   before the command was submitted.  Once the command completes the
>   writes are guaranteed to be persistent.
> 
> And the normative part is:
> 
>   The driver MUST submit a VIRTIO_FLUSH command after performing memory
>   writes that require persistence.
> 
>   The driver MUST wait for the VIRTIO_FLUSH command to complete before
>   assuming previous writes are persistent.

Thanks! this is better.

> 
> > +Driver SHOULD handle multiple fsync requests on files present
> > +on the device.
> 
> fsync is outside the scope of the device interface and, hence, this
> specification.  Perhaps:
> 
>   The driver MAY have multiple VIRTIO_FLUSH commands pending, though
>   each command completion only guarantees persistence of memory writes
>   that were performed before the command was submitted.

o.k

> 
> > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver /
> > Device Operation}
> > +
> > +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM
> > Device / Device Operation / Virtqueue flush}
> > +
> > +Device SHOULD handle multiple flush requests simultaneously using
> > +host filesystem fsync/flush call.
> 
> fsync/flush is an implementation detail.  I'll stop reviewing at this
> point to avoid repetition :).

Thanks you very much for in detail review. I will work on the suggestions.

Best regards,
Pankaj
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-04-10 14:56   ` David Hildenbrand
@ 2019-04-11  9:12     ` Pankaj Gupta
  2019-04-11  9:14       ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Pankaj Gupta @ 2019-04-11  9:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Stefan Hajnoczi, virtio-dev, mst, riel, dan j williams, aarcange,
	cohuck, lcapitulino, nilal


> >> new file mode 100644
> >> index 0000000..04e07bb
> >> --- /dev/null
> >> +++ b/virtio-pmem.tex
> >> @@ -0,0 +1,134 @@
> >> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> >> +
> >> +The virtio pmem is a fake persistent memory (NVDIMM) device
> >> +used to bypass the guest page cache and provide a virtio
> >> +based asynchronous flush mechanism. This avoids the need
> > 
> > Is there anything "fake" about virtio-pmem from the perspective of the
> > device interface?
> > 
> > What you are describing is one use case.  But on a platform that doesn't
> > have existing physical NVDIMM interfaces, ACPI, etc maybe virtio-pmem
> > would be used to pass through physical NVDIMMs from the host?  If you
> > agree, then it might make sense to describe the device simply as a
> > persistent memory device and give "fake NVDIMM" as an example use case
> > in a separate paragraph.  This would make the text more future-proof.
> 
> Very good point. E.g. on x86-64, we might want to pass a real NVDIMM to
> the guest using virtio-pmem, if our guest VM e.g. has no ACPI configured.
> 
> In this case, it would be helpful to indicate if the flushing interface
> is to be used, or if other flushing (e.g. like NVDIMMs) is to be used.
> 

Yes. Agree. This is a very good point, will consider pass-through of 
real NVDIMM device for architectures don't support ACPI.

Yes, corresponding flushing mechanism should be mentioned but I am not sure
if it should be mentioned in device specification?

Thanks,
Pankaj   

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-04-11  9:12     ` Pankaj Gupta
@ 2019-04-11  9:14       ` David Hildenbrand
  2019-04-11  9:35         ` Pankaj Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2019-04-11  9:14 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Stefan Hajnoczi, virtio-dev, mst, riel, dan j williams, aarcange,
	cohuck, lcapitulino, nilal

On 11.04.19 11:12, Pankaj Gupta wrote:
> 
>>>> new file mode 100644
>>>> index 0000000..04e07bb
>>>> --- /dev/null
>>>> +++ b/virtio-pmem.tex
>>>> @@ -0,0 +1,134 @@
>>>> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
>>>> +
>>>> +The virtio pmem is a fake persistent memory (NVDIMM) device
>>>> +used to bypass the guest page cache and provide a virtio
>>>> +based asynchronous flush mechanism. This avoids the need
>>>
>>> Is there anything "fake" about virtio-pmem from the perspective of the
>>> device interface?
>>>
>>> What you are describing is one use case.  But on a platform that doesn't
>>> have existing physical NVDIMM interfaces, ACPI, etc maybe virtio-pmem
>>> would be used to pass through physical NVDIMMs from the host?  If you
>>> agree, then it might make sense to describe the device simply as a
>>> persistent memory device and give "fake NVDIMM" as an example use case
>>> in a separate paragraph.  This would make the text more future-proof.
>>
>> Very good point. E.g. on x86-64, we might want to pass a real NVDIMM to
>> the guest using virtio-pmem, if our guest VM e.g. has no ACPI configured.
>>
>> In this case, it would be helpful to indicate if the flushing interface
>> is to be used, or if other flushing (e.g. like NVDIMMs) is to be used.
>>
> 
> Yes. Agree. This is a very good point, will consider pass-through of 
> real NVDIMM device for architectures don't support ACPI.
> 
> Yes, corresponding flushing mechanism should be mentioned but I am not sure
> if it should be mentioned in device specification?

Well there has to be a way to indicate how to flush writes, no?

> 
> Thanks,
> Pankaj   
> 


-- 

Thanks,

David / dhildenb

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
  2019-04-11  9:14       ` David Hildenbrand
@ 2019-04-11  9:35         ` Pankaj Gupta
  0 siblings, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2019-04-11  9:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Stefan Hajnoczi, virtio-dev, mst, riel, dan j williams, aarcange,
	cohuck, lcapitulino, nilal


> >>>> --- /dev/null
> >>>> +++ b/virtio-pmem.tex
> >>>> @@ -0,0 +1,134 @@
> >>>> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> >>>> +
> >>>> +The virtio pmem is a fake persistent memory (NVDIMM) device
> >>>> +used to bypass the guest page cache and provide a virtio
> >>>> +based asynchronous flush mechanism. This avoids the need
> >>>
> >>> Is there anything "fake" about virtio-pmem from the perspective of the
> >>> device interface?
> >>>
> >>> What you are describing is one use case.  But on a platform that doesn't
> >>> have existing physical NVDIMM interfaces, ACPI, etc maybe virtio-pmem
> >>> would be used to pass through physical NVDIMMs from the host?  If you
> >>> agree, then it might make sense to describe the device simply as a
> >>> persistent memory device and give "fake NVDIMM" as an example use case
> >>> in a separate paragraph.  This would make the text more future-proof.
> >>
> >> Very good point. E.g. on x86-64, we might want to pass a real NVDIMM to
> >> the guest using virtio-pmem, if our guest VM e.g. has no ACPI configured.
> >>
> >> In this case, it would be helpful to indicate if the flushing interface
> >> is to be used, or if other flushing (e.g. like NVDIMMs) is to be used.
> >>
> > 
> > Yes. Agree. This is a very good point, will consider pass-through of
> > real NVDIMM device for architectures don't support ACPI.
> > 
> > Yes, corresponding flushing mechanism should be mentioned but I am not sure
> > if it should be mentioned in device specification?
> 
> Well there has to be a way to indicate how to flush writes, no?

yes. 

Thanks,
Pankaj

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] [RFC] virtio-pmem: PMEM device spec
  2019-03-26 14:35 [virtio-dev] [RFC] virtio-pmem: PMEM device spec Pankaj Gupta
  2019-04-10 10:02 ` [virtio-dev] " Cornelia Huck
  2019-04-10 14:42 ` Stefan Hajnoczi
@ 2019-06-21 20:33 ` Michael S. Tsirkin
  2019-06-24  8:58   ` Pankaj Gupta
  2 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2019-06-21 20:33 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: virtio-dev, david, riel, stefanha, dan.j.williams, aarcange,
	cohuck, lcapitulino, nilal


On Tue, Mar 26, 2019 at 08:05:07PM +0530, Pankaj Gupta wrote:
> This patch proposes a virtio specification for new
> virtio pmem device. Virtio pmem is a paravirtualized 
> device which allows the guest to bypass page cache.
> Previous posting of kernel driver is [1] and Qemu 
> device is [2]. Specification has introduction of 
> virtio pmem device with other implementation details. 
> 
> I have also listed concerns with page cache side channel
> attacks in previous kernel driver posting [1] and 
> possible countermeasures based on discussion [4].
> I have also created an virtio-spec issue [5] for this.
> 
> Request to provide feedback on device specification.
> 
> [1] https://lkml.org/lkml/2019/1/9/471 
> [2] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
> [3] https://lkml.org/lkml/2019/1/9/541
> [4] https://lkml.org/lkml/2019/2/4/1151
> [5] https://github.com/oasis-tcs/virtio-spec/issues/38
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>


So we need to rewrite all this in terms of driver/device.
This will also help remove linux-isms.

> ---
>  conformance.tex |  14 ++++++
>  content.tex     |   3 ++
>  virtio-pmem.tex | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 151 insertions(+)
>  create mode 100644 virtio-pmem.tex
> 
> diff --git a/conformance.tex b/conformance.tex
> index ad7e82e..2133a7c 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -181,6 +181,20 @@ A socket driver MUST conform to the following normative statements:
>  \item \ref{drivernormative:Device Types / Socket Device / Device Operation / Device Events}
>  \end{itemize}
>  
> +\subsection{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance}
> +
> +A PMEM driver MUST conform to the following normative statements:
> +
> +\begin{itemize}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return}
> +\end{itemize}
> +
> +
>  \section{Device Conformance}\label{sec:Conformance / Device Conformance}
>  
>  A device MUST conform to the following normative statements:
> diff --git a/content.tex b/content.tex
> index ede0ef6..6077d32 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2634,6 +2634,8 @@ Device ID  &  Virtio Device    \\
>  \hline
>  24         &   Memory device \\
>  \hline
> +25         &   PMEM device \\
> +\hline
>  \end{tabular}
>  
>  Some of the devices above are unspecified by this document,
> @@ -5595,6 +5597,7 @@ descriptor for the \field{sense_len}, \field{residual},
>  \input{virtio-input.tex}
>  \input{virtio-crypto.tex}
>  \input{virtio-vsock.tex}
> +\input{virtio-pmem.tex}
>  
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> new file mode 100644
> index 0000000..04e07bb
> --- /dev/null
> +++ b/virtio-pmem.tex
> @@ -0,0 +1,134 @@
> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> +
> +The virtio pmem is a fake persistent memory (NVDIMM) device
> +used to bypass the guest page cache and provide a virtio
> +based asynchronous flush mechanism. This avoids the need
> +of a separate page cache in guest and keeps page cache only
> +in the host. Under memory pressure, the host makes use of
> +effecient memory reclaim decisions for page cache pages
> +of all the guests. This helps to reduce the memory footprint
> +and fit more guests in the host system.
> +
> +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> +  25
> +
> +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> +\begin{description}
> +\item[0] req_vq
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> +
> +\begin{lstlisting}
> +struct virtio_pmem_config {
> +	le64 start;
> +	le64 size;
> +};
> +\end{lstlisting}
> +
> +\field{start} contains the guest physical address of the start of the
> +physical address range to be hotplugged into the guest address space
> +using the pmem API.
> +\field{size} contains the length of this address range.
> +
> +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> +
> +Device hotplugs physical memory to guest address space. Persistent memory device
> +is emulated with file backed memory at host side.
> +
> +\begin{enumerate}
> +\item Guest vpmem start is read from \field{start}.
> +\item Guest vpmem end is read from \field{size}.
> +\end{enumerate}
> +
> +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> +
> +File backed memory SHOULD be memory mapped to guest address space with SHARED
> +memory mapping.

You mean MAP_SHARED, right?  This wording is very specific to linux, and
even then it's not all that well defined (can differ between kernel
versions).  Please try to write out what exactly it means.
I also wonder about MAP_SYNC and generally persistence across crashes.


> +
> +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> +
> +Driver hotplugs the physical memory and registers associated
> +region with the pmem API. Also, configures a flush callback
> +function with the corresponding region.
> +
> +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
> +
> +Driver SHOULD enable filesystem direct access operations for
> +read/write on the device.
> +
> +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +
> +Driver SHOULD add implement a virtio based flush callback.
> +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
> +when virtio flush is configured.
> +
> +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +
> +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
> +blocks guest userspace process utill host completes fsync/flush.

more specifically, what does this mean? I am guessing access
to device should be blocked between VIRTIO_FLUSH being available
and VIRTIO_FLUSH being used?

Also, this is the only place that mentions VIRTIO_FLUSH.
And I think VIRTIO_PMEM_FLUSH would be a better name.

> +Driver SHOULD handle multiple fsync requests on files present
> +on the device.
> +
> +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> +
> +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +
> +Device SHOULD handle multiple flush requests simultaneously using
> +host filesystem fsync/flush call.
> +
> +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
> +
> +Device SHOULD return integer '0' for success and '-1' for failure.
> +These errors are converted to corresponding error codes by guest as
> +per architecture.
> +
> +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
> +
> +There could be potential security implications depending on how
> +memory mapped host backing file is used. By default device emulation
> +is done with SHARED mapping. There is a contract between guest and host
> +process to access same backing file for read/write operations.
> +
> +If a malicious guest or host userspace map the same backing file,
> +attacking process can make use of known cache side channel attacks
> +to predict the current state of shared page cache page. If both
> +attacker and victim somehow execute same shared code after a
> +flush/evict call, with difference in execution timing attacker
> +could infer another guest local data or host data. Though this is
> +not easy and same challenges exist as with bare metal host system
> +when userspace share same backing file.

This seems a bit vague.  I think what you are trying to say is this:

if two devices actually share the same memory, this
creates a potential information leak as access patterns
of on driver could be observable by another driver.

This can happen for example if two devices are implemented
in software by a hypervisor, and two drivers are
parts of VMs running on the hypervisor.
In this case, the timing of access to device memory
might leak information about access patterns from one VM to another.

This can include, but might not be limited to:
- configurations sharing a single region of device memory (even in a read-only
  configuration)
- configurations with a shared cache between devices (e.g. linux page
  cache)
- configurations with memory deduplication techniques such as KSM 


Similar side-channels might be present if the device memory is shared
with another system, e.g. information about the hypervisor/host page cache
might leak into a VM guest.



> +
> +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
> +
> +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
> +
> +If device backing backing file is shared with multiple guests or host
> +processes, this may act as a metric for page cache side channel attack.
> +As a counter measure every guest should have its own(not shared with
> +another guest) SHARED backing file and gets populated a per host process
> +page cache pages.


I would say that the solution is to avoid sharing resources between
devices.  Then give examples.

> +
> +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
> +There maybe be chances of side channels attack with PRIVATE
> +memory mapping similar to SHARED with read-only shared mappings.
> +PRIVATE is not used for virtio pmem making this usecase
> +irrelevant.

Let's rewrite in a way that does not rely on semantics of linux
system calls. I am guessing you mean MAP_SHARED/MAP_PRIVATE,
but I am not all that sure.

> +
> +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
> +For SHARED mapping, if workload is single application inside
> +guest and there is no risk with sharing of data between guests.
> +Guest sharing same backing file with SHARED mapping can be
> +used as a valid configuration.
> +
> +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
> +Don't allow cache evict from guest filesystem trim/discard command
> +with virtio pmem. This rules out any possibility of evict-reload
> +page cache side channel attacks if backing disk is shared(SHARED)
> +with mutliple guests. Though if we use per device backing file with
> +shared mapping this countermeasure is not required.
> -- 
> 2.14.3
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] [RFC] virtio-pmem: PMEM device spec
  2019-06-21 20:33 ` [virtio-dev] " Michael S. Tsirkin
@ 2019-06-24  8:58   ` Pankaj Gupta
  0 siblings, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2019-06-24  8:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, david, riel, stefanha, dan j williams, aarcange,
	cohuck, lcapitulino, nilal


Hi Michael,

Thank you Michael for the review.

> 
> On Tue, Mar 26, 2019 at 08:05:07PM +0530, Pankaj Gupta wrote:
> > This patch proposes a virtio specification for new
> > virtio pmem device. Virtio pmem is a paravirtualized
> > device which allows the guest to bypass page cache.
> > Previous posting of kernel driver is [1] and Qemu
> > device is [2]. Specification has introduction of
> > virtio pmem device with other implementation details.
> > 
> > I have also listed concerns with page cache side channel
> > attacks in previous kernel driver posting [1] and
> > possible countermeasures based on discussion [4].
> > I have also created an virtio-spec issue [5] for this.
> > 
> > Request to provide feedback on device specification.
> > 
> > [1] https://lkml.org/lkml/2019/1/9/471
> > [2] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
> > [3] https://lkml.org/lkml/2019/1/9/541
> > [4] https://lkml.org/lkml/2019/2/4/1151
> > [5] https://github.com/oasis-tcs/virtio-spec/issues/38
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> 
> 
> So we need to rewrite all this in terms of driver/device.
> This will also help remove linux-isms.

o.k. 

> 
> > ---
> >  conformance.tex |  14 ++++++
> >  content.tex     |   3 ++
> >  virtio-pmem.tex | 134
> >  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 151 insertions(+)
> >  create mode 100644 virtio-pmem.tex
> > 
> > diff --git a/conformance.tex b/conformance.tex
> > index ad7e82e..2133a7c 100644
> > --- a/conformance.tex
> > +++ b/conformance.tex
> > @@ -181,6 +181,20 @@ A socket driver MUST conform to the following
> > normative statements:
> >  \item \ref{drivernormative:Device Types / Socket Device / Device Operation
> >  / Device Events}
> >  \end{itemize}
> >  
> > +\subsection{PMEM Driver Conformance}\label{sec:Conformance / Driver
> > Conformance / PMEM Driver Conformance}
> > +
> > +A PMEM driver MUST conform to the following normative statements:
> > +
> > +\begin{itemize}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device
> > Initialization}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver
> > Initialization / Direct access}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver
> > Initialization / Virtio flush}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation /
> > Virtqueue command}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation /
> > Virtqueue flush}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation /
> > Virtqueue return}
> > +\end{itemize}
> > +
> > +
> >  \section{Device Conformance}\label{sec:Conformance / Device Conformance}
> >  
> >  A device MUST conform to the following normative statements:
> > diff --git a/content.tex b/content.tex
> > index ede0ef6..6077d32 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -2634,6 +2634,8 @@ Device ID  &  Virtio Device    \\
> >  \hline
> >  24         &   Memory device \\
> >  \hline
> > +25         &   PMEM device \\
> > +\hline
> >  \end{tabular}
> >  
> >  Some of the devices above are unspecified by this document,
> > @@ -5595,6 +5597,7 @@ descriptor for the \field{sense_len},
> > \field{residual},
> >  \input{virtio-input.tex}
> >  \input{virtio-crypto.tex}
> >  \input{virtio-vsock.tex}
> > +\input{virtio-pmem.tex}
> >  
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  
> > diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> > new file mode 100644
> > index 0000000..04e07bb
> > --- /dev/null
> > +++ b/virtio-pmem.tex
> > @@ -0,0 +1,134 @@
> > +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> > +
> > +The virtio pmem is a fake persistent memory (NVDIMM) device
> > +used to bypass the guest page cache and provide a virtio
> > +based asynchronous flush mechanism. This avoids the need
> > +of a separate page cache in guest and keeps page cache only
> > +in the host. Under memory pressure, the host makes use of
> > +effecient memory reclaim decisions for page cache pages
> > +of all the guests. This helps to reduce the memory footprint
> > +and fit more guests in the host system.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> > +  25
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> > +\begin{description}
> > +\item[0] req_vq
> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature
> > bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM
> > Device / Device configuration layout}
> > +
> > +\begin{lstlisting}
> > +struct virtio_pmem_config {
> > +	le64 start;
> > +	le64 size;
> > +};
> > +\end{lstlisting}
> > +
> > +\field{start} contains the guest physical address of the start of the
> > +physical address range to be hotplugged into the guest address space
> > +using the pmem API.
> > +\field{size} contains the length of this address range.
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device /
> > Device Initialization}
> > +
> > +Device hotplugs physical memory to guest address space. Persistent memory
> > device
> > +is emulated with file backed memory at host side.
> > +
> > +\begin{enumerate}
> > +\item Guest vpmem start is read from \field{start}.
> > +\item Guest vpmem end is read from \field{size}.
> > +\end{enumerate}
> > +
> > +\devicenormative{\subsubsection}{Device Initialization}{Device Types /
> > PMEM Device / Device Initialization}
> > +
> > +File backed memory SHOULD be memory mapped to guest address space with
> > SHARED
> > +memory mapping.
> 
> You mean MAP_SHARED, right?  This wording is very specific to linux, and

yes. I will change it.

> even then it's not all that well defined (can differ between kernel
> versions).  

o.k

> Please try to write out what exactly it means.

Sure. Sounds good.

> I also wonder about MAP_SYNC and generally persistence across crashes.

MAP_SYNC is not applicable for current functionality as in this spec we
are proposing pmem emulation on regular storage at host side. For data integrity
host fsync will take care. Though we plan to support real NVDIMM with virtio-pmem
in future, for that we need to document MAP_SYNC.

> 
> 
> > +
> > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver /
> > Driver Initialization}
> > +
> > +Driver hotplugs the physical memory and registers associated
> > +region with the pmem API. Also, configures a flush callback
> > +function with the corresponding region.
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct
> > access}{Device Types / PMEM Driver / Driver Initialization / Direct
> > access}
> > +
> > +Driver SHOULD enable filesystem direct access operations for
> > +read/write on the device.
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Virtio
> > flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> > +
> > +Driver SHOULD add implement a virtio based flush callback.
> > +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
> > +when virtio flush is configured.
> > +
> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver /
> > Driver Operation}
> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue
> > command}{Device Types / PMEM Driver / Driver Operation / Virtqueue
> > command}
> > +
> > +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
> > +blocks guest userspace process utill host completes fsync/flush.
> 
> more specifically, what does this mean? I am guessing access
> to device should be blocked between VIRTIO_FLUSH being available
> and VIRTIO_FLUSH being used?
> 
> Also, this is the only place that mentions VIRTIO_FLUSH.
> And I think VIRTIO_PMEM_FLUSH would be a better name.

Sure.

> 
> > +Driver SHOULD handle multiple fsync requests on files present
> > +on the device.
> > +
> > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver /
> > Device Operation}
> > +
> > +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM
> > Device / Device Operation / Virtqueue flush}
> > +
> > +Device SHOULD handle multiple flush requests simultaneously using
> > +host filesystem fsync/flush call.
> > +
> > +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM
> > Device / Device Operation / Virtqueue return}
> > +
> > +Device SHOULD return integer '0' for success and '-1' for failure.
> > +These errors are converted to corresponding error codes by guest as
> > +per architecture.
> > +
> > +\subsection{Possible security implications}\label{sec:Device Types / PMEM
> > Device / Possible Security Implications}
> > +
> > +There could be potential security implications depending on how
> > +memory mapped host backing file is used. By default device emulation
> > +is done with SHARED mapping. There is a contract between guest and host
> > +process to access same backing file for read/write operations.
> > +
> > +If a malicious guest or host userspace map the same backing file,
> > +attacking process can make use of known cache side channel attacks
> > +to predict the current state of shared page cache page. If both
> > +attacker and victim somehow execute same shared code after a
> > +flush/evict call, with difference in execution timing attacker
> > +could infer another guest local data or host data. Though this is
> > +not easy and same challenges exist as with bare metal host system
> > +when userspace share same backing file.
> 
> This seems a bit vague.  I think what you are trying to say is this:
> 
> if two devices actually share the same memory, this
> creates a potential information leak as access patterns
> of on driver could be observable by another driver.
> 
> This can happen for example if two devices are implemented
> in software by a hypervisor, and two drivers are
> parts of VMs running on the hypervisor.
> In this case, the timing of access to device memory
> might leak information about access patterns from one VM to another.
> 
> This can include, but might not be limited to:
> - configurations sharing a single region of device memory (even in a
> read-only
>   configuration)
> - configurations with a shared cache between devices (e.g. linux page
>   cache)
> - configurations with memory deduplication techniques such as KSM
> 
> 
> Similar side-channels might be present if the device memory is shared
> with another system, e.g. information about the hypervisor/host page cache
> might leak into a VM guest.

This is better. Thank you.

> 
> 
> 
> > +
> > +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device /
> > Possible Security Implications / Countermeasures}
> > +
> > +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device
> > / Possible Security Implications / Countermeasures / SHARED}
> > +
> > +If device backing backing file is shared with multiple guests or host
> > +processes, this may act as a metric for page cache side channel attack.
> > +As a counter measure every guest should have its own(not shared with
> > +another guest) SHARED backing file and gets populated a per host process
> > +page cache pages.
> 
> 
> I would say that the solution is to avoid sharing resources between
> devices.  Then give examples.

Sure.

> 
> > +
> > +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device
> > / Possible Security Implications / Countermeasures / PRIVATE}
> > +There maybe be chances of side channels attack with PRIVATE
> > +memory mapping similar to SHARED with read-only shared mappings.
> > +PRIVATE is not used for virtio pmem making this usecase
> > +irrelevant.
> 
> Let's rewrite in a way that does not rely on semantics of linux
> system calls. I am guessing you mean MAP_SHARED/MAP_PRIVATE,
> but I am not all that sure.

Yes, I intentionally removed MAP_* keyword as I was not sure about non linux
variants. As suggested will explain the meaning than naming which might be
specific to any particular version/OS etc. 

A very good suggestion.

> 
> > +
> > +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM
> > Device / Possible Security Implications / Countermeasures / Workload}
> > +For SHARED mapping, if workload is single application inside
> > +guest and there is no risk with sharing of data between guests.
> > +Guest sharing same backing file with SHARED mapping can be
> > +used as a valid configuration.
> > +
> > +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM
> > Device / Possible Security Implications / Countermeasures / Cache
> > eviction}
> > +Don't allow cache evict from guest filesystem trim/discard command
> > +with virtio pmem. This rules out any possibility of evict-reload
> > +page cache side channel attacks if backing disk is shared(SHARED)
> > +with mutliple guests. Though if we use per device backing file with
> > +shared mapping this countermeasure is not required.
> > --
> > 2.14.3

Best regards,
Pankaj


> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-06-24  8:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-26 14:35 [virtio-dev] [RFC] virtio-pmem: PMEM device spec Pankaj Gupta
2019-04-10 10:02 ` [virtio-dev] " Cornelia Huck
2019-04-10 10:33   ` Pankaj Gupta
2019-04-10 14:42 ` Stefan Hajnoczi
2019-04-10 14:56   ` David Hildenbrand
2019-04-11  9:12     ` Pankaj Gupta
2019-04-11  9:14       ` David Hildenbrand
2019-04-11  9:35         ` Pankaj Gupta
2019-04-11  9:02   ` Pankaj Gupta
2019-06-21 20:33 ` [virtio-dev] " Michael S. Tsirkin
2019-06-24  8:58   ` Pankaj Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.