From: Pankaj Gupta <pagupta@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtio-dev@lists.oasis-open.org, mst@redhat.com,
david@redhat.com, riel@surriel.com,
dan j williams <dan.j.williams@intel.com>,
aarcange@redhat.com, cohuck@redhat.com, lcapitulino@redhat.com,
nilal@redhat.com
Subject: [virtio-dev] Re: [RFC] virtio-pmem: PMEM device spec
Date: Thu, 11 Apr 2019 05:02:33 -0400 (EDT) [thread overview]
Message-ID: <952782165.20990220.1554973353161.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20190410144245.GD9868@stefanha-x1.localdomain>
Hi Stefan,
Thank you for the review. Please find my reply inline.
> > diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> > new file mode 100644
> > index 0000000..04e07bb
> > --- /dev/null
> > +++ b/virtio-pmem.tex
> > @@ -0,0 +1,134 @@
> > +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> > +
> > +The virtio pmem is a fake persistent memory (NVDIMM) device
> > +used to bypass the guest page cache and provide a virtio
> > +based asynchronous flush mechanism. This avoids the need
>
> Is there anything "fake" about virtio-pmem from the perspective of the
> device interface?
I think I should remove fake Keyword here.
>
> What you are describing is one use case. But on a platform that doesn't
> have existing physical NVDIMM interfaces, ACPI, etc maybe virtio-pmem
> would be used to pass through physical NVDIMMs from the host? If you
> agree, then it might make sense to describe the device simply as a
> persistent memory device and give "fake NVDIMM" as an example use case
> in a separate paragraph. This would make the text more future-proof.
Yes, good point point. I will add a separate paragraph for "fake NVDIMM"
as an example usecase.
>
> > +of a separate page cache in guest and keeps page cache only
> > +in the host. Under memory pressure, the host makes use of
> > +effecient memory reclaim decisions for page cache pages
>
> efficient
o.k
>
> Please use a spell-checker like aspell or ispell.
Sure.
>
> > +of all the guests. This helps to reduce the memory footprint
> > +and fit more guests in the host system.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> > + 25
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> > +\begin{description}
> > +\item[0] req_vq
> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature
> > bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM
> > Device / Device configuration layout}
> > +
> > +\begin{lstlisting}
> > +struct virtio_pmem_config {
> > + le64 start;
> > + le64 size;
> > +};
> > +\end{lstlisting}
> > +
> > +\field{start} contains the guest physical address of the start of the
> > +physical address range to be hotplugged into the guest address space
> > +using the pmem API.
>
> What pmem API? This specification shouldn't be written from a Linux
> kernel internals point of view. It should be possible to implement BSD,
> Windows, etc drivers too.
o.k. will change this to generic naming common to other operating systems.
>
> Just saying that this is the physical address of the start of the
> persistent memory range is enough.
>
> > +\field{size} contains the length of this address range.
>
> "in bytes" would be nice because it makes the units explicit.
yes.
>
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device /
> > Device Initialization}
> > +
> > +Device hotplugs physical memory to guest address space. Persistent memory
> > device
> > +is emulated with file backed memory at host side.
>
> File-backed memory is a detail of one possible use case. The spec
> shouldn't be tied to this single use case.
o.k. Will try to make more generic text here as well. I think I need to consider
all the use-cases with real NVDIMM and file backed memory on regular disk?
>
> > +
> > +\begin{enumerate}
> > +\item Guest vpmem start is read from \field{start}.
> > +\item Guest vpmem end is read from \field{size}.
> > +\end{enumerate}
>
> "vpmem" is not defined. Please use consistent terminology.
o.k
>
> I suggest structuring this subsection as follows:
>
> Initialization proceeds as follows:
>
> \begin{enumerate}
> \item The driver reads the physical start address from \field{start}.
> \item The driver reads the length of the persistent memory range from
> \field{size}.
> \end{enumerate}
>
> > +
> > +\devicenormative{\subsubsection}{Device Initialization}{Device Types /
> > PMEM Device / Device Initialization}
> > +
> > +File backed memory SHOULD be memory mapped to guest address space with
> > SHARED
> > +memory mapping.
>
> I'm not sure this point is appropriate for the virtio spec since it's a
> device implementation detail. This is outside the scope of the device
> interface and should be dropped.
>
> If you'd like to describe the file-backed memory use case, please do it
> in a non-normative section and use the appropriate terminology
> (MAP_SHARED) and explanations (shared mapping so that changes will be
> written back to the file).
I omitted MAP_SHARED because I thought its not generic to other operating system
(e.g Windows).
>
> > +
> > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver /
> > Driver Initialization}
> > +
> > +Driver hotplugs the physical memory and registers associated
> > +region with the pmem API. Also, configures a flush callback
> > +function with the corresponding region.
>
> There are Linux internals that don't need to be in the spec. Some
> kernels may not have a similar pmem API with a flush callback,
> registration, etc.
Agree.
>
> Please describe the functionality instead:
>
> Memory stores to the persistent memory range are not guaranteed to be
> persistent without further action. An explicit flush command is
> required to ensure persistence. The req_vq is used to perform flush
> commands.
This is better.
>
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct
> > access}{Device Types / PMEM Driver / Driver Initialization / Direct
> > access}
> > +
> > +Driver SHOULD enable filesystem direct access operations for
> > +read/write on the device.
>
> This is specific to one possible use case for virtio-pmem and it's a
> driver implementation detail, not a requirement of the device interface.
>
> Imagine a unikernel application that doesn't have a file system. They
> just want to access the persistent memory range, they don't care about
> filesystem direct access.
Agree.
>
> Please focus on the requirements/constraints/assumptions of the device
> interface, not the implementation details of the device emulation or the
> Linux guest driver. That way the specification will be generally useful
> and will age better as the use cases change.
o.k
>
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Virtio
> > flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> > +
> > +Driver SHOULD add implement a virtio based flush callback.
>
> If flush is optional, how should a device that does not implement it
> behave:
> 1. Don't implement req_vq
> Or
> 2. Fail flush commands with a certain error code
> ?
I think this needs to be structured again after non ACPI real NVDIMM passthrough
use-case which you pointed above.
>
> > +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device
> > +when virtio flush is configured.
>
> What does this mean?
Either Virtio based flush is enabled or alternative flush (e.g MAP_SYNC is enabled).
>
> > +
> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver /
> > Driver Operation}
> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue
> > command}{Device Types / PMEM Driver / Driver Operation / Virtqueue
> > command}
> > +
> > +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue,
> > +blocks guest userspace process utill host completes fsync/flush.
>
> userspace processes are outside the scope of the virtio specification.
> There may be no "userspace" inside the guest. Please stick to
> describing the device interface.
>
> I think the intention here is:
>
> The VIRTIO_FLUSH command persists memory writes that were performed
> before the command was submitted. Once the command completes the
> writes are guaranteed to be persistent.
>
> And the normative part is:
>
> The driver MUST submit a VIRTIO_FLUSH command after performing memory
> writes that require persistence.
>
> The driver MUST wait for the VIRTIO_FLUSH command to complete before
> assuming previous writes are persistent.
Thanks! this is better.
>
> > +Driver SHOULD handle multiple fsync requests on files present
> > +on the device.
>
> fsync is outside the scope of the device interface and, hence, this
> specification. Perhaps:
>
> The driver MAY have multiple VIRTIO_FLUSH commands pending, though
> each command completion only guarantees persistence of memory writes
> that were performed before the command was submitted.
o.k
>
> > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver /
> > Device Operation}
> > +
> > +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM
> > Device / Device Operation / Virtqueue flush}
> > +
> > +Device SHOULD handle multiple flush requests simultaneously using
> > +host filesystem fsync/flush call.
>
> fsync/flush is an implementation detail. I'll stop reviewing at this
> point to avoid repetition :).
Thanks you very much for in detail review. I will work on the suggestions.
Best regards,
Pankaj
>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
next prev parent reply other threads:[~2019-04-11 9:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-26 14:35 [virtio-dev] [RFC] virtio-pmem: PMEM device spec Pankaj Gupta
2019-04-10 10:02 ` [virtio-dev] " Cornelia Huck
2019-04-10 10:33 ` Pankaj Gupta
2019-04-10 14:42 ` Stefan Hajnoczi
2019-04-10 14:56 ` David Hildenbrand
2019-04-11 9:12 ` Pankaj Gupta
2019-04-11 9:14 ` David Hildenbrand
2019-04-11 9:35 ` Pankaj Gupta
2019-04-11 9:02 ` Pankaj Gupta [this message]
2019-06-21 20:33 ` [virtio-dev] " Michael S. Tsirkin
2019-06-24 8:58 ` Pankaj Gupta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=952782165.20990220.1554973353161.JavaMail.zimbra@redhat.com \
--to=pagupta@redhat.com \
--cc=aarcange@redhat.com \
--cc=cohuck@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=mst@redhat.com \
--cc=nilal@redhat.com \
--cc=riel@surriel.com \
--cc=stefanha@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.