public inbox for virtio-comment@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
@ 2025-03-31 21:37 Sergio Lopez
  2025-03-31 21:37 ` [PATCH v3 1/3] " Sergio Lopez
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Sergio Lopez @ 2025-03-31 21:37 UTC (permalink / raw)
  To: virtio-comment; +Cc: mst, dmitry.osipenko, parav, Sergio Lopez

There's an incresing number of machines supporting multiple page sizes
and, on these machines, the host and a guest can be running with
different pages sizes.

In addition to this, there might be devices that have a required and/or
preferred page size for mapping memory.

In this series we extend the "Shared Memory Regions" with a subsection
explaining the posible existence of page alignment restrictions when
the VIRTIO_F_SHM_PAGE_SIZE feature has been negotiated.

For the device to provide the page size information to the driver, we
need to extend the PCI and MMIO transports. For the former, we borrow
8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift
field which can be used to derive the page size by using the following
formula: (page_size = 1 << (page_shift + 12)).

For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
holding the page_shift value. Since MMIO registers are 32 bit wide, we
could have asked the device to directly provide page_size instead of
page_shift, but seems reasonable to be consistent across transports.

An implementation of the changes proposed in this series has been
published as an RFC to the LKML, to be used as a reference:

https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1619e6908b@redhat.com/

v3:
 - Reintroduce the VIRTIO_F_SHM_PAGE_SIZE feature bit, but limit its
   scope to page alignment restrictions, still exposing page shift
   data in the transports unconditionally (thanks Michael S. Tsirkin)
 - Adjust the feature bits to reflect we're using another one (thanks
   Parav Pandit).

v2:
 - Remove the VIRTIO_F_SHM_PAGE_SIZE feature bit, exposing page_shift
   in the transports unconditionally (thanks Parav Pandit).
 - Didn't pick up R-b due to the significant change between revisions.

Sergio Lopez (3):
  shared-mem: introduce page alignment restrictions
  transport-pci: introduce page_shift field for SHM
  transport-mmio: introduce SHMPageShift register

 content.tex        | 10 ++++++++--
 shared-mem.tex     |  7 +++++++
 transport-mmio.tex |  8 ++++++++
 transport-pci.tex  | 10 +++++++++-
 4 files changed, 32 insertions(+), 3 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/3] shared-mem: introduce page alignment restrictions
  2025-03-31 21:37 [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Sergio Lopez
@ 2025-03-31 21:37 ` Sergio Lopez
  2025-04-01 16:42   ` Daniel Verkamp
  2025-04-02  7:32   ` Michael S. Tsirkin
  2025-03-31 21:37 ` [PATCH v3 2/3] transport-pci: introduce page_shift field for SHM Sergio Lopez
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Sergio Lopez @ 2025-03-31 21:37 UTC (permalink / raw)
  To: virtio-comment; +Cc: mst, dmitry.osipenko, parav, Sergio Lopez

Add a subsection for page alignment restrictions and introduce the
VIRTIO_F_SHM_PAGE_SIZE feature bit.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 content.tex    | 10 ++++++++--
 shared-mem.tex |  7 +++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/content.tex b/content.tex
index c17ffa6..c8826a2 100644
--- a/content.tex
+++ b/content.tex
@@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
 \begin{description}
 \item[0 to 23, and 50 to 127] Feature bits for the specific device type
 
-\item[24 to 41] Feature bits reserved for extensions to the queue and
+\item[24 to 42] Feature bits reserved for extensions to the queue and
   feature negotiation mechanisms, see \ref{sec:Reserved Feature Bits}
 
-\item[42 to 49, and 128 and above] Feature bits reserved for future extensions.
+\item[43 to 49, and 128 and above] Feature bits reserved for future extensions.
 \end{description}
 
 \begin{note}
@@ -872,6 +872,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
 	handling features reserved for future use.
 
+  \item[VIRTIO_F_SHM_PAGE_SIZE(42)] This feature indicates that the device
+  transport provides information about the supported page size and expects the
+  driver to honor the alignment restrictions derived from it when requesting
+  mappings on the corresponding shared memory region.
+  See also \ref {sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
diff --git a/shared-mem.tex b/shared-mem.tex
index 6e6f6c4..dd90cb7 100644
--- a/shared-mem.tex
+++ b/shared-mem.tex
@@ -34,6 +34,13 @@ \subsection{Addressing within regions}\label{sec:Basic Facilities of a Virtio De
 The \field{shmid} may be explicit or may be inferred from the
 context of the reference.
 
+\subsection{Page alignment restrictions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}
+
+If VIRTIO_F_SHM_PAGE_SIZE has been negotiated, when requesting
+the device to map memory into a shared memory region, the driver
+MUST obtain the page size information from the transport and honor
+the page alignment constrains derived from that page size.
+
 \devicenormative{\subsection}{Shared Memory Regions}{Basic Facilities of a Virtio
 Device / Shared Memory Regions}
 Shared memory regions MUST NOT expose shared memory regions which
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/3] transport-pci: introduce page_shift field for SHM
  2025-03-31 21:37 [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Sergio Lopez
  2025-03-31 21:37 ` [PATCH v3 1/3] " Sergio Lopez
@ 2025-03-31 21:37 ` Sergio Lopez
  2025-03-31 21:37 ` [PATCH v3 3/3] transport-mmio: introduce SHMPageShift register Sergio Lopez
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Sergio Lopez @ 2025-03-31 21:37 UTC (permalink / raw)
  To: virtio-comment; +Cc: mst, dmitry.osipenko, parav, Sergio Lopez

Borrowing 8 bits from the 16 bit padding in virtio_pci_cap to hold
a page_shift field which is used to derive the page size supported
by the device.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 transport-pci.tex | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..efb174a 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -147,7 +147,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
         u8 cfg_type;    /* Identifies the structure. */
         u8 bar;         /* Where to find it. */
         u8 id;          /* Multiple capabilities of the same type */
-        u8 padding[2];  /* Pad to full dword. */
+        u8 page_shift;  /* Page shift for this shared memory region. */
+        u8 padding[1];  /* Pad to full dword. */
         le32 offset;    /* Offset within bar. */
         le32 length;    /* Length of the structure, in bytes. */
 };
@@ -222,6 +223,13 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
         of a certain type. If the device type does not specify the meaning of
         this field, its contents are undefined.
 
+\item[\field{page_shift}]
+        If \field{cfg_type} is set to VIRTIO_PCI_CAP_SHARED_MEMORY_CFG, the
+        device MUST provide the page shift derived from the supported page size
+        for this shared memory region. The page shift is derived from the page
+        size by using this formula:
+        $page size = 1 << (page shift + 12)$.
+        See also \ref {sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}.
 
 \item[\field{offset}]
         indicates where the structure begins relative to the base address associated
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/3] transport-mmio: introduce SHMPageShift register
  2025-03-31 21:37 [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Sergio Lopez
  2025-03-31 21:37 ` [PATCH v3 1/3] " Sergio Lopez
  2025-03-31 21:37 ` [PATCH v3 2/3] transport-pci: introduce page_shift field for SHM Sergio Lopez
@ 2025-03-31 21:37 ` Sergio Lopez
  2025-04-02  7:37 ` [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Michael S. Tsirkin
  2025-04-02  8:54 ` Parav Pandit
  4 siblings, 0 replies; 14+ messages in thread
From: Sergio Lopez @ 2025-03-31 21:37 UTC (permalink / raw)
  To: virtio-comment; +Cc: mst, dmitry.osipenko, parav, Sergio Lopez

Introduce the "SHMPageShift" register which, when read, returns the
page shift value that can be used to derive the page size value
supported by the shared memory regions in the device.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 transport-mmio.tex | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/transport-mmio.tex b/transport-mmio.tex
index 94a93a1..042a82c 100644
--- a/transport-mmio.tex
+++ b/transport-mmio.tex
@@ -242,6 +242,14 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Vi
     apply to the queue selected by writing to \field{QueueSel}.
   }
   \hline
+  \mmioreg{SHMPageShift}{Shared memory region page shift 8 bit long}{0x0c4}{R}{%
+    Reading from this register returns the page shift derived from the supported
+    page size for the memory region selected by the \field{SHMSel} register. The
+    page shift is derived from the page size by using this formula:
+    $page size = 1 << (page shift + 12)$.
+    See also \ref {sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}.
+  }
+  \hline
   \mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{
     Reading from this register returns a value describing a version of the device-specific configuration space (see \field{Config}).
     The driver can then access the configuration space and, when finished, read \field{ConfigGeneration} again.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] shared-mem: introduce page alignment restrictions
  2025-03-31 21:37 ` [PATCH v3 1/3] " Sergio Lopez
@ 2025-04-01 16:42   ` Daniel Verkamp
  2025-04-02  7:32   ` Michael S. Tsirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Daniel Verkamp @ 2025-04-01 16:42 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: virtio-comment, mst, dmitry.osipenko, parav

On Tue, Apr 1, 2025 at 4:29 AM Sergio Lopez <slp@redhat.com> wrote:
>
> Add a subsection for page alignment restrictions and introduce the
> VIRTIO_F_SHM_PAGE_SIZE feature bit.
>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
[...]
> +  \item[VIRTIO_F_SHM_PAGE_SIZE(42)] This feature indicates that the device
> +  transport provides information about the supported page size and expects the
> +  driver to honor the alignment restrictions derived from it when requesting
> +  mappings on the corresponding shared memory region.
> +  See also \ref {sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}.

This wording seems reasonable, although I wonder if it would make
sense to avoid "page size" and instead use "minimum mapping
granularity" or something along those lines; there may be no actual
"page size" shared between the device and driver. That's just my
opinion, though, and I don't have a strong opinion about it, so if
"page size" has consensus, that's fine.

> +\subsection{Page alignment restrictions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}
> +
> +If VIRTIO_F_SHM_PAGE_SIZE has been negotiated, when requesting
> +the device to map memory into a shared memory region, the driver
> +MUST obtain the page size information from the transport and honor
> +the page alignment constrains derived from that page size.

nit: "contrains" -> "constraints"

Thanks,
-- Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] shared-mem: introduce page alignment restrictions
  2025-03-31 21:37 ` [PATCH v3 1/3] " Sergio Lopez
  2025-04-01 16:42   ` Daniel Verkamp
@ 2025-04-02  7:32   ` Michael S. Tsirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2025-04-02  7:32 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: virtio-comment, dmitry.osipenko, parav

On Mon, Mar 31, 2025 at 05:37:09PM -0400, Sergio Lopez wrote:
> Add a subsection for page alignment restrictions and introduce the
> VIRTIO_F_SHM_PAGE_SIZE feature bit.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  content.tex    | 10 ++++++++--
>  shared-mem.tex |  7 +++++++
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index c17ffa6..c8826a2 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23, and 50 to 127] Feature bits for the specific device type
>  
> -\item[24 to 41] Feature bits reserved for extensions to the queue and
> +\item[24 to 42] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms, see \ref{sec:Reserved Feature Bits}
>  
> -\item[42 to 49, and 128 and above] Feature bits reserved for future extensions.
> +\item[43 to 49, and 128 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -872,6 +872,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>  	handling features reserved for future use.
>  
> +  \item[VIRTIO_F_SHM_PAGE_SIZE(42)] This feature indicates that the device
> +  transport provides information about the supported page size and expects the
> +  driver to honor the alignment restrictions derived from it when requesting
> +  mappings on the corresponding shared memory region.
> +  See also \ref {sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> diff --git a/shared-mem.tex b/shared-mem.tex
> index 6e6f6c4..dd90cb7 100644
> --- a/shared-mem.tex
> +++ b/shared-mem.tex
> @@ -34,6 +34,13 @@ \subsection{Addressing within regions}\label{sec:Basic Facilities of a Virtio De
>  The \field{shmid} may be explicit or may be inferred from the
>  context of the reference.
>  
> +\subsection{Page alignment restrictions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions / Page alignment restrictions}
> +
> +If VIRTIO_F_SHM_PAGE_SIZE has been negotiated, when requesting
> +the device to map memory into a shared memory region, the driver

the shared memory region - it is this specific one.

> +MUST obtain the page size information from the transport

avoid MUST. Also not "from the transport". In a transport-specific way.

> and honor
> +the page alignment constrains derived from that page size.

Also explain here it is only supported for pci for now, to make
sure people do not go looking for it on other transports.

> +
>  \devicenormative{\subsection}{Shared Memory Regions}{Basic Facilities of a Virtio
>  Device / Shared Memory Regions}
>  Shared memory regions MUST NOT expose shared memory regions which



I have a problem with this wording in that "map" and "request mapping"
is actually a device type specific thing. Someone reading this
paragraph will not know what this term means. Note also that the
term "map" has two meanings, e.g. device-types/fs/description.tex
says 
	In cases where data transfer is
	undesirable, the device can map file contents into the DAX window shared memory
	region. 
but also
	Drivers map this shared
	memory region with writeback caching as if it were regular RAM. 

I am guessing the "request mapping" refers to the 1st but not the 2nd
type of map, but neither mentions "request mapping".

My suggestions, but really just suggestions:

- update device specific text to mention alignment where appropriate,
  using MUST there.

- Remove the MUST here, it is not good to have MUST outside non
conformance sections anyway.

- Avoid saying map since that is not
a generic term.
Just say generally
"information about the supported page size" and put all
the detail in device specific sections.
Also "device expects" is really a weird way to put things.
device does abc. driver does abc. that is all.

E.g. 
	When VIRTIO_F_SHM_PAGE_SIZE is negotiated, driver aligns
	device requests
	using shared memory regions on the supported page size of each
	memory region, in a device-specific way. The supported
	page size is retrieved in a transport-specific way. This feature
	is only supported for the PCI transport (see ref to the relevant section).


And add device specific text where it applies.


> -- 
> 2.49.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-03-31 21:37 [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Sergio Lopez
                   ` (2 preceding siblings ...)
  2025-03-31 21:37 ` [PATCH v3 3/3] transport-mmio: introduce SHMPageShift register Sergio Lopez
@ 2025-04-02  7:37 ` Michael S. Tsirkin
  2025-04-02  9:18   ` Sergio Lopez Pascual
  2025-04-02  8:54 ` Parav Pandit
  4 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2025-04-02  7:37 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: virtio-comment, dmitry.osipenko, parav

On Mon, Mar 31, 2025 at 05:37:08PM -0400, Sergio Lopez wrote:
> There's an incresing number of machines supporting multiple page sizes
> and, on these machines, the host and a guest can be running with
> different pages sizes.
> 
> In addition to this, there might be devices that have a required and/or
> preferred page size for mapping memory.
> 
> In this series we extend the "Shared Memory Regions" with a subsection
> explaining the posible existence of page alignment restrictions when
> the VIRTIO_F_SHM_PAGE_SIZE feature has been negotiated.
> 
> For the device to provide the page size information to the driver, we
> need to extend the PCI and MMIO transports. For the former, we borrow
> 8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift
> field which can be used to derive the page size by using the following
> formula: (page_size = 1 << (page_shift + 12)).
> 
> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
> holding the page_shift value. Since MMIO registers are 32 bit wide, we
> could have asked the device to directly provide page_size instead of
> page_shift, but seems reasonable to be consistent across transports.
> 
> An implementation of the changes proposed in this series has been
> published as an RFC to the LKML, to be used as a reference:
> 
> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1619e6908b@redhat.com/


Can you explain the use a bit more please?

- looks like this is exposed to userspace?
  but userspace can not be trusted not to violate the spec.
  how can that be sufficient to satisfy MUST requirements?

  what are these mappings and what are the alignment restrictions,
  in fact?


Looking at that patch, I begin to suspect that while the spec patch talks about
device mapping requests, it is actually the other way around:
the restriction is on guest virtual to guest physical mappings?


Does the feature affect other device types besides gpu, and how?







> v3:
>  - Reintroduce the VIRTIO_F_SHM_PAGE_SIZE feature bit, but limit its
>    scope to page alignment restrictions, still exposing page shift
>    data in the transports unconditionally (thanks Michael S. Tsirkin)
>  - Adjust the feature bits to reflect we're using another one (thanks
>    Parav Pandit).
> 
> v2:
>  - Remove the VIRTIO_F_SHM_PAGE_SIZE feature bit, exposing page_shift
>    in the transports unconditionally (thanks Parav Pandit).
>  - Didn't pick up R-b due to the significant change between revisions.
> 
> Sergio Lopez (3):
>   shared-mem: introduce page alignment restrictions
>   transport-pci: introduce page_shift field for SHM
>   transport-mmio: introduce SHMPageShift register
> 
>  content.tex        | 10 ++++++++--
>  shared-mem.tex     |  7 +++++++
>  transport-mmio.tex |  8 ++++++++
>  transport-pci.tex  | 10 +++++++++-
>  4 files changed, 32 insertions(+), 3 deletions(-)
> 
> -- 
> 2.49.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-03-31 21:37 [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Sergio Lopez
                   ` (3 preceding siblings ...)
  2025-04-02  7:37 ` [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Michael S. Tsirkin
@ 2025-04-02  8:54 ` Parav Pandit
  4 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2025-04-02  8:54 UTC (permalink / raw)
  To: Sergio Lopez, virtio-comment@lists.linux.dev
  Cc: mst@redhat.com, dmitry.osipenko@collabora.com


> From: Sergio Lopez <slp@redhat.com>
> Sent: Tuesday, April 1, 2025 3:07 AM
> 
> There's an incresing number of machines supporting multiple page sizes and,
> on these machines, the host and a guest can be running with different pages
> sizes.
> 
> In addition to this, there might be devices that have a required and/or
> preferred page size for mapping memory.
> 
I do not well understand the proposal here.
If we take PCI transport as an example, a device has some N bytes of memory.
This memory is shared with the driver by exposing its location in the PCI BAR via the capability.
The location in the BAR may need to have some alignment needs based on how driver wants to use it.
For example, within kernel or with user space.

One example, I could think of is, to map this BAR to the user space process for read/write access.
Each cpu and OS may have different restrictions on the mmap() size to map this BAR memory to user space.
As an example, ARM + OS may have 64KB page size and restricts mmap() to minimum aligned to 64KB.
Intel cpu may do 4KB etc.

To my knowledge partial mapping in cpus is not possible for IOVA to PA. So BAR must be sized and aligned to this memory. 
Please correct me if this is incorrect.

Assuming such limitation exists, device does not have the knowledge on which cpu and/or OS, its running.
So it would just end up taking alignment restrictions of 64KB or similar as worst case.
And I don’t see the need of any alignment negotiation.

Or are you trying to solve completely different problem?

Can you please explain the motivation and use of what you propose in this series.
I am bit lost here. :(

> In this series we extend the "Shared Memory Regions" with a subsection
> explaining the posible existence of page alignment restrictions when the
> VIRTIO_F_SHM_PAGE_SIZE feature has been negotiated.
> 
> For the device to provide the page size information to the driver, we need to
> extend the PCI and MMIO transports. For the former, we borrow
> 8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift field
> which can be used to derive the page size by using the following
> formula: (page_size = 1 << (page_shift + 12)).
> 
> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also holding
> the page_shift value. Since MMIO registers are 32 bit wide, we could have
> asked the device to directly provide page_size instead of page_shift, but
> seems reasonable to be consistent across transports.
> 
> An implementation of the changes proposed in this series has been published
> as an RFC to the LKML, to be used as a reference:
> 
> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-
> aa1619e6908b@redhat.com/
> 
> v3:
>  - Reintroduce the VIRTIO_F_SHM_PAGE_SIZE feature bit, but limit its
>    scope to page alignment restrictions, still exposing page shift
>    data in the transports unconditionally (thanks Michael S. Tsirkin)
>  - Adjust the feature bits to reflect we're using another one (thanks
>    Parav Pandit).
> 
> v2:
>  - Remove the VIRTIO_F_SHM_PAGE_SIZE feature bit, exposing page_shift
>    in the transports unconditionally (thanks Parav Pandit).
>  - Didn't pick up R-b due to the significant change between revisions.
> 
> Sergio Lopez (3):
>   shared-mem: introduce page alignment restrictions
>   transport-pci: introduce page_shift field for SHM
>   transport-mmio: introduce SHMPageShift register
> 
>  content.tex        | 10 ++++++++--
>  shared-mem.tex     |  7 +++++++
>  transport-mmio.tex |  8 ++++++++
>  transport-pci.tex  | 10 +++++++++-
>  4 files changed, 32 insertions(+), 3 deletions(-)
> 
> --
> 2.49.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-04-02  7:37 ` [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Michael S. Tsirkin
@ 2025-04-02  9:18   ` Sergio Lopez Pascual
  2025-04-02  9:55     ` Parav Pandit
  0 siblings, 1 reply; 14+ messages in thread
From: Sergio Lopez Pascual @ 2025-04-02  9:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-comment, dmitry.osipenko, parav

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Mon, Mar 31, 2025 at 05:37:08PM -0400, Sergio Lopez wrote:
>> There's an incresing number of machines supporting multiple page sizes
>> and, on these machines, the host and a guest can be running with
>> different pages sizes.
>>
>> In addition to this, there might be devices that have a required and/or
>> preferred page size for mapping memory.
>>
>> In this series we extend the "Shared Memory Regions" with a subsection
>> explaining the posible existence of page alignment restrictions when
>> the VIRTIO_F_SHM_PAGE_SIZE feature has been negotiated.
>>
>> For the device to provide the page size information to the driver, we
>> need to extend the PCI and MMIO transports. For the former, we borrow
>> 8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift
>> field which can be used to derive the page size by using the following
>> formula: (page_size = 1 << (page_shift + 12)).
>>
>> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
>> holding the page_shift value. Since MMIO registers are 32 bit wide, we
>> could have asked the device to directly provide page_size instead of
>> page_shift, but seems reasonable to be consistent across transports.
>>
>> An implementation of the changes proposed in this series has been
>> published as an RFC to the LKML, to be used as a reference:
>>
>> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1619e6908b@redhat.com/
>
>
> Can you explain the use a bit more please?
>
> - looks like this is exposed to userspace?
>   but userspace can not be trusted not to violate the spec.
>   how can that be sufficient to satisfy MUST requirements?

In the virtio-gpu implementation in the RFC series it's indeed exposed
to userspace. This is because gpu drivers are supported by userspace
drivers (Mesa), and both have a very close relationship. Probably, that
patch should also make the kernel driver reject mappings that doesn't
comply with the alignment to ensure it satisfies the MUST requirements.

I'll do that in the non-RFC series.

>   what are these mappings and what are the alignment restrictions,
>   in fact?
>
>
> Looking at that patch, I begin to suspect that while the spec patch talks about
> device mapping requests, it is actually the other way around:
> the restriction is on guest virtual to guest physical mappings?

No. The guest can operate in smaller-than-host mappings (otherwise,
running a 4K guest kernel on a 16k host wouldn't be possible).

Shared Memory Regions tend to be used this way:

 1. The guest requests access to certain resource owned by the host. The
 request can be for the whole resource or part of it. The request
 includes the 'offset' within the resource, and the 'size' of the window
 it wants to have on it.

 2. The host maps the resource, from 'offset' to 'size' (as received by
 the guest), into a certain offset from the host's view of the mapping
 backing the Shared Memory Region.

 3. The guest receives an answer indicating the offset (completely
 different and independet from the offset that was requested within the
 resource in 1) within the Shared Memory Region where the resource has
 been mapped.

 4. The guest starts operating on the resource, now available in the
 Shared Memory Region at the offset received in 3.

The problem this change in the specification tries to address is that,
in step 2, both 'offset' and 'size' must be aligned according to the
host's limitations (in a traditional host/guest scenario, this would the
host's page size). But the guest has no knowledge of this limitations,
so it can't adjust 'offset' and 'size' as needed.

While this example is written thinking of virtual devices, it's
certainly possible to implement physical virtio devices that may have a
minimum granularity bigger than the host's.

> Does the feature affect other device types besides gpu, and how?

Every device with Shared Memory Regions is affected. The only reason we
found the issue in gpu first is because it's the most intense user of
SHM regions. The moment we attempted to use it on a 16K page host, the
issue was made clear.

For instance, fs is also affected, but only uses SHM regions with DAX,
and the latter is barely used due to various constrains.

New devices using SHM, like virtio-media, are sure to hit this issue
sooner than later.

Thanks,
Sergio.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-04-02  9:18   ` Sergio Lopez Pascual
@ 2025-04-02  9:55     ` Parav Pandit
  2025-04-02 12:22       ` Sergio Lopez Pascual
  2025-04-02 14:49       ` Michael S. Tsirkin
  0 siblings, 2 replies; 14+ messages in thread
From: Parav Pandit @ 2025-04-02  9:55 UTC (permalink / raw)
  To: Sergio Lopez Pascual, Michael S. Tsirkin
  Cc: virtio-comment@lists.linux.dev, dmitry.osipenko@collabora.com

Hi Sergio,

> From: Sergio Lopez Pascual <slp@redhat.com>
> Sent: Wednesday, April 2, 2025 2:48 PM
> 
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Mon, Mar 31, 2025 at 05:37:08PM -0400, Sergio Lopez wrote:
> >> There's an incresing number of machines supporting multiple page
> >> sizes and, on these machines, the host and a guest can be running
> >> with different pages sizes.
> >>
> >> In addition to this, there might be devices that have a required
> >> and/or preferred page size for mapping memory.
> >>
> >> In this series we extend the "Shared Memory Regions" with a
> >> subsection explaining the posible existence of page alignment
> >> restrictions when the VIRTIO_F_SHM_PAGE_SIZE feature has been
> negotiated.
> >>
> >> For the device to provide the page size information to the driver, we
> >> need to extend the PCI and MMIO transports. For the former, we borrow
> >> 8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift
> >> field which can be used to derive the page size by using the
> >> following
> >> formula: (page_size = 1 << (page_shift + 12)).
> >>
> >> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
> >> holding the page_shift value. Since MMIO registers are 32 bit wide,
> >> we could have asked the device to directly provide page_size instead
> >> of page_shift, but seems reasonable to be consistent across transports.
> >>
> >> An implementation of the changes proposed in this series has been
> >> published as an RFC to the LKML, to be used as a reference:
> >>
> >> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1619
> >> e6908b@redhat.com/
> >
> >
> > Can you explain the use a bit more please?
> >
> > - looks like this is exposed to userspace?
> >   but userspace can not be trusted not to violate the spec.
> >   how can that be sufficient to satisfy MUST requirements?
> 
> In the virtio-gpu implementation in the RFC series it's indeed exposed to
> userspace. This is because gpu drivers are supported by userspace drivers
> (Mesa), and both have a very close relationship. Probably, that patch should
> also make the kernel driver reject mappings that doesn't comply with the
> alignment to ensure it satisfies the MUST requirements.
> 
> I'll do that in the non-RFC series.
> 
> >   what are these mappings and what are the alignment restrictions,
> >   in fact?
> >
> >
> > Looking at that patch, I begin to suspect that while the spec patch
> > talks about device mapping requests, it is actually the other way around:
> > the restriction is on guest virtual to guest physical mappings?
> 
> No. The guest can operate in smaller-than-host mappings (otherwise, running
> a 4K guest kernel on a 16k host wouldn't be possible).
> 
> Shared Memory Regions tend to be used this way:
> 
>  1. The guest requests access to certain resource owned by the host. The
> request can be for the whole resource or part of it. The request  includes the
> 'offset' within the resource, and the 'size' of the window  it wants to have on
> it.
> 
>  2. The host maps the resource, from 'offset' to 'size' (as received by  the
> guest), into a certain offset from the host's view of the mapping  backing the
> Shared Memory Region.
> 
>  3. The guest receives an answer indicating the offset (completely  different
> and independet from the offset that was requested within the  resource in 1)
> within the Shared Memory Region where the resource has  been mapped.
> 
>  4. The guest starts operating on the resource, now available in the  Shared
> Memory Region at the offset received in 3.
> 
> The problem this change in the specification tries to address is that, in step 2,
> both 'offset' and 'size' must be aligned according to the host's limitations (in a
> traditional host/guest scenario, this would the host's page size). But the guest
> has no knowledge of this limitations, so it can't adjust 'offset' and 'size' as
> needed.
> 
> While this example is written thinking of virtual devices, it's certainly possible
> to implement physical virtio devices that may have a minimum granularity
> bigger than the host's.
> 
> > Does the feature affect other device types besides gpu, and how?
> 
> Every device with Shared Memory Regions is affected. The only reason we
> found the issue in gpu first is because it's the most intense user of SHM
> regions. The moment we attempted to use it on a 16K page host, the issue
> was made clear.
> 
> For instance, fs is also affected, but only uses SHM regions with DAX, and the
> latter is barely used due to various constrains.
> 
> New devices using SHM, like virtio-media, are sure to hit this issue sooner
> than later.
> 
I understand the issue much better now. Thanks a lot for the detailed explanation.
Given that all the resource carve out (window access), resource allocation and mapping is done through a channel outside of shared memory,
and this is not about the page size, but about the resource mapping size,

I believe this resource mapping size and offset restrictions should be communicated via same communication channel that does this resource mapping plumbing.

Plumbing this into the shared memory capability does not look right place as it has no knowledge of steps #1 to #4 you walk through.
Please consider using your existing resource mapping channel to communicate such restrictions.

> Thanks,
> Sergio.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-04-02  9:55     ` Parav Pandit
@ 2025-04-02 12:22       ` Sergio Lopez Pascual
  2025-04-02 12:36         ` Parav Pandit
  2025-04-02 14:49       ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Sergio Lopez Pascual @ 2025-04-02 12:22 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: virtio-comment@lists.linux.dev, dmitry.osipenko@collabora.com

Parav Pandit <parav@nvidia.com> writes:

> Hi Sergio,
>
>> From: Sergio Lopez Pascual <slp@redhat.com>
>> Sent: Wednesday, April 2, 2025 2:48 PM
>>
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>
>> > On Mon, Mar 31, 2025 at 05:37:08PM -0400, Sergio Lopez wrote:
>> >> There's an incresing number of machines supporting multiple page
>> >> sizes and, on these machines, the host and a guest can be running
>> >> with different pages sizes.
>> >>
>> >> In addition to this, there might be devices that have a required
>> >> and/or preferred page size for mapping memory.
>> >>
>> >> In this series we extend the "Shared Memory Regions" with a
>> >> subsection explaining the posible existence of page alignment
>> >> restrictions when the VIRTIO_F_SHM_PAGE_SIZE feature has been
>> negotiated.
>> >>
>> >> For the device to provide the page size information to the driver, we
>> >> need to extend the PCI and MMIO transports. For the former, we borrow
>> >> 8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift
>> >> field which can be used to derive the page size by using the
>> >> following
>> >> formula: (page_size = 1 << (page_shift + 12)).
>> >>
>> >> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
>> >> holding the page_shift value. Since MMIO registers are 32 bit wide,
>> >> we could have asked the device to directly provide page_size instead
>> >> of page_shift, but seems reasonable to be consistent across transports.
>> >>
>> >> An implementation of the changes proposed in this series has been
>> >> published as an RFC to the LKML, to be used as a reference:
>> >>
>> >> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1619
>> >> e6908b@redhat.com/
>> >
>> >
>> > Can you explain the use a bit more please?
>> >
>> > - looks like this is exposed to userspace?
>> >   but userspace can not be trusted not to violate the spec.
>> >   how can that be sufficient to satisfy MUST requirements?
>>
>> In the virtio-gpu implementation in the RFC series it's indeed exposed to
>> userspace. This is because gpu drivers are supported by userspace drivers
>> (Mesa), and both have a very close relationship. Probably, that patch should
>> also make the kernel driver reject mappings that doesn't comply with the
>> alignment to ensure it satisfies the MUST requirements.
>>
>> I'll do that in the non-RFC series.
>>
>> >   what are these mappings and what are the alignment restrictions,
>> >   in fact?
>> >
>> >
>> > Looking at that patch, I begin to suspect that while the spec patch
>> > talks about device mapping requests, it is actually the other way around:
>> > the restriction is on guest virtual to guest physical mappings?
>>
>> No. The guest can operate in smaller-than-host mappings (otherwise, running
>> a 4K guest kernel on a 16k host wouldn't be possible).
>>
>> Shared Memory Regions tend to be used this way:
>>
>>  1. The guest requests access to certain resource owned by the host. The
>> request can be for the whole resource or part of it. The request  includes the
>> 'offset' within the resource, and the 'size' of the window  it wants to have on
>> it.
>>
>>  2. The host maps the resource, from 'offset' to 'size' (as received by  the
>> guest), into a certain offset from the host's view of the mapping  backing the
>> Shared Memory Region.
>>
>>  3. The guest receives an answer indicating the offset (completely  different
>> and independet from the offset that was requested within the  resource in 1)
>> within the Shared Memory Region where the resource has  been mapped.
>>
>>  4. The guest starts operating on the resource, now available in the  Shared
>> Memory Region at the offset received in 3.
>>
>> The problem this change in the specification tries to address is that, in step 2,
>> both 'offset' and 'size' must be aligned according to the host's limitations (in a
>> traditional host/guest scenario, this would the host's page size). But the guest
>> has no knowledge of this limitations, so it can't adjust 'offset' and 'size' as
>> needed.
>>
>> While this example is written thinking of virtual devices, it's certainly possible
>> to implement physical virtio devices that may have a minimum granularity
>> bigger than the host's.
>>
>> > Does the feature affect other device types besides gpu, and how?
>>
>> Every device with Shared Memory Regions is affected. The only reason we
>> found the issue in gpu first is because it's the most intense user of SHM
>> regions. The moment we attempted to use it on a 16K page host, the issue
>> was made clear.
>>
>> For instance, fs is also affected, but only uses SHM regions with DAX, and the
>> latter is barely used due to various constrains.
>>
>> New devices using SHM, like virtio-media, are sure to hit this issue sooner
>> than later.
>>
> I understand the issue much better now. Thanks a lot for the detailed explanation.
> Given that all the resource carve out (window access), resource allocation and mapping is done through a channel outside of shared memory,
> and this is not about the page size, but about the resource mapping size,
>
> I believe this resource mapping size and offset restrictions should be communicated via same communication channel that does this resource mapping plumbing.
>
> Plumbing this into the shared memory capability does not look right place as it has no knowledge of steps #1 to #4 you walk through.
> Please consider using your existing resource mapping channel to communicate such restrictions.

I'm afraid I have to disagree. Today, the transports provide the
'offset' and 'size' for each SHM region, because back in the day it was
deemed to be all the information needed to make use of them. Now we know
that having the 'minimum granularity' allowed for each mapping in the
region is as important as having 'offset' and 'size'.

I can't think of a reason why, given that those three pieces of
information are needed to make use of a SHM region, two of them must be
provided in the transport, but the third must be obtained in a
device-specific way.

Thanks,
Sergio.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-04-02 12:22       ` Sergio Lopez Pascual
@ 2025-04-02 12:36         ` Parav Pandit
  0 siblings, 0 replies; 14+ messages in thread
From: Parav Pandit @ 2025-04-02 12:36 UTC (permalink / raw)
  To: Sergio Lopez Pascual, Michael S. Tsirkin
  Cc: virtio-comment@lists.linux.dev, dmitry.osipenko@collabora.com


> From: Sergio Lopez Pascual <slp@redhat.com>
> Sent: Wednesday, April 2, 2025 5:53 PM
> 
> Parav Pandit <parav@nvidia.com> writes:
> 
> > Hi Sergio,
> >
> >> From: Sergio Lopez Pascual <slp@redhat.com>
> >> Sent: Wednesday, April 2, 2025 2:48 PM
> >>
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >>
> >> > On Mon, Mar 31, 2025 at 05:37:08PM -0400, Sergio Lopez wrote:
> >> >> There's an incresing number of machines supporting multiple page
> >> >> sizes and, on these machines, the host and a guest can be running
> >> >> with different pages sizes.
> >> >>
> >> >> In addition to this, there might be devices that have a required
> >> >> and/or preferred page size for mapping memory.
> >> >>
> >> >> In this series we extend the "Shared Memory Regions" with a
> >> >> subsection explaining the posible existence of page alignment
> >> >> restrictions when the VIRTIO_F_SHM_PAGE_SIZE feature has been
> >> negotiated.
> >> >>
> >> >> For the device to provide the page size information to the driver,
> >> >> we need to extend the PCI and MMIO transports. For the former, we
> >> >> borrow
> >> >> 8 bits from the 16 bit padding in virtio_pci_cap to hold a
> >> >> page_shift field which can be used to derive the page size by
> >> >> using the following
> >> >> formula: (page_size = 1 << (page_shift + 12)).
> >> >>
> >> >> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
> >> >> holding the page_shift value. Since MMIO registers are 32 bit
> >> >> wide, we could have asked the device to directly provide page_size
> >> >> instead of page_shift, but seems reasonable to be consistent across
> transports.
> >> >>
> >> >> An implementation of the changes proposed in this series has been
> >> >> published as an RFC to the LKML, to be used as a reference:
> >> >>
> >> >> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1
> >> >> 619
> >> >> e6908b@redhat.com/
> >> >
> >> >
> >> > Can you explain the use a bit more please?
> >> >
> >> > - looks like this is exposed to userspace?
> >> >   but userspace can not be trusted not to violate the spec.
> >> >   how can that be sufficient to satisfy MUST requirements?
> >>
> >> In the virtio-gpu implementation in the RFC series it's indeed
> >> exposed to userspace. This is because gpu drivers are supported by
> >> userspace drivers (Mesa), and both have a very close relationship.
> >> Probably, that patch should also make the kernel driver reject
> >> mappings that doesn't comply with the alignment to ensure it satisfies the
> MUST requirements.
> >>
> >> I'll do that in the non-RFC series.
> >>
> >> >   what are these mappings and what are the alignment restrictions,
> >> >   in fact?
> >> >
> >> >
> >> > Looking at that patch, I begin to suspect that while the spec patch
> >> > talks about device mapping requests, it is actually the other way around:
> >> > the restriction is on guest virtual to guest physical mappings?
> >>
> >> No. The guest can operate in smaller-than-host mappings (otherwise,
> >> running a 4K guest kernel on a 16k host wouldn't be possible).
> >>
> >> Shared Memory Regions tend to be used this way:
> >>
> >>  1. The guest requests access to certain resource owned by the host.
> >> The request can be for the whole resource or part of it. The request
> >> includes the 'offset' within the resource, and the 'size' of the
> >> window  it wants to have on it.
> >>
> >>  2. The host maps the resource, from 'offset' to 'size' (as received
> >> by  the guest), into a certain offset from the host's view of the
> >> mapping  backing the Shared Memory Region.
> >>
> >>  3. The guest receives an answer indicating the offset (completely
> >> different and independet from the offset that was requested within
> >> the  resource in 1) within the Shared Memory Region where the resource
> has  been mapped.
> >>
> >>  4. The guest starts operating on the resource, now available in the
> >> Shared Memory Region at the offset received in 3.
> >>
> >> The problem this change in the specification tries to address is
> >> that, in step 2, both 'offset' and 'size' must be aligned according
> >> to the host's limitations (in a traditional host/guest scenario, this
> >> would the host's page size). But the guest has no knowledge of this
> >> limitations, so it can't adjust 'offset' and 'size' as needed.
> >>
> >> While this example is written thinking of virtual devices, it's
> >> certainly possible to implement physical virtio devices that may have
> >> a minimum granularity bigger than the host's.
> >>
> >> > Does the feature affect other device types besides gpu, and how?
> >>
> >> Every device with Shared Memory Regions is affected. The only reason
> >> we found the issue in gpu first is because it's the most intense user
> >> of SHM regions. The moment we attempted to use it on a 16K page host,
> >> the issue was made clear.
> >>
> >> For instance, fs is also affected, but only uses SHM regions with
> >> DAX, and the latter is barely used due to various constrains.
> >>
> >> New devices using SHM, like virtio-media, are sure to hit this issue
> >> sooner than later.
> >>
> > I understand the issue much better now. Thanks a lot for the detailed
> explanation.
> > Given that all the resource carve out (window access), resource
> > allocation and mapping is done through a channel outside of shared
> > memory, and this is not about the page size, but about the resource
> > mapping size,
> >
> > I believe this resource mapping size and offset restrictions should be
> communicated via same communication channel that does this resource
> mapping plumbing.
> >
> > Plumbing this into the shared memory capability does not look right place
> as it has no knowledge of steps #1 to #4 you walk through.
> > Please consider using your existing resource mapping channel to
> communicate such restrictions.
> 
> I'm afraid I have to disagree. Today, the transports provide the 'offset' and
> 'size' for each SHM region, because back in the day it was deemed to be all
> the information needed to make use of them. Now we know that having the
> 'minimum granularity' allowed for each mapping in the region is as important
> as having 'offset' and 'size'.
> 
> I can't think of a reason why, given that those three pieces of information are
> needed to make use of a SHM region, two of them must be provided in the
> transport, but the third must be obtained in a device-specific way.

Because the whole resource window and partitioning and subdivision is done in device specific way.
Otherwise shm region itself does not have any need to be aligned to anything from the device side.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-04-02  9:55     ` Parav Pandit
  2025-04-02 12:22       ` Sergio Lopez Pascual
@ 2025-04-02 14:49       ` Michael S. Tsirkin
  2025-04-02 16:28         ` Sergio Lopez Pascual
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2025-04-02 14:49 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Sergio Lopez Pascual, virtio-comment@lists.linux.dev,
	dmitry.osipenko@collabora.com

On Wed, Apr 02, 2025 at 09:55:20AM +0000, Parav Pandit wrote:
> Hi Sergio,
> 
> > From: Sergio Lopez Pascual <slp@redhat.com>
> > Sent: Wednesday, April 2, 2025 2:48 PM
> > 
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > 
> > > On Mon, Mar 31, 2025 at 05:37:08PM -0400, Sergio Lopez wrote:
> > >> There's an incresing number of machines supporting multiple page
> > >> sizes and, on these machines, the host and a guest can be running
> > >> with different pages sizes.
> > >>
> > >> In addition to this, there might be devices that have a required
> > >> and/or preferred page size for mapping memory.
> > >>
> > >> In this series we extend the "Shared Memory Regions" with a
> > >> subsection explaining the posible existence of page alignment
> > >> restrictions when the VIRTIO_F_SHM_PAGE_SIZE feature has been
> > negotiated.
> > >>
> > >> For the device to provide the page size information to the driver, we
> > >> need to extend the PCI and MMIO transports. For the former, we borrow
> > >> 8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift
> > >> field which can be used to derive the page size by using the
> > >> following
> > >> formula: (page_size = 1 << (page_shift + 12)).
> > >>
> > >> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
> > >> holding the page_shift value. Since MMIO registers are 32 bit wide,
> > >> we could have asked the device to directly provide page_size instead
> > >> of page_shift, but seems reasonable to be consistent across transports.
> > >>
> > >> An implementation of the changes proposed in this series has been
> > >> published as an RFC to the LKML, to be used as a reference:
> > >>
> > >> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1619
> > >> e6908b@redhat.com/
> > >
> > >
> > > Can you explain the use a bit more please?
> > >
> > > - looks like this is exposed to userspace?
> > >   but userspace can not be trusted not to violate the spec.
> > >   how can that be sufficient to satisfy MUST requirements?
> > 
> > In the virtio-gpu implementation in the RFC series it's indeed exposed to
> > userspace. This is because gpu drivers are supported by userspace drivers
> > (Mesa), and both have a very close relationship. Probably, that patch should
> > also make the kernel driver reject mappings that doesn't comply with the
> > alignment to ensure it satisfies the MUST requirements.
> > 
> > I'll do that in the non-RFC series.
> > 
> > >   what are these mappings and what are the alignment restrictions,
> > >   in fact?
> > >
> > >
> > > Looking at that patch, I begin to suspect that while the spec patch
> > > talks about device mapping requests, it is actually the other way around:
> > > the restriction is on guest virtual to guest physical mappings?
> > 
> > No. The guest can operate in smaller-than-host mappings (otherwise, running
> > a 4K guest kernel on a 16k host wouldn't be possible).
> > 
> > Shared Memory Regions tend to be used this way:
> > 
> >  1. The guest requests access to certain resource owned by the host. The
> > request can be for the whole resource or part of it. The request  includes the
> > 'offset' within the resource, and the 'size' of the window  it wants to have on
> > it.
> > 
> >  2. The host maps the resource, from 'offset' to 'size' (as received by  the
> > guest), into a certain offset from the host's view of the mapping  backing the
> > Shared Memory Region.
> > 
> >  3. The guest receives an answer indicating the offset (completely  different
> > and independet from the offset that was requested within the  resource in 1)
> > within the Shared Memory Region where the resource has  been mapped.
> > 
> >  4. The guest starts operating on the resource, now available in the  Shared
> > Memory Region at the offset received in 3.
> > 
> > The problem this change in the specification tries to address is that, in step 2,
> > both 'offset' and 'size' must be aligned according to the host's limitations (in a
> > traditional host/guest scenario, this would the host's page size). But the guest
> > has no knowledge of this limitations, so it can't adjust 'offset' and 'size' as
> > needed.
> > 
> > While this example is written thinking of virtual devices, it's certainly possible
> > to implement physical virtio devices that may have a minimum granularity
> > bigger than the host's.
> > 
> > > Does the feature affect other device types besides gpu, and how?
> > 
> > Every device with Shared Memory Regions is affected. The only reason we
> > found the issue in gpu first is because it's the most intense user of SHM
> > regions. The moment we attempted to use it on a 16K page host, the issue
> > was made clear.
> > 
> > For instance, fs is also affected, but only uses SHM regions with DAX, and the
> > latter is barely used due to various constrains.
> > 
> > New devices using SHM, like virtio-media, are sure to hit this issue sooner
> > than later.
> > 
> I understand the issue much better now. Thanks a lot for the detailed explanation.
> Given that all the resource carve out (window access), resource allocation and mapping is done through a channel outside of shared memory,
> and this is not about the page size, but about the resource mapping size,
> 
> I believe this resource mapping size and offset restrictions should be communicated via same communication channel that does this resource mapping plumbing.
> 
> Plumbing this into the shared memory capability does not look right place as it has no knowledge of steps #1 to #4 you walk through.
> Please consider using your existing resource mapping channel to communicate such restrictions.

I think the channel does not matter much. We provide the size why not
the granularity? But the documentation as written is clearly
insufficient, there's no explanation how the field is
used besides some opaque talk about "map" which is never
defined anywhere.

Sent some suggestions already.


> > Thanks,
> > Sergio.
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] shared-mem: introduce page alignment restrictions
  2025-04-02 14:49       ` Michael S. Tsirkin
@ 2025-04-02 16:28         ` Sergio Lopez Pascual
  0 siblings, 0 replies; 14+ messages in thread
From: Sergio Lopez Pascual @ 2025-04-02 16:28 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: virtio-comment@lists.linux.dev, dmitry.osipenko@collabora.com

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Apr 02, 2025 at 09:55:20AM +0000, Parav Pandit wrote:
>> Hi Sergio,
>>
>> > From: Sergio Lopez Pascual <slp@redhat.com>
>> > Sent: Wednesday, April 2, 2025 2:48 PM
>> >
>> > "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >
>> > > On Mon, Mar 31, 2025 at 05:37:08PM -0400, Sergio Lopez wrote:
>> > >> There's an incresing number of machines supporting multiple page
>> > >> sizes and, on these machines, the host and a guest can be running
>> > >> with different pages sizes.
>> > >>
>> > >> In addition to this, there might be devices that have a required
>> > >> and/or preferred page size for mapping memory.
>> > >>
>> > >> In this series we extend the "Shared Memory Regions" with a
>> > >> subsection explaining the posible existence of page alignment
>> > >> restrictions when the VIRTIO_F_SHM_PAGE_SIZE feature has been
>> > negotiated.
>> > >>
>> > >> For the device to provide the page size information to the driver, we
>> > >> need to extend the PCI and MMIO transports. For the former, we borrow
>> > >> 8 bits from the 16 bit padding in virtio_pci_cap to hold a page_shift
>> > >> field which can be used to derive the page size by using the
>> > >> following
>> > >> formula: (page_size = 1 << (page_shift + 12)).
>> > >>
>> > >> For MMIO, we add a the SHMPageShift register at offset 0x0c4, also
>> > >> holding the page_shift value. Since MMIO registers are 32 bit wide,
>> > >> we could have asked the device to directly provide page_size instead
>> > >> of page_shift, but seems reasonable to be consistent across transports.
>> > >>
>> > >> An implementation of the changes proposed in this series has been
>> > >> published as an RFC to the LKML, to be used as a reference:
>> > >>
>> > >> https://lore.kernel.org/all/20250214-virtio-shm-page-size-v2-0-aa1619
>> > >> e6908b@redhat.com/
>> > >
>> > >
>> > > Can you explain the use a bit more please?
>> > >
>> > > - looks like this is exposed to userspace?
>> > >   but userspace can not be trusted not to violate the spec.
>> > >   how can that be sufficient to satisfy MUST requirements?
>> >
>> > In the virtio-gpu implementation in the RFC series it's indeed exposed to
>> > userspace. This is because gpu drivers are supported by userspace drivers
>> > (Mesa), and both have a very close relationship. Probably, that patch should
>> > also make the kernel driver reject mappings that doesn't comply with the
>> > alignment to ensure it satisfies the MUST requirements.
>> >
>> > I'll do that in the non-RFC series.
>> >
>> > >   what are these mappings and what are the alignment restrictions,
>> > >   in fact?
>> > >
>> > >
>> > > Looking at that patch, I begin to suspect that while the spec patch
>> > > talks about device mapping requests, it is actually the other way around:
>> > > the restriction is on guest virtual to guest physical mappings?
>> >
>> > No. The guest can operate in smaller-than-host mappings (otherwise, running
>> > a 4K guest kernel on a 16k host wouldn't be possible).
>> >
>> > Shared Memory Regions tend to be used this way:
>> >
>> >  1. The guest requests access to certain resource owned by the host. The
>> > request can be for the whole resource or part of it. The request  includes the
>> > 'offset' within the resource, and the 'size' of the window  it wants to have on
>> > it.
>> >
>> >  2. The host maps the resource, from 'offset' to 'size' (as received by  the
>> > guest), into a certain offset from the host's view of the mapping  backing the
>> > Shared Memory Region.
>> >
>> >  3. The guest receives an answer indicating the offset (completely  different
>> > and independet from the offset that was requested within the  resource in 1)
>> > within the Shared Memory Region where the resource has  been mapped.
>> >
>> >  4. The guest starts operating on the resource, now available in the  Shared
>> > Memory Region at the offset received in 3.
>> >
>> > The problem this change in the specification tries to address is that, in step 2,
>> > both 'offset' and 'size' must be aligned according to the host's limitations (in a
>> > traditional host/guest scenario, this would the host's page size). But the guest
>> > has no knowledge of this limitations, so it can't adjust 'offset' and 'size' as
>> > needed.
>> >
>> > While this example is written thinking of virtual devices, it's certainly possible
>> > to implement physical virtio devices that may have a minimum granularity
>> > bigger than the host's.
>> >
>> > > Does the feature affect other device types besides gpu, and how?
>> >
>> > Every device with Shared Memory Regions is affected. The only reason we
>> > found the issue in gpu first is because it's the most intense user of SHM
>> > regions. The moment we attempted to use it on a 16K page host, the issue
>> > was made clear.
>> >
>> > For instance, fs is also affected, but only uses SHM regions with DAX, and the
>> > latter is barely used due to various constrains.
>> >
>> > New devices using SHM, like virtio-media, are sure to hit this issue sooner
>> > than later.
>> >
>> I understand the issue much better now. Thanks a lot for the detailed explanation.
>> Given that all the resource carve out (window access), resource allocation and mapping is done through a channel outside of shared memory,
>> and this is not about the page size, but about the resource mapping size,
>>
>> I believe this resource mapping size and offset restrictions should be communicated via same communication channel that does this resource mapping plumbing.
>>
>> Plumbing this into the shared memory capability does not look right place as it has no knowledge of steps #1 to #4 you walk through.
>> Please consider using your existing resource mapping channel to communicate such restrictions.
>
> I think the channel does not matter much. We provide the size why not
> the granularity? But the documentation as written is clearly
> insufficient, there's no explanation how the field is
> used besides some opaque talk about "map" which is never
> defined anywhere.

While working on adding per-device alignment restrictions as you
suggested, I've found there's a precedent for a device already
negotiating the alignment in a device-specific way in fs DAX:

---
The driver maps a file range into the DAX window using the
FUSE_SETUPMAPPING request. Alignment
constraints for FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING requests are
communicated dur-
ing FUSE_INIT negotiation.
---

The structure exchanged in FUSE_INIT does indeed contain such field
(map_alignment):

---
pub struct InitOut {
    pub major: u32,
    pub minor: u32,
    pub max_readahead: u32,
    pub flags: u32,
    pub max_background: u16,
    pub congestion_threshold: u16,
    pub max_write: u32,
    pub time_gran: u32,
    pub max_pages: u16,
    pub map_alignment: u16,
    pub flags2: u32,
    pub unused: [u32; 7],
}
---

Given this precedent, I think it makes sense to keep providing the
alignment information in a device-specific way.

Thanks,
Sergio.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-04-02 16:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-31 21:37 [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Sergio Lopez
2025-03-31 21:37 ` [PATCH v3 1/3] " Sergio Lopez
2025-04-01 16:42   ` Daniel Verkamp
2025-04-02  7:32   ` Michael S. Tsirkin
2025-03-31 21:37 ` [PATCH v3 2/3] transport-pci: introduce page_shift field for SHM Sergio Lopez
2025-03-31 21:37 ` [PATCH v3 3/3] transport-mmio: introduce SHMPageShift register Sergio Lopez
2025-04-02  7:37 ` [PATCH v3 0/3] shared-mem: introduce page alignment restrictions Michael S. Tsirkin
2025-04-02  9:18   ` Sergio Lopez Pascual
2025-04-02  9:55     ` Parav Pandit
2025-04-02 12:22       ` Sergio Lopez Pascual
2025-04-02 12:36         ` Parav Pandit
2025-04-02 14:49       ` Michael S. Tsirkin
2025-04-02 16:28         ` Sergio Lopez Pascual
2025-04-02  8:54 ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox